Science.gov

Sample records for java sequence alignment

  1. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    PubMed

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836

  2. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

    PubMed Central

    Martin, Andrew C. R.

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836

  3. JVM: Java Visual Mapping tool for next generation sequencing read.

    PubMed

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB. PMID:25387956

  4. Pairwise Sequence Alignment Library

    Energy Science and Technology Software Center (ESTSC)

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprintmore » that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less

  5. Pairwise Sequence Alignment Library

    SciTech Connect

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  6. Alignment-Annotator web server: rendering and annotating sequence alignments

    PubMed Central

    Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

    2014-01-01

    Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. Availability: http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. PMID:24813445

  7. Sequence alignment with tandem duplication

    SciTech Connect

    Benson, G.

    1997-12-01

    Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modification of sequences proceeds through any of the operations of substitution, insertion or deletion (the latter two collectively termed indels). While this model has worked farily well, it has long been apparent that other mutational events occur. In this paper, we introduce a new model, the DSI model which includes another common mutational event, tandem duplication. Tandem duplication produces tandem repeats which are common in DNA, making up perhaps 10% of the human genome. They are responsible for some human diseases and may serve a multitude of functions in DNA regulation and evolution. Using the DSI model, we develop new exact and heuristic algorithms for comparing and aligning DNA sequences when they contain tandem repeats. 30 refs., 3 figs.

  8. Global Alignment System for Large Genomic Sequencing

    Energy Science and Technology Software Center (ESTSC)

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  9. Conditional alignment random fields for multiple motion sequence alignment.

    PubMed

    Kim, Minyoung

    2013-11-01

    We consider the multiple time-series alignment problem, typically focusing on the task of synchronizing multiple motion videos of the same kind of human activity. Finding an optimal global alignment of multiple sequences is infeasible, while there have been several approximate solutions, including iterative pairwise warping algorithms and variants of hidden Markov models. In this paper, we propose a novel probabilistic model that represents the conditional densities of the latent target sequences which are aligned with the given observed sequences through the hidden alignment variables. By imposing certain constraints on the target sequences at the learning stage, we have a sensible model for multiple alignments that can be learned very efficiently by the EM algorithm. Compared to existing methods, our approach yields more accurate alignment while being more robust to local optima and initial configurations. We demonstrate its efficacy on both synthetic and real-world motion videos including facial emotions and human activities. PMID:24051737

  10. Robust temporal alignment of multimodal cardiac sequences

    NASA Astrophysics Data System (ADS)

    Perissinotto, Andrea; Queirós, Sandro; Morais, Pedro; Baptista, Maria J.; Monaghan, Mark; Rodrigues, Nuno F.; D'hooge, Jan; Vilaça, João. L.; Barbosa, Daniel

    2015-03-01

    Given the dynamic nature of cardiac function, correct temporal alignment of pre-operative models and intraoperative images is crucial for augmented reality in cardiac image-guided interventions. As such, the current study focuses on the development of an image-based strategy for temporal alignment of multimodal cardiac imaging sequences, such as cine Magnetic Resonance Imaging (MRI) or 3D Ultrasound (US). First, we derive a robust, modality-independent signal from the image sequences, estimated by computing the normalized cross-correlation between each frame in the temporal sequence and the end-diastolic frame. This signal is a resembler for the left-ventricle (LV) volume curve over time, whose variation indicates different temporal landmarks of the cardiac cycle. We then perform the temporal alignment of these surrogate signals derived from MRI and US sequences of the same patient through Dynamic Time Warping (DTW), allowing to synchronize both sequences. The proposed framework was evaluated in 98 patients, which have undergone both 3D+t MRI and US scans. The end-systolic frame could be accurately estimated as the minimum of the image-derived surrogate signal, presenting a relative error of 1.6 +/- 1.9% and 4.0 +/- 4.2% for the MRI and US sequences, respectively, thus supporting its association with key temporal instants of the cardiac cycle. The use of DTW reduces the desynchronization of the cardiac events in MRI and US sequences, allowing to temporally align multimodal cardiac imaging sequences. Overall, a generic, fast and accurate method for temporal synchronization of MRI and US sequences of the same patient was introduced. This approach could be straightforwardly used for the correct temporal alignment of pre-operative MRI information and intra-operative US images.

  11. Image correlation method for DNA sequence alignment.

    PubMed

    Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

    2012-01-01

    The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment. PMID:22761742

  12. DNA Sequence Chromatogram Browsing Using JAVA and CORBA

    PubMed Central

    Parsons, Jeremy D.; Buehler, Eugen; Hillier, LaDeana

    1999-01-01

    DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence. [The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/∼jparsons. Links to working examples of the trace viewers can be found at http://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.] PMID:10077534

  13. EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes.

    PubMed

    Jeon, Yoon-Seong; Lee, Kihyun; Park, Sang-Cheol; Kim, Bong-Soo; Cho, Yong-Joon; Ha, Sung-Min; Chun, Jongsik

    2014-02-01

    EzEditor is a Java-based molecular sequence editor allowing manipulation of both DNA and protein sequence alignments for phylogenetic analysis. It has multiple features optimized to connect initial computer-generated multiple alignment and subsequent phylogenetic analysis by providing manual editing with reference to biological information specific to the genes under consideration. It provides various functionalities for editing rRNA alignments using secondary structure information. In addition, it supports simultaneous editing of both DNA sequences and their translated protein sequences for protein-coding genes. EzEditor is, to our knowledge, the first sequence editing software designed for both rRNA- and protein-coding genes with the visualization of biologically relevant information and should be useful in molecular phylogenetic studies. EzEditor is based on Java, can be run on all major computer operating systems and is freely available from http://sw.ezbiocloud.net/ezeditor/. PMID:24425826

  14. Regular language constrained sequence alignment revisited.

    PubMed

    Kucherov, Gregory; Pinhas, Tamar; Ziv-Ukelson, Michal

    2011-05-01

    Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n²t⁴) time and O(n²t²) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the input non-deterministic automaton. A faster O(n²t³) time algorithm for the same problem was subsequently proposed. In this article, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n²t³)/log t). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve the worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense. PMID:21554020

  15. Multiple Sequence Alignment Based on Chaotic PSO

    NASA Astrophysics Data System (ADS)

    Lei, Xiu-Juan; Sun, Jing-Jing; Ma, Qian-Zhi

    This paper introduces a new improved algorithm called chaotic PSO (CPSO) based on the thought of chaos optimization to solve multiple sequence alignment. For one thing, the chaotic variables are generated between 0 and 1 when initializing the population so that the particles are distributed uniformly in the solution space. For another thing, the chaotic sequences are generated using the Logistic mapping function in order to make chaotic search and strengthen the diversity of the population. The simulation results of several benchmark data sets of BAliBase show that the improved algorithm is effective and has good performances for the data sets with different similarity.

  16. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  17. DNA Sequence Alignment during Homologous Recombination.

    PubMed

    Greene, Eric C

    2016-05-27

    Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination. PMID:27129270

  18. A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis

    PubMed Central

    2014-01-01

    Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing -mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown) organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows. PMID:25133391

  19. PSAR: measuring multiple sequence alignment reliability by probabilistic sampling

    PubMed Central

    Kim, Jaebum; Ma, Jian

    2011-01-01

    Multiple sequence alignment, which is of fundamental importance for comparative genomics, is a difficult problem and error-prone. Therefore, it is essential to measure the reliability of the alignments and incorporate it into downstream analyses. We propose a new probabilistic sampling-based alignment reliability (PSAR) score. Instead of relying on heuristic assumptions, such as the correlation between alignment quality and guide tree uncertainty in progressive alignment methods, we directly generate suboptimal alignments from an input multiple sequence alignment by a probabilistic sampling method, and compute the agreement of the input alignment with the suboptimal alignments as the alignment reliability score. We construct the suboptimal alignments by an approximate method that is based on pairwise comparisons between each single sequence and the sub-alignment of the input alignment where the chosen sequence is left out. By using simulation-based benchmarks, we find that our approach is superior to existing ones, supporting that the suboptimal alignments are highly informative source for assessing alignment reliability. We apply the PSAR method to the alignments in the UCSC Genome Browser to measure the reliability of alignments in different types of regions, such as coding exons and conserved non-coding regions, and use it to guide cross-species conservation study. PMID:21576232

  20. PROMALS web server for accurate multiple protein sequence alignments.

    PubMed

    Pei, Jimin; Kim, Bong-Hyun; Tang, Ming; Grishin, Nick V

    2007-07-01

    Multiple sequence alignments are essential in homology inference, structure modeling, functional prediction and phylogenetic analysis. We developed a web server that constructs multiple protein sequence alignments using PROMALS, a progressive method that improves alignment quality by using additional homologs from PSI-BLAST searches and secondary structure predictions from PSIPRED. PROMALS shows higher alignment accuracy than other advanced methods, such as MUMMALS, ProbCons, MAFFT and SPEM. The PROMALS web server takes FASTA format protein sequences as input. The output includes a colored alignment augmented with information about sequence grouping, predicted secondary structures and positional conservation. The PROMALS web server is available at: http://prodata.swmed.edu/promals/ PMID:17452345

  1. Blasting and Zipping: Sequence Alignment and Mutual Information

    NASA Astrophysics Data System (ADS)

    Penner, Orion; Grassberger, Peter; Paczuski, Maya

    2009-03-01

    Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. While the accomplishments of sequence alignment algorithms are undeniable the fact remains that these algorithms are based upon heuristic scoring schemes. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure - the mutual information (MI) - numerous previous attempts to connect sequence alignment and information have not produced realistic estimates for the MI from a given alignment. We report on a simple and flexible approach to get robust estimates of MI from global alignments. The presented results may help establish MI as a reliable tool for evaluating the quality of global alignments, judging the relative merits of different alignment algorithms, and estimating the significance of specific alignments.

  2. Cover song identification by sequence alignment algorithms

    NASA Astrophysics Data System (ADS)

    Wang, Chih-Li; Zhong, Qian; Wang, Szu-Ying; Roychowdhury, Vwani

    2011-10-01

    Content-based music analysis has drawn much attention due to the rapidly growing digital music market. This paper describes a method that can be used to effectively identify cover songs. A cover song is a song that preserves only the crucial melody of its reference song but different in some other acoustic properties. Hence, the beat/chroma-synchronous chromagram, which is insensitive to the variation of the timber or rhythm of songs but sensitive to the melody, is chosen. The key transposition is achieved by cyclically shifting the chromatic domain of the chromagram. By using the Hidden Markov Model (HMM) to obtain the time sequences of songs, the system is made even more robust. Similar structure or length between the cover songs and its reference are not necessary by the Smith-Waterman Alignment Algorithm.

  3. On Quantum Algorithm for Multiple Alignment of Amino Acid Sequences

    NASA Astrophysics Data System (ADS)

    Iriyama, Satoshi; Ohya, Masanori

    2009-02-01

    The alignment of genome sequences or amino acid sequences is one of fundamental operations for the study of life. Usual computational complexity for the multiple alignment of N sequences with common length L by dynamic programming is O(LN). This alignment is considered as one of the NP problems, so that it is desirable to find a nice algorithm of the multiple alignment. Thus in this paper we propose the quantum algorithm for the multiple alignment based on the works12,1,2 in which the NP complete problem was shown to be the P problem by means of quantum algorithm and chaos information dynamics.

  4. NSeq: a multithreaded Java application for finding positioned nucleosomes from sequencing data

    PubMed Central

    Nellore, Abhinav; Bobkov, Konstantin; Howe, Elizabeth; Pankov, Aleksandr; Diaz, Aaron; Song, Jun S.

    2012-01-01

    We introduce NSeq, a fast and efficient Java application for finding positioned nucleosomes from the high-throughput sequencing of MNase-digested mononucleosomal DNA. NSeq includes a user-friendly graphical interface, computes false discovery rates (FDRs) for candidate nucleosomes from Monte Carlo simulations, plots nucleosome coverage and centers, and exploits the availability of multiple processor cores by parallelizing its computations. Java binaries and source code are freely available at https://github.com/songlab/NSeq. The software is supported on all major platforms equipped with Java Runtime Environment 6 or later. PMID:23335939

  5. MULTAN: a program to align multiple DNA sequences.

    PubMed Central

    Bains, W

    1986-01-01

    I describe a computer program which can align a large number of nucleic acid sequences with one another. The program uses an heuristic, iterative algorithm which has been tested extensively, and is found to produce useful alignments of a variety of sequence families. The algorithm is fast enough to be practical for the analysis of large number of sequences, and is implemented in a program which contains a variety of other functions to facilitate the analysis of the aligned result. PMID:3003672

  6. Local alignment of two-base encoded DNA sequence

    PubMed Central

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-01-01

    Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732

  7. Probabilistic sequence alignment of stratigraphic records

    NASA Astrophysics Data System (ADS)

    Lin, Luan; Khider, Deborah; Lisiecki, Lorraine E.; Lawrence, Charles E.

    2014-10-01

    The assessment of age uncertainty in stratigraphically aligned records is a pressing need in paleoceanographic research. The alignment of ocean sediment cores is used to develop mutually consistent age models for climate proxies and is often based on the δ18O of calcite from benthic foraminifera, which records a global ice volume and deep water temperature signal. To date, δ18O alignment has been performed by manual, qualitative comparison or by deterministic algorithms. Here we present a hidden Markov model (HMM) probabilistic algorithm to find 95% confidence bands for δ18O alignment. This model considers the probability of every possible alignment based on its fit to the δ18O data and transition probabilities for sedimentation rate changes obtained from radiocarbon-based estimates for 37 cores. Uncertainty is assessed using a stochastic back trace recursion to sample alignments in exact proportion to their probability. We applied the algorithm to align 35 late Pleistocene records to a global benthic δ18O stack and found that the mean width of 95% confidence intervals varies between 3 and 23 kyr depending on the resolution and noisiness of the record's δ18O signal. Confidence bands within individual cores also vary greatly, ranging from ~0 to >40 kyr. These alignment uncertainty estimates will allow researchers to examine the robustness of their conclusions, including the statistical evaluation of lead-lag relationships between events observed in different cores.

  8. Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks

    PubMed Central

    Bawono, Punto; van der Velde, Arjan; Abeln, Sanne; Heringa, Jaap

    2015-01-01

    Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request. PMID:25993129

  9. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm

    PubMed Central

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. PMID:27065770

  10. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm.

    PubMed

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. PMID:27065770

  11. A simple method to control over-alignment in the MAFFT multiple sequence alignment program

    PubMed Central

    Katoh, Kazutaka; Standley, Daron M.

    2016-01-01

    Motivation: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments). Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction. Results: The proposed method utilizes a variable scoring matrix for different pairs of sequences (or groups) in a single multiple sequence alignment, based on the global similarity of each pair. This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions. Regarding sensitivity, the effect of the proposed method is slightly negative in real protein-based benchmarks, and mostly neutral in simulation-based benchmarks. This approach is based on natural biological reasoning and should be compatible with many methods based on dynamic programming for multiple sequence alignment. Availability and implementation: The new feature is available in MAFFT versions 7.263 and higher. http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153688

  12. An information theoretic approach to macromolecular modeling: I. Sequence alignments.

    PubMed

    Aynechi, Tiba; Kuntz, Irwin D

    2005-11-01

    We are interested in applying the principles of information theory to structural biology calculations. In this article, we explore the information content of an important computational procedure: sequence alignment. Using a reference state developed from exhaustive sequences, we measure alignment statistics and evaluate gap penalties based on first-principle considerations and gap distributions. We show that there are different gap penalties for different alphabet sizes and that the gap penalties can depend on the length of the sequences being aligned. In a companion article, we examine the information content of molecular force fields. PMID:16254389

  13. Improving multiple sequence alignment by using better guide trees

    PubMed Central

    2015-01-01

    Progressive sequence alignment is one of the most commonly used method for multiple sequence alignment. Roughly speaking, the method first builds a guide tree, and then aligns the sequences progressively according to the topology of the tree. It is believed that guide trees are very important to progressive alignment; a better guide tree will give an alignment with higher accuracy. Recently, we have proposed an adaptive method for constructing guide trees. This paper studies the quality of the guide trees constructed by such method. Our study showed that our adaptive method can be used to improve the accuracy of many different progressive MSA tools. In fact, we give evidences showing that the guide trees constructed by the adaptive method are among the best. PMID:25859903

  14. Look-Align: an interactive web-based multiple sequence alignment viewer with polymorphism analysis support

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We have developed Look-Align, an interactive web-based viewer to display pre-computed multiple sequence alignments. Although initially developed to support the visualization needs of the maize diversity website Panzea (http://www.panzea.org), the viewer is a generic stand-alone tool that can be easi...

  15. Refinement by shifting secondary structure elements improves sequence alignments.

    PubMed

    Tong, Jing; Pei, Jimin; Otwinowski, Zbyszek; Grishin, Nick V

    2015-03-01

    Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa. PMID:25546158

  16. Refinement by shifting secondary structure elements improves sequence alignments

    PubMed Central

    Tong, Jing; Pei, Jimin; Otwinowski, Zbyszek; Grishin, Nick V.

    2015-01-01

    Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa. PMID:25546158

  17. Protein multiple sequence alignment by hybrid bio-inspired algorithms.

    PubMed

    Cutello, Vincenzo; Nicosia, Giuseppe; Pavone, Mario; Prizzi, Igor

    2011-03-01

    This article presents an immune inspired algorithm to tackle the Multiple Sequence Alignment (MSA) problem. MSA is one of the most important tasks in biological sequence analysis. Although this paper focuses on protein alignments, most of the discussion and methodology may also be applied to DNA alignments. The problem of finding the multiple alignment was investigated in the study by Bonizzoni and Vedova and Wang and Jiang, and proved to be a NP-hard (non-deterministic polynomial-time hard) problem. The presented algorithm, called Immunological Multiple Sequence Alignment Algorithm (IMSA), incorporates two new strategies to create the initial population and specific ad hoc mutation operators. It is based on the 'weighted sum of pairs' as objective function, to evaluate a given candidate alignment. IMSA was tested using both classical benchmarks of BAliBASE (versions 1.0, 2.0 and 3.0), and experimental results indicate that it is comparable with state-of-the-art multiple alignment algorithms, in terms of quality of alignments, weighted Sums-of-Pairs (SP) and Column Score (CS) values. The main novelty of IMSA is its ability to generate more than a single suboptimal alignment, for every MSA instance; this behaviour is due to the stochastic nature of the algorithm and of the populations evolved during the convergence process. This feature will help the decision maker to assess and select a biologically relevant multiple sequence alignment. Finally, the designed algorithm can be used as a local search procedure to properly explore promising alignments of the search space. PMID:21071394

  18. Scoring consensus of multiple ECG annotators by optimal sequence alignment.

    PubMed

    Haghpanahi, Masoumeh; Sameni, Reza; Borkholder, David A

    2014-01-01

    Development of ECG delineation algorithms has been an area of intense research in the field of computational cardiology for the past few decades. However, devising evaluation techniques for scoring and/or merging the results of such algorithms, both in the presence or absence of gold standards, still remains as a challenge. This is mainly due to existence of missed or erroneous determination of fiducial points in the results of different annotation algorithms. The discrepancy between different annotators increases when the reference signal includes arrhythmias or significant noise and its morphology deviates from a clean ECG signal. In this work, we propose a new approach to evaluate and compare the results of different annotators under such conditions. Specifically, we use sequence alignment techniques similar to those used in bioinformatics for the alignment of gene sequences. Our approach is based on dynamic programming where adequate mismatch penalties, depending on the type of the fiducial point and the underlying signal, are defined to optimally align the annotation sequences. We also discuss how to extend the algorithm for more than two sequences by using suitable data structures to align multiple annotation sequences with each other. Once the sequences are aligned, different heuristics are devised to evaluate the performance against a gold standard annotation, or to merge the results of multiple annotations when no gold standard exists. PMID:25570339

  19. Mercury BLASTP: Accelerating Protein Sequence Alignment.

    PubMed

    Jacob, Arpith; Lancaster, Joseph; Buhler, Jeremy; Harris, Brandon; Chamberlain, Roger D

    2008-06-01

    Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11-15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results. PMID:19492068

  20. Mercury BLASTP: Accelerating Protein Sequence Alignment

    PubMed Central

    Jacob, Arpith; Lancaster, Joseph; Buhler, Jeremy; Harris, Brandon; Chamberlain, Roger D.

    2008-01-01

    Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11-15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results. PMID:19492068

  1. Evaluating the Accuracy and Efficiency of Multiple Sequence Alignment Methods

    PubMed Central

    Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Muhammad; Awan, Ali Raza; Aslam, Naeem; Hussain, Tanveer; Naveed, Nasir; Qadri, Salman; Waheed, Usman; Shoaib, Muhammad

    2014-01-01

    A comparison of 10 most popular Multiple Sequence Alignment (MSA) tools, namely, MUSCLE, MAFFT(L-INS-i), MAFFT (FFT-NS-2), T-Coffee, ProbCons, SATe, Clustal Omega, Kalign, Multalin, and Dialign-TX is presented. We also focused on the significance of some implementations embedded in algorithm of each tool. Based on 10 simulated trees of different number of taxa generated by R, 400 known alignments and sequence files were constructed using indel-Seq-Gen. A total of 4000 test alignments were generated to study the effect of sequence length, indel size, deletion rate, and insertion rate. Results showed that alignment quality was highly dependent on the number of deletions and insertions in the sequences and that the sequence length and indel size had a weaker effect. Overall, ProbCons was consistently on the top of list of the evaluated MSA tools. SATe, being little less accurate, was 529.10% faster than ProbCons and 236.72% faster than MAFFT(L-INS-i). Among other tools, Kalign and MUSCLE achieved the highest sum of pairs. We also considered BALiBASE benchmark datasets and the results relative to BAliBASE- and indel-Seq-Gen-generated alignments were consistent in the most cases. PMID:25574120

  2. Recursive dynamic programming for adaptive sequence and structure alignment

    SciTech Connect

    Thiele, R.; Zimmer, R.; Lengauer, T.

    1995-12-31

    We propose a new alignment procedure that is capable of aligning protein sequences and structures in a unified manner. Recursive dynamic programming (RDP) is a hierarchical method which, on each level of the hierarchy, identifies locally optimal solutions and assembles them into partial alignments of sequences and/or structures. In contrast to classical dynamic programming, RDP can also handle alignment problems that use objective functions not obeying the principle of prefix optimality, e.g. scoring schemes derived from energy potentials of mean force. For such alignment problems, RDP aims at computing solutions that are near-optimal with respect to the involved cost function and biologically meaningful at the same time. Towards this goal, RDP maintains a dynamic balance between different factors governing alignment fitness such as evolutionary relationships and structural preferences. As in the RDP method gaps are not scored explicitly, the problematic assignment of gap cost parameters is circumvented. In order to evaluate the RDP approach we analyse whether known and accepted multiple alignments based on structural information can be reproduced with the RDP method.

  3. Image-based temporal alignment of echocardiographic sequences

    NASA Astrophysics Data System (ADS)

    Danudibroto, Adriyana; Bersvendsen, Jørn; Mirea, Oana; Gerard, Olivier; D'hooge, Jan; Samset, Eigil

    2016-04-01

    Temporal alignment of echocardiographic sequences enables fair comparisons of multiple cardiac sequences by showing corresponding frames at given time points in the cardiac cycle. It is also essential for spatial registration of echo volumes where several acquisitions are combined for enhancement of image quality or forming larger field of view. In this study, three different image-based temporal alignment methods were investigated. First, a method based on dynamic time warping (DTW). Second, a spline-based method that optimized the similarity between temporal characteristic curves of the cardiac cycle using 1D cubic B-spline interpolation. Third, a method based on the spline-based method with piecewise modification. These methods were tested on in-vivo data sets of 19 echo sequences. For each sequence, the mitral valve opening (MVO) time was manually annotated. The results showed that the average MVO timing error for all methods are well under the time resolution of the sequences.

  4. Heuristic reusable dynamic programming: efficient updates of local sequence alignment.

    PubMed

    Hong, Changjin; Tewfik, Ahmed H

    2009-01-01

    Recomputation of the previously evaluated similarity results between biological sequences becomes inevitable when researchers realize errors in their sequenced data or when the researchers have to compare nearly similar sequences, e.g., in a family of proteins. We present an efficient scheme for updating local sequence alignments with an affine gap model. In principle, using the previous matching result between two amino acid sequences, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Given a correctly updated sequence, we initially predict a new score of the alignment path for each contour to select the best candidates among them. Then, we run the Smith-Waterman algorithm in this confined space. Furthermore, our heuristic alignment for an updated sequence shows that it can be further accelerated by using reusable dynamic programming (rDP), our prior work. In this study, we successfully validate "relative node tolerance bound" (RNTB) in the pruned searching space. Furthermore, we improve the computational performance by quantifying the successful RNTB tolerance probability and switch to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90 percent of the optimal alignment score, we find that 98.3 percent of contours contain correctly updated paths. We also find that our method consumes only 25.36 percent of the runtime cost of sparse dynamic programming (sDP) method, and to only 2.55 percent of that of a normal dynamic programming with the Smith-Waterman algorithm. PMID:19875856

  5. A novel approach to multiple sequence alignment using hadoop data grids.

    PubMed

    Sudha Sadasivam, G; Baktavatchalam, G

    2010-01-01

    Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences. PMID:21224205

  6. SRP-RNA sequence alignment and secondary structure.

    PubMed Central

    Larsen, N; Zwieb, C

    1991-01-01

    The secondary structures of the RNAs from the signal recognition particle, termed SRP-RNA, were derived buy comparative analyses of an alignment of 39 sequences. The models are minimal in that only base pairs are included for which there is comparative evidence. The structures represent refinements of earlier versions and include a new short helix. PMID:1707519

  7. IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments.

    PubMed

    Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Naeem; Naveed, Nasir; Ahmad, Sarfraz; Muhammad, Shah; Qadri, Salman; Shahid, Muhammad; Hussain, Tanveer; Javed, Maryam

    2015-01-01

    IVisTMSA is a software package of seven graphical tools for multiple sequence alignments. MSApad is an editing and analysis tool. It can load 409% more data than Jalview, STRAP, CINEMA, and Base-by-Base. MSA comparator allows the user to visualize consistent and inconsistent regions of reference and test alignments of more than 21-MB size in less than 12 seconds. MSA comparator is 5,200% efficient and more than 40% efficient as compared to BALiBASE c program and FastSP, respectively. MSA reconstruction tool provides graphical user interfaces for four popular aligners and allows the user to load several sequence files at a time. FASTA generator converts seven formats of alignments of unlimited size into FASTA format in a few seconds. MSA ID calculator calculates identity matrix of more than 11,000 sequences with a sequence length of 2,696 base pairs in less than 100 seconds. Tree and Distance Matrix calculation tools generate phylogenetic tree and distance matrix, respectively, using neighbor joining% identity and BLOSUM 62 matrix. PMID:25861209

  8. IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments

    PubMed Central

    Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Naeem; Naveed, Nasir; Ahmad, Sarfraz; Muhammad, Shah; Qadri, Salman; Shahid, Muhammad; Hussain, Tanveer; Javed, Maryam

    2015-01-01

    IVisTMSA is a software package of seven graphical tools for multiple sequence alignments. MSApad is an editing and analysis tool. It can load 409% more data than Jalview, STRAP, CINEMA, and Base-by-Base. MSA comparator allows the user to visualize consistent and inconsistent regions of reference and test alignments of more than 21-MB size in less than 12 seconds. MSA comparator is 5,200% efficient and more than 40% efficient as compared to BALiBASE c program and FastSP, respectively. MSA reconstruction tool provides graphical user interfaces for four popular aligners and allows the user to load several sequence files at a time. FASTA generator converts seven formats of alignments of unlimited size into FASTA format in a few seconds. MSA ID calculator calculates identity matrix of more than 11,000 sequences with a sequence length of 2,696 base pairs in less than 100 seconds. Tree and Distance Matrix calculation tools generate phylogenetic tree and distance matrix, respectively, using neighbor joining% identity and BLOSUM 62 matrix. PMID:25861209

  9. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

    PubMed Central

    Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan

    2014-01-01

    Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations. Availability https://hive.biochemistry.gwu.edu/hive/ PMID:24918764

  10. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences

    PubMed Central

    2014-01-01

    Background Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. Findings FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. Conclusions The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data. PMID:24929426

  11. Distributed sequence alignment applications for the public computing architecture.

    PubMed

    Pellicer, S; Chen, G; Chan, K C C; Pan, Y

    2008-03-01

    The public computer architecture shows promise as a platform for solving fundamental problems in bioinformatics such as global gene sequence alignment and data mining with tools such as the basic local alignment search tool (BLAST). Our implementation of these two problems on the Berkeley open infrastructure for network computing (BOINC) platform demonstrates a runtime reduction factor of 1.15 for sequence alignment and 16.76 for BLAST. While the runtime reduction factor of the global gene sequence alignment application is modest, this value is based on a theoretical sequential runtime extrapolated from the calculation of a smaller problem. Because this runtime is extrapolated from running the calculation in memory, the theoretical sequential runtime would require 37.3 GB of memory on a single system. With this in mind, the BOINC implementation not only offers the reduced runtime, but also the aggregation of the available memory of all participant nodes. If an actual sequential run of the problem were compared, a more drastic reduction in the runtime would be seen due to an additional secondary storage I/O overhead for a practical system. Despite the limitations of the public computer architecture, most notably in communication overhead, it represents a practical platform for grid- and cluster-scale bioinformatics computations today and shows great potential for future implementations. PMID:18334454

  12. Rapid automatic detection and alignment of repeats in protein sequences.

    PubMed

    Heger, A; Holm, L

    2000-11-01

    Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. We have developed an automatic algorithm, RADAR, for segmenting a query sequence into repeats. The segmentation procedure has three steps: (i) repeat length is determined by the spacing between suboptimal self-alignment traces; (ii) repeat borders are optimized to yield a maximal integer number of repeats, and (iii) distant repeats are validated by iterative profile alignment. The method identifies short composition biased as well as gapped approximate repeats and complex repeat architectures involving many different types of repeats in the query sequence. No manual intervention and no prior assumptions on the number and length of repeats are required. Comparison to the Pfam-A database indicates good coverage, accurate alignments, and reasonable repeat borders. Screening the Swissprot database revealed 3,000 repeats not annotated in existing domain databases. A number of these repeats had been described in the literature but most were novel. This illustrates how in times when curated databases grapple with ever increasing backlogs, automatic (re)analysis of sequences provides an efficient way to capture this important information. PMID:10966575

  13. Sequence Alignment Tools: One Parallel Pattern to Rule Them All?

    PubMed Central

    2014-01-01

    In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets. PMID:25147803

  14. Optimizing Data Intensive GPGPU Computations for DNA Sequence Alignment

    PubMed Central

    Trapnell, Cole; Schatz, Michael C.

    2009-01-01

    MUMmerGPU uses highly-parallel commodity graphics processing units (GPU) to accelerate the data-intensive computation of aligning next generation DNA sequence data to a reference sequence for use in diverse applications such as disease genotyping and personal genomics. MUMmerGPU 2.0 features a new stackless depth-first-search print kernel and is 13× faster than the serial CPU version of the alignment code and nearly 4× faster in total computation time than MUMmerGPU 1.0. We exhaustively examined 128 GPU data layout configurations to improve register footprint and running time and conclude higher occupancy has greater impact than reduced latency. MUMmerGPU is available open-source at http://mummergpu.sourceforge.net. PMID:20161021

  15. Reconfigurable systems for sequence alignment and for general dynamic programming.

    PubMed

    Jacobi, Ricardo P; Ayala-Rincón, Mauricio; Carvalho, Luis G A; Llanos, Carlos H; Hartenstein, Reiner W

    2005-01-01

    Reconfigurable systolic arrays can be adapted to efficiently resolve a wide spectrum of computational problems; parallelism is naturally explored in systolic arrays and reconfigurability allows for redefinition of the interconnections and operations even during run time (dynamically). We present a reconfigurable systolic architecture that can be applied for the efficient treatment of several dynamic programming methods for resolving well-known problems, such as global and local sequence alignment, approximate string matching and longest common subsequence. The dynamicity of the reconfigurability was found to be useful for practical applications in the construction of sequence alignments. A VHDL (VHSIC hardware description language) version of this new architecture was implemented on an APEX FPGA (Field programmable gate array). It would be several magnitudes faster than the software algorithm alternatives. PMID:16342039

  16. Prokaryotic Phylogeny Based on Complete Genomes Without Sequence Alignment

    NASA Astrophysics Data System (ADS)

    Hao, Bailin; Qi, Ji; Wang, Bin

    We present a brief review of a series of on-going work on bacterial phylogeny. We propose a new method to infer relatedness of prokaryotes from their complete genome data without using sequence alignment, leading to results comparable with the bacteriologist's systematics as reflected in the latest 2001 edition of Bergey's Manual of Systematic Bacteriology.1 We only touch on the mathematical aspects of the method. The biological implications of our results will be published elsewhere.

  17. Exploring Dance Movement Data Using Sequence Alignment Methods

    PubMed Central

    Chavoshi, Seyed Hossein; De Baets, Bernard; Neutens, Tijs; De Tré, Guy; Van de Weghe, Nico

    2015-01-01

    Despite the abundance of research on knowledge discovery from moving object databases, only a limited number of studies have examined the interaction between moving point objects in space over time. This paper describes a novel approach for measuring similarity in the interaction between moving objects. The proposed approach consists of three steps. First, we transform movement data into sequences of successive qualitative relations based on the Qualitative Trajectory Calculus (QTC). Second, sequence alignment methods are applied to measure the similarity between movement sequences. Finally, movement sequences are grouped based on similarity by means of an agglomerative hierarchical clustering method. The applicability of this approach is tested using movement data from samba and tango dancers. PMID:26181435

  18. MACSIMS : multiple alignment of complete sequences information management system

    PubMed Central

    Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

    2006-01-01

    Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820

  19. Extracting protein alignment models from the sequence database.

    PubMed Central

    Neuwald, A F; Liu, J S; Lipman, D J; Lawrence, C E

    1997-01-01

    Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences. PMID:9108146

  20. Genome-wide synteny through highly sensitive sequence alignment: Satsuma

    PubMed Central

    Grabherr, Manfred G.; Russell, Pamela; Meyer, Miriah; Mauceli, Evan; Alföldi, Jessica; Di Palma, Federica; Lindblad-Toh, Kerstin

    2010-01-01

    Motivation: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes). Results: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous ‘battleship’-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine. Availability: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/ Contact: grabherr@broadinstitute.org PMID:20208069

  1. Fractal MapReduce decomposition of sequence alignment

    PubMed Central

    2012-01-01

    Background The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required. Results In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming. Conclusions The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing. Availability Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm". PMID:22551205

  2. Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment

    PubMed Central

    Kam, Alfred; Kwak, Daniel; Leung, Clarence; Wu, Chu; Zarour, Eleyine; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2012-01-01

    Background Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. Methodology/Principal Findings We introduce Phylo, a human-based computing framework applying “crowd sourcing” techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. Conclusions/Significance We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of “human-brain peta-flops” of computation that are spent every day playing games. Phylo is

  3. Prokaryotic Phylogeny Based on Complete Genomes Without Sequence Alignment

    NASA Astrophysics Data System (ADS)

    Hao, Bailin; Qi, Ji; Wang, Bin

    2003-04-01

    This is a brief review of a series of on-going work on bacterial phylogeny. We have proposed a new method to infer relatedness of prokaryotes from their complete genome data without using sequence alignment. It has led to results comparable with the bacteriologists' systematics as reflected in the latest 2001 edition of the Bergey's Manual of Systematic Bacteriology1. In what follows we only touch on the mathematical aspects of the method. The biological implications of our results will be published elsewhere.

  4. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

    PubMed Central

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-01-01

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam. PMID:21988835

  5. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search

    NASA Technical Reports Server (NTRS)

    Wheeler, Ward C.

    2003-01-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.

  6. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search.

    PubMed

    Wheeler, Ward C

    2003-06-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. PMID:12901383

  7. DNA sequence alignment by microhomology sampling during homologous recombination

    PubMed Central

    Qi, Zhi; Redding, Sy; Lee, Ja Yil; Gibb, Bryan; Kwon, YoungHo; Niu, Hengyao; Gaines, William A.; Sung, Patrick

    2015-01-01

    Summary Homologous recombination (HR) mediates the exchange of genetic information between sister or homologous chromatids. During HR, members of the RecA/Rad51 family of recombinases must somehow search through vast quantities of DNA sequence to align and pair ssDNA with a homologous dsDNA template. Here we use single-molecule imaging to visualize Rad51 as it aligns and pairs homologous DNA sequences in real-time. We show that Rad51 uses a length-based recognition mechanism while interrogating dsDNA, enabling robust kinetic selection of 8-nucleotide (nt) tracts of microhomology, which kinetically confines the search to sites with a high probability of being a homologous target. Successful pairing with a 9th nucleotide coincides with an additional reduction in binding free energy and subsequent strand exchange occurs in precise 3-nt steps, reflecting the base triplet organization of the presynaptic complex. These findings provide crucial new insights into the physical and evolutionary underpinnings of DNA recombination. PMID:25684365

  8. MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information.

    PubMed

    Balech, Bachir; Vicario, Saverio; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-08-01

    Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode. PMID:25819080

  9. PROMALS3D web server for accurate multiple protein sequence and structure alignments.

    PubMed

    Pei, Jimin; Tang, Ming; Grishin, Nick V

    2008-07-01

    Multiple sequence alignments are essential in computational sequence and structural analysis, with applications in homology detection, structure modeling, function prediction and phylogenetic analysis. We report PROMALS3D web server for constructing alignments for multiple protein sequences and/or structures using information from available 3D structures, database homologs and predicted secondary structures. PROMALS3D shows higher alignment accuracy than a number of other advanced methods. Input of PROMALS3D web server can be FASTA format protein sequences, PDB format protein structures and/or user-defined alignment constraints. The output page provides alignments with several formats, including a colored alignment augmented with useful information about sequence grouping, predicted secondary structures and consensus sequences. Intermediate results of sequence and structural database searches are also available. The PROMALS3D web server is available at: http://prodata.swmed.edu/promals3d/. PMID:18503087

  10. Multiple sequence alignment based on combining genetic algorithm with chaotic sequences.

    PubMed

    Gao, C; Wang, B; Zhou, C J; Zhang, Q

    2016-01-01

    In bioinformatics, sequence alignment is one of the most common problems. Multiple sequence alignment is an NP (nondeterministic polynomial time) problem, which requires further study and exploration. The chaos optimization algorithm is a type of chaos theory, and a procedure for combining the genetic algorithm (GA), which uses ergodicity, and inherent randomness of chaotic iteration. It is an efficient method to solve the basic premature phenomenon of the GA. Applying the Logistic map to the GA and using chaotic sequences to carry out the chaotic perturbation can improve the convergence of the basic GA. In addition, the random tournament selection and optimal preservation strategy are used in the GA. Experimental evidence indicates good results for this process. PMID:27420977

  11. Cardiac motion estimation by joint alignment of tagged MRI sequences.

    PubMed

    Oubel, E; De Craene, M; Hero, A O; Pourmorteza, A; Huguet, M; Avegliano, G; Bijnens, B H; Frangi, A F

    2012-01-01

    Image registration has been proposed as an automatic method for recovering cardiac displacement fields from tagged Magnetic Resonance Imaging (tMRI) sequences. Initially performed as a set of pairwise registrations, these techniques have evolved to the use of 3D+t deformation models, requiring metrics of joint image alignment (JA). However, only linear combinations of cost functions defined with respect to the first frame have been used. In this paper, we have applied k-Nearest Neighbors Graphs (kNNG) estimators of the α-entropy (H(α)) to measure the joint similarity between frames, and to combine the information provided by different cardiac views in an unified metric. Experiments performed on six subjects showed a significantly higher accuracy (p<0.05) with respect to a standard pairwise alignment (PA) approach in terms of mean positional error and variance with respect to manually placed landmarks. The developed method was used to study strains in patients with myocardial infarction, showing a consistency between strain, infarction location, and coronary occlusion. This paper also presents an interesting clinical application of graph-based metric estimators, showing their value for solving practical problems found in medical imaging. PMID:22000567

  12. Fast single-pass alignment and variant calling using sequencing data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  13. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and 3-dimensional structural information

    PubMed Central

    Pei, Jimin; Grishin, Nick V.

    2015-01-01

    SUMMARY Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of 3-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D web server and package are available at http://prodata.swmed.edu/PROMALS3D. PMID:24170408

  14. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information.

    PubMed

    Pei, Jimin; Grishin, Nick V

    2014-01-01

    Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D. PMID:24170408

  15. B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC.

    PubMed

    Cui, Yingbo; Liao, Xiangke; Zhu, Xiaoqian; Wang, Bingqiang; Peng, Shaoliang

    2016-03-01

    Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision. PMID:26358141

  16. Score distributions of gapped multiple sequence alignments down to the low-probability tail.

    PubMed

    Fieth, Pascal; Hartmann, Alexander K

    2016-08-01

    Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10^{-160}, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments. PMID:27627266

  17. Determining Word Sequence Variation Patterns in Clinical Documents using Multiple Sequence Alignment

    PubMed Central

    Meng, Frank; Morioka, Craig A.; El-Saden, Suzie

    2011-01-01

    Sentences and phrases that represent a certain meaning often exhibit patterns of variation where they differ from a basic structural form by one or two words. We present an algorithm that utilizes multiple sequence alignments (MSAs) to generate a representation of groups of phrases that possess the same semantic meaning but also share in common the same basic word sequence structure. The MSA enables the determination not only of the words that compose the basic word sequence, but also of the locations within the structure that exhibit variation. The algorithm can be utilized to generate patterns of text sequences that can be used as the basis for a pattern-based classifier, as a starting point to bootstrap the pattern building process for a regular expression-based classifiers, or serve to reveal the variation characteristics of sentences and phrases within a particular domain. PMID:22195152

  18. The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment

    PubMed Central

    Altschul, Stephen F.; Wootton, John C.; Zaslavsky, Elena; Yu, Yi-Kuo

    2010-01-01

    Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using Bayesian methods to construct “BILD” (“Bayesian Integral Log-odds”) substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We describe how to calculate BILD scores efficiently, and illustrate their uses in Gibbs sampling optimization procedures, gapped alignment, and the construction of hidden Markov model profiles. BILD scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and deletion locations. Other applications include the classification of related sequences into subfamilies, and the definition of profile-profile alignment scores. Although a fully realized multiple alignment program must rely upon more than substitution scores, many existing multiple alignment programs can be modified to employ BILD scores. We illustrate how simple BILD score based strategies can enhance the recognition of DNA binding domains, including the Api-AP2 domain in Toxoplasma gondii and Plasmodium falciparum. PMID:20657661

  19. Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences.

    PubMed

    Lanave, C; Liuni, S; Licciulli, F; Attimonelli, M

    2000-01-01

    The AMmtDB database (http://bio-www.ba.cnr.it:8000/srs6/ ) has been updated by collecting the multi-aligned sequences of Chordata mitochondrial genes coding for proteins and tRNAs. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually. PMID:10592208

  20. Multiple sequence alignment: a major challenge to large-scale phylogenetics

    PubMed Central

    Liu, Kevin; Linder, C. Randal; Warnow, Tandy

    2011-01-01

    Over the last decade, dramatic advances have been made in developing methods for large-scale phylogeny estimation, so that it is now feasible for investigators with moderate computational resources to obtain reasonable solutions to maximum likelihood and maximum parsimony, even for datasets with a few thousand sequences. There has also been progress on developing methods for multiple sequence alignment, so that greater alignment accuracy (and subsequent improvement in phylogenetic accuracy) is now possible through automated methods. However, these methods have not been tested under conditions that reflect properties of datasets confronted by large-scale phylogenetic estimation projects. In this paper we report on a study that compares several alignment methods on a benchmark collection of nucleotide sequence datasets of up to 78,132 sequences. We show that as the number of sequences increases, the number of alignment methods that can analyze the datasets decreases. Furthermore, the most accurate alignment methods are unable to analyze the very largest datasets we studied, so that only moderately accurate alignment methods can be used on the largest datasets. As a result, alignments computed for large datasets have relatively large error rates, and maximum likelihood phylogenies computed on these alignments also have high error rates. Therefore, the estimation of highly accurate multiple sequence alignments is a major challenge for Tree of Life projects, and more generally for large-scale systematics studies. PMID:21113338

  1. Slider—maximum use of probability information for alignment of short sequence reads and SNP detection

    PubMed Central

    Malhis, Nawar; Butterfield, Yaron S. N.; Ester, Martin; Jones, Steven J. M.

    2009-01-01

    Motivation: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. Results: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality. Contact: nmalhis@bcgsc.ca Supplementary information and availability: http://www.bcgsc.ca/platform/bioinfo/software/slider PMID:18974170

  2. Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies.

    PubMed

    Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G

    2012-09-01

    Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. PMID:22676903

  3. Java bioinformatics analysis web services for multiple sequence alignment—JABAWS:MSA

    PubMed Central

    Troshin, Peter V.; Procter, James B.; Barton, Geoffrey J.

    2011-01-01

    Summary: JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts. Availability and Implementation: JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws. Contact: g.j.barton@dundee.ac.uk PMID:21593132

  4. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  5. Linking GPS and travel diary data using sequence alignment in a study of children's independent mobility

    PubMed Central

    2011-01-01

    Background Global positioning systems (GPS) are increasingly being used in health research to determine the location of study participants. Combining GPS data with data collected via travel/activity diaries allows researchers to assess where people travel in conjunction with data about trip purpose and accompaniment. However, linking GPS and diary data is problematic and to date the only method has been to match the two datasets manually, which is time consuming and unlikely to be practical for larger data sets. This paper assesses the feasibility of a new sequence alignment method of linking GPS and travel diary data in comparison with the manual matching method. Methods GPS and travel diary data obtained from a study of children's independent mobility were linked using sequence alignment algorithms to test the proof of concept. Travel diaries were assessed for quality by counting the number of errors and inconsistencies in each participant's set of diaries. The success of the sequence alignment method was compared for higher versus lower quality travel diaries, and for accompanied versus unaccompanied trips. Time taken and percentage of trips matched were compared for the sequence alignment method and the manual method. Results The sequence alignment method matched 61.9% of all trips. Higher quality travel diaries were associated with higher match rates in both the sequence alignment and manual matching methods. The sequence alignment method performed almost as well as the manual method and was an order of magnitude faster. However, the sequence alignment method was less successful at fully matching trips and at matching unaccompanied trips. Conclusions Sequence alignment is a promising method of linking GPS and travel diary data in large population datasets, especially if limitations in the trip detection algorithm are addressed. PMID:22142322

  6. Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties

    PubMed Central

    Neuwald, Andrew F.; Altschul, Stephen F.

    2016-01-01

    We describe a Bayesian Markov chain Monte Carlo (MCMC) sampler for protein multiple sequence alignment (MSA) that, as implemented in the program GISMO and applied to large numbers of diverse sequences, is more accurate than the popular MSA programs MUSCLE, MAFFT, Clustal-Ω and Kalign. Features of GISMO central to its performance are: (i) It employs a “top-down” strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the input sequences, and then realigns closely related subgroups in tandem. (ii) It infers position-specific gap penalties that favor insertions or deletions (indels) within each sequence at alignment positions in which indels are invoked in other sequences. This favors the placement of insertions between conserved blocks, which can be understood as making up the proteins’ structural core. (iii) It uses a Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences. (iv) It defines a system for exploring alignment space that provides natural avenues for further experimentation through the development of new sampling strategies for more efficiently escaping from suboptimal traps. GISMO’s superior performance is illustrated using 408 protein sets containing, on average, 235 sequences. These sets correspond to NCBI Conserved Domain Database alignments, which have been manually curated in the light of available crystal structures, and thus provide a means to assess alignment accuracy. GISMO fills a different niche than other MSA programs, namely identifying and aligning a conserved domain present within a large, diverse set of full length sequences. The GISMO program is available at http://gismo.igs.umaryland.edu/. PMID

  7. Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties.

    PubMed

    Neuwald, Andrew F; Altschul, Stephen F

    2016-05-01

    We describe a Bayesian Markov chain Monte Carlo (MCMC) sampler for protein multiple sequence alignment (MSA) that, as implemented in the program GISMO and applied to large numbers of diverse sequences, is more accurate than the popular MSA programs MUSCLE, MAFFT, Clustal-Ω and Kalign. Features of GISMO central to its performance are: (i) It employs a "top-down" strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the input sequences, and then realigns closely related subgroups in tandem. (ii) It infers position-specific gap penalties that favor insertions or deletions (indels) within each sequence at alignment positions in which indels are invoked in other sequences. This favors the placement of insertions between conserved blocks, which can be understood as making up the proteins' structural core. (iii) It uses a Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences. (iv) It defines a system for exploring alignment space that provides natural avenues for further experimentation through the development of new sampling strategies for more efficiently escaping from suboptimal traps. GISMO's superior performance is illustrated using 408 protein sets containing, on average, 235 sequences. These sets correspond to NCBI Conserved Domain Database alignments, which have been manually curated in the light of available crystal structures, and thus provide a means to assess alignment accuracy. GISMO fills a different niche than other MSA programs, namely identifying and aligning a conserved domain present within a large, diverse set of full length sequences. The GISMO program is available at http://gismo.igs.umaryland.edu/. PMID:27192614

  8. Upcoming challenges for multiple sequence alignment methods in the high-throughput era

    PubMed Central

    Kemena, Carsten; Notredame, Cedric

    2009-01-01

    This review focuses on recent trends in multiple sequence alignment tools. It describes the latest algorithmic improvements including the extension of consistency-based methods to the problem of template-based multiple sequence alignments. Some results are presented suggesting that template-based methods are significantly more accurate than simpler alternative methods. The validation of existing methods is also discussed at length with the detailed description of recent results and some suggestions for future validation strategies. The last part of the review addresses future challenges for multiple sequence alignment methods in the genomic era, most notably the need to cope with very large sequences, the need to integrate large amounts of experimental data, the need to accurately align non-coding and non-transcribed sequences and finally, the need to integrate many alternative methods and approaches. Contact: cedric.notredame@crg.es PMID:19648142

  9. Update of AMmtDB: a database of multi-aligned Metazoa mitochondrial DNA sequences.

    PubMed

    Lanave, Cecilia; Licciulli, Flavio; De Robertis, Mariateresa; Marolla, Alessandra; Attimonelli, Marcella

    2002-01-01

    The AMmtDB database (http://bighost.area.ba.cnr.it/mitochondriome) has been updated by collecting the multi-aligned sequences of Chordata and Invertebrata mitochondrial genes coding for proteins and tRNAs. Links to the multi-aligned mtDNA intraspecies variants, collected in VarMmtDB at the Mitochondriome web site, have been introduced. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually. PMID:11752285

  10. Update of AMmtDB: a database of multi-aligned Metazoa mitochondrial DNA sequences

    PubMed Central

    Lanave, Cecilia; Licciulli, Flavio; De Robertis, Mariateresa; Marolla, Alessandra; Attimonelli, Marcella

    2002-01-01

    The AMmtDB database (http://bighost.area.ba.cnr.it/mitochondriome) has been updated by collecting the multi-aligned sequences of Chordata and Invertebrata mitochondrial genes coding for proteins and tRNAs. Links to the multi-aligned mtDNA intraspecies variants, collected in VarMmtDB at the Mitochondriome web site, have been introduced. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user’s operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually. PMID:11752285

  11. Efficient mapping of genomic sequences to optimize multiple pairwise alignment in hybrid cluster platforms.

    PubMed

    Montañola, Alberto; Roig, Concepció; Hernández, Porfidio

    2014-01-01

    Multiple sequence alignment (MSA), used in biocomputing to study similarities between different genomic sequences, is known to require important memory and computation resources. Nowadays, researchers are aligning thousands of these sequences, creating new challenges in order to solve the problem using the available resources efficiently. Determining the efficient amount of resources to allocate is important to avoid waste of them, thus reducing the economical costs required in running for example a specific cloud instance. The pairwise alignment is the initial key step of the MSA problem, which will compute all pair alignments needed. We present a method to determine the optimal amount of memory and computation resources to allocate by the pairwise alignment, and we will validate it through a set of experimental results for different possible inputs. These allow us to determine the best parameters to configure the applications in order to use effectively the available resources of a given system. PMID:25339085

  12. Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model

    PubMed Central

    Neuwald, Andrew F; Liu, Jun S

    2004-01-01

    Background Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences. Results Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: Giα subunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases. Conclusion While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of

  13. Pattern recognition and probabilistic measures in alignment-free sequence analysis.

    PubMed

    Schwende, Isabel; Pham, Tuan D

    2014-05-01

    With the massive production of genomic and proteomic data, the number of available biological sequences in databases has reached a level that is not feasible anymore for exact alignments even when just a fraction of all sequences is used. To overcome this inevitable time complexity, ultrafast alignment-free methods are studied. Within the past two decades, a broad variety of nonalignment methods have been proposed including dissimilarity measures on classical representations of sequences like k-words or Markov models. Furthermore, articles were published that describe distance measures on alternative representations such as compression complexity, spectral time series or chaos game representation. However, alignments are still the standard method for real world applications in biological sequence analysis, and the time efficient alignment-free approaches are usually applied in cases when the accustomed algorithms turn out to fail or be too inconvenient. PMID:24096012

  14. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing. PMID:27377322

  15. Predicting and improving the protein sequence alignment quality by support vector regression

    PubMed Central

    Lee, Minho; Jeong, Chan-seok; Kim, Dongsup

    2007-01-01

    Background For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment. Results In this work, we develop a method to predict the quality of the alignment between a query and a template. We train the support vector regression (SVR) models to predict the MaxSub scores as a measure of alignment quality. The alignment between a query protein and a template of length n is transformed into a (n + 1)-dimensional feature vector, then it is used as an input to predict the alignment quality by the trained SVR model. Performance of our work is evaluated by various measures including Pearson correlation coefficient between the observed and predicted MaxSub scores. Result shows high correlation coefficient of 0.945. For a pair of query and template, 48 alignments are generated by changing alignment options. Trained SVR models are then applied to predict the MaxSub scores of those and to select the best alignment option which is chosen specifically to the query-template pair. This adaptive selection procedure results in 7.4% improvement of MaxSub scores, compared to those when the single best parameter option is used for all query-template pairs. Conclusion The present work demonstrates that the alignment quality can be

  16. Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences.

    PubMed

    Lanave, C; Attimonelli, M; De Robertis, M; Licciulli, F; Liuni, S; Sbisá, E; Saccone, C

    1999-01-01

    The present paper describes AMmtDB, a database collecting the multi-aligned sequences of vertebrate mitochondrial genes coding for proteins and tRNAs, as well as the multiple alignment of the mammalian mtDNA main regulatory region (D-loop) sequences. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. As far as the genes coding for tRNAs are concerned, the multi-alignments based on the primary and the secondary structures are both provided; for the mammalian D-loop multi-alignments we report the conserved regions of the entire D-loop (CSB1, CSB2, CSB3, the central region, ETAS1 and ETAS2) as defined by Sbisà et al. [ Gene (1997), 205, 125-140). A flatfile format for AMmtDB has been designed allowing its implementation in SRS (http://bio-www.ba.cnr.it:8000/BioWWW/#AMMTDB ). Data selected through SRS can be managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALV and PILEUP programs and then carefully optimized manually. PMID:9847158

  17. Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

    PubMed

    Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

    2016-03-01

    Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. PMID:26812576

  18. A statistical physics perspective on alignment-independent protein sequence comparison

    PubMed Central

    Chattopadhyay, Amit K.; Nasiev, Diar; Flower, Darren R.

    2015-01-01

    Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from ‘first passage probability distribution’ to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. Contact: d.r.flower@aston.ac.uk PMID:25810434

  19. Update of AMmtDB: a database of multi-aligned Metazoa mitochondrial DNA sequences

    PubMed Central

    Lanave, Cecilia; Liuni, Sabino; Licciulli, Flavio; Attimonelli, Marcella

    2000-01-01

    The AMmtDB database (http://bio-www.ba.cnr.it:8000/srs6/ ) has been updated by collecting the multi-aligned sequences of Chordata mitochondrial genes coding for proteins and tRNAs. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. AMmtDB data selected through SRS can be viewed and managed using GeneDoc or other programs for the management of multi-aligned data depending on the user’s operative system. The multiple alignments have been produced with CLUSTALW and PILEUP programs and then carefully optimized manually. PMID:10592208

  20. WebGMAP: a web service for mapping and aligning cDNA sequences to genomes

    PubMed Central

    Liang, Chun; Liu, Lin; Ji, Guoli

    2009-01-01

    The genomes of thousands of organisms are being sequenced, often with accompanying sequences of cDNAs or ESTs. One of the great challenges in bioinformatics is to make these genomic sequences and genome annotations accessible in a user-friendly manner to general biologists to address interesting biological questions. We have created an open-access web service called WebGMAP (http://www.bioinfolab.org/software/webgmap) that seamlessly integrates cDNA-genome alignment tools, such as GMAP, with easy-to-use data visualization and mining tools. This web service is intended to facilitate community efforts in improving genome annotation, determining accurate gene structures and their variations, and exploring important biological processes such as alternative splicing and alternative polyadenylation. For routine sequence analysis, WebGMAP provides a web-based sequence viewer with many useful functions, including nucleotide positioning, six-frame translations, sequence reverse complementation, and imperfect motif detection and alignment. WebGMAP also provides users with the ability to sort, filter and search for individual cDNA sequences and cDNA-genome alignments. Our EST-Genome-Browser can display annotated gene structures and cDNA-genome alignments at scales from 100 to 50 000 nt. With its ability to highlight base differences between query cDNAs and the genome, our EST-Genome-Browser allows biologists to discover potential point or insertion-deletion variations from cDNA-genome alignments. PMID:19465381

  1. Fast multiple alignment of ungapped DNA sequences using information theory and a relaxation method.

    PubMed

    Schneider, Thomas D; Mastronarde, David N

    1996-12-01

    An information theory based multiple alignment ("Malign") method was used to align the DNA binding sequences of the OxyR and Fis proteins, whose sequence conservation is so spread out that it is difficult to identify the sites. In the algorithm described here, the information content of the sequences is used as a unique global criterion for the quality of the alignment. The algorithm uses look-up tables to avoid recalculating computationally expensive functions such as the logarithm. Because there are no arbitrary constants and because the results are reported in absolute units (bits), the best alignment can be chosen without ambiguity. Starting from randomly selected alignments, a hill-climbing algorithm can track through the immense space of s(n) combinations where s is the number of sequences and n is the number of positions possible for each sequence. Instead of producing a single alignment, the algorithm is fast enough that one can afford to use many start points and to classify the solutions. Good convergence is indicated by the presence of a single well-populated solution class having higher information content than other classes. The existence of several distinct classes for the Fis protein indicates that those binding sites have self-similar features. PMID:19953199

  2. SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

    PubMed Central

    2016-01-01

    Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license. PMID:27182962

  3. SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.

    PubMed

    Abuín, José M; Pichel, Juan C; Pena, Tomás F; Amigo, Jorge

    2016-01-01

    Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license. PMID:27182962

  4. Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

    PubMed Central

    Tan, Ge; Muffato, Matthieu; Ledergerber, Christian; Herrero, Javier; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2015-01-01

    Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. PMID:26031838

  5. Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

    PubMed

    Tan, Ge; Muffato, Matthieu; Ledergerber, Christian; Herrero, Javier; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2015-09-01

    Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. PMID:26031838

  6. Volcanoes, Central Java, Indonesia

    NASA Technical Reports Server (NTRS)

    1992-01-01

    The island of Java (8.0S, 112.0E), perhaps better than any other, illustrates the volcanic origin of Pacific Island groups. Seen in this single view are at least a dozen once active volcano craters. Alignment of the craters even defines the linear fault line of Java as well as the other some 1500 islands of the Indonesian Archipelago. Deep blue water of the Indian Ocean to the south contrasts to the sediment laden waters of the Java Sea to the north.

  7. Alignment-free comparison of genome sequences by a new numerical characterization.

    PubMed

    Huang, Guohua; Zhou, Houqing; Li, Yongfan; Xu, Lixin

    2011-07-21

    In order to compare different genome sequences, an alignment-free method has proposed. First, we presented a new graphical representation of DNA sequences without degeneracy, which is conducive to intuitive comparison of sequences. Then, a new numerical characterization based on the representation was introduced to quantitatively depict the intrinsic nature of genome sequences, and considered as a 10-dimensional vector in the mathematical space. Alignment-free comparison of sequences was performed by computing the distances between vectors of the corresponding numerical characterizations, which define the evolutionary relationship. Two data sets of DNA sequences were constructed to assess the performance on sequence comparison. The results illustrate well validity of the method. The new numerical characterization provides a powerful tool for genome comparison. PMID:21536050

  8. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    DOE PAGESBeta

    Daily, Jeffrey A.

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less

  9. SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes

    PubMed Central

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-01-01

    Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license. Contact: epruesse@mpi-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22556368

  10. A distributed system for fast alignment of next-generation sequencing data

    PubMed Central

    Srimani, Jaydeep K.; Wu, Po-Yen; Phan, John H.; Wang, May D.

    2016-01-01

    We developed a scalable distributed computing system using the Berkeley Open Interface for Network Computing (BOINC) to align next-generation sequencing (NGS) data quickly and accurately. NGS technology is emerging as a promising platform for gene expression analysis due to its high sensitivity compared to traditional genomic microarray technology. However, despite the benefits, NGS datasets can be prohibitively large, requiring significant computing resources to obtain sequence alignment results. Moreover, as the data and alignment algorithms become more prevalent, it will become necessary to examine the effect of the multitude of alignment parameters on various NGS systems. We validate the distributed software system by (1) computing simple timing results to show the speed-up gained by using multiple computers, (2) optimizing alignment parameters using simulated NGS data, and (3) computing NGS expression levels for a single biological sample using optimal parameters and comparing these expression levels to that of a microarray sample. Results indicate that the distributed alignment system achieves approximately a linear speed-up and correctly distributes sequence data to and gathers alignment results from multiple compute clients.

  11. PyMod: sequence similarity searches, multiple sequence-structure alignments, and homology modeling within PyMOL

    PubMed Central

    2012-01-01

    Background In recent years, an exponential growing number of tools for protein sequence analysis, editing and modeling tasks have been put at the disposal of the scientific community. Despite the vast majority of these tools have been released as open source software, their deep learning curves often discourages even the most experienced users. Results A simple and intuitive interface, PyMod, between the popular molecular graphics system PyMOL and several other tools (i.e., [PSI-]BLAST, ClustalW, MUSCLE, CEalign and MODELLER) has been developed, to show how the integration of the individual steps required for homology modeling and sequence/structure analysis within the PyMOL framework can hugely simplify these tasks. Sequence similarity searches, multiple sequence and structural alignments generation and editing, and even the possibility to merge sequence and structure alignments have been implemented in PyMod, with the aim of creating a simple, yet powerful tool for sequence and structure analysis and building of homology models. Conclusions PyMod represents a new tool for the analysis and the manipulation of protein sequences and structures. The ease of use, integration with many sequence retrieving and alignment tools and PyMOL, one of the most used molecular visualization system, are the key features of this tool. Source code, installation instructions, video tutorials and a user's guide are freely available at the URL http://schubert.bio.uniroma1.it/pymod/index.html PMID:22536966

  12. Skeleton-based human action recognition using multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong

    2015-05-01

    Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.

  13. Support for linguistic macrofamilies from weighted sequence alignment

    PubMed Central

    Jäger, Gerhard

    2015-01-01

    Computational phylogenetics is in the process of revolutionizing historical linguistics. Recent applications have shed new light on controversial issues, such as the location and time depth of language families and the dynamics of their spread. So far, these approaches have been limited to single-language families because they rely on a large body of expert cognacy judgments or grammatical classifications, which is currently unavailable for most language families. The present study pursues a different approach. Starting from raw phonetic transcription of core vocabulary items from very diverse languages, it applies weighted string alignment to track both phonetic and lexical change. Applied to a collection of ∼1,000 Eurasian languages and dialects, this method, combined with phylogenetic inference, leads to a classification in excellent agreement with established findings of historical linguistics. Furthermore, it provides strong statistical support for several putative macrofamilies contested in current historical linguistics. In particular, there is a solid signal for the Nostratic/Eurasiatic macrofamily. PMID:26403857

  14. Support for linguistic macrofamilies from weighted sequence alignment.

    PubMed

    Jäger, Gerhard

    2015-10-13

    Computational phylogenetics is in the process of revolutionizing historical linguistics. Recent applications have shed new light on controversial issues, such as the location and time depth of language families and the dynamics of their spread. So far, these approaches have been limited to single-language families because they rely on a large body of expert cognacy judgments or grammatical classifications, which is currently unavailable for most language families. The present study pursues a different approach. Starting from raw phonetic transcription of core vocabulary items from very diverse languages, it applies weighted string alignment to track both phonetic and lexical change. Applied to a collection of ∼1,000 Eurasian languages and dialects, this method, combined with phylogenetic inference, leads to a classification in excellent agreement with established findings of historical linguistics. Furthermore, it provides strong statistical support for several putative macrofamilies contested in current historical linguistics. In particular, there is a solid signal for the Nostratic/Eurasiatic macrofamily. PMID:26403857

  15. Flexible, Fast and Accurate Sequence Alignment Profiling on GPGPU with PaSWAS

    PubMed Central

    Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J. L.; Nap, Jan Peter

    2015-01-01

    Motivation To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. Results With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation. PMID:25830241

  16. Alignment of Escherichia coli K12 DNA sequences to a genomic restriction map.

    PubMed Central

    Rudd, K E; Miller, W; Ostell, J; Benson, D A

    1990-01-01

    We use the extensive published information describing the genome of Escherichia coli and new restriction map alignment software to align DNA sequence, genetic, and physical maps. Restriction map alignment software is used which considers restriction maps as strings analogous to DNA or protein sequences except that two values, enzyme name and DNA base address, are associated with each position on the string. The resulting alignments reveal a nearly linear relationship between the physical and genetic maps of the E. coli chromosome. Physical map comparisons with the 1976, 1980, and 1983 genetic maps demonstrate a better fit with the more recent maps. The results of these alignments are genomic kilobase coordinates, orientation and rank of the alignment that best fits the genetic data. A statistical measure based on extreme value distribution is applied to the alignments. Additional computer analyses allow us to estimate the accuracy of the published E. coli genomic restriction map, simulate rearrangements of the bacterial chromosome, and search for repetitive DNA. The procedures we used are general enough to be applicable to other genome mapping projects. PMID:2183179

  17. Self-consistently optimized statistical mechanical energy functions for sequence structure alignment.

    PubMed Central

    Koretke, K. K.; Luthey-Schulten, Z.; Wolynes, P. G.

    1996-01-01

    A quantitative form of the principle of minimal frustration is used to obtain from a database analysis statistical mechanical energy functions and gap parameters for aligning sequences to three-dimensional structures. The analysis that partially takes into account correlations in the energy landscape improves upon the previous approximations of Goldstein et al. (1994, 1995) (Goldstein R, Luthey-Schulten Z, Wolynes P, 1994, Proceedings of the 27th Hawaii International Conference on System Sciences. Los Alamitos, California: IEEE Computer Society Press. pp 306-315; Goldstein R, Luthey-Schulten Z, Wolynes P, 1995, In: Elber R, ed. New developments in theoretical studies of proteins. Singapore: World Scientific). The energy function allows for ordering of alignments based on the compatibility of a sequence to be in a given structure (i.e., lowest energy) and therefore removes the necessity of using percent identity or similarity as scoring parameters. The alignments produced by the energy function on distant homologues with low percent identity (less than 21%) are generally better than those generated with evolutionary information. The lowest energy alignment generated with the energy function for sequences containing prosite signatures but unknown structures is a structure containing the same prosite signature, providing a check on the robustness of the algorithm. Finally, the energy function can make use of known experimental evidence as constraints within the alignment algorithm to aid in finding the correct structural alignment. PMID:8762136

  18. Alignment-Free Sequence Comparison Based on Next-Generation Sequencing Reads

    PubMed Central

    Song, Kai; Ren, Jie; Zhai, Zhiyuan; Liu, Xuemei

    2013-01-01

    Abstract Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D2, \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin{document} $$\\textbf{\\textit{D}}_{\\bf 2}^{\\bf *}$$ \\end{document}, and \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin{document} $$\\textbf{\\textit{D}}_{\\bf 2}^S$$ \\end{document}, both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin{document} $$\\textbf{\\textit{D}}_{\\bf 2}^{\\bf *}$$ \\end{document} and \\documentclass{aastex}\\usepackage{amsbsy}\\usepackage{amsfonts}\\usepackage{amssymb}\\usepackage{bm}\\usepackage{mathrsfs}\\usepackage{pifont}\\usepackage{stmaryrd}\\usepackage{textcomp}\\usepackage{portland, xspace}\\usepackage{amsmath, amsxtra}\\pagestyle{empty}\\DeclareMathSizes{10}{9}{7}{6}\\begin

  19. Aptaligner: Automated Software for Aligning Pseudorandom DNA X-Aptamers from Next-Generation Sequencing Data

    PubMed Central

    2015-01-01

    Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects. PMID:24866698

  20. DIALIGN at GOBICS—multiple sequence alignment using various sources of external information

    PubMed Central

    Al Ait, Layal; Yamak, Zaher; Morgenstern, Burkhard

    2013-01-01

    DIALIGN is an established tool for multiple sequence alignment that is particularly useful to detect local homologies in sequences with low overall similarity. In recent years, various versions of the program have been developed, some of which are fully automated, whereas others are able to accept user-specified external information. In this article, we review some versions of the program that are available through ‘Göttingen Bioinformatics Compute Server’. In addition to previously described implementations, we present a new release of DIALIGN called ‘DIALIGN-PFAM’, which uses hits to the PFAM database for improved protein alignment. Our software is available through http://dialign.gobics.de/. PMID:23620293

  1. Aptaligner: automated software for aligning pseudorandom DNA X-aptamers from next-generation sequencing data.

    PubMed

    Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E

    2014-06-10

    Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects. PMID:24866698

  2. Mulan: Multiple-Sequence Local Alignment and Visualization for Studying Function and Evolution

    SciTech Connect

    Ovcharenko, I; Loots, G; Giardine, B; Hou, M; Ma, J; Hardison, R; Stubbs, L; Miller, W

    2004-07-14

    Multiple sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the tba multi-aligner program for rapid identification of local sequence conservation and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA. Local multiple alignments computed by Mulan ensure reliable representation of short-and large-scale genomic rearrangements in distant organisms. Mulan allows for interactive modification of critical conservation parameters to differentially predict conserved regions in comparisons of both closely and distantly related species. We illustrate the uses and applications of the Mulan tool through multi-species comparisons of the GATA3 gene locus and the identification of elements that are conserved differently in avians than in other genomes allowing speculation on the evolution of birds. Source code for the aligners and the aligner-evaluation software can be freely downloaded from http://bio.cse.psu.edu/.

  3. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  4. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments

    PubMed Central

    Schwarz, Roland F.; Tamuri, Asif U.; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M.; Schultz, Jörg; Goldman, Nick

    2016-01-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  5. Multiple sequence alignment algorithm based on a dispersion graph and ant colony algorithm.

    PubMed

    Chen, Weiyang; Liao, Bo; Zhu, Wen; Xiang, Xuyu

    2009-10-01

    In this article, we describe a representation for the processes of multiple sequences alignment (MSA) and used it to solve the problem of MSA. By this representation, we took every possible aligning result into account by defining the representation of gap insertion, the value of heuristic information in every optional path and scoring rule. On the basis of the proposed multidimensional graph, we used the ant colony algorithm to find the better path that denotes a better aligning result. In our article, we proposed the instance of three-dimensional graph and four-dimensional graph and advanced a special ichnographic representation to analyze MSA. It is yet only an experimental software, and we gave an example for finding the best aligning result by three-dimensional graph and ant colony algorithm. Experimental results show that our method can improve the solution quality on MSA benchmarks. PMID:19130503

  6. A Phylogenetic Analysis of the Brassicales Clade Based on an Alignment-Free Sequence Comparison Method

    PubMed Central

    Hatje, Klas; Kollmar, Martin

    2012-01-01

    Phylogenetic analyses reveal the evolutionary derivation of species. A phylogenetic tree can be inferred from multiple sequence alignments of proteins or genes. The alignment of whole genome sequences of higher eukaryotes is a computational intensive and ambitious task as is the computation of phylogenetic trees based on these alignments. To overcome these limitations, we here used an alignment-free method to compare genomes of the Brassicales clade. For each nucleotide sequence a Chaos Game Representation (CGR) can be computed, which represents each nucleotide of the sequence as a point in a square defined by the four nucleotides as vertices. Each CGR is therefore a unique fingerprint of the underlying sequence. If the CGRs are divided by grid lines each grid square denotes the occurrence of oligonucleotides of a specific length in the sequence (Frequency Chaos Game Representation, FCGR). Here, we used distance measures between FCGRs to infer phylogenetic trees of Brassicales species. Three types of data were analyzed because of their different characteristics: (A) Whole genome assemblies as far as available for species belonging to the Malvidae taxon. (B) EST data of species of the Brassicales clade. (C) Mitochondrial genomes of the Rosids branch, a supergroup of the Malvidae. The trees reconstructed based on the Euclidean distance method are in general agreement with single gene trees. The Fitch–Margoliash and Neighbor joining algorithms resulted in similar to identical trees. Here, for the first time we have applied the bootstrap re-sampling concept to trees based on FCGRs to determine the support of the branchings. FCGRs have the advantage that they are fast to calculate, and can be used as additional information to alignment based data and morphological characteristics to improve the phylogenetic classification of species in ambiguous cases. PMID:22952468

  7. A Parallel Non-Alignment Based Approach to Efficient Sequence Comparison using Longest Common Subsequences

    NASA Astrophysics Data System (ADS)

    Bhowmick, S.; Shafiullah, M.; Rai, H.; Bastola, D.

    2010-11-01

    Biological sequence comparison programs have revolutionized the practice of biochemistry, and molecular and evolutionary biology. Pairwise comparison of genomic sequences is a popular method of choice for analyzing genetic sequence data. However the quality of results from most sequence comparison methods are significantly affected by small perturbations in the data and furthermore, there is a dearth of computational tools to compare sequences beyond a certain length. In this paper, we describe a parallel algorithm for comparing genetic sequences using an alignment free-method based on computing the Longest Common Subsequence (LCS) between genetic sequences. We validate the quality of our results by comparing the phylogenetic tress obtained from ClustalW and LCS. We also show through complexity analysis of the isoefficiency and by empirical measurement of the running time that our algorithm is very scalable.

  8. Simultaneous Bayesian Estimation of Alignment and Phylogeny under a Joint Model of Protein Sequence and Structure

    PubMed Central

    Herman, Joseph L.; Challis, Christopher J.; Novák, Ádám; Hein, Jotun; Schmidler, Scott C.

    2014-01-01

    For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally diverge more slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence–structure model. We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences. We use the model to investigate the order of divergence of cytoglobins, myoglobins, and hemoglobins and observe a stabilization of phylogenetic inference: although a sequence-based inference assigns significant posterior probability to several different topologies, the structural model strongly favors one of these over the others and is more robust to the choice of data set. PMID:24899668

  9. Assessing Activity Pattern Similarity with Multidimensional Sequence Alignment based on a Multiobjective Optimization Evolutionary Algorithm

    PubMed Central

    Kwan, Mei-Po; Xiao, Ningchuan; Ding, Guoxiang

    2015-01-01

    Due to the complexity and multidimensional characteristics of human activities, assessing the similarity of human activity patterns and classifying individuals with similar patterns remains highly challenging. This paper presents a new and unique methodology for evaluating the similarity among individual activity patterns. It conceptualizes multidimensional sequence alignment (MDSA) as a multiobjective optimization problem, and solves this problem with an evolutionary algorithm. The study utilizes sequence alignment to code multiple facets of human activities into multidimensional sequences, and to treat similarity assessment as a multiobjective optimization problem that aims to minimize the alignment cost for all dimensions simultaneously. A multiobjective optimization evolutionary algorithm (MOEA) is used to generate a diverse set of optimal or near-optimal alignment solutions. Evolutionary operators are specifically designed for this problem, and a local search method also is incorporated to improve the search ability of the algorithm. We demonstrate the effectiveness of our method by comparing it with a popular existing method called ClustalG using a set of 50 sequences. The results indicate that our method outperforms the existing method for most of our selected cases. The multiobjective evolutionary algorithm presented in this paper provides an effective approach for assessing activity pattern similarity, and a foundation for identifying distinctive groups of individuals with similar activity patterns. PMID:26190858

  10. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    NASA Astrophysics Data System (ADS)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  11. Probabilistic sequence alignment of Late Pleistocene benthic δ18O data

    NASA Astrophysics Data System (ADS)

    Lawrence, C.; Lin, L.; Lisiecki, L. E.; Stern, J.

    2013-12-01

    The stratigraphic alignment of ocean sediment cores plays a vital role in paleoceanographic research because it is used to develop mutually consistent age models for climate proxies measured in these cores. The most common proxy used for alignment is the The stratigraphic alignment of ocean sediment cores plays a vital role in paleoceanographic research because it is used to develop mutually consistent age models for climate proxies measured in these cores. The most common proxy used for alignment is the δ18O of calcite from benthic or planktonic foraminifera because a large fraction of δ18O variance derives from the global signal of ice volume. To date, alignment has been performed either by manual, qualitative comparison or by deterministic algorithms (Martinson, Pisias et al. Quat. Res. 27 1987; Lisiecki and Lisiecki Paleoceanography 17, 2002; Huybers and Wunsch, Paleoceanography 19, 2004). Here we present a probabilistic sequence alignment algorithm which provides 95% confidence bands for the alignment of pairs of benthic δ18O records. The probabilistic algorithm presented here is based on a hidden Markov model (HMM) (Levinson, Rabiner et al. Bell Systems Technical Journal, 62,1983) similar to those that have been used extensively to align DNA and protein sequences (Durbin, Eddy et al. Biological Sequence Analysis, Ch. 4, 1998). However, here the need to the alignment of sequences stems from expansion and/or contraction in the records due to changes in sedimentation rates rather than the insertion or deletion of residues. Transition probabilities that are used in this HMM to model changes in sedimentation rates are based on radiocarbon estimates of sedimentation rates. The probabilistic algorithm considers all possible alignments with these predefined sedimentation rates. Exact calculations are completed using dynamic programming recursions. The algorithm yields the probability distributions of the age at each point in the record, which are probabilistically

  12. A max-margin model for efficient simultaneous alignment and folding of RNA sequences

    PubMed Central

    Do, Chuong B.; Foo, Chuan-Sheng; Batzoglou, Serafim

    2008-01-01

    Motivation: The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for simultaneous alignment and consensus folding of unaligned RNA sequences. Algorithmically, RAF exploits sparsity in the set of likely pairing and alignment candidates for each nucleotide (as identified by the CONTRAfold or CONTRAlign programs) to achieve an effectively quadratic running time for simultaneous pairwise alignment and folding. RAF's fast sparse dynamic programming, in turn, serves as the inference engine within a discriminative machine learning algorithm for parameter estimation. Results: In cross-validated benchmark tests, RAF achieves accuracies equaling or surpassing the current best approaches for RNA multiple sequence secondary structure prediction. However, RAF requires nearly an order of magnitude less time than other simultaneous folding and alignment methods, thus making it especially appropriate for high-throughput studies. Availability: Source code for RAF is available at:http://contra.stanford.edu/contrafold/ Contact: chuongdo@cs.stanford.edu PMID:18586747

  13. SDT: A Virus Classification Tool Based on Pairwise Sequence Alignment and Identity Calculation

    PubMed Central

    Muhire, Brejnev Muhizi; Varsani, Arvind; Martin, Darren Patrick

    2014-01-01

    The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms). PMID:25259891

  14. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server.

    PubMed

    Cannone, Jamie J; Sweeney, Blake A; Petrov, Anton I; Gutell, Robin R; Zirbel, Craig L; Leontis, Neocles

    2015-07-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960

  15. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server

    PubMed Central

    Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles

    2015-01-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960

  16. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer

    PubMed Central

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A.

    2016-01-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution. PMID:27363362

  17. Phylo-VISTA: An Interactive Visualization Tool for Multiple DNA Sequence Alignments

    SciTech Connect

    Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.; Brudno, Michael; Batzoglou, Serafim; Bethel, E. Wes; Rubin, Edward M.; Hamann, Bernd; Dubchak, Inna

    2004-04-01

    We have developed Phylo-VISTA (Shah et al., 2003), an interactive software tool for analyzing multiple alignments by visualizing a similarity measure for DNA sequences of multiple species. The complexity of visual presentation is effectively organized using a framework based upon inter-species phylogenetic relationships. The phylogenetic organization supports rapid, user-guided inter-species comparison. To aid in navigation through large sequence datasets, Phylo-VISTA provides a user with the ability to select and view data at varying resolutions. The combination of multi-resolution data visualization and analysis, combined with the phylogenetic framework for inter-species comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments.

  18. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  19. SVM-BALSA: Remote Homology Detection based on Bayesian Sequence Alignment

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Oehmen, Chris S.; Matzke, Melissa M.

    2005-11-10

    Using biopolymer sequence comparison methods to identify evolutionarily related proteins is one of the most common tasks in bioinformatics. Recently, support vector machines (SVMs) utilizing statistical learning theory have been employed in the problem of remote homology detection and shown to outperform iterative profile methods such as PSI-BLAST. In this study we demonstrate the utilization of a Bayesian alignment score, which accounts for the uncertainty of all possible alignments, in the SVM construction improves sensitivity compared to the traditional dynamic programming implementation.

  20. Comparative Topological Analysis of Neuronal Arbors via Sequence Representation and Alignment

    NASA Astrophysics Data System (ADS)

    Gillette, Todd Aaron

    Neuronal morphology is a key mediator of neuronal function, defining the profile of connectivity and shaping signal integration and propagation. Reconstructing neurite processes is technically challenging and thus data has historically been relatively sparse. Data collection and curation along with more efficient and reliable data production methods provide opportunities for the application of informatics to find new relationships and more effectively explore the field. This dissertation presents a method for aiding the development of data production as well as a novel representation and set of analyses for extracting morphological patterns. The DIADEM Challenge was organized for the purposes of determining the state of the art in automated neuronal reconstruction and what existing challenges remained. As one of the co-organizers of the Challenge, I developed the DIADEM metric, a tool designed to measure the effectiveness of automated reconstruction algorithms by comparing resulting reconstructions to expert-produced gold standards and identifying errors of various types. It has been used in the DIADEM Challenge and in the testing of several algorithms since. Further, this dissertation describes a topological sequence representation of neuronal trees amenable to various forms of sequence analysis, notably motif analysis, global pairwise alignment, clustering, and multiple sequence alignment. Motif analysis of neuronal arbors shows a large difference in bifurcation type proportions between axons and dendrites, but that relatively simple growth mechanisms account for most higher order motifs. Pairwise global alignment of topological sequences, modified from traditional sequence alignment to preserve tree relationships, enabled cluster analysis which displayed strong correspondence with known cell classes by cell type, species, and brain region. Multiple alignment of sequences in selected clusters enabled the extraction of conserved features, revealing mouse

  1. Alignment of Red-Sequence Cluster Dwarf Galaxies: From the Frontier Fields to the Local Universe

    NASA Astrophysics Data System (ADS)

    Barkhouse, Wayne Alan; Archer, Haylee; Burgad, Jaford; Foote, Gregory; Rude, Cody; Lopez-Cruz, Omar

    2015-08-01

    Galaxy clusters are the largest virialized structures in the universe. Due to their high density and mass, they are an excellent laboratory for studying the environmental effects on galaxy evolution. Numerical simulations have predicted that tidal torques acting on dwarf galaxies as they fall into the cluster environment will cause the major axis of the galaxies to align with their radial position vector (a line that extends from the cluster center to the galaxy's center). We have undertaken a study to measure the redshift evolution of the alignment of red-sequence cluster dwarf galaxies based on a sample of 57 low-redshift Abell clusters imaged at KPNO using the 0.9-meter telescope, and 64 clusters from the WINGS dataset. To supplement our low-redshift sample, we have included galaxies selected from the Hubble Space Telescope Frontier fields. Leveraging the HST data allows us to look for evolutionary changes in the alignment of red-sequence cluster dwarf galaxies over a redshift range of 0 < z < 0.35. The alignment of the major axis of the dwarf galaxies is measured by fitting a Sersic function to each red-sequence galaxy using GALFIT. The quality of each model is checked visually after subtracting the model from the galaxy. The cluster sample is then combined by scaling each cluster by r200. We present our preliminary results based on the alignment of the red-sequence dwarf galaxies with: 1) the major axis of the brightest cluster galaxy, 2) the major axis of the cluster defined by the position of cluster members, and 3) a radius vector pointing from the cluster center to individual dwarf galaxies. Our combined cluster sample is sub-divided into different radial regions and redshift bins.

  2. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution. PMID:27261456

  3. Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets.

    PubMed

    Melo, Francisco; Marti-Renom, Marc A

    2006-06-01

    Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs. PMID:16506243

  4. TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction

    PubMed Central

    Chang, Jia-Ming; Di Tommaso, Paolo; Lefort, Vincent; Gascuel, Olivier; Notredame, Cedric

    2015-01-01

    This article introduces the Transitive Consistency Score (TCS) web server; a service making it possible to estimate the local reliability of protein multiple sequence alignments (MSAs) using the TCS index. The evaluation can be used to identify the aligned positions most likely to contain structurally analogous residues and also most likely to support an accurate phylogenetic reconstruction. The TCS scoring scheme has been shown to be accurate predictor of structural alignment correctness among commonly used methods. It has also been shown to outperform common filtering schemes like Gblocks or trimAl when doing MSA post-processing prior to phylogenetic tree reconstruction. The web server is available from http://tcoffee.crg.cat/tcs. PMID:25855806

  5. Comparison of alignment software for genome-wide bisulphite sequence data

    PubMed Central

    Chatterjee, Aniruddha; Stockwell, Peter A.; Rodger, Euan J.; Morison, Ian M.

    2012-01-01

    Recent advances in next generation sequencing (NGS) technology now provide the opportunity to rapidly interrogate the methylation status of the genome. However, there are challenges in handling and interpretation of the methylation sequence data because of its large volume and the consequences of bisulphite modification. We sequenced reduced representation human genomes on the Illumina platform and efficiently mapped and visualized the data with different pipelines and software packages. We examined three pipelines for aligning bisulphite converted sequencing reads and compared their performance. We also comment on pre-processing and quality control of Illumina data. This comparison highlights differences in methods for NGS data processing and provides guidance to advance sequence-based methylation data analysis for molecular biologists. PMID:22344695

  6. Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty

    PubMed Central

    Md Mukarram Hossain, A.S.; Blackburne, Benjamin P.; Shah, Abhijeet; Whelan, Simon

    2015-01-01

    Evolutionary studies usually use a two-step process to investigate sequence data. Step one estimates a multiple sequence alignment (MSA) and step two applies phylogenetic methods to ask evolutionary questions of that MSA. Modern phylogenetic methods infer evolutionary parameters using maximum likelihood or Bayesian inference, mediated by a probabilistic substitution model that describes sequence change over a tree. The statistical properties of these methods mean that more data directly translates to an increased confidence in downstream results, providing the substitution model is adequate and the MSA is correct. Many studies have investigated the robustness of phylogenetic methods in the presence of substitution model misspecification, but few have examined the statistical properties of those methods when the MSA is unknown. This simulation study examines the statistical properties of the complete two-step process when inferring sequence divergence and the phylogenetic tree topology. Both nucleotide and amino acid analyses are negatively affected by the alignment step, both through inaccurate guide tree estimates and through overfitting to that guide tree. For many alignment tools these effects become more pronounced when additional sequences are added to the analysis. Nucleotide sequences are particularly susceptible, with MSA errors leading to statistical support for long-branch attraction artifacts, which are usually associated with gross substitution model misspecification. Amino acid MSAs are more robust, but do tend to arbitrarily resolve multifurcations in favor of the guide tree. No inference strategies produce consistently accurate estimates of divergence between sequences, although amino acid MSAs are again more accurate than their nucleotide counterparts. We conclude with some practical suggestions about how to limit the effect of MSA uncertainty on evolutionary inference. PMID:26139831

  7. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    SciTech Connect

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  8. A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base

    PubMed Central

    Miller, Robert T.; Christoffels, Alan G.; Gopalakrishnan, Chella; Burke, John; Ptitsyn, Andrey A.; Broveak, Tania R.; Hide, Winston A.

    1999-01-01

    The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313,103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1.86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence AC004106 demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented. PMID:10568754

  9. Java XMGR

    SciTech Connect

    Dr. George L. Mesina; Steven P. Miller

    2004-08-01

    The XMGR5 graphing package [1] for drawing RELAP5 [2] plots is being re-written in Java [3]. Java is a robust programming language that is available at no cost for most computer platforms from Sun Microsystems, Inc. XMGR5 is an extension of an XY plotting tool called ACE/gr extended to plot data from several US Nuclear Regulatory Commission (NRC) applications. It is also the most popular graphing package worldwide for making RELAP5 plots. In Section 1, a short review of XMGR5 is given, followed by a brief overview of Java. In Section 2, shortcomings of both tkXMGR [4] and XMGR5 are discussed and the value of converting to Java is given. Details of the conversion to Java are given in Section 3. The progress to date, some conclusions and future work are given in Section 4. Some screen shots of the Java version are shown.

  10. KMAD: knowledge-based multiple sequence alignment for intrinsically disordered proteins

    PubMed Central

    Lange, Joanna; Wyrwicz, Lucjan S.; Vriend, Gert

    2016-01-01

    Summary: Intrinsically disordered proteins (IDPs) lack tertiary structure and thus differ from globular proteins in terms of their sequence–structure–function relations. IDPs have lower sequence conservation, different types of active sites and a different distribution of functionally important regions, which altogether make their multiple sequence alignment (MSA) difficult. The KMAD MSA software has been written specifically for the alignment and annotation of IDPs. It augments the substitution matrix with knowledge about post-translational modifications, functional domains and short linear motifs. Results: MSAs produced with KMAD describe well-conserved features among IDPs, tend to agree well with biological intuition, and are a good basis for designing new experiments to shed light on this large, understudied class of proteins. Availability and implementation: KMAD web server is accessible at http://www.cmbi.ru.nl/kmad/. A standalone version is freely available. Contact: vriend@cmbi.ru.nl PMID:26568635

  11. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  12. seqphase: a web tool for interconverting phase input/output files and fasta sequence alignments.

    PubMed

    Flot, J-F

    2010-01-01

    The program phase is widely used for Bayesian inference of haplotypes from diploid genotypes; however, manually creating phase input files from sequence alignments is an error-prone and time-consuming process, especially when dealing with numerous variable sites and/or individuals. Here, a web tool called seqphase is presented that generates phase input files from fasta sequence alignments and converts phase output files back into fasta. During the production of the phase input file, several consistency checks are performed on the dataset and suitable command line options to be used for the actual phase data analysis are suggested. seqphase was written in perl and is freely accessible over the Internet at the address http://www.mnhn.fr/jfflot/seqphase. PMID:21565002

  13. RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem

    PubMed Central

    Taheri, Javid; Zomaya, Albert Y

    2009-01-01

    Background Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. Results This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. Conclusion RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences. PMID:19594869

  14. JAR3D Webserver: Scoring and aligning RNA loop sequences to known 3D motifs.

    PubMed

    Roll, James; Zirbel, Craig L; Sweeney, Blake; Petrov, Anton I; Leontis, Neocles

    2016-07-01

    Many non-coding RNAs have been identified and may function by forming 2D and 3D structures. RNA hairpin and internal loops are often represented as unstructured on secondary structure diagrams, but RNA 3D structures show that most such loops are structured by non-Watson-Crick basepairs and base stacking. Moreover, different RNA sequences can form the same RNA 3D motif. JAR3D finds possible 3D geometries for hairpin and internal loops by matching loop sequences to motif groups from the RNA 3D Motif Atlas, by exact sequence match when possible, and by probabilistic scoring and edit distance for novel sequences. The scoring gauges the ability of the sequences to form the same pattern of interactions observed in 3D structures of the motif. The JAR3D webserver at http://rna.bgsu.edu/jar3d/ takes one or many sequences of a single loop as input, or else one or many sequences of longer RNAs with multiple loops. Each sequence is scored against all current motif groups. The output shows the ten best-matching motif groups. Users can align input sequences to each of the motif groups found by JAR3D. JAR3D will be updated with every release of the RNA 3D Motif Atlas, and so its performance is expected to improve over time. PMID:27235417

  15. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. PMID:23904502

  16. Finite-temperature local protein sequence alignment: percolation and free-energy distribution.

    PubMed

    Wolfsheimer, S; Melchert, O; Hartmann, A K

    2009-12-01

    Sequence alignment is a tool in bioinformatics that is used to find homological relationships in large molecular databases. It can be mapped on the physical model of directed polymers in random media. We consider the finite-temperature version of local sequence alignment for proteins and study the transition between the linear phase and the biologically relevant logarithmic phase, where the free energy grows linearly or logarithmically with the sequence length. By means of numerical simulations and finite-size-scaling analysis, we determine the phase diagram in the plane that is spanned by the gap costs and the temperature. We use the most frequently used parameter set for protein alignment. The critical exponents that describe the parameter-driven transition are found to be explicitly temperature dependent. Furthermore, we study the shape of the (free-) energy distribution close to the transition by rare-event simulations down to probabilities on the order 10(-64). It is well known that in the logarithmic region, the optimal score distribution (T=0) is described by a modified Gumbel distribution. We confirm that this also applies for the free-energy distribution (T>0). However, in the linear phase, the distribution crosses over to a modified Gaussian distribution. PMID:20365196

  17. Large-Scale Multiple Sequence Alignment and Tree Estimation Using SATé

    PubMed Central

    Liu, Kevin; Warnow, Tandy

    2016-01-01

    SATé is a method for estimating multiple sequence alignments and trees that has been shown to produce highly accurate results for datasets with large numbers of sequences. Running SATé using its default settings is very simple, but improved accuracy can be obtained by modifying its algorithmic parameters. We provide a detailed introduction to the algorithmic approach used by SATé, and instructions for running a SATé analysis using the GUI under default settings. We also provide a discussion of how to modify these settings to obtain improved results, and how to use SATé in a phylogenetic analysis pipeline. PMID:24170406

  18. Identification of Anoectochilus based on rDNA ITS sequences alignment and SELDI-TOF-MS.

    PubMed

    Gao, Chuan; Zhang, Fusheng; Zhang, Jun; Guo, Shunxing; Shao, Hongbo

    2009-01-01

    The internal transcribed spacer (ITS) sequences alignment and proteomic difference of Anoectochilus interspecies have been studied by means of ITS molecular identification and surface enhanced laser desorption ionization time of flight mass spectrography. Results showed that variety certification on Anoectochilus by ITS sequences can not determine species, and there is proteomic difference among Anoectochilus interspecies. Moreover, proteomic finger printings of five Anoectochilus species have been established for identifying species, and genetic relationships of five species within Anoectochilus have been deduced according to proteomic differences among five species. PMID:20016748

  19. Identification of Anoectochilus based on rDNA ITS sequences alignment and SELDI-TOF-MS

    PubMed Central

    Gao, Chuan; Zhang, Fusheng; Zhang, Jun; Guo, Shunxing; Shao, Hongbo

    2009-01-01

    The internal transcribed spacer (ITS) sequences alignment and proteomic difference of Anoectochilus interspecies have been studied by means of ITS molecular identification and surface enhanced laser desorption ionization time of flight mass spectrography. Results showed that variety certification on Anoectochilus by ITS sequences can not determine species, and there is proteomic difference among Anoectochilus interspecies. Moreover, proteomic finger printings of five Anoectochilus species have been established for identifying species, and genetic relationships of five species within Anoectochilus have been deduced according to proteomic differences among five species. PMID:20016748

  20. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    PubMed

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments. PMID:25393409

  1. Genomic Signal Processing Methods for Computation of Alignment-Free Distances from DNA Sequences

    PubMed Central

    Borrayo, Ernesto; Mendizabal-Ruiz, E. Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P.; Morales, J. Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments. PMID:25393409

  2. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  3. Alignment-free analysis of barcode sequences by means of compression-based methods

    PubMed Central

    2013-01-01

    Background The key idea of DNA barcode initiative is to identify, for each group of species belonging to different kingdoms of life, a short DNA sequence that can act as a true taxon barcode. DNA barcode represents a valuable type of information that can be integrated with ecological, genetic, and morphological data in order to obtain a more consistent taxonomy. Recent studies have shown that, for the animal kingdom, the mitochondrial gene cytochrome c oxidase I (COI), about 650 bp long, can be used as a barcode sequence for identification and taxonomic purposes of animals. In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our approach is based on the use of two compression-based versions of non-computable Universal Similarity Metric (USM) class of distances. Our purpose is to justify the employ of USM also for the analysis of short DNA barcode sequences, showing how USM is able to correctly extract taxonomic information among those kind of sequences. Results We downloaded from Barcode of Life Data System (BOLD) database 30 datasets of barcode sequences belonging to different animal species. We built phylogenetic trees of every dataset, according to compression-based and classic evolutionary methods, and compared them in terms of topology preservation. In the experimental tests, we obtained scores with a percentage of similarity between evolutionary and compression-based trees between 80% and 100% for the most of datasets (94%). Moreover we carried out experimental tests using simulated barcode datasets composed of 100, 150, 200 and 500 sequences, each simulation replicated 25-fold. In this case, mean similarity scores between evolutionary and compression-based trees span between 83% and 99% for all simulated datasets. Conclusions In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our

  4. Studies on structure-based sequence alignment and phylogenies of beta-lactamases.

    PubMed

    Salahuddin, Parveen; Khan, Asad U

    2014-01-01

    The β-lactamases enzymes cleave the amide bond in β-lactam ring, rendering β-lactam antibiotics harmless to bacteria. In this communication we have studied structure-function relationship and phylogenies of class A, B and D beta-lactamases using structure-based sequence alignment and phylip programs respectively. The data of structure-based sequence alignment suggests that in different isolates of TEM-1, mutations did not occur at or near sequence motifs. Since deletions are reported to be lethal to structure and function of enzyme. Therefore, in these variants antibiotic hydrolysis profile and specificity will be affected. The alignment data of class A enzyme SHV-1, CTX-M-15, class D enzyme, OXA-10, and class B enzyme VIM-2 and SIM-1 show sequence motifs along with other part of polypeptide are essentially conserved. These results imply that conformations of betalactamases are close to native state and possess normal hydrolytic activities towards beta-lactam antibiotics. However, class B enzyme such as IMP-1 and NDM-1 are less conserved than other class A and D studied here because mutation and deletions occurred at critically important region such as active site. Therefore, the structure of these beta-lactamases will be altered and antibiotic hydrolysis profile will be affected. Phylogenetic studies suggest that class A and D beta-lactamases including TOHO-1 and OXA-10 respectively evolved by horizontal gene transfer (HGT) whereas other member of class A such as TEM-1 evolved by gene duplication mechanism. Taken together, these studies justify structure-function relationship of beta-lactamases and phylogenetic studies suggest these enzymes evolved by different mechanisms. PMID:24966539

  5. Studies on structure-based sequence alignment and phylogenies of beta-lactamases

    PubMed Central

    Salahuddin, Parveen; Khan, Asad U

    2014-01-01

    The β-lactamases enzymes cleave the amide bond in β-lactam ring, rendering β-lactam antibiotics harmless to bacteria. In this communication we have studied structure-function relationship and phylogenies of class A, B and D beta-lactamases using structure-based sequence alignment and phylip programs respectively. The data of structure-based sequence alignment suggests that in different isolates of TEM-1, mutations did not occur at or near sequence motifs. Since deletions are reported to be lethal to structure and function of enzyme. Therefore, in these variants antibiotic hydrolysis profile and specificity will be affected. The alignment data of class A enzyme SHV-1, CTX-M-15, class D enzyme, OXA-10, and class B enzyme VIM-2 and SIM-1 show sequence motifs along with other part of polypeptide are essentially conserved. These results imply that conformations of betalactamases are close to native state and possess normal hydrolytic activities towards beta-lactam antibiotics. However, class B enzyme such as IMP-1 and NDM-1 are less conserved than other class A and D studied here because mutation and deletions occurred at critically important region such as active site. Therefore, the structure of these beta-lactamases will be altered and antibiotic hydrolysis profile will be affected. Phylogenetic studies suggest that class A and D beta-lactamases including TOHO-1 and OXA-10 respectively evolved by horizontal gene transfer (HGT) whereas other member of class A such as TEM-1 evolved by gene duplication mechanism. Taken together, these studies justify structure-function relationship of beta-lactamases and phylogenetic studies suggest these enzymes evolved by different mechanisms. PMID:24966539

  6. Feature-based multiexposure image-sequence fusion with guided filter and image alignment

    NASA Astrophysics Data System (ADS)

    Xu, Liang; Du, Junping; Zhang, Zhenhong

    2015-01-01

    Multiexposure fusion images have a higher dynamic range and reveal more details than a single captured image of a real-world scene. A clear and intuitive feature-based fusion technique for multiexposure image sequences is conceptually proposed. The main idea of the proposed method is to combine three image features [phase congruency (PC), local contrast, and color saturation] to obtain weight maps of the images. Then, the weight maps are further refined using a guided filter which can improve their accuracy. The final fusion result is constructed using the weighted sum of the source image sequence. In addition, for multiexposure image-sequence fusion involving dynamic scenes containing moving objects, ghost artifacts can easily occur if fusion is directly performed. Therefore, an image-alignment method is first used to adjust the input images to correspond to a reference image, after which fusion is performed. Experimental results demonstrate that the proposed method has a superior performance compared to the existing methods.

  7. CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs.

    PubMed

    Hung, Che-Lun; Lin, Yu-Shiang; Lin, Chun-Yuan; Chung, Yeh-Ching; Chung, Yi-Fang

    2015-10-01

    For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11. PMID:26052076

  8. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    PubMed Central

    Ye, Hao; Meehan, Joe; Tong, Weida; Hong, Huixiao

    2015-01-01

    Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants. PMID:26610555

  9. A tabu search algorithm for post-processing multiple sequence alignment.

    PubMed

    Riaz, Tariq; Yi, Wang; Li, Kuo-Bin

    2005-02-01

    Tabu search is a meta-heuristic approach that is proven to be useful in solving combinatorial optimization problems. We implement the adaptive memory features of tabu search to refine a multiple sequence alignment. Adaptive memory helps the search process to avoid local optima and explores the solution space economically and effectively without getting trapped into cycles. The algorithm is further enhanced by introducing extended tabu search features such as intensification and diversification. The neighborhoods of a solution are generated stochastically and a consistency-based objective function is employed to measure its quality. The algorithm is tested with the datasets from BAliBASE benchmarking database. We have observed through experiments that tabu search is able to improve the quality of multiple alignments generated by other software such as ClustalW and T-Coffee. The source code of our algorithm is available at http://www.bii.a-star.edu.sg/~tariq/tabu/. PMID:15751117

  10. T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension

    PubMed Central

    Di Tommaso, Paolo; Moretti, Sebastien; Xenarios, Ioannis; Orobitg, Miquel; Montanyola, Alberto; Chang, Jia-Ming; Taly, Jean-François; Notredame, Cedric

    2011-01-01

    This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat. PMID:21558174

  11. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. PMID:25625550

  12. Alignment of high-throughput sequencing data inside in-memory databases.

    PubMed

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future. PMID:25160230

  13. Assessing genetic diversity in java fine-flavor cocoa (theobroma cacao l.) Germplasm by simple sequence repeat (ssr) markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Indonesia is the 3rd largest cocoa producing countries in the world, with an annual cacao bean production of 572,000 tons. The currently cultivated cacao varieties in Indonesia were inter-hybrids of various clones introduced from the Americas since the 16th century. Among them, “Java cocoa” is a wel...

  14. Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences.

    PubMed

    Mirsky, Alexander; Kazandjian, Linda; Anisimova, Maria

    2015-03-01

    Antibodies are glycoproteins produced by the immune system as a dynamically adaptive line of defense against invading pathogens. Very elegant and specific mutational mechanisms allow B lymphocytes to produce a large and diversified repertoire of antibodies, which is modified and enhanced throughout all adulthood. One of these mechanisms is somatic hypermutation, which stochastically mutates nucleotides in the antibody genes, forming new sequences with different properties and, eventually, higher affinity and selectivity to the pathogenic target. As somatic hypermutation involves fast mutation of antibody sequences, this process can be described using a Markov substitution model of molecular evolution. Here, using large sets of antibody sequences from mice and humans, we infer an empirical amino acid substitution model AB, which is specific to antibody sequences. Compared with existing general amino acid models, we show that the AB model provides significantly better description for the somatic evolution of mice and human antibody sequences, as demonstrated on large next generation sequencing (NGS) antibody data. General amino acid models are reflective of conservation at the protein level due to functional constraints, with most frequent amino acids exchanges taking place between residues with the same or similar physicochemical properties. In contrast, within the variable part of antibody sequences we observed an elevated frequency of exchanges between amino acids with distinct physicochemical properties. This is indicative of a sui generis mutational mechanism, specific to antibody somatic hypermutation. We illustrate this property of antibody sequences by a comparative analysis of the network modularity implied by the AB model and general amino acid substitution models. We recommend using the new model for computational studies of antibody sequence maturation, including inference of alignments and phylogenetic trees describing antibody somatic hypermutation in

  15. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment.

    PubMed

    Kwak, Daniel; Kam, Alfred; Becerra, David; Zhou, Qikuan; Hops, Adam; Zarour, Eleyine; Kam, Arthur; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2013-01-01

    Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem. PMID:24148814

  16. Two Simple and Efficient Algorithms to Compute the SP-Score Objective Function of a Multiple Sequence Alignment

    PubMed Central

    Ranwez, Vincent

    2016-01-01

    Background Multiple sequence alignment (MSA) is a crucial step in many molecular analyses and many MSA tools have been developed. Most of them use a greedy approach to construct a first alignment that is then refined by optimizing the sum of pair score (SP-score). The SP-score estimation is thus a bottleneck for most MSA tools since it is repeatedly required and is time consuming. Results Given an alignment of n sequences and L sites, I introduce here optimized solutions reaching O(nL) time complexity for affine gap cost, instead of O(n2L), which are easy to implement. PMID:27505054

  17. Clustering protein sequences with a novel metric transformed from sequence similarity scores and sequence alignments with neural networks

    PubMed Central

    Ma, Qicheng; Chirn, Gung-Wei; Cai, Richard; Szustakowski, Joseph D; Nirmala, NR

    2005-01-01

    Background The sequencing of the human genome has enabled us to access a comprehensive list of genes (both experimental and predicted) for further analysis. While a majority of the approximately 30000 known and predicted human coding genes are characterized and have been assigned at least one function, there remains a fair number of genes (about 12000) for which no annotation has been made. The recent sequencing of other genomes has provided us with a huge amount of auxiliary sequence data which could help in the characterization of the human genes. Clustering these sequences into families is one of the first steps to perform comparative studies across several genomes. Results Here we report a novel clustering algorithm (CLUGEN) that has been used to cluster sequences of experimentally verified and predicted proteins from all sequenced genomes using a novel distance metric which is a neural network score between a pair of protein sequences. This distance metric is based on the pairwise sequence similarity score and the similarity between their domain structures. The distance metric is the probability that a pair of protein sequences are of the same Interpro family/domain, which facilitates the modelling of transitive homology closure to detect remote homologues. The hierarchical average clustering method is applied with the new distance metric. Conclusion Benchmarking studies of our algorithm versus those reported in the literature shows that our algorithm provides clustering results with lower false positive and false negative rates. The clustering algorithm is applied to cluster several eukaryotic genomes and several dozens of prokaryotic genomes. PMID:16202129

  18. Nucleotide sequence alignment of hdcA from Gram-positive bacteria.

    PubMed

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; Del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A

    2016-03-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4]. PMID:26958625

  19. Nucleotide sequence alignment of hdcA from Gram-positive bacteria

    PubMed Central

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A.

    2016-01-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4]. PMID:26958625

  20. Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments

    PubMed Central

    Miga, Karen H.; Eisenhart, Christopher; Kent, W. James

    2015-01-01

    The human reference assembly remains incomplete due to the underrepresentation of repeat-rich sequences that are found within centromeric regions and acrocentric short arms. Although these sequences are marginally represented in the assembly, they are often fully represented in whole-genome short-read datasets and contribute to inappropriate alignments and high read-depth signals that localize to a small number of assembled homologous regions. As a consequence, these regions often provide artifactual peak calls that confound hypothesis testing and large-scale genomic studies. To address this problem, we have constructed mapping targets that represent roughly 8% of the human genome generally omitted from the human reference assembly. By integrating these data into standard mapping and peak-calling pipelines we demonstrate a 10-fold reduction in signals in regions common to the blacklisted region and identify a comprehensive set of regions that exhibit mapping sensitivity with the presence of the repeat-rich targets. PMID:26163063

  1. An alignment-free method to find and visualise rearrangements between pairs of DNA sequences

    PubMed Central

    Pratas, Diogo; Silva, Raquel M.; Pinho, Armando J.; Ferreira, Paulo J.S.G.

    2015-01-01

    Species evolution is indirectly registered in their genomic structure. The emergence and advances in sequencing technology provided a way to access genome information, namely to identify and study evolutionary macro-events, as well as chromosome alterations for clinical purposes. This paper describes a completely alignment-free computational method, based on a blind unsupervised approach, to detect large-scale and small-scale genomic rearrangements between pairs of DNA sequences. To illustrate the power and usefulness of the method we give complete chromosomal information maps for the pairs human-chimpanzee and human-orangutan. The tool by means of which these results were obtained has been made publicly available and is described in detail. PMID:25984837

  2. An alignment-free method to find and visualise rearrangements between pairs of DNA sequences.

    PubMed

    Pratas, Diogo; Silva, Raquel M; Pinho, Armando J; Ferreira, Paulo J S G

    2015-01-01

    Species evolution is indirectly registered in their genomic structure. The emergence and advances in sequencing technology provided a way to access genome information, namely to identify and study evolutionary macro-events, as well as chromosome alterations for clinical purposes. This paper describes a completely alignment-free computational method, based on a blind unsupervised approach, to detect large-scale and small-scale genomic rearrangements between pairs of DNA sequences. To illustrate the power and usefulness of the method we give complete chromosomal information maps for the pairs human-chimpanzee and human-orangutan. The tool by means of which these results were obtained has been made publicly available and is described in detail. PMID:25984837

  3. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments

    PubMed Central

    2014-01-01

    Background Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties. Results In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles. Conclusions Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods. PMID:24886131

  4. Amino acid sequence alignment of bacterial and mammalian pancreatic serine proteases based on topological equivalences.

    PubMed

    James, M N; Delbaere, L T; Brayer, G D

    1978-06-01

    The three-dimensional structures of the bacterial serine proteases SGPA, SGPB, and alpha-lytic protease have been compared with those of the pancreatic enzymes alpha-chymotrypsin and elastase. This comparison shows that approximately 60% (55-64%) of the alpha-carbon atom positions of the bacterial serine proteases are topologically equivalent to the alpha-carbon atom positions of the pancreatic enzymes. The corresponding value for a comparison of the bacterial enzymes among themselves is approximately 84%. The results of these topological comparisons have been used to deduce an experimentally sound sequence alignment for these several enzymes. This alignment shows that there is extensive tertiary structural homology among the bacteria and pancreatic enzymes without significant primary sequence identity (less than 21%). The acquisition of a zymogen function by the pancreatic enzymes is accompanied by two major changes to the bacterial enzymes' architecture: an insertion of 9 residues to increase the length of the N-terminal loop, and one of 12 residues to a loop near the activation salt bridge. In addition, in these two enzyme families, the methionine loop (residues 164-182) adopts very different comformations which are associated with their altered substrate specificities. PMID:96920

  5. MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads

    PubMed Central

    2009-01-01

    Background Next-generation sequencing technologies provide exciting avenues for studies of transcriptomics and population genomics. There is an increasing need to conduct spliced and unspliced alignments of short transcript reads onto a reference genome and estimate minor allele frequency from sequences of population samples. Results We have designed and implemented MapNext, a software tool for both spliced and unspliced alignments of short sequence reads onto reference sequences, and automated SNP detection using neighbourhood quality standards. MapNext provides four main analyses: (i) unspliced alignment and clustering of reads, (ii) spliced alignment of transcript reads over intron boundaries, (iii) SNP detection and estimation of minor allele frequency from population sequences, and (iv) storage of result data in a database to make it available for more flexible queries and for further analyses. The software tool has been tested using both simulated and real data. Conclusion MapNext is a comprehensive and powerful tool for both spliced and unspliced alignments of short reads and automated SNP detection from population sequences. The simplicity, flexibility and efficiency of MapNext makes it a valuable tool for transcriptomic and population genomic research. PMID:19958476

  6. Extraction of high quality k-words for alignment-free sequence comparison.

    PubMed

    Gunasinghe, Upuli; Alahakoon, Damminda; Bedingfield, Susan

    2014-10-01

    The weighted Euclidean distance (D(2)) is one of the earliest dissimilarity measures used for alignment free comparison of biological sequences. This distance measure and its variants have been used in numerous applications due to its fast computation, and many variants of it have been subsequently introduced. The D(2) distance measure is based on the count of k-words in the two sequences that are compared. Traditionally, all k-words are compared when computing the distance. In this paper we show that similar accuracy in sequence comparison can be achieved by using a selected subset of k-words. We introduce a term variance based quality measure for identifying the important k-words. We demonstrate the application of the proposed technique in phylogeny reconstruction and show that up to 99% of the k-words can be filtered out for certain datasets, resulting in faster sequence comparison. The paper also presents an exploratory analysis based evaluation of optimal k-word values and discusses the impact of using subsets of k-words in such optimal instances. PMID:24846728

  7. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

    NASA Astrophysics Data System (ADS)

    Newkirk, Daniel; Biesinger, Jacob; Chon, Alvin; Yokomori, Kyoko; Xie, Xiaohui

    High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChIP-Seq data analysis which utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem

  8. Simultaneous alignment and folding of 28S rRNA sequences uncovers phylogenetic signal in structure variation.

    PubMed

    Letsch, Harald O; Greve, Carola; Kück, Patrick; Fleck, Günther; Stocsits, Roman R; Misof, Bernhard

    2009-12-01

    Secondary structure models of mitochondrial and nuclear (r)RNA sequences are frequently applied to aid the alignment of these molecules in phylogenetic analyses. Additionally, it is often speculated that structure variation of (r)RNA sequences might profitably be used as phylogenetic markers. The benefit of these approaches depends on the reliability of structure models. We used a recently developed approach to show that reliable inference of large (r)RNA secondary structures as a prerequisite of simultaneous sequence and structure alignment is feasible. The approach iteratively establishes local structure constraints of each sequence and infers fully folded individual structures by constrained MFE optimization. A comparison of structure edit distances of individual constraints and fully folded structures showed pronounced phylogenetic signal in fully folded structures. As model sequences we characterized secondary structures of 28S rRNA sequences of selected insects and examined their phylogenetic signal according to established phylogenetic hypotheses. PMID:19654047

  9. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

    PubMed Central

    Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  10. Alignment of 3D Building Models and TIR Video Sequences with Line Tracking

    NASA Astrophysics Data System (ADS)

    Iwaszczuk, D.; Stilla, U.

    2014-11-01

    Thermal infrared imagery of urban areas became interesting for urban climate investigations and thermal building inspections. Using a flying platform such as UAV or a helicopter for the acquisition and combining the thermal data with the 3D building models via texturing delivers a valuable groundwork for large-area building inspections. However, such thermal textures are useful for further analysis if they are geometrically correctly extracted. This can be achieved with a good coregistrations between the 3D building models and thermal images, which cannot be achieved by direct georeferencing. Hence, this paper presents methodology for alignment of 3D building models and oblique TIR image sequences taken from a flying platform. In a single image line correspondences between model edges and image line segments are found using accumulator approach and based on these correspondences an optimal camera pose is calculated to ensure the best match between the projected model and the image structures. Among the sequence the linear features are tracked based on visibility prediction. The results of the proposed methodology are presented using a TIR image sequence taken from helicopter in a densely built-up urban area. The novelty of this work is given by employing the uncertainty of the 3D building models and by innovative tracking strategy based on a priori knowledge from the 3D building model and the visibility checking.

  11. Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method

    PubMed Central

    Burger, Lukas; van Nimwegen, Erik

    2008-01-01

    Accurate and large-scale prediction of protein–protein interactions directly from amino-acid sequences is one of the great challenges in computational biology. Here we present a new Bayesian network method that predicts interaction partners using only multiple alignments of amino-acid sequences of interacting protein domains, without tunable parameters, and without the need for any training examples. We first apply the method to bacterial two-component systems and comprehensively reconstruct two-component signaling networks across all sequenced bacteria. Comparisons of our predictions with known interactions show that our method infers interaction partners genome-wide with high accuracy. To demonstrate the general applicability of our method we show that it also accurately predicts interaction partners in a recent dataset of polyketide synthases. Analysis of the predicted genome-wide two-component signaling networks shows that cognates (interacting kinase/regulator pairs, which lie adjacent on the genome) and orphans (which lie isolated) form two relatively independent components of the signaling network in each genome. In addition, while most genes are predicted to have only a small number of interaction partners, we find that 10% of orphans form a separate class of ‘hub' nodes that distribute and integrate signals to and from up to tens of different interaction partners. PMID:18277381

  12. GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality.

    PubMed

    Wu, Thomas D; Reeder, Jens; Lawrence, Michael; Becker, Gabe; Brauer, Matthew J

    2016-01-01

    The programs GMAP and GSNAP, for aligning RNA-Seq and DNA-Seq datasets to genomes, have evolved along with advances in biological methodology to handle longer reads, larger volumes of data, and new types of biological assays. The genomic representation has been improved to include linear genomes that can compare sequences using single-instruction multiple-data (SIMD) instructions, compressed genomic hash tables with fast access using SIMD instructions, handling of large genomes with more than four billion bp, and enhanced suffix arrays (ESAs) with novel data structures for fast access. Improvements to the algorithms have included a greedy match-and-extend algorithm using suffix arrays, segment chaining using genomic hash tables, diagonalization using segmental hash tables, and nucleotide-level dynamic programming procedures that use SIMD instructions and eliminate the need for F-loop calculations. Enhancements to the functionality of the programs include standardization of indel positions, handling of ambiguous splicing, clipping and merging of overlapping paired-end reads, and alignments to circular chromosomes and alternate scaffolds. The programs have been adapted for use in pipelines by integrating their usage into R/Bioconductor packages such as gmapR and HTSeqGenie, and these pipelines have facilitated the discovery of numerous biological phenomena. PMID:27008021

  13. Kohonen map as a visualization tool for the analysis of protein sequences: multiple alignments, domains and segments of secondary structures.

    PubMed

    Hanke, J; Reich, J G

    1996-12-01

    The method of Kohonen maps, a special form of neural networks, was applied as a visualization tool for the analysis of protein sequence similarity. The procedure converts sequence (domains, aligned sequences, segments of secondary structure) into a characteristic signal matrix. This conversion depends on the property or replacement score vector selected by the user. Similar sequences have small distance in the signal space. The trained Kohonen network is functionally equivalent to an unsupervised non-linear cluster analyzer. Protein families, or aligned sequences, or segments of similar secondary structure, aggregate as clusters, and their proximity may be inspected on a color screen or on paper. Pull-down menus permit access to background information in the established text-oriented way. PMID:9021261

  14. A genome survey sequencing of the Java mouse deer (Tragulus javanicus) adds new aspects to the evolution of lineage specific retrotransposons in Ruminantia (Cetartiodactyla).

    PubMed

    Gallus, S; Kumar, V; Bertelsen, M F; Janke, A; Nilsson, M A

    2015-10-25

    Ruminantia, the ruminating, hoofed mammals (cow, deer, giraffe and allies) are an unranked artiodactylan clade. Around 50-60 million years ago the BovB retrotransposon entered the ancestral ruminantian genome through horizontal gene transfer. A survey genome screen using 454-pyrosequencing of the Java mouse deer (Tragulus javanicus) and the lesser kudu (Tragelaphus imberbis) was done to investigate and to compare the landscape of transposable elements within Ruminantia. The family Tragulidae (mouse deer) is the only representative of Tragulina and phylogenetically important, because it represents the earliest divergence in Ruminantia. The data analyses show that, relative to other ruminantian species, the lesser kudu genome has seen an expansion of BovB Long INterspersed Elements (LINEs) and BovB related Short INterspersed Elements (SINEs) like BOVA2. In comparison the genome of Java mouse deer has fewer BovB elements than other ruminants, especially Bovinae, and has in addition a novel CHR-3 SINE most likely propagated by LINE-1. By contrast the other ruminants have low amounts of CHR SINEs but high numbers of actively propagating BovB-derived and BovB-propagated SINEs. The survey sequencing data suggest that the transposable element landscape in mouse deer (Tragulina) is unique among Ruminantia, suggesting a lineage specific evolutionary trajectory that does not involve BovB mediated retrotransposition. This shows that the genomic landscape of mobile genetic elements can rapidly change in any lineage. PMID:26123917

  15. Efficient and Accurate OTU Clustering with GPU-Based Sequence Alignment and Dynamic Dendrogram Cutting.

    PubMed

    Nguyen, Thuy-Diem; Schmidt, Bertil; Zheng, Zejun; Kwoh, Chee-Keong

    2015-01-01

    De novo clustering is a popular technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). In this work, we introduce a new dendrogram-based OTU clustering pipeline called CRiSPy. The key idea used in CRiSPy to improve clustering accuracy is the application of an anomaly detection technique to obtain a dynamic distance cutoff instead of using the de facto value of 97 percent sequence similarity as in most existing OTU clustering pipelines. This technique works by detecting an abrupt change in the merging heights of a dendrogram. To produce the output dendrograms, CRiSPy employs the OTU hierarchical clustering approach that is computed on a genetic distance matrix derived from an all-against-all read comparison by pairwise sequence alignment. However, most existing dendrogram-based tools have difficulty processing datasets larger than 10,000 unique reads due to high computational complexity. We address this difficulty by developing two efficient algorithms for CRiSPy: a compute-efficient GPU-accelerated parallel algorithm for pairwise distance matrix computation and a memory-efficient hierarchical clustering algorithm. Our experiments on various datasets with distinct attributes show that CRiSPy is able to produce more accurate OTU groupings than most OTU clustering applications. PMID:26451819

  16. SNUFER: A software for localization and presentation of single nucleotide polymorphisms using a Clustal multiple sequence alignment output file

    PubMed Central

    Mansur, Marco A B; Cardozo, Giovana P; Santos, Elaine V; Marins, Mozart

    2008-01-01

    SNUFER is a software for the automatic localization and generation of tables used for the presentation of single nucleotide polymorphisms (SNPs). After input of a fasta file containing the sequences to be analyzed, a multiple sequence alignment is generated using ClustalW ran inside SNUFER. The ClustalW output file is then used to generate a table which displays the SNPs detected in the aligned sequences and their degree of similarity. This table can be exported to Microsoft Word, Microsoft Excel or as a single text file, permitting further editing for publication. The software was written using Delphi 7 for programming and FireBird 2.0 for sequence database management. It is freely available for noncommercial use and can be downloaded from http://www.heranza.com.br/bioinformatica2.htm. PMID:19238196

  17. Evolution of the cytochrome P450 superfamily: sequence alignments and pharmacogenetics.

    PubMed

    Lewis, D F; Watson, E; Lake, B G

    1998-06-01

    The evolution of the cytochrome P450 (CYP) superfamily is described, with particular reference to major events in the development of biological forms during geological time. It is noted that the currently accepted timescale for the elaboration of the P450 phylogenetic tree exhibits close parallels with the evolution of terrestrial biota. Indeed, the present human P450 complement of xenobiotic-metabolizing enzymes may have originated from coevolutionary 'warfare' between plants and animals during the Devonian period about 400 million years ago. A number of key correspondences between the evolution of P450 system and the course of biological development over time, point to a mechanistic molecular biology of evolution which is consistent with a steady increase in atmospheric oxygenation beginning over 2000 million years ago, whereas dietary changes during more recent geological time may provide one possible explanation for certain species differences in metabolism. Alignment between P450 protein sequences within the same family or subfamily, together with across-family comparisons, aid the rationalization of drug metabolism specificities for different P450 isoforms, and can assist in an understanding of genetic polymorphisms in P450-mediated oxidations at the molecular level. Moreover, the variation in P450 regulatory mechanisms and inducibilities between different mammalian species are likely to have important implications for current procedures of chemical safety evaluation, which rely on pure genetic strains of laboratory bred rodents for the testing of compounds destined for human exposure. PMID:9630657

  18. Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures.

    PubMed

    Fu, Yinghan; Xu, Zhenjiang Zech; Lu, Zhi J; Zhao, Shan; Mathews, David H

    2015-01-01

    Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary. PMID:26075601

  19. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    PubMed Central

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs

  20. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications.

    PubMed

    Cvicek, Vaclav; Goddard, William A; Abrol, Ravinder

    2016-03-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs

  1. Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment

    PubMed Central

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670

  2. eMatchSite: Sequence Order-Independent Structure Alignments of Ligand Binding Pockets in Protein Models

    PubMed Central

    Brylinski, Michal

    2014-01-01

    Detecting similarities between ligand binding sites in the absence of global homology between target proteins has been recognized as one of the critical components of modern drug discovery. Local binding site alignments can be constructed using sequence order-independent techniques, however, to achieve a high accuracy, many current algorithms for binding site comparison require high-quality experimental protein structures, preferably in the bound conformational state. This, in turn, complicates proteome scale applications, where only various quality structure models are available for the majority of gene products. To improve the state-of-the-art, we developed eMatchSite, a new method for constructing sequence order-independent alignments of ligand binding sites in protein models. Large-scale benchmarking calculations using adenine-binding pockets in crystal structures demonstrate that eMatchSite generates accurate alignments for almost three times more protein pairs than SOIPPA. More importantly, eMatchSite offers a high tolerance to structural distortions in ligand binding regions in protein models. For example, the percentage of correctly aligned pairs of adenine-binding sites in weakly homologous protein models is only 4–9% lower than those aligned using crystal structures. This represents a significant improvement over other algorithms, e.g. the performance of eMatchSite in recognizing similar binding sites is 6% and 13% higher than that of SiteEngine using high- and moderate-quality protein models, respectively. Constructing biologically correct alignments using predicted ligand binding sites in protein models opens up the possibility to investigate drug-protein interaction networks for complete proteomes with prospective systems-level applications in polypharmacology and rational drug repositioning. eMatchSite is freely available to the academic community as a web-server and a stand-alone software distribution at http://www.brylinski.org/ematchsite. PMID

  3. Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

    PubMed Central

    Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should

  4. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.

    PubMed

    Pightling, Arthur W; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should

  5. Science course sequences: The alignment of written, enacted, and tested curricula and their impact on grade 11 HSPA science scores

    NASA Astrophysics Data System (ADS)

    Lentz, Christine A.

    The purpose of this mixed method study was to examine the alignment of the written, enacted, and tested curricula of the Ocean City High School science course sequencing and its impact on student achievement. This study also examined the school's ability to predict student scores on the science portion of the High School Proficiency Assessment (HSPA). Data collected for science achievement included the science portion of the Grade Eight Proficiency Assessment (GEPA) as a pretest and the scores for the science portion of the HSPA as a posttest. Data collected for curriculum alignment included an examination of teacher generated course curriculum maps to determine the alignment with the New Jersey Core Curriculum Content Standards and the HSPA Test Specifications Directory. The quantitative data were treated through a series of paired samples t-tests, Pearson product moment correlation was used to examine relationships between variables, an ANCOVA analysis and a stepwise regression analysis were also completed. Based on the findings of the data analysis of this research effort, the following conclusions were drawn: (1) the alignment of the enacted curriculum with the tested and written curricula affected science achievement. (2) GEPA scores are significantly tied to HSPA scores and (3) GEPA scores and enrollment in the science sequence whose curriculum was aligned with the written and tested curricula, met the requirements of a predictor of scores on the HSPA exam. It is expected that educational leadership will use the results of this research to inform practice and drive decision-making in respect to student placement in to course sequences. It is hoped that the results will not only increase support for the district's curricula development plan but also add to the overall body of knowledge surrounding science program effectiveness in relation to the No Child Left Behind standards.

  6. Bio-samtools: Ruby bindings for SAMtools, a library for accessing BAM files containing high-throughput sequence alignments

    PubMed Central

    2012-01-01

    Background The SAMtools utilities comprise a very useful and widely used suite of software for manipulating files and alignments in the SAM and BAM format, used in a wide range of genetic analyses. The SAMtools utilities are implemented in C and provide an API for programmatic access, to help make this functionality available to programmers wishing to develop in the high level Ruby language we have developed bio-samtools, a Ruby binding to the SAMtools library. Results The utility of SAMtools is encapsulated in 3 main classes, Bio::DB::Sam, representing the alignment files and providing access to the data in them, Bio::DB::Alignment, representing the individual read alignments inside the files and Bio::DB::Pileup, representing the summarised nucleotides of reads over a single point in the nucleotide sequence to which the reads are aligned. Conclusions Bio-samtools is a flexible and easy to use interface that programmers of many levels of experience can use to access information in the popular and common SAM/BAM format. PMID:22640879

  7. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB

    PubMed Central

    Pruesse, Elmar; Quast, Christian; Knittel, Katrin; Fuchs, Bernhard M.; Ludwig, Wolfgang; Peplies, Jörg; Glöckner, Frank Oliver

    2007-01-01

    Sequencing ribosomal RNA (rRNA) genes is currently the method of choice for phylogenetic reconstruction, nucleic acid based detection and quantification of microbial diversity. The ARB software suite with its corresponding rRNA datasets has been accepted by researchers worldwide as a standard tool for large scale rRNA analysis. However, the rapid increase of publicly available rRNA sequence data has recently hampered the maintenance of comprehensive and curated rRNA knowledge databases. A new system, SILVA (from Latin silva, forest), was implemented to provide a central comprehensive web resource for up to date, quality controlled databases of aligned rRNA sequences from the Bacteria, Archaea and Eukarya domains. All sequences are checked for anomalies, carry a rich set of sequence associated contextual information, have multiple taxonomic classifications, and the latest validly described nomenclature. Furthermore, two precompiled sequence datasets compatible with ARB are offered for download on the SILVA website: (i) the reference (Ref) datasets, comprising only high quality, nearly full length sequences suitable for in-depth phylogenetic analysis and probe design and (ii) the comprehensive Parc datasets with all publicly available rRNA sequences longer than 300 nucleotides suitable for biodiversity analyses. The latest publicly available database release 91 (August 2007) hosts 547 521 sequences split into 461 823 small subunit and 85 689 large subunit rRNAs. PMID:17947321

  8. DNA Align Editor: DNA Alignment Editor Tool

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The SNPAlignEditor is a DNA sequence alignment editor that runs on Windows platforms. The purpose of the program is to provide an intuitive, user-friendly tool for manual editing of multiple sequence alignments by providing functions for input, editing, and output of nucleotide sequence alignments....

  9. Improvements to GALA and dbERGE II: databases featuring genomic sequence alignment, annotation and experimental results.

    PubMed

    Elnitski, Laura; Giardine, Belinda; Shah, Prachi; Zhang, Yi; Riemer, Cathy; Weirauch, Matthew; Burhans, Richard; Miller, Webb; Hardison, Ross C

    2005-01-01

    We describe improvements to two databases that give access to information on genomic sequence similarities, functional elements in DNA and experimental results that demonstrate those functions. GALA, the database of Genome ALignments and Annotations, is now a set of interlinked relational databases for five vertebrate species, human, chimpanzee, mouse, rat and chicken. For each species, GALA records pairwise and multiple sequence alignments, scores derived from those alignments that reflect the likelihood of being under purifying selection or being a regulatory element, and extensive annotations such as genes, gene expression patterns and transcription factor binding sites. The user interface supports simple and complex queries, including operations such as subtraction and intersections as well as clustering and finding elements in proximity to features. dbERGE II, the database of Experimental Results on Gene Expression, contains experimental data from a variety of functional assays. Both databases are now run on the DB2 database management system. Improved hardware and tuning has reduced response times and increased querying capacity, while simplified query interfaces will help direct new users through the querying process. Links are available at http://www.bx.psu.edu/. PMID:15608239

  10. New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era (2010 JGI/ANL HPC Workshop)

    ScienceCinema

    Notredame, Cedric [Centre for Genomic Regulation

    2011-06-08

    Cedric Notredame from the Centre for Genomic Regulation gives a presentation on "New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era" at the JGI/Argonne HPC Workshop on January 26, 2010.

  11. New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era (2010 JGI/ANL HPC Workshop)

    SciTech Connect

    Notredame, Cedric

    2010-01-26

    Cedric Notredame from the Centre for Genomic Regulation gives a presentation on "New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era" at the JGI/Argonne HPC Workshop on January 26, 2010.

  12. Identifying New Drug Targets for Potent Phospholipase D Inhibitors: Combining Sequence Alignment, Molecular Docking, and Enzyme Activity/Binding Assays.

    PubMed

    Djakpa, Helene; Kulkarni, Aditya; Barrows-Murphy, Scheneque; Miller, Greg; Zhou, Weihong; Cho, Hyejin; Török, Béla; Stieglitz, Kimberly

    2016-05-01

    Phospholipase D enzymes cleave phospholipid substrates generating choline and phosphatidic acid. Phospholipase D from Streptomyces chromofuscus is a non-HKD (histidine, lysine, and aspartic acid) phospholipase D as the enzyme is more similar to members of the diverse family of metallo-phosphodiesterase/phosphatase enzymes than phospholipase D enzymes with active site HKD repeats. A highly efficient library of phospholipase D inhibitors based on 1,3-disubstituted-4-amino-pyrazolopyrimidine core structure was utilized to evaluate the inhibition of purified S. chromofuscus phospholipase D. The molecules exhibited inhibition of phospholipase D activity (IC50 ) in the nanomolar range with monomeric substrate diC4 PC and micromolar range with phospholipid micelles and vesicles. Binding studies with vesicle substrate and phospholipase D strongly indicate that these inhibitors directly block enzyme vesicle binding. Following these compelling results as a starting point, sequence searches and alignments with S. chromofuscus phospholipase D have identified potential new drug targets. Using AutoDock, inhibitors were docked into the enzymes selected from sequence searches and alignments (when 3D co-ordinates were available) and results analyzed to develop next-generation inhibitors for new targets. In vitro enzyme activity assays with several human phosphatases demonstrated that the predictive protocol was accurate. The strategy of combining sequence comparison, docking, and high-throughput screening assays has helped to identify new drug targets and provided some insight into how to make potential inhibitors more specific to desired targets. PMID:26691755

  13. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing

    PubMed Central

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-01-01

    Motivation: Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. Results: We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5′-end processing and 3′-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. Availability and Implementation: The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA

  14. Nearest Alignment Space Termination

    Energy Science and Technology Software Center (ESTSC)

    2006-07-13

    Near Alignment Space Termination (NAST) is the Greengenes algorithm that matches up submitted sequences with the Greengenes database to look for similarities and align the submitted sequences based on those similarities.

  15. Energy-based RNA consensus secondary structure prediction in multiple sequence alignments.

    PubMed

    Washietl, Stefan; Bernhart, Stephan H; Kellis, Manolis

    2014-01-01

    Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs. PMID:24639158

  16. Sequence alignment status and amplicon size difference affecting EST-SSR primer performance and polymorphism

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Little attention has been given to failed, poorly-performing, and non-polymorphic expressed sequence tag (EST) simple sequence repeat (SSR) primers. This is due in part to a lack of interest and value in reporting them but also because of the difficulty in addressing the causes of failure on a prime...

  17. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

    PubMed Central

    Shahrudin, Shahriza

    2015-01-01

    This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs) which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM-) LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity. PMID:25802839

  18. Program Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  19. SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) software and documentation

    EPA Science Inventory

    SeqAPASS is a software application facilitates rapid and streamlined, yet transparent, comparisons of the similarity of toxicologically-significant molecular targets across species. The present application facilitates analysis of primary amino acid sequence similarity (including ...

  20. Sequence stratigraphy, structure, and tectonic history of the southwestern Ontong Java Plateau adjacent to the North Solomon Trench and Solomon Islands Arc

    NASA Astrophysics Data System (ADS)

    Phinney, Eric J.; Mann, Paul; Coffin, Millard F.; Shipley, Thomas H.

    1999-09-01

    The Ontong Java Plateau (OJP) is the largest and thickest oceanic plateau on Earth and one of the few oceanic plateaus actively converging on an island arc. We present velocity determinations and geologic interpretation of 2000 km of two-dimensional, multi-channel seismic data from the southwestern Ontong Java Plateau, North Solomon Trench, and northern Solomon Islands. We recognize three megasequences, ranging in age from early Cretaceous to Quaternary, on the basis of distinct interval velocities and seismic stratigraphic facies. Megasequence OJ1 is early Cretaceous, upper igneous crust of the OJP and correlates with basalt outcrops dated at 122-125 Ma on the island of Malaita. The top of the overlying megasequence OJ2, a late Cretaceous mudstone unit, had been identified by previous workers as the top of igneous basement. Seismic facies and correlation to distant Deep Sea Drilling Project/Ocean Drilling Program sites indicate that OJ2 was deposited in a moderately low-energy, marine environment near a fluctuating carbonate compensation depth that resulted in multiple periods of dissolution. OJ2 thins south of the Stewart Arch onto the Solomon Islands where it is correlated with the Kwaraae Mudstone Formation. Megasequence OJ3 is late Cretaceous through Quaternary pelagic cover which caps the Ontong Java Plateau; it thickens into the North Solomon Trench, and seismic facies suggest that OJ3 was deposited in a low-energy marine environment. We use seismic facies analysis, sediment thickness, structural observations, and quantitative plate reconstructions of the position of the OJP and Solomon Islands to propose a tectonic, magmatic, and sedimentary history of the southwestern Ontong Java Plateau. Prior to 125 Ma late Jurassic and early Cretaceous oceanic crust formed. From 125 to 122 Ma, the first mantle plume formed igneous crust (OJ1). Between 122 and 92 Ma, marine mudstone (OJ2 and Kwaraae mudstone of Malaita, Solomon Islands) was deposited on Ontong Java

  1. Fast and Sensitive Alignment of Microbial Whole Genome Sequencing Reads to Large Sequence Datasets on a Desktop PC: Application to Metagenomic Datasets and Pathogen Identification

    PubMed Central

    2014-01-01

    Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner. PMID:25077800

  2. PASS2 database for the structure-based sequence alignment of distantly related SCOP domain superfamilies: update to version 5 and added features.

    PubMed

    Gandhimathi, Arumugam; Ghosh, Pritha; Hariharaputran, Sridhar; Mathew, Oommen K; Sowdhamini, R

    2016-01-01

    Structure-based sequence alignment is an essential step in assessing and analysing the relationship of distantly related proteins. PASS2 is a database that records such alignments for protein domain superfamilies and has been constantly updated periodically. This update of the PASS2 version, named as PASS2.5, directly corresponds to the SCOPe 2.04 release. All SCOPe structural domains that share less than 40% sequence identity, as defined by the ASTRAL compendium of protein structures, are included. The current version includes 1977 superfamilies and has been assembled utilizing the structure-based sequence alignment protocol. Such an alignment is obtained initially through MATT, followed by a refinement through the COMPARER program. The JOY program has been used for structural annotations of such alignments. In this update, we have automated the protocol and focused on inclusion of new features such as mapping of GO terms, absolutely conserved residues among the domains in a superfamily and inclusion of PDBs, that are absent in SCOPe 2.04, using the HMM profiles from the alignments of the superfamily members and are provided as a separate list. We have also implemented a more user-friendly manner of data presentation and options for downloading more features. PASS2.5 version is available at http://caps.ncbs.res.in/pass2/. PMID:26553811

  3. PASS2 database for the structure-based sequence alignment of distantly related SCOP domain superfamilies: update to version 5 and added features

    PubMed Central

    Gandhimathi, Arumugam; Ghosh, Pritha; Hariharaputran, Sridhar; Mathew, Oommen K.; Sowdhamini, R.

    2016-01-01

    Structure-based sequence alignment is an essential step in assessing and analysing the relationship of distantly related proteins. PASS2 is a database that records such alignments for protein domain superfamilies and has been constantly updated periodically. This update of the PASS2 version, named as PASS2.5, directly corresponds to the SCOPe 2.04 release. All SCOPe structural domains that share less than 40% sequence identity, as defined by the ASTRAL compendium of protein structures, are included. The current version includes 1977 superfamilies and has been assembled utilizing the structure-based sequence alignment protocol. Such an alignment is obtained initially through MATT, followed by a refinement through the COMPARER program. The JOY program has been used for structural annotations of such alignments. In this update, we have automated the protocol and focused on inclusion of new features such as mapping of GO terms, absolutely conserved residues among the domains in a superfamily and inclusion of PDBs, that are absent in SCOPe 2.04, using the HMM profiles from the alignments of the superfamily members and are provided as a separate list. We have also implemented a more user-friendly manner of data presentation and options for downloading more features. PASS2.5 version is available at http://caps.ncbs.res.in/pass2/. PMID:26553811

  4. Mutation biases and mutation rate variation around very short human microsatellites revealed by human-chimpanzee-orangutan genomic sequence alignments.

    PubMed

    Amos, William

    2010-09-01

    I have studied mutation patterns around very short microsatellites, focusing mainly on sequences carrying only two repeat units. By using human-chimpanzee-orangutan alignments, inferences can be made about both the relative rates of mutations and which bases have mutated. I find remarkable non-randomness, with mutation rate depending on a base's position relative to the microsatellite, the identity of the base itself and the motif in the microsatellite. Comparing the patterns around AC2 with those around other four-base combinations reveals that AC2 does not stand out as being special in the sense that non-repetitive tetramers also generate strong mutation biases. However, comparing AC2 and AC3 with AC4 reveals a step change in both the rate and nature of mutations occurring, suggesting a transition state, AC4 exhibiting an alternating high-low mutation rate pattern consistent with the sequence patterning seen around longer microsatellites. Surprisingly, most changes in repeat number occur through base substitutions rather than slippage, and the relative probability of gaining versus losing a repeat in this way varies greatly with repeat number. Slippage mutations reveal rather similar patterns of mutability compared with point mutations, being rare at two repeats where most cause the loss of a repeat, with both mutation rate and the proportion of expansion mutations increasing up to 6-8 repeats. Inferences about longer repeat tracts are hampered by uncertainties about the proportion of multi-species alignments that fail due to multi-repeat mutations and other rearrangements. PMID:20700734

  5. An Alignment-Free "Metapeptide" Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing.

    PubMed

    May, Damon H; Timmins-Schiffman, Emma; Mikan, Molly P; Harvey, H Rodger; Borenstein, Elhanan; Nunn, Brook L; Noble, William S

    2016-08-01

    In principle, tandem mass spectrometry can be used to detect and quantify the peptides present in a microbiome sample, enabling functional and taxonomic insight into microbiome metabolic activity. However, the phylogenetic diversity constituting a particular microbiome is often unknown, and many of the organisms present may not have assembled genomes. In ocean microbiome samples, with particularly diverse and uncultured bacterial communities, it is difficult to construct protein databases that contain the bulk of the peptides in the sample without losing detection sensitivity due to the overwhelming number of candidate peptides for each tandem mass spectrum. We describe a method for deriving "metapeptides" (short amino acid sequences that may be represented in multiple organisms) from shotgun metagenomic sequencing of microbiome samples. In two ocean microbiome samples, we constructed site-specific metapeptide databases to detect more than one and a half times as many peptides as by searching against predicted genes from an assembled metagenome and roughly three times as many peptides as by searching against the NCBI environmental proteome database. The increased peptide yield has the potential to enrich the taxonomic and functional characterization of sample metaproteomes. PMID:27396978

  6. The map-based genome sequence of Spirodela polyrhiza aligned with its chromosomes, a reference for karyotype evolution.

    PubMed

    Cao, Hieu Xuan; Vu, Giang Thi Ha; Wang, Wenqin; Appenroth, Klaus J; Messing, Joachim; Schubert, Ingo

    2016-01-01

    Duckweeds are aquatic monocotyledonous plants of potential economic interest with fast vegetative propagation, comprising 37 species with variable genome sizes (0.158-1.88 Gbp). The genomic sequence of Spirodela polyrhiza, the smallest and the most ancient duckweed genome, needs to be aligned to its chromosomes as a reference and prerequisite to study the genome and karyotype evolution of other duckweed species. We selected physically mapped bacterial artificial chromosomes (BACs) containing Spirodela DNA inserts with little or no repetitive elements as probes for multicolor fluorescence in situ hybridization (mcFISH), using an optimized BAC pooling strategy, to validate its physical map and correlate it with its chromosome complement. By consecutive mcFISH analyses, we assigned the originally assembled 32 pseudomolecules (supercontigs) of the genomic sequences to the 20 chromosomes of S. polyrhiza. A Spirodela cytogenetic map containing 96 BAC markers with an average distance of 0.89 Mbp was constructed. Using a cocktail of 41 BACs in three colors, all chromosome pairs could be individualized simultaneously. Seven ancestral blocks emerged from duplicated chromosome segments of 19 Spirodela chromosomes. The chromosomally integrated genome of S. polyrhiza and the established prerequisites for comparative chromosome painting enable future studies on the chromosome homoeology and karyotype evolution of duckweed species. PMID:26305472

  7. CLaMS: Classifier for Metagenomic Sequences

    Energy Science and Technology Software Center (ESTSC)

    2010-12-01

    CLaMS-"Classifer for Metagenonic Sequences" is a Java application for binning assembled metagenomes wings user-specified training sequence sets and other user-specified initial parameters. Since ClAmS analyzes and matches sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; CLaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 Ghz. Intel Core 2 Duo processor and 2 GB Ram. CLaMS is meant to be desktop applicationmore » for biologist and can be run on any machine under any operating system on which the Java Runtime Environment is enabled. CLaMS is freely available in both GVI-based and command-line based forms.« less

  8. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.

    PubMed

    Kuraku, Shigehiro; Zmasek, Christian M; Nishimura, Osamu; Katoh, Kazutaka

    2013-07-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology. PMID:23677614

  9. GUTSS: An Alignment-Free Sequence Comparison Method for Use in Human Intestinal Microbiome and Fecal Microbiota Transplantation Analysis

    PubMed Central

    Heltshe, Sonya L.; Hayden, Hillary S.; Radey, Matthew C.; Weiss, Eli J.; Damman, Christopher J.; Zisman, Timothy L.; Suskind, David L.; Miller, Samuel I.

    2016-01-01

    Background Comparative analysis of gut microbiomes in clinical studies of human diseases typically rely on identification and quantification of species or genes. In addition to exploring specific functional characteristics of the microbiome and potential significance of species diversity or expansion, microbiome similarity is also calculated to study change in response to therapies directed at altering the microbiome. Established ecological measures of similarity can be constructed from species abundances, however methods for calculating these commonly used ecological measures of similarity directly from whole genome shotgun (WGS) metagenomic sequence are lacking. Results We present an alignment-free method for calculating similarity of WGS metagenomic sequences that is analogous to the Bray–Curtis index for species, implemented by the General Utility for Testing Sequence Similarity (GUTSS) software application. This method was applied to intestinal microbiomes of healthy young children to measure developmental changes toward an adult microbiome during the first 3 years of life. We also calculate similarity of donor and recipient microbiomes to measure establishment, or engraftment, of donor microbiota in fecal microbiota transplantation (FMT) studies focused on mild to moderate Crohn's disease. We show how a relative index of similarity to donor can be calculated as a measure of change in a patient's microbiome toward that of the donor in response to FMT. Conclusion Because clinical efficacy of the transplant procedure cannot be fully evaluated without analysis methods to quantify actual FMT engraftment, we developed a method for detecting change in the gut microbiome that is independent of species identification and database bias, sensitive to changes in relative abundance of the microbial constituents, and can be formulated as an index for correlating engraftment success with clinical measures of disease. More generally, this method may be applied to clinical

  10. ISHAN: sequence homology analysis package.

    PubMed

    Shil, Pratip; Dudani, Niraj; Vidyasagar, Pandit B

    2006-01-01

    Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention. PMID:17274766

  11. The Oryza map alignment project: Construction, alignment and analysis of 12 BAC fingerprint/end sequence framework physical maps that represent the 10 genome types of genus Oryza

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Oryza Map Alignment Project (OMAP) provides the first comprehensive experimental system for understanding the evolution, physiology and biochemistry of a full genus in plants or animals. We have constructed twelve deep-coverage BAC libraries that are representative of both diploid and tetraploid...

  12. Model Checking JAVA Programs Using Java Pathfinder

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Pressburger, Thomas

    2000-01-01

    This paper describes a translator called JAVA PATHFINDER from JAVA to PROMELA, the "programming language" of the SPIN model checker. The purpose is to establish a framework for verification and debugging of JAVA programs based on model checking. This work should be seen in a broader attempt to make formal methods applicable "in the loop" of programming within NASA's areas such as space, aviation, and robotics. Our main goal is to create automated formal methods such that programmers themselves can apply these in their daily work (in the loop) without the need for specialists to manually reformulate a program into a different notation in order to analyze the program. This work is a continuation of an effort to formally verify, using SPIN, a multi-threaded operating system programmed in Lisp for the Deep-Space 1 spacecraft, and of previous work in applying existing model checkers and theorem provers to real applications.

  13. Diversity Measures in Environmental Sequences Are Highly Dependent on Alignment Quality—Data from ITS and New LSU Primers Targeting Basidiomycetes

    PubMed Central

    Fischer, Christiane; Daniel, Rolf; Wubet, Tesfaye

    2012-01-01

    The ribosomal DNA comprised of the ITS1-5.8S-ITS2 regions is widely used as a fungal marker in molecular ecology and systematics but cannot be aligned with confidence across genetically distant taxa. In order to study the diversity of Agaricomycotina in forest soils, we designed primers targeting the more alignable 28S (LSU) gene, which should be more useful for phylogenetic analyses of the detected taxa. This paper compares the performance of the established ITS1F/4B primer pair, which targets basidiomycetes, to that of two new pairs. Key factors in the comparison were the diversity covered, off-target amplification, rarefaction at different Operational Taxonomic Unit (OTU) cutoff levels, sensitivity of the method used to process the alignment to missing data and insecure positional homology, and the congruence of monophyletic clades with OTU assignments and BLAST-derived OTU names. The ITS primer pair yielded no off-target amplification but also exhibited the least fidelity to the expected phylogenetic groups. The LSU primers give complementary pictures of diversity, but were more sensitive to modifications of the alignment such as the removal of difficult-to align stretches. The LSU primers also yielded greater numbers of singletons but also had a greater tendency to produce OTUs containing sequences from a wider variety of species as judged by BLAST similarity. We introduced some new parameters to describe alignment heterogeneity based on Shannon entropy and the extent and contents of the OTUs in a phylogenetic tree space. Our results suggest that ITS should not be used when calculating phylogenetic trees from genetically distant sequences obtained from environmental DNA extractions and that it is inadvisable to define OTUs on the basis of very heterogeneous alignments. PMID:22363808

  14. Data in support of the discovery of alternative splicing variants of quail LEPR and the evolutionary conservation of qLEPRl by nucleotide and amino acid sequences alignment.

    PubMed

    Wang, Dandan; Xu, Chunlin; Wang, Taian; Li, Hong; Li, Yanmin; Ren, Junxiao; Tian, Yadong; Li, Zhuanjian; Jiao, Yuping; Kang, Xiangtao; Liu, Xiaojun

    2016-03-01

    Leptin receptor (LEPR) belongs to the class I cytokine receptor superfamily which share common structural features and signal transduction pathways. Although multiple LEPR isoforms, which are derived from one gene, were identified in mammals, they were rarely found in avian except the long LEPR. Four alternative splicing variants of quail LEPR (qLEPR) had been cloned and sequenced for the first time (Wang et al., 2015 [1]). To define patterns of the four splicing variants (qLEPRl, qLEPR-a, qLEPR-b and qLEPR-c) and locate the conserved regions of qLEPRl, this data article provides nucleotide sequence alignment of qLEPR and amino acid sequence alignment of representative vertebrate LEPR. The detailed analysis was shown in [1]. PMID:26759819

  15. Data for amino acid alignment of Japanese stingray melanocortin receptors with other gnathostome melanocortin receptor sequences, and the ligand selectivity of Japanese stingray melanocortin receptors.

    PubMed

    Takahashi, Akiyoshi; Davis, Perry; Reinick, Christina; Mizusawa, Kanta; Sakamoto, Tatsuya; Dores, Robert M

    2016-06-01

    This article contains structure and pharmacological characteristics of melanocortin receptors (MCRs) related to research published in "Characterization of melanocortin receptors from stingray Dasyatis akajei, a cartilaginous fish" (Takahashi et al., 2016) [1]. The amino acid sequences of the stingray, D. akajei, MC1R, MC2R, MC3R, MC4R, and MC5R were aligned with the corresponding melanocortin receptor sequences from the elephant shark, Callorhinchus milii, the dogfish, Squalus acanthias, the goldfish, Carassius auratus, and the mouse, Mus musculus. These alignments provide the basis for phylogenetic analysis of these gnathostome melanocortin receptor sequences. In addition, the Japanese stingray melanocortin receptors were separately expressed in Chinese Hamster Ovary cells, and stimulated with stingray ACTH, α-MSH, β-MSH, γ-MSH, δ-MSH, and β-endorphin. The dose response curves reveal the order of ligand selectivity for each stingray MCR. PMID:27408924

  16. Data in support of the discovery of alternative splicing variants of quail LEPR and the evolutionary conservation of qLEPRl by nucleotide and amino acid sequences alignment

    PubMed Central

    Wang, Dandan; Xu, Chunlin; Wang, Taian; Li, Hong; Li, Yanmin; Ren, Junxiao; Tian, Yadong; Li, Zhuanjian; Jiao, Yuping; Kang, Xiangtao; Liu, Xiaojun

    2015-01-01

    Leptin receptor (LEPR) belongs to the class I cytokine receptor superfamily which share common structural features and signal transduction pathways. Although multiple LEPR isoforms, which are derived from one gene, were identified in mammals, they were rarely found in avian except the long LEPR. Four alternative splicing variants of quail LEPR (qLEPR) had been cloned and sequenced for the first time (Wang et al., 2015 [1]). To define patterns of the four splicing variants (qLEPRl, qLEPR-a, qLEPR-b and qLEPR-c) and locate the conserved regions of qLEPRl, this data article provides nucleotide sequence alignment of qLEPR and amino acid sequence alignment of representative vertebrate LEPR. The detailed analysis was shown in [1]. PMID:26759819

  17. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    PubMed

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-01

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. PMID:27106060

  18. Java for flight software

    NASA Technical Reports Server (NTRS)

    Benowitz, E. G.; Niessner, A. F.

    2003-01-01

    We have successfully demonstrated a portion of the spacecraft attitude control and fault protection, running on a standard Java platform, and are currently in the process of taking advantage of the features provided by the RTSJ.

  19. Java Programming Language

    NASA Technical Reports Server (NTRS)

    Shaykhian, Gholam Ali

    2007-01-01

    The Java seminar covers the fundamentals of Java programming language. No prior programming experience is required for participation in the seminar. The first part of the seminar covers introductory concepts in Java programming including data types (integer, character, ..), operators, functions and constants, casts, input, output, control flow, scope, conditional statements, and arrays. Furthermore, introduction to Object-Oriented programming in Java, relationships between classes, using packages, constructors, private data and methods, final instance fields, static fields and methods, and overloading are explained. The second part of the seminar covers extending classes, inheritance hierarchies, polymorphism, dynamic binding, abstract classes, protected access. The seminar conclude by introducing interfaces, properties of interfaces, interfaces and abstract classes, interfaces and cailbacks, basics of event handling, user interface components with swing, applet basics, converting applications to applets, the applet HTML tags and attributes, exceptions and debugging.

  20. The ENSDF Java Package

    SciTech Connect

    Sonzogni, A.A.

    2005-05-24

    A package of computer codes has been developed to process and display nuclear structure and decay data stored in the ENSDF (Evaluated Nuclear Structure Data File) library. The codes were written in an object-oriented fashion using the java language. This allows for an easy implementation across multiple platforms as well as deployment on web pages. The structure of the different java classes that make up the package is discussed as well as several different implementations.

  1. HPMV: human protein mutation viewer - relating sequence mutations to protein sequence architecture and function changes.

    PubMed

    Sherman, Westley Arthur; Kuchibhatla, Durga Bhavani; Limviphuvadh, Vachiranee; Maurer-Stroh, Sebastian; Eisenhaber, Birgit; Eisenhaber, Frank

    2015-10-01

    Next-generation sequencing advances are rapidly expanding the number of human mutations to be analyzed for causative roles in genetic disorders. Our Human Protein Mutation Viewer (HPMV) is intended to explore the biomolecular mechanistic significance of non-synonymous human mutations in protein-coding genomic regions. The tool helps to assess whether protein mutations affect the occurrence of sequence-architectural features (globular domains, targeting signals, post-translational modification sites, etc.). As input, HPMV accepts protein mutations - as UniProt accessions with mutations (e.g. HGVS nomenclature), genome coordinates, or FASTA sequences. As output, HPMV provides an interactive cartoon showing the mutations in relation to elements of the sequence architecture. A large variety of protein sequence architectural features were selected for their particular relevance to mutation interpretation. Clicking a sequence feature in the cartoon expands a tree view of additional information including multiple sequence alignments of conserved domains and a simple 3D viewer mapping the mutation to known PDB structures, if available. The cartoon is also correlated with a multiple sequence alignment of similar sequences from other organisms. In cases where a mutation is likely to have a straightforward interpretation (e.g. a point mutation disrupting a well-understood targeting signal), this interpretation is suggested. The interactive cartoon can be downloaded as standalone viewer in Java jar format to be saved and viewed later with only a standard Java runtime environment. The HPMV website is: http://hpmv.bii.a-star.edu.sg/ . PMID:26503432

  2. easyPAC: A Tool for Fast Prediction, Testing and Reference Mapping of Degenerate PCR Primers from Alignments or Consensus Sequences

    PubMed Central

    Rosenkranz, David

    2012-01-01

    The PCR-amplification of unknown homologous or paralogous genes generally relies on PCR primers predicted from multi sequence alignments. But increasing sequence divergence can induce the need to use degenerate primers which entails the problem of testing the characteristics, unwanted interactions and potential mispriming of degenerate primers. Here I introduce easyPAC, a new software for the prediction of degenerate primers from multi sequence alignments or single consensus sequences. As a major innovation, easyPAC allows to apply all customary primer test procedures to degenerate primer sequences including fast mapping to reference files. Thus, easyPAC simplifies and expedites the designing of specific degenerate primers enormously. Degenerate primers suggested by easyPAC were used in PCR amplification with subsequent de novo sequencing of TDRD1 exon 11 homologs from several representatives of the haplorrhine primate phylogeny. The results demonstrate the efficient performance of the suggested primers and therefore show that easyPAC can advance upcoming comparative genetic studies.

  3. Structure-based sequence alignment for the beta-trefoil subdomain of the clostridial neurotoxin family provides residue level information about the putative ganglioside binding site.

    PubMed

    Ginalski, K; Venclovas, C; Lesyng, B; Fidelis, K

    2000-09-29

    Clostridial neurotoxins embrace a family of extremely potent toxins comprised of tetanus toxin (TeNT) and seven different serotypes of botulinum toxin (BoNT/A-G). The beta-trefoil subdomain of the C-terminal part of the heavy chain (H(C)), responsible for ganglioside binding, is the most divergent region in clostridial neurotoxins with sequence identity as low as 15%. We re-examined the alignment between family sequences within this subdomain, since in this region all alignments published to date show obvious inconsistencies with the beta-trefoil fold. The final alignment was obtained by considering the general constraints imposed by this fold, and homology modeling studies based on the TeNT structure. Recently solved structures of BoNT/A confirm the validity of this structure-based approach. Taking into account biochemical data and crystal structures of TeNT and BoNT/A, we also re-examined the location of the putative ganglioside binding site and, using the new alignment, characterized this site in other BoNT serotypes. PMID:11018534

  4. Large-scale nucleotide sequence alignment and sequence variability assessment to identify the evolutionarily highly conserved regions for universal screening PCR assay design: an example of influenza A virus.

    PubMed

    Nagy, Alexander; Jiřinec, Tomáš; Černíková, Lenka; Jiřincová, Helena; Havlíčková, Martina

    2015-01-01

    The development of a diagnostic polymerase chain reaction (PCR) or quantitative PCR (qPCR) assay for universal detection of highly variable viral genomes is always a difficult task. The purpose of this chapter is to provide a guideline on how to align, process, and evaluate a huge set of homologous nucleotide sequences in order to reveal the evolutionarily most conserved positions suitable for universal qPCR primer and hybridization probe design. Attention is paid to the quantification and clear graphical visualization of the sequence variability at each position of the alignment. In addition, specific problems related to the processing of the extremely large sequence pool are highlighted. All of these steps are performed using an ordinary desktop computer without the need for extensive mathematical or computational skills. PMID:25697651

  5. Fast statistical alignment.

    PubMed

    Bradley, Robert K; Roberts, Adam; Smoot, Michael; Juvekar, Sudeep; Do, Jaeyoung; Dewey, Colin; Holmes, Ian; Pachter, Lior

    2009-05-01

    We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment--previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches--yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/. PMID:19478997

  6. Pure Java-based streaming MPEG player

    NASA Astrophysics Data System (ADS)

    Tolba, Osama; Briceno, Hector; McMillan, Leonard

    1999-01-01

    We present a pure Java-based streaming MPEG-1 video player. By implementing the player entirely in Java, we guarantee its functionality across platforms within any Java-enabled web browsers, without the need for native libraries. This allows greater sue of MPEG video sequences, because the users will no longer need to pre-install any software to display video, beyond Java compatibility. This player features a novel forward-mapping IDCT algorithm that allows it to play locally stored, CIF-sized video sequences at 11 frames per second, when run on a personal computer with Java 'just-in-time' compiler. The IDCT algorithm can run with greater speed when the sequence is viewed at reduced size; e.g., performing approximately 1/4 the amount of computation when the user resizes the sequence to 1/2 its original width and height. We are able to play video streams stored anywhere on the Internet with acceptable performance using a proxy server, eliminating the need for large-capacity auxiliary storage. Thus, the player is well suited to small devices, such as digital TV set-top decoders, requiring little more memory than is required for three video frames. Because of our modular design, it is possible to assemble multiple video streams from remote sources and present them simultaneously to the viewers, subject to network and local performance limitations. The same modular system can further provide viewers with their own customized view of each sessions; e.g., moving and resizing the video display window dynamically, and selecting their preferred set of video controls.

  7. Automated insertion of sequences into a ribosomal RNA alignment: An application of computational linguistics in molecular biology

    SciTech Connect

    Taylor, R.C.

    1991-11-01

    This thesis involved the construction of (1) a grammar that incorporates knowledge on base invariancy and secondary structure in a molecule and (2) a parser engine that uses the grammar to position bases into the structural subunits of the molecule. These concepts were combined with a novel pinning technique to form a tool that semi-automates insertion of a new species into the alignment for the 16S rRNA molecule (a component of the ribosome) maintained by Dr. Carl Woese's group at the University of Illinois at Urbana. The tool was tested on species extracted from the alignment and on a group of entirely new species. The results were very encouraging, and the tool should be substantial aid to the curators of the 16S alignment. The construction of the grammar was itself automated, allowing application of the tool to alignments for other molecules. The logic programming language Prolog was used to construct all programs involved. The computational linguistics approach used here was found to be a useful way to attach the problem of insertion into an alignment.

  8. Automated insertion of sequences into a ribosomal RNA alignment: An application of computational linguistics in molecular biology

    SciTech Connect

    Taylor, R.C.

    1991-11-01

    This thesis involved the construction of (1) a grammar that incorporates knowledge on base invariancy and secondary structure in a molecule and (2) a parser engine that uses the grammar to position bases into the structural subunits of the molecule. These concepts were combined with a novel pinning technique to form a tool that semi-automates insertion of a new species into the alignment for the 16S rRNA molecule (a component of the ribosome) maintained by Dr. Carl Woese`s group at the University of Illinois at Urbana. The tool was tested on species extracted from the alignment and on a group of entirely new species. The results were very encouraging, and the tool should be substantial aid to the curators of the 16S alignment. The construction of the grammar was itself automated, allowing application of the tool to alignments for other molecules. The logic programming language Prolog was used to construct all programs involved. The computational linguistics approach used here was found to be a useful way to attach the problem of insertion into an alignment.

  9. Handling Permutation in Sequence Comparison: Genome-Wide Enhancer Prediction in Vertebrates by a Novel Non-Linear Alignment Scoring Principle

    PubMed Central

    Dolle, Dirk; Mateo, Juan L.; Eichenlaub, Michael P.; Sinn, Rebecca; Reinhardt, Robert; Höckendorf, Burkhard; Inoue, Daigo; Centanin, Lazaro; Ettwiller, Laurence; Wittbrodt, Joachim

    2015-01-01

    Enhancers have been described to evolve by permutation without changing function. This has posed the problem of how to predict enhancer elements that are hidden from alignment-based approaches due to the loss of co-linearity. Alignment-free algorithms have been proposed as one possible solution. However, this approach is hampered by several problems inherent to its underlying working principle. Here we present a new approach, which combines the power of alignment and alignment-free techniques into one algorithm. It allows the prediction of enhancers based on the query and target sequence only, no matter whether the regulatory logic is co-linear or reshuffled. To test our novel approach, we employ it for the prediction of enhancers across the evolutionary distance of ~450Myr between human and medaka. We demonstrate its efficacy by subsequent in vivo validation resulting in 82% (9/11) of the predicted medaka regions showing reporter activity. These include five candidates with partially co-linear and four with reshuffled motif patterns. Orthology in flanking genes and conservation of the detected co-linear motifs indicates that those candidates are likely functionally equivalent enhancers. In sum, our results demonstrate that the proposed principle successfully predicts mutated as well as permuted enhancer regions at an encouragingly high rate. PMID:26505748

  10. JAVA PathFinder

    NASA Technical Reports Server (NTRS)

    Mehhtz, Peter

    2005-01-01

    JPF is an explicit state software model checker for Java bytecode. Today, JPF is a swiss army knife for all sort of runtime based verification purposes. This basically means JPF is a Java virtual machine that executes your program not just once (like a normal VM), but theoretically in all possible ways, checking for property violations like deadlocks or unhandled exceptions along all potential execution paths. If it finds an error, JPF reports the whole execution that leads to it. Unlike a normal debugger, JPF keeps track of every step how it got to the defect.

  11. Java for flight software

    NASA Technical Reports Server (NTRS)

    Benowitz, E.; Niessner, A.

    2003-01-01

    This work involves developing representative mission-critical spacecraft software using the Real-Time Specification for Java (RTSJ). This work currently leverages actual flight software used in the design of actual flight software in the NASA's Deep Space 1 (DSI), which flew in 1998.

  12. Java Metadata Facility

    SciTech Connect

    Buttler, D J

    2008-03-06

    The Java Metadata Facility is introduced by Java Specification Request (JSR) 175 [1], and incorporated into the Java language specification [2] in version 1.5 of the language. The specification allows annotations on Java program elements: classes, interfaces, methods, and fields. Annotations give programmers a uniform way to add metadata to program elements that can be used by code checkers, code generators, or other compile-time or runtime components. Annotations are defined by annotation types. These are defined the same way as interfaces, but with the symbol {at} preceding the interface keyword. There are additional restrictions on defining annotation types: (1) They cannot be generic; (2) They cannot extend other annotation types or interfaces; (3) Methods cannot have any parameters; (4) Methods cannot have type parameters; (5) Methods cannot throw exceptions; and (6) The return type of methods of an annotation type must be a primitive, a String, a Class, an annotation type, or an array, where the type of the array is restricted to one of the four allowed types. See [2] for additional restrictions and syntax. The methods of an annotation type define the elements that may be used to parameterize the annotation in code. Annotation types may have default values for any of its elements. For example, an annotation that specifies a defect report could initialize an element defining the defect outcome submitted. Annotations may also have zero elements. This could be used to indicate serializability for a class (as opposed to the current Serializability interface).

  13. Java Tool Retirement

    Atmospheric Science Data Center

    2014-05-15

    ... 08:00 am EDT Event Impact:  The ASDC Java Order Tool was officially retired on Wednesday, May 14, 2014.  The HTML Order Tool and additional options are available from our Order Data Page .  Please update all bookmarks.     ...

  14. A Java commodity grid kit.

    SciTech Connect

    von Laszewski, G.; Foster, I.; Gawor, J.; Lane, P.; Mathematics and Computer Science

    2001-07-01

    In this paper we report on the features of the Java Commodity Grid Kit. The Java CoG Kit provides middleware for accessing Grid functionality from the Java framework. Java CoG Kit middleware is general enough to design a variety of advanced Grid applications with quite different user requirements. Access to the Grid is established via Globus protocols, allowing the Java CoG Kit to communicate also with the C Globus reference implementation. Thus, the Java CoG Kit provides Grid developers with the ability to utilize the Grid, as well as numerous additional libraries and frameworks developed by the Java community to enable network, Internet, enterprise, and peer-to peer computing. A variety of projects have successfully used the client libraries of the Java CoG Kit to access Grids driven by the C Globus software. In this paper we also report on the efforts to develop server side Java CoG Kit components. As part of this research we have implemented a prototype pure Java resource management system that enables one to run Globus jobs on platforms on which a Java virtual machine is supported, including Windows NT machines.

  15. GOblet: a platform for Gene Ontology annotation of anonymous sequence data

    PubMed Central

    Groth, Detlef; Lehrach, Hans; Hennig, Steffen

    2004-01-01

    GOblet is a comprehensive web server application providing the annotation of anonymous sequence data with Gene Ontology (GO) terms. It uses a variety of different protein databases (human, murines, invertebrates, plants, sp-trembl) and their respective GO mappings. The user selects the appropriate database and alignment threshold and thereafter submits single or multiple nucleotide or protein sequences. Results are shown in different ways, e.g. as survey statistics for the main GO categories for all sequences or as detailed results for each single sequence that has been submitted. In its newest version, GOblet allows the batch submission of sequences and provides an improved display of results with the aid of Java applets. All output data, together with the Java applet, are packed to a downloadable archive for local installation and analysis. GOblet can be accessed freely at http://goblet.molgen.mpg.de. PMID:15215401

  16. CATO: The Clone Alignment Tool.

    PubMed

    Henstock, Peter V; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow. PMID:27459605

  17. CATO: The Clone Alignment Tool

    PubMed Central

    Henstock, Peter V.; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow. PMID:27459605

  18. Java Vertexing Tools

    SciTech Connect

    Strube, Jan; Graf, Norman; /SLAC

    2006-03-03

    This document describes the implementation of the topological vertex finding algorithm ZVTOP within the org.lcsim reconstruction and analysis framework. At the present date, Java vertexing tools allow users to perform topological vertexing on tracks that have been obtained from a Fast MC simulation. An implementation that will be able to handle fully reconstructed events is being designed from the ground up for longevity and maintainability.

  19. The PCR-SSP Manager computer program: a tool for maintaining sequence alignments and automatically updating the specificities of PCR-SSP primers and primer mixes.

    PubMed

    Bunce, M; Barnardo, M C; Welsh, K I

    1998-08-01

    An emerging problem of molecular typing methods such as PCR amplification using sequence-specific primers (PCR-SSP) is that they frequently require updating as new alleles are constantly being described which potentially affect the specificity of every PCR-SSP reaction. PCR-SSP uses pairs of primers to detect cis-linked polymorphisms and thus each new allele described must be compared to each individual primer pair. Furthermore, sequence homology between the various loci for class I and class II means that, for example, new HLA-A sequences have to be compared with HLA-B and HLA-C primer mixes to rule out cross-locus amplification. We have developed a computer program known as SSP Manager which is capable of aligning HLA class I and class II sequences obtained from Internet-accessible databases such as GenBank. The program then updates all individual primer specificities held in its database before updating the specificities of all primer mixes. Sets of primer mixes can then be combined from the primer mix directory to create PCR-SSP typing trays which are subsequently analysed by the program. A report is generated which stipulates whether all known sequences are amplified and the reason for apparent failure to test for individual alleles, e.g. a lack of relevant sequence information. SSP Manager has the flexibility to cope with unusual sequences (deletions and insertions), primers with internal mismatches and primers with a deliberate mismatch. The program also has many tools for developing new primer mixes, such as the facility to search for novel reactions using Boolean operators. The organisation and operational use of the SSP Manager program is described and its uses are illustrated with an updated allele list for our previously described Phototyping PCR-SSP class I and class II typing set. The SSP Manager is available on request from the authors. PMID:9756405

  20. Theoretical assessment of feasibility to sequence DNA through interlayer electronic tunneling transport at aligned nanopores in bilayer graphene

    PubMed Central

    Prasongkit, Jariyanee; Feliciano, Gustavo T.; Rocha, Alexandre R.; He, Yuhui; Osotchan, Tanakorn; Ahuja, Rajeev; Scheicher, Ralph H.

    2015-01-01

    Fast, cost effective, single-shot DNA sequencing could be the prelude of a new era in genetics. As DNA encodes the information for the production of proteins in all known living beings on Earth, determining the nucleobase sequences is the first and necessary step in that direction. Graphene-based nanopore devices hold great promise for next-generation DNA sequencing. In this work, we develop a novel approach for sequencing DNA using bilayer graphene to read the interlayer conductance through the layers in the presence of target nucleobases. Classical molecular dynamics simulations of DNA translocation through the pore were performed to trace the nucleobase trajectories and evaluate the interaction between the nucleobases and the nanopore. This interaction stabilizes the bases in different orientations, resulting in smaller fluctuations of the nucleobases inside the pore. We assessed the performance of a bilayer graphene nanopore setup for the purpose of DNA sequencing by employing density functional theory and non-equilibrium Green’s function method to investigate the interlayer conductance of nucleobases coupling simultaneously to the top and bottom graphene layers. The obtained conductance is significantly affected by the presence of DNA in the bilayer graphene nanopore, allowing us to analyze DNA sequences. PMID:26634811

  1. Theoretical assessment of feasibility to sequence DNA through interlayer electronic tunneling transport at aligned nanopores in bilayer graphene

    NASA Astrophysics Data System (ADS)

    Prasongkit, Jariyanee; Feliciano, Gustavo T.; Rocha, Alexandre R.; He, Yuhui; Osotchan, Tanakorn; Ahuja, Rajeev; Scheicher, Ralph H.

    2015-12-01

    Fast, cost effective, single-shot DNA sequencing could be the prelude of a new era in genetics. As DNA encodes the information for the production of proteins in all known living beings on Earth, determining the nucleobase sequences is the first and necessary step in that direction. Graphene-based nanopore devices hold great promise for next-generation DNA sequencing. In this work, we develop a novel approach for sequencing DNA using bilayer graphene to read the interlayer conductance through the layers in the presence of target nucleobases. Classical molecular dynamics simulations of DNA translocation through the pore were performed to trace the nucleobase trajectories and evaluate the interaction between the nucleobases and the nanopore. This interaction stabilizes the bases in different orientations, resulting in smaller fluctuations of the nucleobases inside the pore. We assessed the performance of a bilayer graphene nanopore setup for the purpose of DNA sequencing by employing density functional theory and non-equilibrium Green’s function method to investigate the interlayer conductance of nucleobases coupling simultaneously to the top and bottom graphene layers. The obtained conductance is significantly affected by the presence of DNA in the bilayer graphene nanopore, allowing us to analyze DNA sequences.

  2. Theoretical assessment of feasibility to sequence DNA through interlayer electronic tunneling transport at aligned nanopores in bilayer graphene.

    PubMed

    Prasongkit, Jariyanee; Feliciano, Gustavo T; Rocha, Alexandre R; He, Yuhui; Osotchan, Tanakorn; Ahuja, Rajeev; Scheicher, Ralph H

    2015-01-01

    Fast, cost effective, single-shot DNA sequencing could be the prelude of a new era in genetics. As DNA encodes the information for the production of proteins in all known living beings on Earth, determining the nucleobase sequences is the first and necessary step in that direction. Graphene-based nanopore devices hold great promise for next-generation DNA sequencing. In this work, we develop a novel approach for sequencing DNA using bilayer graphene to read the interlayer conductance through the layers in the presence of target nucleobases. Classical molecular dynamics simulations of DNA translocation through the pore were performed to trace the nucleobase trajectories and evaluate the interaction between the nucleobases and the nanopore. This interaction stabilizes the bases in different orientations, resulting in smaller fluctuations of the nucleobases inside the pore. We assessed the performance of a bilayer graphene nanopore setup for the purpose of DNA sequencing by employing density functional theory and non-equilibrium Green's function method to investigate the interlayer conductance of nucleobases coupling simultaneously to the top and bottom graphene layers. The obtained conductance is significantly affected by the presence of DNA in the bilayer graphene nanopore, allowing us to analyze DNA sequences. PMID:26634811

  3. Differentiated evolutionary relationships among chordates from comparative alignments of multiple sequences of MyoD and MyoG myogenic regulatory factors.

    PubMed

    Oliani, L C; Lidani, K C F; Gabriel, J E

    2015-01-01

    MyoD and MyoG are transcription factors that have essential roles in myogenic lineage determination and muscle differentiation. The purpose of this study was to compare multiple amino acid sequences of myogenic regulatory proteins to infer evolutionary relationships among chordates. Protein sequences from Mus musculus (P10085 and P12979), human Homo sapiens (P15172 and P15173), bovine Bos taurus (Q7YS82 and Q7YS81), wild pig Sus scrofa (P49811 and P49812), quail Coturnix coturnix (P21572 and P34060), chicken Gallus gallus (P16075 and P17920), rat Rattus norvegicus (Q02346 and P20428), domestic water buffalo Bubalus bubalis (D2SP11 and A7L034), and sheep Ovis aries (Q90477 and D3YKV7) were searched from a non-redundant protein sequence database UniProtKB/Swiss-Prot, and subsequently analyzed using the Mega6.0 software. MyoD evolutionary analyses revealed the presence of three main clusters with all mammals branched in one cluster, members of the order Rodentia (mouse and rat) in a second branch linked to the first, and birds of the order Galliformes (chicken and quail) remaining isolated in a third. MyoG evolutionary analyses aligned sequences in two main clusters, all mammalian specimens grouped in different sub-branches, and birds clustered in a second branch. These analyses suggest that the evolution of MyoD and MyoG was driven by different pathways. PMID:26505406

  4. Phylo-VISTA: An interactive visualization tool for multiple DNAsequence alignments

    SciTech Connect

    Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.; Brudno,Michael; Batzoglou, Serafim; Bethel, E. Wes; Rubin, Edward M.; Hamann,Bernd; Dubchak, Inna

    2003-04-25

    Motivation. The power of multi-sequence comparison forbiological discovery is well established and sequence data from a growinglist of organisms is becoming available. Thus, a need exists forcomputational strategies to visually compare multiple aligned sequencesto support conservation analysis across various species. To be efficientthese visualization algorithms require the ability to universally handlea wide range of evolutionary distances while taking into accountphylogeny Results. We have developed Phylo-VISTA, an interactive tool foranalyzing multiple alignments by visualizing the similarity of DNAsequences among multiple species while considering their phylogenicrelationships. Features include a broad spectrum of resolution parametersfor examining the alignment and the ability to easily compare any subtreeof sequences within a complete alignment dataset. Phylo-VISTA uses VISTAconcepts that have been successfully applied previously to a wide rangeof comparative genomics data analysis problems. Availability Phylo-VISTAis an interactive java applet available for downloading athttp://graphics.cs.ucdavis.edu/~;nyshah/Phylo-VISTA. It is also availableon-line at http://www-gsd.lbl.gov/phylovista and is integrated with theglobal alignment program LAGAN athttp://lagan.stanford.edu.Contactphylovista@lbl.gov

  5. Jannovar: a java library for exome annotation.

    PubMed

    Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

    2014-05-01

    Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. PMID:24677618

  6. MC64-ClustalWP2: a highly-parallel hybrid strategy to align multiple sequences in many-core architectures.

    PubMed

    Díaz, David; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio; Dorado, Gabriel; Gálvez, Sergio

    2014-01-01

    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification

  7. MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

    PubMed Central

    Díaz, David; Esteban, Francisco J.; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio

    2014-01-01

    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification

  8. FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

    PubMed

    Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-Jie; He, Qing-Yu; Zhang, Gong

    2014-01-01

    Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/. PMID:24743329

  9. FANSe2: A Robust and Cost-Efficient Alignment Tool for Quantitative Next-Generation Sequencing Applications

    PubMed Central

    Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-jie; He, Qing-Yu; Zhang, Gong

    2014-01-01

    Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/. PMID:24743329

  10. Implementation of NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry

    2000-01-01

    A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.

  11. SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments.

    PubMed

    Jessen, Leon Eyrich; Hoof, Ilka; Lund, Ole; Nielsen, Morten

    2013-07-01

    Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype-phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying 'hot' or 'cold' regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype-phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/. PMID:23761454

  12. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis

    PubMed Central

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242

  13. Use of Alignment-Free Phylogenetics for Rapid Genome Sequence-Based Typing of Helicobacter pylori Virulence Markers and Antibiotic Susceptibility

    PubMed Central

    Kusters, Johannes G.

    2015-01-01

    Whole-genome sequencing is becoming a leading technology in the typing and epidemiology of microbial pathogens, but the increase in genomic information necessitates significant investment in bioinformatic resources and expertise, and currently used methodologies struggle with genetically heterogeneous bacteria such as the human gastric pathogen Helicobacter pylori. Here we demonstrate that the alignment-free analysis method feature frequency profiling (FFP) can be used to rapidly construct phylogenetic trees of draft bacterial genome sequences on a standard desktop computer and that coupling with in silico genotyping methods gives useful information for comparative and clinical genomic and molecular epidemiology applications. FFP-based phylogenetic trees of seven gastric Helicobacter species matched those obtained by analysis of 16S rRNA genes and ribosomal proteins, and FFP- and core genome single nucleotide polymorphism-based analysis of 63 H. pylori genomes again showed comparable phylogenetic clustering, consistent with genomotypes assigned by using multilocus sequence typing (MLST). Analysis of 377 H. pylori genomes highlighted the conservation of genomotypes and linkage with phylogeographic characteristics and predicted the presence of an incomplete or nonfunctional cag pathogenicity island in 18/276 genomes. In silico analysis of antibiotic susceptibility markers suggests that most H. pylori hspAmerind and hspEAsia isolates are predicted to carry the T2812C mutation potentially conferring low-level clarithromycin resistance, while levels of metronidazole resistance were similar in all multilocus sequence types. In conclusion, the use of FFP phylogenetic clustering and in silico genotyping allows determination of genome evolution and phylogeographic clustering and can contribute to clinical microbiology by genomotyping for outbreak management and the prediction of pathogenic potential and antibiotic susceptibility. PMID:26135867

  14. The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs.

    PubMed

    Cruz-Barbosa, Raúl; Vellido, Alfredo; Giraldo, Jesús

    2015-02-01

    G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The tertiary structure of the transmembrane domain, a gate to the study of protein functionality, is unknown for almost all members of class C GPCRs, which are the target of the current study. As a result, their investigation must often rely on alignments of their amino acid sequences. Sequence alignment entails the risk of missing relevant information. Various approaches have attempted to circumvent this risk through alignment-free transformations of the sequences on the basis of different amino acid physicochemical properties. In this paper, we use several of these alignment-free methods, as well as a basic amino acid composition representation, to transform the available sequences. Novel semi-supervised statistical machine learning methods are then used to discriminate the different class C GPCRs types from the transformed data. This approach is relevant due to the existence of orphan proteins to which type labels should be assigned in a process of deorphanization or reverse pharmacology. The reported experiments show that the proposed techniques provide accurate classification even in settings of extreme class-label scarcity and that fair accuracy can be achieved even with very simple transformation strategies that ignore the sequence ordering. PMID:25367737

  15. Java PathFinder: A Translator From Java to Promela

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus

    1999-01-01

    JAVA PATHFINDER, JPF, is a prototype translator from JAVA to PROMELA, the modeling language of the SPIN model checker. JPF is a product of a major effort by the Automated Software Engineering group at NASA Ames to make model checking technology part of the software process. Experience has shown that severe bugs can be found in final code using this technique, and that automated translation from a programming language to a modeling language like PROMELA can help reducing the effort required.

  16. Combining Multiple Pairwise Structure-based Alignments

    SciTech Connect

    2014-11-12

    CombAlign is a new Python code that generates a gapped, one-to-many, multiple structure-based sequence alignment(MSSA) given a set of pairwise structure-based alignments. In order to better define regions of similarity among related protein structures, it is useful to detect the residue-residue correspondences among a set of pairwise structure alignments. Few codes exist for constructing a one-to-many, multiple sequence alignment derived from a set of structure alignments, and we perceived a need for creating a new tool for combing pairwise structure alignments that would allow for insertion of gaps in the reference structure.

  17. Java applets for Physics Instruction

    NASA Astrophysics Data System (ADS)

    Dukes, Phillip; Hatch, Dorian

    1998-10-01

    We will present a number of Web accessible single concept Java applets created at Brigham Young University and designed for teaching introductory physics. Java applet based problems, along with digitized video, are being developed as part of a web-based physics course at Brigham Young University. It is believed that video and animation can significantly influence students' visualization of some types of physics and that use of these types of tools could significantly improve their understanding of physics concepts.

  18. Model Checker for Java Programs

    NASA Technical Reports Server (NTRS)

    Visser, Willem

    2007-01-01

    Java Pathfinder (JPF) is a verification and testing environment for Java that integrates model checking, program analysis, and testing. JPF consists of a custom-made Java Virtual Machine (JVM) that interprets bytecode, combined with a search interface to allow the complete behavior of a Java program to be analyzed, including interleavings of concurrent programs. JPF is implemented in Java, and its architecture is highly modular to support rapid prototyping of new features. JPF is an explicit-state model checker, because it enumerates all visited states and, therefore, suffers from the state-explosion problem inherent in analyzing large programs. It is suited to analyzing programs less than 10kLOC, but has been successfully applied to finding errors in concurrent programs up to 100kLOC. When an error is found, a trace from the initial state to the error is produced to guide the debugging. JPF works at the bytecode level, meaning that all of Java can be model-checked. By default, the software checks for all runtime errors (uncaught exceptions), assertions violations (supports Java s assert), and deadlocks. JPF uses garbage collection and symmetry reductions of the heap during model checking to reduce state-explosion, as well as dynamic partial order reductions to lower the number of interleavings analyzed. JPF is capable of symbolic execution of Java programs, including symbolic execution of complex data such as linked lists and trees. JPF is extensible as it allows for the creation of listeners that can subscribe to events during searches. The creation of dedicated code to be executed in place of regular classes is supported and allows users to easily handle native calls and to improve the efficiency of the analysis.

  19. On the alignment space.

    PubMed

    Shen, Shi-Yi; Wang, Kui; Hu, Gang; Chen, Lu-Sheng; Zhang, Hua; Xia, Shu-Tao

    2005-01-01

    Sequences with generalized errors which are called mutations in bioinformatics and generalized error-correcting codes are studied in this paper. In the areas of bioinformatics, computer science and information theory, sequences with generalized errors are discussed respectively for different aims. Firstly, we give the definitions of alignment distance and Levenshtein distance by expansion sequences and discuss their properties and relations. Then the modular structure theory is introduced for strictly describe the expansion sequences. We show that the expansion modular structures of sequences form a Boolean algebra. As applications of the modular structure theory, we give a new and more strict proof of triangle inequality for alignment distance. At last, the definition and construction of generalized error-correcting codes are studied, and some optimal codes with small length are listed. PMID:17282158

  20. Going back to Java.

    PubMed

    Critchfield, R

    1985-01-01

    In Indonesia, achievements in food production have helped lower the country's deaths rates and increase life expectancy, making concern about the birthrate all the more critical, particularly in the already crowded Java. Indonesia's rice production in 1985 is expected to reach 26.3 million tons, 58% more than the 1975-79 average. With every country except Malaysia now self-sufficient or surplus in rice, the world market price for rice has dropped markedly. Indonesia's National Logistics Board (BULOG), which aims to establish a floor price for rice, has had to stockpile 3.5 million tons, double its normal reserve and enough for 3 years. Some of it has been kept 2 years already, but it cannot be exported as the quality is low and everybody else also has plenty of rice. Peasants and agriculture experts agree that alternatives to rice pose greater risks in terms of weather and disease. Whatever the government does, rice prices have dropped sharply and are likely to stay down. Fertilizer use can also be expected to decline for the 1st time in years. Indonesia is the scene of a scientific breakthrough, a new hybrid seed corn that grows in the tropics. If seed companies are able to sell seed for half of Indonesia's existing corn acreage, this would be an increase of 1.3 million tons, which would mostly be a surplus to be used for export, processing, or increased human or animal consumption. In revisiting Indonesia, the biggest dissapointment is the failure of family planning to slow the rate of population growth more drastically. 5 years ago, Indonesia's family planning program, started in 1970, appeared a great success. Countrywide, the proportion of women aged 15-44 using contraceptives increased from almost nothing to almost 40% and in Bali topped 60%. Indonesia's overall annual population growth rate had dropped to 1.7%, raising hopes it could be brought down to the 1.2% rate of East Java and Bali by 1985. What has happended instead is that an unexpectedly fast

  1. Java based open architecture controller

    SciTech Connect

    Weinert, G F

    2000-01-13

    At Lawrence Livermore National Laboratory (LLNL) the authors have been developing an open architecture machine tool controller. This work has been patterned after the General Motors (GM) led Open Modular Architecture Controller (OMAC) work, where they have been involved since its inception. The OMAC work has centered on creating sets of implementation neutral application programming interfaces (APIs) for machine control software components. In the work at LLNL, they were among the early adopters of the Java programming language. As an application programming language, it is particularly well suited for component software development. The language contains many features, which along with a well-defined implementation API (such as the OMAC APIs) allows third party binary files to be integrated into a working system. Because of its interpreted nature, Java allows rapid integration testing of components. However, for real-time systems development, the Java programming language presents many drawbacks. For instance, lack of well defined scheduling semantics and threading behavior can present many unwanted challenges. Also, the interpreted nature of the standard Java Virtual Machine (JVM) presents an immediate performance hit. Various real-time Java vendors are currently addressing some of these drawbacks. The various pluses and minuses of using the Java programming language and environment, with regard to a component-based controller, will be outlined.

  2. JAVA Stereo Display Toolkit

    NASA Technical Reports Server (NTRS)

    Edmonds, Karina

    2008-01-01

    This toolkit provides a common interface for displaying graphical user interface (GUI) components in stereo using either specialized stereo display hardware (e.g., liquid crystal shutter or polarized glasses) or anaglyph display (red/blue glasses) on standard workstation displays. An application using this toolkit will work without modification in either environment, allowing stereo software to reach a wider audience without sacrificing high-quality display on dedicated hardware. The toolkit is written in Java for use with the Swing GUI Toolkit and has cross-platform compatibility. It hooks into the graphics system, allowing any standard Swing component to be displayed in stereo. It uses the OpenGL graphics library to control the stereo hardware and to perform the rendering. It also supports anaglyph and special stereo hardware using the same API (application-program interface), and has the ability to simulate color stereo in anaglyph mode by combining the red band of the left image with the green/blue bands of the right image. This is a low-level toolkit that accomplishes simply the display of components (including the JadeDisplay image display component). It does not include higher-level functions such as disparity adjustment, 3D cursor, or overlays all of which can be built using this toolkit.

  3. Java Radar Analysis Tool

    NASA Technical Reports Server (NTRS)

    Zaczek, Mariusz P.

    2005-01-01

    Java Radar Analysis Tool (JRAT) is a computer program for analyzing two-dimensional (2D) scatter plots derived from radar returns showing pieces of the disintegrating Space Shuttle Columbia. JRAT can also be applied to similar plots representing radar returns showing aviation accidents, and to scatter plots in general. The 2D scatter plots include overhead map views and side altitude views. The superposition of points in these views makes searching difficult. JRAT enables three-dimensional (3D) viewing: by use of a mouse and keyboard, the user can rotate to any desired viewing angle. The 3D view can include overlaid trajectories and search footprints to enhance situational awareness in searching for pieces. JRAT also enables playback: time-tagged radar-return data can be displayed in time order and an animated 3D model can be moved through the scene to show the locations of the Columbia (or other vehicle) at the times of the corresponding radar events. The combination of overlays and playback enables the user to correlate a radar return with a position of the vehicle to determine whether the return is valid. JRAT can optionally filter single radar returns, enabling the user to selectively hide or highlight a desired radar return.

  4. Tertiary structure prediction of the KIX domain of CBP using Monte Carlo simulations driven by restraints derived from multiple sequence alignments.

    PubMed

    Ortiz, A R; Kolinski, A; Skolnick, J

    1998-02-15

    Using a recently developed protein folding algorithm, a prediction of the tertiary structure of the KIX domain of the CREB binding protein is described. The method incorporates predicted secondary and tertiary restraints derived from multiple sequence alignments in a reduced protein model whose conformational space is explored by Monte Carlo dynamics. Secondary structure restraints are provided by the PHD secondary structure prediction algorithm that was modified for the presence of predicted U-turns, i.e., regions where the chain reverses global direction. Tertiary restraints are obtained via a two-step process: First, seed side-chain contacts are identified from a correlated mutation analysis, and then, a threading-based algorithm expands the number of these seed contacts. Blind predictions indicate that the KIX domain is a putative three-helix bundle, although the chirality of the bundle could not be uniquely determined. The expected root-mean-square deviation for the correct chirality of the KIX domain is between 5.0 and 6.2 A. This is to be compared with the estimate of 12.9 A that would be expected by a random prediction, using the model of F. Cohen and M. Sternberg (J. Mol. Biol. 138:321-333, 1980). PMID:9517544

  5. Java Application Shell: A Framework for Piecing Together Java Applications

    NASA Technical Reports Server (NTRS)

    Miller, Philip; Powers, Edward I. (Technical Monitor)

    2001-01-01

    This session describes the architecture of Java Application Shell (JAS), a Swing-based framework for developing interactive Java applications. Java Application Shell is being developed by Commerce One, Inc. for NASA Goddard Space Flight Center Code 588. The purpose of JAS is to provide a framework for the development of Java applications, providing features that enable the development process to be more efficient, consistent and flexible. Fundamentally, JAS is based upon an architecture where an application is considered a collection of 'plugins'. In turn, a plug-in is a collection of Swing actions defined using XML and packaged in a jar file. Plug-ins may be local to the host platform or remotely-accessible through HTTP. Local and remote plugins are automatically discovered by JAS upon application startup; plugins may also be loaded dynamically without having to re-start the application. Using Extensible Markup Language (XML) to define actions, as opposed to hardcoding them in application logic, allows easier customization of application-specific operations by separating application logic from presentation. Through XML, a developer defines an action that may appear on any number of menus, toolbars, and buttons. Actions maintain and propagate enable/disable states and specify icons, tool-tips, titles, etc. Furthermore, JAS allows actions to be implemented using various scripting languages through the use of IBM's Bean Scripting Framework. Scripted action implementation is seamless to the end-user. In addition to action implementation, scripts may be used for application and unit-level testing. In the case of application-level testing, JAS has hooks to assist a script in simulating end-user input. JAS also provides property and user preference management, JavaHelp, Undo/Redo, Multi-Document Interface, Single-Document Interface, printing, and logging. Finally, Jini technology has also been included into the framework by means of a Jini services browser and the

  6. Model Checking Real Time Java Using Java PathFinder

    NASA Technical Reports Server (NTRS)

    Lindstrom, Gary; Mehlitz, Peter C.; Visser, Willem

    2005-01-01

    The Real Time Specification for Java (RTSJ) is an augmentation of Java for real time applications of various degrees of hardness. The central features of RTSJ are real time threads; user defined schedulers; asynchronous events, handlers, and control transfers; a priority inheritance based default scheduler; non-heap memory areas such as immortal and scoped, and non-heap real time threads whose execution is not impeded by garbage collection. The Robust Software Systems group at NASA Ames Research Center has JAVA PATHFINDER (JPF) under development, a Java model checker. JPF at its core is a state exploring JVM which can examine alternative paths in a Java program (e.g., via backtracking) by trying all nondeterministic choices, including thread scheduling order. This paper describes our implementation of an RTSJ profile (subset) in JPF, including requirements, design decisions, and current implementation status. Two examples are analyzed: jobs on a multiprogramming operating system, and a complex resource contention example involving autonomous vehicles crossing an intersection. The utility of JPF in finding logic and timing errors is illustrated, and the remaining challenges in supporting all of RTSJ are assessed.

  7. Monitoring Java Programs with Java PathExplorer

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Rosu, Grigore; Clancy, Daniel (Technical Monitor)

    2001-01-01

    We present recent work on the development Java PathExplorer (JPAX), a tool for monitoring the execution of Java programs. JPAX can be used during program testing to gain increased information about program executions, and can potentially furthermore be applied during operation to survey safety critical systems. The tool facilitates automated instrumentation of a program's late code which will then omit events to an observer during its execution. The observer checks the events against user provided high level requirement specifications, for example temporal logic formulae, and against lower level error detection procedures, for example concurrency related such as deadlock and data race algorithms. High level requirement specifications together with their underlying logics are defined in the Maude rewriting logic, and then can either be directly checked using the Maude rewriting engine, or be first translated to efficient data structures and then checked in Java.

  8. Vertical Alignment and Collaboration.

    ERIC Educational Resources Information Center

    Bergman, Donna; Calzada, Lucio; LaPointe, Nancy; Lee, Audra; Sullivan, Lynn

    This study investigated whether vertical (grade level sequence) alignment of the curriculum in conjunction with teacher collaboration would enhance student performance on the Texas Assessment of Academic Skills (TAAS) test in south Texas school districts of various sizes. Surveys were mailed to the office of the superintendent of 47 school…

  9. JAVA based LCD Reconstruction and Analysis Tools

    SciTech Connect

    Bower, G.

    2004-10-11

    We summarize the current status and future developments of the North American Group's Java-based system for studying physics and detector design issues at a linear collider. The system is built around Java Analysis Studio (JAS) an experiment-independent Java-based utility for data analysis. Although the system is an integrated package running in JAS, many parts of it are also standalone Java utilities.

  10. Dynamic triggering of Lusi, East Java Basin

    NASA Astrophysics Data System (ADS)

    Lupi, Matteo; Saenger, Erik H.; Fuchs, Florian; Miller, Steve

    2016-04-01

    On the 27th of May 2006, a M6.3 strike slip earthquake struck beneath Yogyakarta, Java. Forty-seven hours later a mixture of mud, breccia, and gas reached the surface near Sidoarjo, 250 km far from the epicenter, creating several mud vents aligned along a NW-SE direction. The mud eruption reached a peak of 180.000 km3 of erupted material per day and it is still ongoing. The major eruption crater was named Lusi and represents the surface expression of a newborn sedimentary-hosted hydrothermal system. Lusi flooded several villages causing a loss of approximately 4 billions to Indonesia. Previous geochemical and geological data suggest that the Yogyakarta earthquake may have reactivated parts of the Watukosek fault system, a strike slip structure upon which Lusi resides. The Watukosek fault systems connects the East Java basin to the volcanic arc, which may explain the presence of both biogenic and thermogenic fluids. To quantify the effects of incoming seismic energy at Lusi we conducted a seismic wave propagation study on a geological model of Lusi's structure. A key feature of our model is a low velocity shear zone in the Kalibeng formation caused by elevated pore pressures, which is often neglected in other studies. Our analysis highlights the importance of the overall geological structure that focused the seismic energy causing elevated strain rates at depth. In particular, we show that body waves generated by the Yogyakarta earthquake may have induced liquefaction of the Kalibeng formation. As consequence, the liquefied mud injected and reactivated parts of the Watukosek fault system. Our findings are in agreement with previous studies suggesting that Lusi was an unfortunate case of dynamic triggering promoted by the Yogyakarta earthquake.

  11. MzJava: An open source library for mass spectrometry data processing.

    PubMed

    Horlacher, Oliver; Nikitin, Frederic; Alocci, Davide; Mariethoz, Julien; Müller, Markus; Lisacek, Frederique

    2015-11-01

    Mass spectrometry (MS) is a widely used and evolving technique for the high-throughput identification of molecules in biological samples. The need for sharing and reuse of code among bioinformaticians working with MS data prompted the design and implementation of MzJava, an open-source Java Application Programming Interface (API) for MS related data processing. MzJava provides data structures and algorithms for representing and processing mass spectra and their associated biological molecules, such as metabolites, glycans and peptides. MzJava includes functionality to perform mass calculation, peak processing (e.g. centroiding, filtering, transforming), spectrum alignment and clustering, protein digestion, fragmentation of peptides and glycans as well as scoring functions for spectrum-spectrum and peptide/glycan-spectrum matches. For data import and export MzJava implements readers and writers for commonly used data formats. For many classes support for the Hadoop MapReduce (hadoop.apache.org) and Apache Spark (spark.apache.org) frameworks for cluster computing was implemented. The library has been developed applying best practices of software engineering. To ensure that MzJava contains code that is correct and easy to use the library's API was carefully designed and thoroughly tested. MzJava is an open-source project distributed under the AGPL v3.0 licence. MzJava requires Java 1.7 or higher. Binaries, source code and documentation can be downloaded from http://mzjava.expasy.org and https://bitbucket.org/sib-pig/mzjava. This article is part of a Special Issue entitled: Computational Proteomics. PMID:26141507

  12. DNAAlignEditor: DNA alignment editor tool

    PubMed Central

    Sanchez-Villeda, Hector; Schroeder, Steven; Flint-Garcia, Sherry; Guill, Katherine E; Yamasaki, Masanori; McMullen, Michael D

    2008-01-01

    Background With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor. Results We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected. Conclusion We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism. PMID:18366684

  13. Java: An Explosion on the Internet.

    ERIC Educational Resources Information Center

    Read, Tim; Hall, Hazel

    Summer 1995 saw the release, with considerable media attention, of draft versions of Sun Microsystems' Java computer programming language and the HotJava browser. Java has been heralded as the latest "killer" technology in the Internet explosion. Sun Microsystems and numerous companies including Microsoft, IBM, and Netscape have agreed upon…

  14. Features of the Java commodity grid kit.

    SciTech Connect

    von Laszewski, G.; Gawor, J.; Lane, P.; Rehn, N.; Russell, M.; Mathematics and Computer Science

    2002-11-01

    In this paper we report on the features of the Java Commodity Grid Kit (Java CoG Kit). The Java CoG Kit provides middleware for accessing Grid functionality from the Java framework. Java CoG Kit middleware is general enough to design a variety of advanced Grid applications with quite different user requirements. Access to the Grid is established via Globus Toolkit protocols, allowing the Java CoG Kit to also communicate with the services distributed as part of the C Globus Toolkit reference implementation. Thus, the Java CoG Kit provides Grid developers with the ability to utilize the Grid, as well as numerous additional libraries and frameworks developed by the Java community to enable network, Internet, enterprise and peer-to-peer computing. A variety of projects have successfully used the client libraries of the Java CoG Kit to access Grids driven by the C Globus Toolkit software. In this paper we also report on the efforts to develop serverside Java CoG Kit components. As part of this research we have implemented a prototype pure Java resource management system that enables one to run Grid jobs on platforms on which a Java virtual machine is supported, including Windows NT machines.

  15. Alignment validation

    SciTech Connect

    ALICE; ATLAS; CMS; LHCb; Golling, Tobias

    2008-09-06

    The four experiments, ALICE, ATLAS, CMS and LHCb are currently under constructionat CERN. They will study the products of proton-proton collisions at the Large Hadron Collider. All experiments are equipped with sophisticated tracking systems, unprecedented in size and complexity. Full exploitation of both the inner detector andthe muon system requires an accurate alignment of all detector elements. Alignmentinformation is deduced from dedicated hardware alignment systems and the reconstruction of charged particles. However, the system is degenerate which means the data is insufficient to constrain all alignment degrees of freedom, so the techniques are prone to converging on wrong geometries. This deficiency necessitates validation and monitoring of the alignment. An exhaustive discussion of means to validate is subject to this document, including examples and plans from all four LHC experiments, as well as other high energy experiments.

  16. Support Vector Training of Protein Alignment Models

    PubMed Central

    Joachims, Thorsten; Elber, Ron; Pillardy, Jaroslaw

    2008-01-01

    Abstract Sequence to structure alignment is an important step in homology modeling of protein structures. Incorporation of features such as secondary structure, solvent accessibility, or evolutionary information improve sequence to structure alignment accuracy, but conventional generative estimation techniques for alignment models impose independence assumptions that make these features difficult to include in a principled way. In this paper, we overcome this problem using a Support Vector Machine (SVM) method that provides a well-founded way of estimating complex alignment models with hundred of thousands of parameters. Furthermore, we show that the method can be trained using a variety of loss functions. In a rigorous empirical evaluation, the SVM algorithm outperforms the generative alignment method SSALN, a highly accurate generative alignment model that incorporates structural information. The alignment model learned by the SVM aligns 50% of the residues correctly and aligns over 70% of the residues within a shift of four positions. PMID:18707536

  17. Alignments of RNA structures.

    PubMed

    Blin, Guillaume; Denise, Alain; Dulucq, Serge; Herrbach, Claire; Touzet, Hélène

    2010-01-01

    We describe a theoretical unifying framework to express the comparison of RNA structures, which we call alignment hierarchy. This framework relies on the definition of common supersequences for arc-annotated sequences and encompasses the main existing models for RNA structure comparison based on trees and arc-annotated sequences with a variety of edit operations. It also gives rise to edit models that have not been studied yet. We provide a thorough analysis of the alignment hierarchy, including a new polynomial-time algorithm and an NP-completeness proof. The polynomial-time algorithm involves biologically relevant edit operations such as pairing or unpairing nucleotides. It has been implemented in a software, called gardenia, which is available at the Web server http://bioinfo.lifl.fr/RNA/gardenia. PMID:20431150

  18. Java Mission Evaluation Workstation System

    NASA Technical Reports Server (NTRS)

    Pettinger, Ross; Watlington, Tim; Ryley, Richard; Harbour, Jeff

    2006-01-01

    The Java Mission Evaluation Workstation System (JMEWS) is a collection of applications designed to retrieve, display, and analyze both real-time and recorded telemetry data. This software is currently being used by both the Space Shuttle Program (SSP) and the International Space Station (ISS) program. JMEWS was written in the Java programming language to satisfy the requirement of platform independence. An object-oriented design was used to satisfy additional requirements and to make the software easily extendable. By virtue of its platform independence, JMEWS can be used on the UNIX workstations in the Mission Control Center (MCC) and on office computers. JMEWS includes an interactive editor that allows users to easily develop displays that meet their specific needs. The displays can be developed and modified while viewing data. By simply selecting a data source, the user can view real-time, recorded, or test data.

  19. Alignment fixture

    DOEpatents

    Bell, Grover C.; Gibson, O. Theodore

    1980-01-01

    A part alignment fixture is provided which may be used for precise variable lateral and tilt alignment relative to the fixture base of various shaped parts. The fixture may be used as a part holder for machining or inspection of parts or alignment of parts during assembly and the like. The fixture includes a precisely machined diameter disc-shaped hub adapted to receive the part to be aligned. The hub is nested in a guide plate which is adapted to carry two oppositely disposed pairs of positioning wedges so that the wedges may be reciprocatively positioned by means of respective micrometer screws. The sloping faces of the wedges contact the hub at respective quadrants of the hub periphery. The lateral position of the hub relative to the guide plate is adjusted by positioning the wedges with the associated micrometer screws. The tilt of the part is adjusted relative to a base plate, to which the guide plate is pivotally connected by means of a holding plate. Two pairs of oppositely disposed wedges are mounted for reciprocative lateral positioning by means of separate micrometer screws between flanges of the guide plate and the base plate. Once the wedges are positioned to achieve the proper tilt of the part or hub on which the part is mounted relative to the base plate, the fixture may be bolted to a machining, inspection, or assembly device.

  20. Semiautomated improvement of RNA alignments

    PubMed Central

    Andersen, Ebbe S.; Lind-Thomsen, Allan; Knudsen, Bjarne; Kristensen, Susie E.; Havgaard, Jakob H.; Torarinsson, Elfar; Larsen, Niels; Zwieb, Christian; Sestoft, Peter; Kjems, Jørgen; Gorodkin, Jan

    2007-01-01

    We have developed a semiautomated RNA sequence editor (SARSE) that integrates tools for analyzing RNA alignments. The editor highlights different properties of the alignment by color, and its integrated analysis tools prevent the introduction of errors when doing alignment editing. SARSE readily connects to external tools to provide a flexible semiautomatic editing environment. A new method, Pcluster, is introduced for dividing the sequences of an RNA alignment into subgroups with secondary structure differences. Pcluster was used to evaluate 574 seed alignments obtained from the Rfam database and we identified 71 alignments with significant prediction of inconsistent base pairs and 102 alignments with significant prediction of novel base pairs. Four RNA families were used to illustrate how SARSE can be used to manually or automatically correct the inconsistent base pairs detected by Pcluster: the mir-399 RNA, vertebrate telomase RNA (vert-TR), bacterial transfer-messenger RNA (tmRNA), and the signal recognition particle (SRP) RNA. The general use of the method is illustrated by the ability to accommodate pseudoknots and handle even large and divergent RNA families. The open architecture of the SARSE editor makes it a flexible tool to improve all RNA alignments with relatively little human intervention. Online documentation and software are available at http://sarse.ku.dk. PMID:17804647

  1. Developing JAVA Card Application with RMI API

    NASA Astrophysics Data System (ADS)

    JunWu, Xu; JunLing, Liang

    This paper describes research in the use of the RMI to develop Java Card applications. the Java Card RMI (JCRMI), which is based on the J2SE RMI distributed-object model. In the RMI model a server application creates and makes accessible remote objects, and a client application obtains remote references to the server's remote objects, and then invokes remote methods on them. In JCRMI, the Java Card applet is the server, and the host application is the client.

  2. Jess, the Java expert system shell

    SciTech Connect

    Friedman-Hill, E.J.

    1997-11-01

    This report describes Jess, a clone of the popular CLIPS expert system shell written entirely in Java. Jess supports the development of rule-based expert systems which can be tightly coupled to code written in the powerful, portable Java language. The syntax of the Jess language is discussed, and a comprehensive list of supported functions is presented. A guide to extending Jess by writing Java code is also included.

  3. ALIGNING JIG

    DOEpatents

    Culver, J.S.; Tunnell, W.C.

    1958-08-01

    A jig or device is described for setting or aligning an opening in one member relative to another member or structure, with a predetermined offset, or it may be used for measuring the amount of offset with which the parts have previously been sct. This jig comprises two blocks rabbeted to each other, with means for securing thc upper block to the lower block. The upper block has fingers for contacting one of the members to be a1igmed, the lower block is designed to ride in grooves within the reference member, and calibration marks are provided to determine the amount of offset. This jig is specially designed to align the collimating slits of a mass spectrometer.

  4. A Compact Java Class Library for FITS

    NASA Astrophysics Data System (ADS)

    Grosbøl, Preben

    The Java Platform offers an efficient environment for development of administrational tasks with its very comprehensive libraries, good support for graphical user interfaces, and concept for distributed processing. With large amounts of FITS data being produced by facilities like the VLT, it is therefore natural to consider the Java Platform for development of tasks for management of FITS files including administration related to pipeline processing. In the context of a feasibility study for using Java for such applications, a small basic Java class library for accessing FITS files was developed. The paper describe the main requirements and general design of the library. Furthermore, benchmarks are presented to illustrate performance and limitations.

  5. Image alignment

    DOEpatents

    Dowell, Larry Jonathan

    2014-04-22

    Disclosed is a method and device for aligning at least two digital images. An embodiment may use frequency-domain transforms of small tiles created from each image to identify substantially similar, "distinguishing" features within each of the images, and then align the images together based on the location of the distinguishing features. To accomplish this, an embodiment may create equal sized tile sub-images for each image. A "key" for each tile may be created by performing a frequency-domain transform calculation on each tile. A information-distance difference between each possible pair of tiles on each image may be calculated to identify distinguishing features. From analysis of the information-distance differences of the pairs of tiles, a subset of tiles with high discrimination metrics in relation to other tiles may be located for each image. The subset of distinguishing tiles for each image may then be compared to locate tiles with substantially similar keys and/or information-distance metrics to other tiles of other images. Once similar tiles are located for each image, the images may be aligned in relation to the identified similar tiles.

  6. Ultra-large alignments using phylogeny-aware profiles.

    PubMed

    Nguyen, Nam-Phuong D; Mirarab, Siavash; Kumar, Keerthana; Warnow, Tandy

    2015-01-01

    Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp . PMID:26076734

  7. Implementation of the NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.

  8. Performance and Scalability of the NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.

  9. Combining Multiple Pairwise Structure-based Alignments

    Energy Science and Technology Software Center (ESTSC)

    2014-11-12

    CombAlign is a new Python code that generates a gapped, one-to-many, multiple structure-based sequence alignment(MSSA) given a set of pairwise structure-based alignments. In order to better define regions of similarity among related protein structures, it is useful to detect the residue-residue correspondences among a set of pairwise structure alignments. Few codes exist for constructing a one-to-many, multiple sequence alignment derived from a set of structure alignments, and we perceived a need for creating a newmore » tool for combing pairwise structure alignments that would allow for insertion of gaps in the reference structure.« less

  10. ARYANA: Aligning Reads by Yet Another Approach

    PubMed Central

    2014-01-01

    Motivation Although there are many different algorithms and software tools for aligning sequencing reads, fast gapped sequence search is far from solved. Strong interest in fast alignment is best reflected in the $106 prize for the Innocentive competition on aligning a collection of reads to a given database of reference genomes. In addition, de novo assembly of next-generation sequencing long reads requires fast overlap-layout-concensus algorithms which depend on fast and accurate alignment. Contribution We introduce ARYANA, a fast gapped read aligner, developed on the base of BWA indexing infrastructure with a completely new alignment engine that makes it significantly faster than three other aligners: Bowtie2, BWA and SeqAlto, with comparable generality and accuracy. Instead of the time-consuming backtracking procedures for handling mismatches, ARYANA comes with the seed-and-extend algorithmic framework and a significantly improved efficiency by integrating novel algorithmic techniques including dynamic seed selection, bidirectional seed extension, reset-free hash tables, and gap-filling dynamic programming. As the read length increases ARYANA's superiority in terms of speed and alignment rate becomes more evident. This is in perfect harmony with the read length trend as the sequencing technologies evolve. The algorithmic platform of ARYANA makes it easy to develop mission-specific aligners for other applications using ARYANA engine. Availability ARYANA with complete source code can be obtained from http://github.com/aryana-aligner PMID:25252881

  11. Principal component analysis implementation in Java

    NASA Astrophysics Data System (ADS)

    Wójtowicz, Sebastian; Belka, Radosław; Sławiński, Tomasz; Parian, Mahnaz

    2015-09-01

    In this paper we show how PCA (Principal Component Analysis) method can be implemented using Java programming language. We consider using PCA algorithm especially in analysed data obtained from Raman spectroscopy measurements, but other applications of developed software should also be possible. Our goal is to create a general purpose PCA application, ready to run on every platform which is supported by Java.

  12. Sandia secure processor : a native Java processor.

    SciTech Connect

    Wickstrom, Gregory Lloyd; Gale, Jason Carl; Ma, Kwok Kee

    2003-08-01

    The Sandia Secure Processor (SSP) is a new native Java processor that has been specifically designed for embedded applications. The SSP's design is a system composed of a core Java processor that directly executes Java bytecodes, on-chip intelligent IO modules, and a suite of software tools for simulation and compiling executable binary files. The SSP is unique in that it provides a way to control real-time IO modules for embedded applications. The system software for the SSP is a 'class loader' that takes Java .class files (created with your favorite Java compiler), links them together, and compiles a binary. The complete SSP system provides very powerful functionality with very light hardware requirements with the potential to be used in a wide variety of small-system embedded applications. This paper gives a detail description of the Sandia Secure Processor and its unique features.

  13. A Java GUI for /μSR

    NASA Astrophysics Data System (ADS)

    Brewer, J. H.; Duty, T. L.; Cowperthwaite, D.

    2000-08-01

    A universal Graphical User Interface (GUI) for μSR data presentation and analysis has been developed using a Java Applet in the ubiquitous Web browser. This “thin Client” communicates with a Server via (remote method invocation) (RMI) to retrieve data over the Internet in the form of compact Java Objects. The Server has access to a massive archive of μSR “run” files and a relational database of “header” information from those runs. The only component of this system that depends on the local data file format is a single Java method that converts the data file into a standard Java “μSR Run” Object. Because all components are written in Java, they can be implemented on any common operating system. ( test site: http://musr.physics.ubc.ca/gui /)

  14. IUS prerelease alignment

    NASA Technical Reports Server (NTRS)

    Evans, F. A.

    1978-01-01

    Space shuttle orbiter/IUS alignment transfer was evaluated. Although the orbiter alignment accuracy was originally believed to be the major contributor to the overall alignment transfer error, it was shown that orbiter alignment accuracy is not a factor affecting IUS alignment accuracy, if certain procedures are followed. Results are reported of alignment transfer accuracy analysis.

  15. Accelerator and transport line survey and alignment

    SciTech Connect

    Ruland, R.E.

    1991-10-01

    This paper summarizes the survey and alignment processes of accelerators and transport lines and discusses the propagation of errors associated with these processes. The major geodetic principles governing the survey and alignment measurement space are introduced and their relationship to a lattice coordinate system shown. The paper continues with a broad overview about the activities involved in the step sequence from initial absolute alignment to final smoothing. Emphasis is given to the relative alignment of components, in particular to the importance of incorporating methods to remove residual systematic effects in surveying and alignment operations. Various approaches to smoothing used at major laboratories are discussed. 47 refs., 19 figs., 1 tab.

  16. BinAligner: a heuristic method to align biological networks.

    PubMed

    Yang, Jialiang; Li, Jun; Grünewald, Stefan; Wan, Xiu-Feng

    2013-01-01

    The advances in high throughput omics technologies have made it possible to characterize molecular interactions within and across various species. Alignments and comparison of molecular networks across species will help detect orthologs and conserved functional modules and provide insights on the evolutionary relationships of the compared species. However, such analyses are not trivial due to the complexity of network and high computational cost. Here we develop a mixture of global and local algorithm, BinAligner, for network alignments. Based on the hypotheses that the similarity between two vertices across networks would be context dependent and that the information from the edges and the structures of subnetworks can be more informative than vertices alone, two scoring schema, 1-neighborhood subnetwork and graphlet, were introduced to derive the scoring matrices between networks, besides the commonly used scoring scheme from vertices. Then the alignment problem is formulated as an assignment problem, which is solved by the combinatorial optimization algorithm, such as the Hungarian method. The proposed algorithm was applied and validated in aligning the protein-protein interaction network of Kaposi's sarcoma associated herpesvirus (KSHV) and that of varicella zoster virus (VZV). Interestingly, we identified several putative functional orthologous proteins with similar functions but very low sequence similarity between the two viruses. For example, KSHV open reading frame 56 (ORF56) and VZV ORF55 are helicase-primase subunits with sequence identity 14.6%, and KSHV ORF75 and VZV ORF44 are tegument proteins with sequence identity 15.3%. These functional pairs can not be identified if one restricts the alignment into orthologous protein pairs. In addition, BinAligner identified a conserved pathway between two viruses, which consists of 7 orthologous protein pairs and these proteins are connected by conserved links. This pathway might be crucial for virus packing and

  17. A theoretical model for whole genome alignment.

    PubMed

    Belal, Nahla A; Heath, Lenwood S

    2011-05-01

    We present a graph-based model for representing two aligned genomic sequences. An alignment graph is a mixed graph consisting of two sets of vertices, each representing one of the input sequences, and three sets of edges. These edges allow the model to represent a number of evolutionary events. This model is used to perform sequence alignment at the level of nucleotides. We define a scoring function for alignment graphs. We show that minimizing the score is NP-complete. However, we present a dynamic programming algorithm that solves the minimization problem optimally for a certain class of alignments, called breakable arrangements. Algorithms for analyzing breakable arrangements are presented. We also present a greedy algorithm that is capable of representing reversals. We present a dynamic programming algorithm that optimally aligns two genomic sequences, when one of the input sequences is a breakable arrangement of the other. Comparing what we define as breakable arrangements to alignments generated by other algorithms, it is seen that many already aligned genomes fall into the category of being breakable. Moreover, the greedy algorithm is shown to represent reversals, besides rearrangements, mutations, and other evolutionary events. PMID:21210739

  18. Java transactions for the Internet

    NASA Astrophysics Data System (ADS)

    Little, M. C.; Shrivastava, S. K.

    1998-12-01

    The Web frequently suffers from failures which affect the performance and consistency of applications run over it. An important fault-tolerance technique is the use of atomic transactions for controlling operations on services. While it has been possible to make server-side Web applications transactional, browsers typically did not possess such facilities. However, with the advent of Java it is now possible to consider empowering browsers so that they can fully participate within transactional applications. In this paper we present the design and implementation of a standards-compliant transactional toolkit for the Web. The toolkit allows transactional applications to span Web browsers and servers and supports application-specific customization, so that an application can be made transactional without compromising the security policies operational at browsers and servers.

  19. Bringing Interactivity to the Web: The JAVA Solution.

    ERIC Educational Resources Information Center

    Knee, Richard H.; Cafolla, Ralph

    Java is an object-oriented programming language of the Internet. It's popularity lies in its ability to create interactive Web sites across platforms. The most common Java programs are applications and applets, which adhere to a set of conventions that lets them run within a Java-compatible browser. Java is becoming an essential subject matter and…

  20. Java and its future in biomedical computing.

    PubMed Central

    Rodgers, R P

    1996-01-01

    Java, a new object-oriented computing language related to C++, is receiving considerable attention due to its use in creating network-sharable, platform-independent software modules (known as "applets") that can be used with the World Wide Web. The Web has rapidly become the most commonly used information-retrieval tool associated with the global computer network known as the Internet, and Java has the potential to further accelerate the Web's application to medical problems. Java's potentially wide acceptance due to its Web association and its own technical merits also suggests that it may become a popular language for non-Web-based, object-oriented computing. PMID:8880677

  1. Multiple alignment using hidden Markov models

    SciTech Connect

    Eddy, S.R.

    1995-12-31

    A simulated annealing method is described for training hidden Markov models and producing multiple sequence alignments from initially unaligned protein or DNA sequences. Simulated annealing in turn uses a dynamic programming algorithm for correctly sampling suboptimal multiple alignments according to their probability and a Boltzmann temperature factor. The quality of simulated annealing alignments is evaluated on structural alignments of ten different protein families, and compared to the performance of other HMM training methods and the ClustalW program. Simulated annealing is better able to find near-global optima in the multiple alignment probability landscape than the other tested HMM training methods. Neither ClustalW nor simulated annealing produce consistently better alignments compared to each other. Examination of the specific cases in which ClustalW outperforms simulated annealing, and vice versa, provides insight into the strengths and weaknesses of current hidden Maxkov model approaches.

  2. Analysis of variables affecting unemployment rate and detecting for cluster in West Java, Central Java, and East Java in 2012

    NASA Astrophysics Data System (ADS)

    Samuel, Putra A.; Widyaningsih, Yekti; Lestari, Dian

    2016-02-01

    The objective of this study is modeling the Unemployment Rate (UR) in West Java, Central Java, and East Java, with rate of disease, infant mortality rate, educational level, population size, proportion of married people, and GDRP as the explanatory variables. Spatial factors are also considered in the modeling since the closer the distance, the higher the correlation. This study uses the secondary data from BPS (Badan Pusat Statistik). The data will be analyzed using Moran I test, to obtain the information about spatial dependence, and using Spatial Autoregressive modeling to obtain the information, which variables are significant affecting UR and how great the influence of the spatial factors. The result is, variables proportion of married people, rate of disease, and population size are related significantly to UR. In all three regions, the Hotspot of unemployed will also be detected districts/cities using Spatial Scan Statistics Method. The results are 22 districts/cities as a regional group with the highest unemployed (Most likely cluster) in the study area; 2 districts/cities as a regional group with the highest unemployed in West Java; 1 district/city as a regional groups with the highest unemployed in Central Java; 15 districts/cities as a regional group with the highest unemployed in East Java.

  3. Java PathFinder User Guide

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus

    1999-01-01

    The JAVA PATHFINDER, JPF, is a translator from a subset of JAVA 1.0 to PROMELA, the programming language of the SPIN model checker. The purpose of JPF is to establish a framework for verification and debugging of JAVA programming based on model checking. The main goal is to automate program verification such that a programmer can apply it in the daily work without the need for a specialist to manually reformulate a program into a different notation in order to analyze the program. The system is especially suited for analyzing multi-threaded JAVA applications, where normal testing usually falls short. The system can find deadlocks and violations of boolean assertions stated by the programmer in a special assertion language. This document explains how to Use JPF.

  4. On comparing two structured RNA multiple alignments.

    PubMed

    Patel, Vandanaben; Wang, Jason T L; Setia, Shefali; Verma, Anurag; Warden, Charles D; Zhang, Kaizhong

    2010-12-01

    We present a method, called BlockMatch, for aligning two blocks, where a block is an RNA multiple sequence alignment with the consensus secondary structure of the alignment in Stockholm format. The method employs a quadratic-time dynamic programming algorithm for aligning columns and column pairs of the multiple alignments in the blocks. Unlike many other tools that can perform pairwise alignment of either single sequences or structures only, BlockMatch takes into account the characteristics of all the sequences in the blocks along with their consensus structures during the alignment process, thus being able to achieve a high-quality alignment result. We apply BlockMatch to phylogeny reconstruction on a set of 5S rRNA sequences taken from fifteen bacteria species. Experimental results showed that the phylogenetic tree generated by our method is more accurate than the tree constructed based on the widely used ClustalW tool. The BlockMatch algorithm is implemented into a web server, accessible at http://bioinformatics.njit.edu/blockmatch. A jar file of the program is also available for download from the web server. PMID:21121021

  5. Generation of Java code from Alvis model

    NASA Astrophysics Data System (ADS)

    Matyasik, Piotr; Szpyrka, Marcin; Wypych, Michał

    2015-12-01

    Alvis is a formal language that combines graphical modelling of interconnections between system entities (called agents) and a high level programming language to describe behaviour of any individual agent. An Alvis model can be verified formally with model checking techniques applied to the model LTS graph that represents the model state space. This paper presents transformation of an Alvis model into executable Java code. Thus, the approach provides a method of automatic generation of a Java application from formally verified Alvis model.

  6. Muria Volcano, Island of Java, Indonesia

    NASA Technical Reports Server (NTRS)

    1991-01-01

    This view of the north coast of central Java, Indonesia centers on the currently inactive Muria Volcano (6.5S, 111.0E). Muria is 5,330 ft. tall and lies just north of Java's main volcanic belt which runs east - west down the spine of the island attesting to the volcanic origin of the more than 1,500 Indonesian Islands.

  7. Analyses of the radiation of birnaviruses from diverse host phyla and of their evolutionary affinities with other double-stranded RNA and positive strand RNA viruses using robust structure-based multiple sequence alignments and advanced phylogenetic methods

    PubMed Central

    2013-01-01

    Background Birnaviruses form a distinct family of double-stranded RNA viruses infecting animals as different as vertebrates, mollusks, insects and rotifers. With such a wide host range, they constitute a good model for studying the adaptation to the host. Additionally, several lines of evidence link birnaviruses to positive strand RNA viruses and suggest that phylogenetic analyses may provide clues about transition. Results We characterized the genome of a birnavirus from the rotifer Branchionus plicalitis. We used X-ray structures of RNA-dependent RNA polymerases and capsid proteins to obtain multiple structure alignments that allowed us to obtain reliable multiple sequence alignments and we employed “advanced” phylogenetic methods to study the evolutionary relationships between some positive strand and double-stranded RNA viruses. We showed that the rotifer birnavirus genome exhibited an organization remarkably similar to other birnaviruses. As this host was phylogenetically very distant from the other known species targeted by birnaviruses, we revisited the evolutionary pathways within the Birnaviridae family using phylogenetic reconstruction methods. We also applied a number of phylogenetic approaches based on structurally conserved domains/regions of the capsid and RNA-dependent RNA polymerase proteins to study the evolutionary relationships between birnaviruses, other double-stranded RNA viruses and positive strand RNA viruses. Conclusions We show that there is a good correlation between the phylogeny of the birnaviruses and that of their hosts at the phylum level using the RNA-dependent RNA polymerase (genomic segment B) on the one hand and a concatenation of the capsid protein, protease and ribonucleoprotein (genomic segment A) on the other hand. This correlation tends to vanish within phyla. The use of advanced phylogenetic methods and robust structure-based multiple sequence alignments allowed us to obtain a more accurate picture (in terms of

  8. Java simulations of embedded control systems.

    PubMed

    Farias, Gonzalo; Cervin, Anton; Arzén, Karl-Erik; Dormido, Sebastián; Esquembre, Francisco

    2010-01-01

    This paper introduces a new Open Source Java library suited for the simulation of embedded control systems. The library is based on the ideas and architecture of TrueTime, a toolbox of Matlab devoted to this topic, and allows Java programmers to simulate the performance of control processes which run in a real time environment. Such simulations can improve considerably the learning and design of multitasking real-time systems. The choice of Java increases considerably the usability of our library, because many educators program already in this language. But also because the library can be easily used by Easy Java Simulations (EJS), a popular modeling and authoring tool that is increasingly used in the field of Control Education. EJS allows instructors, students, and researchers with less programming capabilities to create advanced interactive simulations in Java. The paper describes the ideas, implementation, and sample use of the new library both for pure Java programmers and for EJS users. The JTT library and some examples are online available on http://lab.dia.uned.es/jtt. PMID:22163674

  9. Java Simulations of Embedded Control Systems

    PubMed Central

    Farias, Gonzalo; Cervin, Anton; Årzén, Karl-Erik; Dormido, Sebastián; Esquembre, Francisco

    2010-01-01

    This paper introduces a new Open Source Java library suited for the simulation of embedded control systems. The library is based on the ideas and architecture of TrueTime, a toolbox of Matlab devoted to this topic, and allows Java programmers to simulate the performance of control processes which run in a real time environment. Such simulations can improve considerably the learning and design of multitasking real-time systems. The choice of Java increases considerably the usability of our library, because many educators program already in this language. But also because the library can be easily used by Easy Java Simulations (EJS), a popular modeling and authoring tool that is increasingly used in the field of Control Education. EJS allows instructors, students, and researchers with less programming capabilities to create advanced interactive simulations in Java. The paper describes the ideas, implementation, and sample use of the new library both for pure Java programmers and for EJS users. The JTT library and some examples are online available on http://lab.dia.uned.es/jtt. PMID:22163674

  10. Plague in Central Java, Indonesia

    PubMed Central

    Williams, J. E.; Hudson, B. W.; Turner, R. W.; Saroso, J. Sulianti; Cavanaugh, D. C.

    1980-01-01

    Plague in man occurred from 1968 to 1970 in mountain villages of the Boyolali Regency in Central Java. Infected fleas, infected rats, and seropositive rats were collected in villages with human plague cases. Subsequent isolations of Yersinia pestis and seropositive rodents, detected during investigations of rodent plague undertaken by the Government of Indonesia and the WHO, attested to the persistence of plague in the region from 1972 to 1974. Since 1968, the incidence of both rodent and human plague has been greatest from December to May at elevations over 1000 m. Isolations of Y. pestis were obtained from the fleas Xenopsylla cheopis and Stivalius cognatus and the rats Rattus rattus diardii and R. exulans ephippium. The major risk to man has been fleas infected with Y. pestis of unique electrophoretic phenotype. Infected fleas were collected most often in houses. Introduced in 1920, rodent plague had persisted in the Boyolali Regency for at least 54 years. The recent data support specific requirements for continued plague surveillance. ImagesFig. 2 PMID:6968252

  11. CSA: An efficient algorithm to improve circular DNA multiple alignment

    PubMed Central

    Fernandes, Francisco; Pereira, Luísa; Freitas, Ana T

    2009-01-01

    Background The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools. Results In this paper we propose an efficient algorithm that identifies the most interesting region to cut circular genomes in order to improve phylogenetic analysis when using standard multiple sequence alignment algorithms. This algorithm identifies the largest chain of non-repeated longest subsequences common to a set of circular mitochondrial DNA sequences. All the sequences are then rotated and made linear for multiple alignment purposes. To evaluate the effectiveness of this new tool, three different sets of mitochondrial DNA sequences were considered. Other tests considering randomly rotated sequences were also performed. The software package Arlequin was used to evaluate the standard genetic measures of the alignments obtained with and without the use of the CSA algorithm with two well known multiple alignment algorithms, the CLUSTALW and the MAVID tools, and also the visualization tool SinicView. Conclusion The results show that a circularization and rotation pre-processing step significantly improves the efficiency of public available multiple sequence alignment algorithms when used in the

  12. Java Expert System Shell Version 6.0

    SciTech Connect

    Friedman-Hill, Ernest

    2002-06-18

    Java Expert Shell System - Jess - is a rule engine and scripting environment written entirely in Sun's Java language, Jess was orginially inspired by the CLIPS expert system shell, but has grown int a complete, distinct JAVA-influenced environment of its own. Using Jess, you can build Java applets and applications that have the capacity to "reason" using knowledge you supply in the form of declarative rules. Jess is surprisingly fast, and for some problems is faster than CLIPS, in that many Jess scripts are valid CLIPS scripts and vice-versa. Like CLIPS, Jess uses the Rete algorithm to process rules, a very efficient mechanism for solving the difficult many-to-many matching problem. Jess adds many features to CLIPS, including backwards chaining and the ability to manipulate and directly reason about Java objects. Jess is also a powerful Java scripting environment, from which you can create Java objects and call Java methods without compiling any Java Code.

  13. Hyper-Threaded Java: Use the Java Concurrency API to Speed Up Time-Consuming Tasks

    SciTech Connect

    Scarberry, Randy

    2006-11-21

    This is for a Java World article that was already published on Nov 21, 2006. When I originally submitted the draft, Java World wasn't in the available lists of publications. Now that it is, Hanford Library staff recommended that I resubmit so it would be counted. Original submission ID: PNNL-SA-52490

  14. Phylogenetic Inference From Conserved sites Alignments

    SciTech Connect

    grundy, W.N.; Naylor, G.J.P.

    1999-08-15

    Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements.

  15. Java classes for nonprocedural variogram modeling

    NASA Astrophysics Data System (ADS)

    Faulkner, Barton R.

    2002-04-01

    A set of Java TM classes was written for variogram modeling to support research for US EPA's Regional Vulnerability Assessment Program (ReVA). The modeling objectives of this research program are to use conceptual programming tools for numerical analysis for regional risk assessment. The classes presented use of object-oriented design elements, and their use is described for the benefit of programmers. To help facilitate their use, class diagrams and standard JavaDoc commenting were employed. Java's support for polymorphism and inheritance is used and these are described as ways to promote extension of these classes for other geostatistical applications. Among the advantages is the ease of programming, code reuse, and conceptual, rather than procedural implementation. A graphical application for variogram modeling that uses the classes is also provided and described. It can also be used by non-programmers. This application uses a generalized least-squares fitting algorithm for robust parametric variogram model fitting through the variogram cloud. This feature makes this program unique from other freely available variogram modeling programs, though the classes are presented primarily so they may be extended for use in other Java programs. More traditional variogram plotting and fitting utilities are also provided. This application is graphical and platform-neutral. It uses classes of the recently proposed Java API for linear algebra, called the JAMA package.

  16. IMGT/V-QUEST: the highly customized and integrated system for IG and TR standardized V-J and V-D-J sequence analysis.

    PubMed

    Brochet, Xavier; Lefranc, Marie-Paule; Giudicelli, Véronique

    2008-07-01

    IMGT/V-QUEST is the highly customized and integrated system for the standardized analysis of the immunoglobulin (IG) and T cell receptor (TR) rearranged nucleotide sequences. IMGT/V-QUEST identifies the variable (V), diversity (D) and joining (J) genes and alleles by alignment with the germline IG and TR gene and allele sequences of the IMGT reference directory. New functionalities were added through a complete rewrite in Java. IMGT/V-QUEST analyses batches of sequences (up to 50) in a single run. IMGT/V-QUEST describes the V-REGION mutations and identifies the hot spot positions in the closest germline V gene. IMGT/V-QUEST can detect insertions and deletions in the submitted sequences by reference to the IMGT unique numbering. IMGT/V-QUEST integrates IMGT/JunctionAnalysis for a detailed analysis of the V-J and V-D-J junctions, and IMGT/Automat for a full V-J- and V-D-J-REGION annotation. IMGT/V-QUEST displays, in 'Detailed view', the results and alignments for each submitted sequence individually and, in 'Synthesis view', the alignments of the sequences that, in a given run, express the same V gene and allele. The 'Advanced parameters' allow to modify default parameters used by IMGT/V-QUEST and IMGT/JunctionAnalysis according to the users' interest. IMGT/V-QUEST is freely available for academic research at http://imgt.cines.fr. PMID:18503082

  17. Computing posterior probabilities for score-based alignments using ppALIGN.

    PubMed

    Wolfsheimer, Stefan; Hartmann, Alexander; Rabus, Ralf; Nuel, Gregory

    2012-01-01

    Score-based pairwise alignments are widely used in bioinformatics in particular with molecular database search tools, such as the BLAST family. Due to sophisticated heuristics, such algorithms are usually fast but the underlying scoring model unfortunately lacks a statistical description of the reliability of the reported alignments. In particular, close to gaps, in low-score or low-complexity regions, a huge number of alternative alignments arise which results in a decrease of the certainty of the alignment. ppALIGN is a software package that uses hidden Markov Model techniques to compute position-wise reliability of score-based pairwise alignments of DNA or protein sequences. The design of the model allows for a direct connection between the scoring function and the parameters of the probabilistic model. For this reason it is suitable to analyze the outcomes of popular score based aligners and search tools without having to choose a complicated set of parameters. By contrast, our program only requires the classical score parameters (the scoring function and gap costs). The package comes along with a library written in C++, a standalone program for user defined alignments (ppALIGN) and another program (ppBLAST) which can process a complete result set of BLAST. The main algorithms essentially exhibit a linear time complexity (in the alignment lengths), and they are hence suitable for on-line computations. We have also included alternative decoding algorithms to provide alternative alignments. ppALIGN is a fast program/library that helps detect and quantify questionable regions in pairwise alignments. Due to its structure, the input/output interface it can to be connected to other post-processing tools. Empirically, we illustrate its usefulness in terms of correctly predicted reliable regions for sequences generated using the ROSE model for sequence evolution, and identify sensor-specific regions in the denitrifying betaproteobacterium Aromatoleum aromaticum. PMID

  18. Sawja: Static Analysis Workshop for Java

    NASA Astrophysics Data System (ADS)

    Hubert, Laurent; Barré, Nicolas; Besson, Frédéric; Demange, Delphine; Jensen, Thomas; Monfort, Vincent; Pichardie, David; Turpin, Tiphaine

    Static analysis is a powerful technique for automatic verification of programs but raises major engineering challenges when developing a full-fledged analyzer for a realistic language such as Java. Efficiency and precision of such a tool rely partly on low level components which only depend on the syntactic structure of the language and therefore should not be redesigned for each implementation of a new static analysis. This paper describes the Sawja library: a static analysis workshop fully compliant with Java 6 which provides OCaml modules for efficiently manipulating Java bytecode programs. We present the main features of the library, including i) efficient functional data-structures for representing a program with implicit sharing and lazy parsing, ii) an intermediate stack-less representation, and iii) fast computation and manipulation of complete programs. We provide experimental evaluations of the different features with respect to time, memory and precision.

  19. Instrumentation of Java Bytecode for Runtime Analysis

    NASA Technical Reports Server (NTRS)

    Goldberg, Allen; Haveland, Klaus

    2003-01-01

    This paper describes JSpy, a system for high-level instrumentation of Java bytecode and its use with JPaX, OUT system for runtime analysis of Java programs. JPaX monitors the execution of temporal logic formulas and performs predicative analysis of deadlocks and data races. JSpy s input is an instrumentation specification, which consists of a collection of rules, where a rule is a predicate/action pair The predicate is a conjunction of syntactic constraints on a Java statement, and the action is a description of logging information to be inserted in the bytecode corresponding to the statement. JSpy is built using JTrek an instrumentation package at a lower level of abstraction.

  20. An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

    PubMed Central

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  1. Multiple Whole Genome Alignments Without a Reference Organism

    SciTech Connect

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  2. Tectonic Control of Piercement Structures in Central Java, Indonesia

    NASA Astrophysics Data System (ADS)

    Mazzini, A.; Hadi, S.; Etiope, G.; Inguaggiato, S.

    2014-12-01

    A recent field expedition in Central Java targeted the mapping and sampling of several piercements structures in central Java (Indonesia), most of which have never been documented before. Here, at least seven structures erupting mud water and gas are distributed along a NE-SW alignment that extends for about 10 kilometers. Some of the mapped structures (Bledug Kuwu, Bledug Cangkring Krabagan, Mendikil, Banjarsari, Krewek) have been named after the neighboring local village. None of these have obvious elevation despite the vigorous emission of gas and mud, suggesting that significant caldera collapse is ongoing. Among the most relevant: Bledug Kuwu is certainly the most impressive structure with three main eruption sites in the crater area bursting more than 5 m large hot mud bubbles. Similar characteristics are present at the smaller (200 m in diameter) Bledug Cangkring Krabagan, that is also surrounded by numerous pools and gryphons seeping around the main crater. The smaller sized Mendikil is the only visited structure that, at the moment of the sampling, did not show seepage of hot fluids. Banjarsari and Krewek (up to 200 m wide) are characterized by scattered hot water-dominated pools where gas is vented vigorously. In particular the hot pools are systematically covered by travertine concretions. Water and gas geochemisty confirms the seepage of CO2 dominated gas and water with hydrothermal signature. The investigated structures appear to follow an obvious NE-SW oriented lineament that most likely coincides with a tectonic structure (fault?) that controls their location. Indeed the field observations and the analyses suggest that likely scenario is that this fault (?) acts as a preferential pathway for the expulsion of hydrothermal fluids to the surface. Very little is known about this region, neither is known why several of these structures erupt hot mud despite their significant distance from the two closest volcanic structures (i.e. Mt. Muria 60 km to the NW

  3. A Java Applet for Illustrating Internet Error Control

    ERIC Educational Resources Information Center

    Holliday, Mark A.

    2004-01-01

    This paper discusses the author's experiences developing a Java applet that illustrates how error control is implemented in the Transmission Control Protocol (TCP). One section discusses the concepts which the TCP error control Java applet is intended to convey, while the nature of the Java applet is covered in another section. The author…

  4. Local Structural Alignment of RNA with Affine Gap Model

    NASA Astrophysics Data System (ADS)

    Wong, Thomas K. F.; Cheung, Brenda W. Y.; Lam, T. W.; Yiu, S. M.

    Predicting new non-coding RNAs (ncRNAs) of a family can be done by aligning the potential candidate with a member of the family with known sequence and secondary structure. Existing tools either only consider the sequence similarity or cannot handle local alignment with gaps. In this paper, we consider the problem of finding the optimal local structural alignment between a query RNA sequence (with known secondary structure) and a target sequence (with unknown secondary structure) with the affine gap penalty model. We provide the algorithm to solve the problem. Based on a preliminary experiment, we show that there are ncRNA families in which considering local structural alignment with gap penalty model can identify real hits more effectively than using global alignment or local alignment without gap penalty model.

  5. R3D Align: global pairwise alignment of RNA 3D structures using local superpositions

    PubMed Central

    Rahrig, Ryan R.; Leontis, Neocles B.; Zirbel, Craig L.

    2010-01-01

    Motivation: Comparing 3D structures of homologous RNA molecules yields information about sequence and structural variability. To compare large RNA 3D structures, accurate automatic comparison tools are needed. In this article, we introduce a new algorithm and web server to align large homologous RNA structures nucleotide by nucleotide using local superpositions that accommodate the flexibility of RNA molecules. Local alignments are merged to form a global alignment by employing a maximum clique algorithm on a specially defined graph that we call the ‘local alignment’ graph. Results: The algorithm is implemented in a program suite and web server called ‘R3D Align’. The R3D Align alignment of homologous 3D structures of 5S, 16S and 23S rRNA was compared to a high-quality hand alignment. A full comparison of the 16S alignment with the other state-of-the-art methods is also provided. The R3D Align program suite includes new diagnostic tools for the structural evaluation of RNA alignments. The R3D Align alignments were compared to those produced by other programs and were found to be the most accurate, in comparison with a high quality hand-crafted alignment and in conjunction with a series of other diagnostics presented. The number of aligned base pairs as well as measures of geometric similarity are used to evaluate the accuracy of the alignments. Availability: R3D Align is freely available through a web server http://rna.bgsu.edu/R3DAlign. The MATLAB source code of the program suite is also freely available for download at that location. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: r-rahrig@onu.edu PMID:20929913

  6. GATA: A graphic alignment tool for comparative sequenceanalysis

    SciTech Connect

    Nix, David A.; Eisen, Michael B.

    2005-01-01

    Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.

  7. Lisp as an Alternative to Java

    NASA Technical Reports Server (NTRS)

    Gat, E.

    2000-01-01

    In a recent study, Prechelt compared the relative performance of Java and C++ in terms of execution time and memory utilization. Unlike many benchmark studies, Prechelt compared mulitple implementations of the same task by multiple programmers in order to control for the effects of difference in programmer skill.

  8. JAVA CLASSES FOR NONPROCEDURAL VARIOGRAM MONITORING

    EPA Science Inventory

    A set of Java classes was written for variogram modeling to support research for US EPA's Regional Vulnerability Assessment Program (ReVA). The modeling objectives of this research program are to use conceptual programming tools for numerical analysis for regional risk assessm...

  9. Interactive Economics Instruction with Java and CGI.

    ERIC Educational Resources Information Center

    Gerdes, Geoffrey R.

    2000-01-01

    States that this Web site is based on the conviction that Web-based materials must contain interactive modules to achieve value beyond that obtained by conventional media. Discusses three applets that can be reached at the homepage of the Web site by selecting the Java applets link. (CMK)

  10. BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC

    PubMed Central

    Satija, Rahul; Novák, Ádám; Miklós, István; Lyngsø, Rune; Hein, Jotun

    2009-01-01

    Background We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the α-globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from PMID:19715598

  11. Modular VO oriented Java EE service deployer

    NASA Astrophysics Data System (ADS)

    Molinaro, Marco; Cepparo, Francesco; De Marco, Marco; Knapic, Cristina; Apollo, Pietro; Smareglia, Riccardo

    2014-07-01

    The International Virtual Observatory Alliance (IVOA) has produced many standards and recommendations whose aim is to generate an architecture that starts from astrophysical resources, in a general sense, and ends up in deployed consumable services (that are themselves astrophysical resources). Focusing on the Data Access Layer (DAL) system architecture, that these standards define, in the last years a web based application has been developed and maintained at INAF-OATs IA2 (Italian National institute for Astrophysics - Astronomical Observatory of Trieste, Italian center of Astronomical Archives) to try to deploy and manage multiple VO (Virtual Observatory) services in a uniform way: VO-Dance. However a set of criticalities have arisen since when the VO-Dance idea has been produced, plus some major changes underwent and are undergoing at the IVOA DAL layer (and related standards): this urged IA2 to identify a new solution for its own service layer. Keeping on the basic ideas from VO-Dance (simple service configuration, service instantiation at call time and modularity) while switching to different software technologies (e.g. dismissing Java Reflection in favour of Enterprise Java Bean, EJB, based solution), the new solution has been sketched out and tested for feasibility. Here we present the results originating from this test study. The main constraints for this new project come from various fields. A better homogenized solution rising from IVOA DAL standards: for example the new DALI (Data Access Layer Interface) specification that acts as a common interface system for previous and oncoming access protocols. The need for a modular system where each component is based upon a single VO specification allowing services to rely on common capabilities instead of homogenizing them inside service components directly. The search for a scalable system that takes advantage from distributed systems. The constraints find answer in the adopted solutions hereafter sketched. The

  12. Short Read Alignment Using SOAP2.

    PubMed

    Hurgobin, Bhavna

    2016-01-01

    Next-generation sequencing (NGS) technologies have rapidly evolved in the last 5 years, leading to the generation of millions of short reads in a single run. Consequently, various sequence alignment algorithms have been developed to compare these reads to an appropriate reference in order to perform important downstream analysis. SOAP2 from the SOAP series is one of the most commonly used alignment programs to handle NGS data, and it efficiently does so using low computer memory usage and fast alignment speed. This chapter describes the protocol used to align short reads to a reference genome using SOAP2, and highlights the significance of using the in-built command-line options to tune the behavior of the algorithm according to the inputs and the desired results. PMID:26519410

  13. Shiva automatic pinhole alignment

    SciTech Connect

    Suski, G.J.

    1980-09-05

    This paper describes a computer controlled closed loop alignment subsystem for Shiva, which represents the first use of video sensors for large laser alignment at LLNL. The techniques used on this now operational subsystem are serving as the basis for all closed loop alignment on Nova, the 200 terawatt successor to Shiva.

  14. JavaGenes and Condor: Cycle-Scavenging Genetic Algorithms

    NASA Technical Reports Server (NTRS)

    Globus, Al; Langhirt, Eric; Livny, Miron; Ramamurthy, Ravishankar; Soloman, Marvin; Traugott, Steve

    2000-01-01

    A genetic algorithm code, JavaGenes, was written in Java and used to evolve pharmaceutical drug molecules and digital circuits. JavaGenes was run under the Condor cycle-scavenging batch system managing 100-170 desktop SGI workstations. Genetic algorithms mimic biological evolution by evolving solutions to problems using crossover and mutation. While most genetic algorithms evolve strings or trees, JavaGenes evolves graphs representing (currently) molecules and circuits. Java was chosen as the implementation language because the genetic algorithm requires random splitting and recombining of graphs, a complex data structure manipulation with ample opportunities for memory leaks, loose pointers, out-of-bound indices, and other hard to find bugs. Java garbage-collection memory management, lack of pointer arithmetic, and array-bounds index checking prevents these bugs from occurring, substantially reducing development time. While a run-time performance penalty must be paid, the only unacceptable performance we encountered was using standard Java serialization to checkpoint and restart the code. This was fixed by a two-day implementation of custom checkpointing. JavaGenes is minimally integrated with Condor; in other words, JavaGenes must do its own checkpointing and I/O redirection. A prototype Java-aware version of Condor was developed using standard Java serialization for checkpointing. For the prototype to be useful, standard Java serialization must be significantly optimized. JavaGenes is approximately 8700 lines of code and a few thousand JavaGenes jobs have been run. Most jobs ran for a few days. Results include proof that genetic algorithms can evolve directed and undirected graphs, development of a novel crossover operator for graphs, a paper in the journal Nanotechnology, and another paper in preparation.

  15. Amino acid alignment of cholinesterases, esterases, lipases, and related proteins

    SciTech Connect

    Gentry, M.K.; Doctor, B.P.

    1995-12-31

    The alignments previously published (Gentry Doctor, 1991; Cygler et al., 1993), nine and 32 sequences respectively, have been further expanded by the addition of 22 newly-found sequences. References and protein sequences were found by searching on the term acetylcholinesterase using the software package Entrez, an integrated citation and sequence retrieval system (National Center for Biotechnology Information, NLM, Bethesda, MD).

  16. Girder Alignment Plan

    SciTech Connect

    Wolf, Zackary; Ruland, Robert; LeCocq, Catherine; Lundahl, Eric; Levashov, Yurii; Reese, Ed; Rago, Carl; Poling, Ben; Schafer, Donald; Nuhn, Heinz-Dieter; Wienands, Uli; /SLAC

    2010-11-18

    The girders for the LCLS undulator system contain components which must be aligned with high accuracy relative to each other. The alignment is one of the last steps before the girders go into the tunnel, so the alignment must be done efficiently, on a tight schedule. This note documents the alignment plan which includes efficiency and high accuracy. The motivation for girder alignment involves the following considerations. Using beam based alignment, the girder position will be adjusted until the beam goes through the center of the quadrupole and beam finder wire. For the machine to work properly, the undulator axis must be on this line and the center of the undulator beam pipe must be on this line. The physics reasons for the undulator axis and undulator beam pipe axis to be centered on the beam are different, but the alignment tolerance for both are similar. In addition, the beam position monitor must be centered on the beam to preserve its calibration. Thus, the undulator, undulator beam pipe, quadrupole, beam finder wire, and beam position monitor axes must all be aligned to a common line. All relative alignments are equally important, not just, for example, between quadrupole and undulator. We begin by making the common axis the nominal beam axis in the girder coordinate system. All components will be initially aligned to this axis. A more accurate alignment will then position the components relative to each other, without incorporating the girder itself.

  17. Alignment of multiple proteins with an ensemble of Hidden Markov Models

    PubMed Central

    Song, Yinglei; Qu, Junfeng; Hura, Gurdeep S.

    2011-01-01

    In this paper, we developed a new method that progressively construct and update a set of alignments by adding sequences in certain order to each of the existing alignments. Each of the existing alignments is modelled with a profile Hidden Markov Model (HMM) and an added sequence is aligned to each of these profile HMMs. We introduced an integer parameter for the number of profile HMMs. The profile HMMs are then updated based on the alignments with leading scores. Our experiments on BaliBASE showed that our approach could efficiently explore the alignment space and significantly improve the alignment accuracy. PMID:20376922

  18. BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data

    PubMed Central

    2014-01-01

    Background Advances in sequencing efficiency have vastly increased the sizes of biological sequence databases, including many thousands of genome-sequenced species. The BLAST algorithm remains the main search engine for retrieving sequence information, and must consequently handle data on an unprecedented scale. This has been possible due to high-performance computers and parallel processing. However, the raw BLAST output from contemporary searches involving thousands of queries becomes ill-suited for direct human processing. Few programs attempt to directly visualize and interpret BLAST output; those that do often provide a mere basic structuring of BLAST data. Results Here we present a bioinformatics application named BLASTGrabber suitable for high-throughput sequencing analysis. BLASTGrabber, being implemented as a Java application, is OS-independent and includes a user friendly graphical user interface. Text or XML-formatted BLAST output files can be directly imported, displayed and categorized based on BLAST statistics. Query names and FASTA headers can be analysed by text-mining. In addition to visualizing sequence alignments, BLAST data can be ordered as an interactive taxonomy tree. All modes of analysis support selection, export and storage of data. A Java interface-based plugin structure facilitates the addition of customized third party functionality. Conclusion The BLASTGrabber application introduces new ways of visualizing and analysing massive BLAST output data by integrating taxonomy identification, text mining capabilities and generic multi-dimensional rendering of BLAST hits. The program aims at a non-expert audience in terms of computer skills; the combination of new functionalities makes the program flexible and useful for a broad range of operations. PMID:24885091

  19. Software tool for the analysis and visualization of whole genome alignments

    Energy Science and Technology Software Center (ESTSC)

    2011-08-01

    GenomeVISTA is a tool which performs and displays pairwise and multiple whole genome DNA alignments. The tools provides a graphical user interface by which users can navigate alignments and multiple levels of resolution and get imformation about individual aligned regions. Users can load their own sequences into GenomeVISTA or view pre-computed alignments for genomes in the VISTA database.

  20. A Reconfigurable Processor Infrastructure for Accelerating Java Applications

    NASA Astrophysics Data System (ADS)

    Han, Youngsun; Hwang, Seok Joong; Kim, Seon Wook

    In this paper, we present a reconfigurable processor infrastructure to accelerate Java applications, called Jaguar. The Jaguar infrastructure consists of a compiler framework and a runtime environment support. The compiler framework selects a group of Java methods to be translated into hardware for delivering the best performance under limited resources, and translates the selected Java methods into Verilog synthesizable code modules. The runtime environment support includes the Java virtual machine (JVM) running on a host processor to provide Java execution environment to the generated Java accelerator through communication interface units while preserving Java semantics. Our compiler infrastructure is a tightly integrated and solid compiler-aided solution for Java reconfigurable computing. There is no limitation in generating synthesizable Verilog modules from any Java application while preserving Java semantics. In terms of performance, our infrastructure achieves the speedup by 5.4 times on average and by up to 9.4 times in measured benchmarks with respect to JVM-only execution. Furthermore, two optimization schemes such as an instruction folding and a live buffer removal can reduce 24% on average and up to 39% of the resource consumption.

  1. Finch: A System for Evolving Java (Bytecode)

    NASA Astrophysics Data System (ADS)

    Orlov, Michael; Sipper, Moshe

    The established approach in genetic programming (GP) involves the definition of functions and terminals appropriate to the problem at hand, after which evolution of expressions using these definitions takes place. We have recently developed a system, dubbed FINCH (Fertile Darwinian Bytecode Harvester), to evolutionarily improve actual, extant software, which was not intentionally written for the purpose of serving as a GP representation in particular, nor for evolution in general. This is in contrast to existing work that uses restricted subsets of the Java bytecode instruction set as a representation language for individuals in genetic programming. The ability to evolve Java programs will hopefully lead to a valuable new tool in the software engineer's toolkit.

  2. Safe Commits for Transactional Featherweight Java

    NASA Astrophysics Data System (ADS)

    Thuong Tran, Thi Mai; Steffen, Martin

    Transactions are a high-level alternative for low-level concurrency-control mechanisms such as locks, semaphores, monitors. A recent proposal for integrating transactional features into programming languages is Transactional Featherweight Java (TFJ), extending Featherweight Java by adding transactions. With support for nested and multi-threaded transactions, its transactional model is rather expressive. In particular, the constructs governing transactions - to start and to commit a transaction - can be used freely with a non-lexical scope. On the downside, this flexibility also allows for an incorrect use of these constructs, e.g., trying to perform a commit outside any transaction. To catch those kinds of errors, we introduce a static type and effect system for the safe use of transactions for TFJ. We prove the soundness of our type system by subject reduction.

  3. The state of the Java universe

    SciTech Connect

    2011-02-08

    Speaker Bio: James Gosling received a B.Sc. in computer science from the University of Calgary, Canada in 1977. He received a Ph.D. in computer science from Carnegie-Mellon University in 1983. The title of his thesis was The Algebraic Manipulation of Constraints. He has built satellite data acquisition systems, a multiprocessor version of UNIX®, several compilers, mail systems, and window managers. He has also built a WYSIWYG text editor, a constraint-based drawing editor, and a text editor called Emacs, for UNIX systems. At Sun his early activity was as lead engineer of the NeWS window system. He did the original design of the Java programming language and implemented its original compiler and virtual machine. He has recently been a contributor to the Real-Time Specification for Java.

  4. Java 3D Interactive Visualization for Astrophysics

    NASA Astrophysics Data System (ADS)

    Chae, K.; Edirisinghe, D.; Lingerfelt, E. J.; Guidry, M. W.

    2003-05-01

    We are developing a series of interactive 3D visualization tools that employ the Java 3D API. We have applied this approach initially to a simple 3-dimensional galaxy collision model (restricted 3-body approximation), with quite satisfactory results. Running either as an applet under Web browser control, or as a Java standalone application, this program permits real-time zooming, panning, and 3-dimensional rotation of the galaxy collision simulation under user mouse and keyboard control. We shall also discuss applications of this technology to 3-dimensional visualization for other problems of astrophysical interest such as neutron star mergers and the time evolution of element/energy production networks in X-ray bursts. *Managed by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725.

  5. The state of the Java universe

    ScienceCinema

    None

    2011-10-06

    Speaker Bio: James Gosling received a B.Sc. in computer science from the University of Calgary, Canada in 1977. He received a Ph.D. in computer science from Carnegie-Mellon University in 1983. The title of his thesis was The Algebraic Manipulation of Constraints. He has built satellite data acquisition systems, a multiprocessor version of UNIX®, several compilers, mail systems, and window managers. He has also built a WYSIWYG text editor, a constraint-based drawing editor, and a text editor called Emacs, for UNIX systems. At Sun his early activity was as lead engineer of the NeWS window system. He did the original design of the Java programming language and implemented its original compiler and virtual machine. He has recently been a contributor to the Real-Time Specification for Java.

  6. Implementation of a parallel protein structure alignment service on cloud.

    PubMed

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  7. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    PubMed Central

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  8. A Java Interface for Roche Lobe Calculations

    NASA Astrophysics Data System (ADS)

    Leahy, D. A.; Leahy, J. C.

    2015-09-01

    A JAVA interface for calculating various properties of the Roche lobe has been created. The geometry of the Roche lobe is important for studying interacting binary stars, particularly those with compact objects which have a companion which fills the Roche lobe. There is no known analytic solution to the Roche lobe problem. Here the geometry of the Roche lobe is calculated numerically to high accuracy and made available to the user for arbitrary input mass ratio, q.

  9. JavaGenes: Evolving Graphs with Crossover

    NASA Technical Reports Server (NTRS)

    Globus, Al; Atsatt, Sean; Lawton, John; Wipke, Todd

    2000-01-01

    Genetic algorithms usually use string or tree representations. We have developed a novel crossover operator for a directed and undirected graph representation, and used this operator to evolve molecules and circuits. Unlike strings or trees, a single point in the representation cannot divide every possible graph into two parts, because graphs may contain cycles. Thus, the crossover operator is non-trivial. A steady-state, tournament selection genetic algorithm code (JavaGenes) was written to implement and test the graph crossover operator. All runs were executed by cycle-scavagging on networked workstations using the Condor batch processing system. The JavaGenes code has evolved pharmaceutical drug molecules and simple digital circuits. Results to date suggest that JavaGenes can evolve moderate sized drug molecules and very small circuits in reasonable time. The algorithm has greater difficulty with somewhat larger circuits, suggesting that directed graphs (circuits) are more difficult to evolve than undirected graphs (molecules), although necessary differences in the crossover operator may also explain the results. In principle, JavaGenes should be able to evolve other graph-representable systems, such as transportation networks, metabolic pathways, and computer networks. However, large graphs evolve significantly slower than smaller graphs, presumably because the space-of-all-graphs explodes combinatorially with graph size. Since the representation strongly affects genetic algorithm performance, adding graphs to the evolutionary programmer's bag-of-tricks should be beneficial. Also, since graph evolution operates directly on the phenotype, the genotype-phenotype translation step, common in genetic algorithm work, is eliminated.

  10. STAR: ultrafast universal RNA-seq aligner

    PubMed Central

    Dobin, Alexander; Davis, Carrie A.; Schlesinger, Felix; Drenkow, Jorg; Zaleski, Chris; Jha, Sonali; Batut, Philippe; Chaisson, Mark; Gingeras, Thomas R.

    2013-01-01

    Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80–90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/. Contact: dobin@cshl.edu. PMID:23104886

  11. ELIST8: simulating military deployments in Java

    SciTech Connect

    Van Groningen, C. N.; Blachowicz, D.; Braun, M. D.; Simunich, K. L.; Widing, M. A.

    2002-04-12

    Planning for the transportation of large amounts of equipment, troops, and supplies presents a complex problem. Many options, including modes of transportation, vehicles, facilities, routes, and timing, must be considered. The amount of data involved in generating and analyzing a course of action (e.g., detailed information about military units, logistical infrastructures, and vehicles) is enormous. Software tools are critical in defining and analyzing these plans. Argonne National Laboratory has developed ELIST (Enhanced Logistics Intra-theater Support Tool), a simulation-based decision support system, to assist military planners in determining the logistical feasibility of an intra-theater course of action. The current version of ELIST (v.8) contains a discrete event simulation developed using the Java programming language. Argonne selected Java because of its object-oriented framework, which has greatly facilitated entity and process development within the simulation, and because it fulfills a primary requirement for multi-platform execution. This paper describes the model, including setup and analysis, a high-level architectural design, and an evaluation of Java.

  12. A Visual Editor in Java for View

    NASA Technical Reports Server (NTRS)

    Stansifer, Ryan

    2000-01-01

    In this project we continued the development of a visual editor in the Java programming language to create screens on which to display real-time data. The data comes from the numerous systems monitoring the operation of the space shuttle while on the ground and in space, and from the many tests of subsystems. The data can be displayed on any computer platform running a Java-enabled World Wide Web (WWW) browser and connected to the Internet. Previously a special-purpose program bad been written to display data on emulations of character-based display screens used for many years at NASA. The goal now is to display bit-mapped screens created by a visual editor. We report here on the visual editor that creates the display screens. This project continues the work we bad done previously. Previously we had followed the design of the 'beanbox,' a prototype visual editor created by Sun Microsystems. We abandoned this approach and implemented a prototype using a more direct approach. In addition, our prototype is based on newly released Java 2 graphical user interface (GUI) libraries. The result has been a visually more appealing appearance and a more robust application.

  13. astrojs: JavaScript Libraries for Astronomy

    NASA Astrophysics Data System (ADS)

    Kapadia, A.; Smith, A.

    2013-10-01

    Astronomers mainly use the web for data retrieval. To create visualizations and conduct analyses requires installation of many external packages, often creating a difficult task for the astronomer. An ideal situation would move many of the common tasks to a browser — a homogenous solution for data access, visualization, and analyses in one application. As part of an effort to build research tools around core citizen science experiences, the Zooniverse is building science grade tools for handling astronomical data. As the browser is Zooniverse's medium, JavaScript — the only client-side programming language — becomes ever more relevant for feature-rich web applications. The technology industry is investing large development time in improving JavaScript engines resulting in performance gains that exceed other scripting languages. The science community could benefit from this investment by migrating development of desktop applications to web applications. Similar to the astropy initiative, ASTROJS is providing a consolidation of JavaScript libraries for in-browser client-side astronomical data visualization and analyses.

  14. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    PubMed Central

    Lu, David V.; Brown, Randall H.; Arumugam, Manimozhiyan; Brent, Michael R.

    2009-01-01

    Motivation: The most accurate way to determine the intron–exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. Results: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created ‘perfect’ simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Availability: Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/ Contact: davidlu@wustl.edu; brent@cse.wustl.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19414532

  15. An Accurate Scalable Template-based Alignment Algorithm.

    PubMed

    Gardner, David P; Xu, Weijia; Miranker, Daniel P; Ozer, Stuart; Cannone, Jamie J; Gutell, Robin R

    2012-12-31

    The rapid determination of nucleic acid sequences is increasing the number of sequences that are available. Inherent in a template or seed alignment is the culmination of structural and functional constraints that are selecting those mutations that are viable during the evolution of the RNA. While we might not understand these structural and functional, template-based alignment programs utilize the patterns of sequence conservation to encapsulate the characteristics of viable RNA sequences that are aligned properly. We have developed a program that utilizes the different dimensions of information in rCAD, a large RNA informatics resource, to establish a profile for each position in an alignment. The most significant include sequence identity and column composition in different phylogenetic taxa. We have compared our methods with a maximum of eight alternative alignment methods on different sets of 16S and 23S rRNA sequences with sequence percent identities ranging from 50% to 100%. The results showed that CRWAlign outperformed the other alignment methods in both speed and accuracy. A web-based alignment server is available at http://www.rna.ccbb.utexas.edu/SAE/2F/CRWAlign. PMID:24772376

  16. Horizontal carbon nanotube alignment.

    PubMed

    Cole, Matthew T; Cientanni, Vito; Milne, William I

    2016-09-21

    The production of horizontally aligned carbon nanotubes offers a rapid means of realizing a myriad of self-assembled near-atom-scale technologies - from novel photonic crystals to nanoscale transistors. The ability to reproducibly align anisotropic nanostructures has huge technological value. Here we review the present state-of-the-art in horizontal carbon nanotube alignment. For both in and ex situ approaches, we quantitatively assess the reported linear packing densities alongside the degree of alignment possible for each of these core methodologies. PMID:27546174

  17. Orthodontics and Aligners

    MedlinePlus

    ... Repairing Chipped Teeth Teeth Whitening Tooth-Colored Fillings Orthodontics and Aligners Straighten teeth for a healthier smile. Orthodontics When consumers think about orthodontics, braces are the ...

  18. Alignability of Optical Interconnects

    NASA Astrophysics Data System (ADS)

    Beech, Russell Scott

    With the continuing drive towards higher speed, density, and functionality in electronics, electrical interconnects become inadequate. Due to optics' high speed and bandwidth, freedom from capacitive loading effects, and freedom from crosstalk, optical interconnects can meet more stringent interconnect requirements. But, an optical interconnect requires additional components, such as an optical source and detector, lenses, holographic elements, etc. Fabrication and assembly of an optical interconnect requires precise alignment of these components. The successful development and deployment of optical interconnects depend on how easily the interconnect components can be aligned and/or how tolerant the interconnect is to misalignments. In this thesis, a method of quantitatively specifying the relative difficulty of properly aligning an optical interconnect is described. Ways of using this theory of alignment to obtain design and packaging guidelines for optical interconnects are examined. The measure of the ease with which an optical interconnect can be aligned, called the alignability, uses the efficiency of power transfer as a measure of alignment quality. The alignability is related to interconnect package design through the overall cost measure, which depends upon various physical parameters of the interconnect, such as the cost of the components and the time required for fabrication and alignment. Through a mutual dependence on detector size, the relationship between an interconnect's alignability and its bandwidth, signal-to-noise ratio, and bit-error -rate is examined. The results indicate that a range of device sizes exists for which given performance threshold values are satisfied. Next, the alignability of integrated planar-optic backplanes is analyzed in detail. The resulting data show that the alignability can be optimized by varying the substrate thickness or the angle of reflection. By including the effects of crosstalk, in a multi-channel backplane, the

  19. Tidal alignment of galaxies

    NASA Astrophysics Data System (ADS)

    Blazek, Jonathan; Vlah, Zvonimir; Seljak, Uroš

    2015-08-01

    We develop an analytic model for galaxy intrinsic alignments (IA) based on the theory of tidal alignment. We calculate all relevant nonlinear corrections at one-loop order, including effects from nonlinear density evolution, galaxy biasing, and source density weighting. Contributions from density weighting are found to be particularly important and lead to bias dependence of the IA amplitude, even on large scales. This effect may be responsible for much of the luminosity dependence in IA observations. The increase in IA amplitude for more highly biased galaxies reflects their locations in regions with large tidal fields. We also consider the impact of smoothing the tidal field on halo scales. We compare the performance of this consistent nonlinear model in describing the observed alignment of luminous red galaxies with the linear model as well as the frequently used "nonlinear alignment model," finding a significant improvement on small and intermediate scales. We also show that the cross-correlation between density and IA (the "GI" term) can be effectively separated into source alignment and source clustering, and we accurately model the observed alignment down to the one-halo regime using the tidal field from the fully nonlinear halo-matter cross correlation. Inside the one-halo regime, the average alignment of galaxies with density tracers no longer follows the tidal alignment prediction, likely reflecting nonlinear processes that must be considered when modeling IA on these scales. Finally, we discuss tidal alignment in the context of cosmic shear measurements.

  20. Java Expert System Shell Version 6.0

    Energy Science and Technology Software Center (ESTSC)

    2002-06-18

    Java Expert Shell System - Jess - is a rule engine and scripting environment written entirely in Sun's Java language, Jess was orginially inspired by the CLIPS expert system shell, but has grown int a complete, distinct JAVA-influenced environment of its own. Using Jess, you can build Java applets and applications that have the capacity to "reason" using knowledge you supply in the form of declarative rules. Jess is surprisingly fast, and for some problemsmore » is faster than CLIPS, in that many Jess scripts are valid CLIPS scripts and vice-versa. Like CLIPS, Jess uses the Rete algorithm to process rules, a very efficient mechanism for solving the difficult many-to-many matching problem. Jess adds many features to CLIPS, including backwards chaining and the ability to manipulate and directly reason about Java objects. Jess is also a powerful Java scripting environment, from which you can create Java objects and call Java methods without compiling any Java Code.« less

  1. GRAT--genome-scale rapid alignment tool.

    PubMed

    Kindlund, Ellen; Tammi, Martti T; Arner, Erik; Nilsson, Daniel; Andersson, Björn

    2007-04-01

    Modern alignment methods designed to work rapidly and efficiently with large datasets often do so at the cost of method sensitivity. To overcome this, we have developed a novel alignment program, GRAT, built to accurately align short, highly similar DNA sequences. The program runs rapidly and requires no more memory and CPU power than a desktop computer. In addition, specificity is ensured by statistically separating the true alignments from spurious matches using phred quality values. An efficient separation is especially important when searching large datasets and whenever there are repeats present in the dataset. Results are superior in comparison to widely used existing software, and analysis of two large genomic datasets show the usefulness and scalability of the algorithm. PMID:17292508

  2. Structural analysis of aligned RNAs.

    PubMed

    Voss, Björn

    2006-01-01

    The knowledge about classes of non-coding RNAs (ncRNAs) is growing very fast and it is mainly the structure which is the common characteristic property shared by members of the same class. For correct characterization of such classes it is therefore of great importance to analyse the structural features in great detail. In this manuscript I present RNAlishapes which combines various secondary structure analysis methods, such as suboptimal folding and shape abstraction, with a comparative approach known as RNA alignment folding. RNAlishapes makes use of an extended thermodynamic model and covariance scoring, which allows to reward covariation of paired bases. Applying the algorithm to a set of bacterial trp-operon leaders using shape abstraction it was able to identify the two alternating conformations of this attenuator. Besides providing in-depth analysis methods for aligned RNAs, the tool also shows a fairly well prediction accuracy. Therefore, RNAlishapes provides the community with a powerful tool for structural analysis of classes of RNAs and is also a reasonable method for consensus structure prediction based on sequence alignments. RNAlishapes is available for online use and download at http://rna.cyanolab.de. PMID:17020924

  3. Long Read Alignment with Parallel MapReduce Cloud Platform

    PubMed Central

    Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

    2015-01-01

    Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887

  4. Long Read Alignment with Parallel MapReduce Cloud Platform.

    PubMed

    Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

    2015-01-01

    Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887

  5. Volume visualization of multiple alignment of genomic DNA

    SciTech Connect

    Shah, Nameeta; Weber, Gunther H.; Dillard, Scott E.; Hamann, Bernd

    2004-05-01

    Genomes of hundreds of species have been sequenced to date and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We provide results for aligned DNA sequence data and compare it with traditional 1D line plots. Our technique, coupled with 1D line plots, results in effective multiresolution visualization of very large aligned sequence data sets.

  6. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  7. SPEAR3 Construction Alignment

    SciTech Connect

    LeCocq, Catherine; Banuelos, Cristobal; Fuss, Brian; Gaudreault, Francis; Gaydosh, Michael; Griffin, Levirt; Imfeld, Hans; McDougal, John; Perry, Michael; Rogers, Michael; /SLAC

    2005-08-17

    An ambitious seven month shutdown of the existing SPEAR2 synchrotron radiation facility was successfully completed in March 2004 when the first synchrotron light was observed in the new SPEAR3 ring, SPEAR3 completely replaced SPEAR2 with new components aligned on a new highly-flat concrete floor. Devices such as magnets and vacuum chambers had to be fiducialized and later aligned on girder rafts that were then placed into the ring over pre-aligned support plates. Key to the success of aligning this new ring was to ensure that the new beam orbit matched the old SPEAR2 orbit so that existing experimental beamlines would not have to be reoriented. In this presentation a pictorial summary of the Alignment Engineering Group's surveying tasks for the construction of the SPEAR3 ring is provided. Details on the networking and analysis of various surveys throughout the project can be found in the accompanying paper.

  8. Efficient alignment-free DNA barcode analytics

    PubMed Central

    Kuksa, Pavel; Pavlovic, Vladimir

    2009-01-01

    Background In this work we consider barcode DNA analysis problems and address them using alternative, alignment-free methods and representations which model sequences as collections of short sequence fragments (features). The methods use fixed-length representations (spectrum) for barcode sequences to measure similarities or dissimilarities between sequences coming from the same or different species. The spectrum-based representation not only allows for accurate and computationally efficient species classification, but also opens possibility for accurate clustering analysis of putative species barcodes and identification of critical within-barcode loci distinguishing barcodes of different sample groups. Results New alignment-free methods provide highly accurate and fast DNA barcode-based identification and classification of species with substantial improvements in accuracy and speed over state-of-the-art barcode analysis methods. We evaluate our methods on problems of species classification and identification using barcodes, important and relevant analytical tasks in many practical applications (adverse species movement monitoring, sampling surveys for unknown or pathogenic species identification, biodiversity assessment, etc.) On several benchmark barcode datasets, including ACG, Astraptes, Hesperiidae, Fish larvae, and Birds of North America, proposed alignment-free methods considerably improve prediction accuracy compared to prior results. We also observe significant running time improvements over the state-of-the-art methods. Conclusion Our results show that newly developed alignment-free methods for DNA barcoding can efficiently and with high accuracy identify specimens by examining only few barcode features, resulting in increased scalability and interpretability of current computational approaches to barcoding. PMID:19900305

  9. Java interface for asserting interactive telerobotic control

    NASA Astrophysics Data System (ADS)

    DePasquale, Peter; Lewis, John; Stein, Matthew R.

    1997-12-01

    Many current web-based telerobotic interfaces use HyperText Markup Language (HTML) forms to assert user control on a robot. While acceptable for some tasks, a Java interface can provide better client-server interaction. The Puma Paint project is a joint effort between the Department of Computing Sciences at Villanova University and the Department of Mechanical and Materials Engineering at Wilkes University. THe project utilizes a Java applet to control a Unimation Puma 1760 robot during the task of painting on a canvas. The interface allows the user to control the paint strokes as well as the pressure of a brush on the canvas and how deep the brush is dipped into a paint jar. To provide immediate feedback, a virtual canvas models the effects of the controls as the artist paints. Live color video feedback is provided, allowing the user to view the actual results of the robot's motions. Unlike the step-at-a-time model of many web forms, the application permits the user to assert interactive control. The greater the complexity of the interaction between the robot and its environment, the greater the need for high quality information presentation to the user. The use of Java allows the sophistication of the user interface to be raised to the level required for satisfactory control. This paper describes the Puma Paint project, including the interface and communications model. It also examines the challenges of using the Internet as the medium of communications and the challenges of encoding free ranging motions for transmission from the client to the robot.

  10. Java implementation of Class Association Rule algorithms

    Energy Science and Technology Software Center (ESTSC)

    2007-08-30

    Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix andmore » a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.« less

  11. Visualizing systems engineering data with Java

    SciTech Connect

    Barter, Robert H.; Vinzant, Aleta

    1998-11-10

    Systems Engineers are required to deal with complex sets of data. To be useful, the data must be managed effectively, and presented in meaningful terms to a wide variety of information consumers. Two software patterns are presented as the basis for exploring the visualization of systems engineering data. The Model, View, Controller pattern defines an information management system architecture. The Entity, Relation, Attribute pattern defines the information model. MVC "Views" then form the basis for the user interface between the information consumer and the MVC "Controller"/"Model" combination. A Java tool set is described for exploring alternative views into the underlying complex data structures encountered in systems engineering.

  12. Java implementation of Class Association Rule algorithms

    SciTech Connect

    Tamura, Makio

    2007-08-30

    Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix and a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.

  13. Implementation of BT, SP, LU, and FT of NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Schultz, Matthew; Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

    2000-01-01

    A number of Java features make it an attractive but a debatable choice for High Performance Computing. We have implemented benchmarks working on single structured grid BT,SP,LU and FT in Java. The performance and scalability of the Java code shows that a significant improvement in Java compiler technology and in Java thread implementation are necessary for Java to compete with Fortran in HPC applications.

  14. HIV Sequence Compendium 2015

    SciTech Connect

    Foley, Brian Thomas; Leitner, Thomas Kenneth; Apetrei, Cristian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette Tina Marie

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  15. HubAlign: an accurate and efficient method for global alignment of protein–protein interaction networks

    PubMed Central

    Hashemifar, Somaye; Xu, Jinbo

    2014-01-01

    Motivation: High-throughput experimental techniques have produced a large amount of protein–protein interaction (PPI) data. The study of PPI networks, such as comparative analysis, shall benefit the understanding of life process and diseases at the molecular level. One way of comparative analysis is to align PPI networks to identify conserved or species-specific subnetwork motifs. A few methods have been developed for global PPI network alignment, but it still remains challenging in terms of both accuracy and efficiency. Results: This paper presents a novel global network alignment algorithm, denoted as HubAlign, that makes use of both network topology and sequence homology information, based upon the observation that topologically important proteins in a PPI network usually are much more conserved and thus, more likely to be aligned. HubAlign uses a minimum-degree heuristic algorithm to estimate the topological and functional importance of a protein from the global network topology information. Then HubAlign aligns topologically important proteins first and gradually extends the alignment to the whole network. Extensive tests indicate that HubAlign greatly outperforms several popular methods in terms of both accuracy and efficiency, especially in detecting functionally similar proteins. Availability: HubAlign is available freely for non-commercial purposes at http://ttic.uchicago.edu/∼hashemifar/software/HubAlign.zip Contact: jinboxu@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161231

  16. Multi-threading the generation of Burrows-Wheeler Alignment.

    PubMed

    Jo, H

    2016-01-01

    Along with recent progress in next-generation sequencing technology, it has become easier to process larger amounts of genome sequencing data at a lower cost. The most time-consuming step of next-generation sequencing data analysis involves the mapping of read data into a reference genome. Although the Burrows-Wheeler Alignment (BWA) tool is one of the most widely used open-source software tools for aligning read sequences, it still has a limitation in that it does not fully support a multi-thread mechanism during the alignment generation step. In this article, we propose a BWA-MT tool based on BWA that supports multi-thread mechanisms for processing alignment generation. To evaluate BWA-MT, we used an evaluation system equipped with 24 cores and 128 GB of memory. As workloads, we used the hg19 human genome reference sequence and sequences of various read sizes from the 1 to 40 M spots. In our evaluation, BWA-MT showed a maximum of 3.66-times better performance, and generated the same Sequence Alignment/Map result file as that of BWA. Although the ability to speed up the procedure might be dependent on computing resources, we confirmed that BWA-MT is a highly effective and fast alignment tool. PMID:27323088

  17. Dynamic Learning Objects to Teach Java Programming Language

    ERIC Educational Resources Information Center

    Narasimhamurthy, Uma; Al Shawkani, Khuloud

    2010-01-01

    This article describes a model for teaching Java Programming Language through Dynamic Learning Objects. The design of the learning objects was based on effective learning design principles to help students learn the complex topic of Java Programming. Visualization was also used to facilitate the learning of the concepts. (Contains 1 figure and 2…

  18. A Geostationary Earth Orbit Satellite Model Using Easy Java Simulation

    ERIC Educational Resources Information Center

    Wee, Loo Kang; Goh, Giam Hwee

    2013-01-01

    We develop an Easy Java Simulation (EJS) model for students to visualize geostationary orbits near Earth, modelled using a Java 3D implementation of the EJS 3D library. The simplified physics model is described and simulated using a simple constant angular velocity equation. We discuss four computer model design ideas: (1) a simple and realistic…

  19. JavaScript: Convenient Interactivity for the Class Web Page.

    ERIC Educational Resources Information Center

    Gray, Patricia

    This paper shows how JavaScript can be used within HTML pages to add interactive review sessions and quizzes incorporating graphics and sound files. JavaScript has the advantage of providing basic interactive functions without the use of separate software applications and players. Because it can be part of a standard HTML page, it is…

  20. Developmental Process Model for the Java Intelligent Tutoring System

    ERIC Educational Resources Information Center

    Sykes, Edward

    2007-01-01

    The Java Intelligent Tutoring System (JITS) was designed and developed to support the growing trend of Java programming around the world. JITS is an advanced web-based personalized tutoring system that is unique in several ways. Most programming Intelligent Tutoring Systems require the teacher to author problems with corresponding solutions. JITS,…

  1. High-Performance Java Codes for Computational Fluid Dynamics

    NASA Technical Reports Server (NTRS)

    Riley, Christopher; Chatterjee, Siddhartha; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The computational science community is reluctant to write large-scale computationally -intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of Computational Fluid Dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance.

  2. Real-time Java for flight applications: an update

    NASA Technical Reports Server (NTRS)

    Dvorak, D.

    2003-01-01

    The RTSJ is a specification for supporting real-time execution in the Java programming language. The specification has been shaped by several guiding principles, particularly: predictable execution as the first priority in all tradeoffs, no syntactic extensions to Java, and backward compatibility.

  3. JAVA SWING-BASED PLOTTING PACKAGE RESIDING WITHIN XAL

    SciTech Connect

    Shishlo, Andrei P; Chu, Paul; Pelaia II, Tom

    2007-01-01

    A data plotting package residing in the XAL tools set is presented. This package is based on Java SWING, and therefore it has the same portability as Java itself. The data types for charts, bar-charts, and color-surface plots are described. The algorithms, performance, interactive capabilities, limitations, and the best usage practices of this plotting package are discussed.

  4. Java: A New Brew for Educators, Administrators and Students.

    ERIC Educational Resources Information Center

    Gordon, Barbara

    1996-01-01

    Java is an object-oriented programming language developed by Sun Microsystems; its benefits include platform independence, security, and interactivity. Within the college community, Java is being used in programming courses, collaborative technology research projects, computer graphics instruction, and distance education. (AEF)

  5. Paintbrush of Discovery: Using Java Applets to Enhance Mathematics Education

    ERIC Educational Resources Information Center

    Eason, Ray; Heath, Garrett

    2004-01-01

    This article addresses the enhancement of the learning environment by using Java applets in the mathematics classroom. Currently, the first year mathematics program at the United States Military Academy involves one semester of modeling with discrete dynamical systems (DDS). Several faculty members from the Academy have integrated Java applets…

  6. Precision alignment device

    DOEpatents

    Jones, N.E.

    1988-03-10

    Apparatus for providing automatic alignment of beam devices having an associated structure for directing, collimating, focusing, reflecting, or otherwise modifying the main beam. A reference laser is attached to the structure enclosing the main beam producing apparatus and produces a reference beam substantially parallel to the main beam. Detector modules containing optical switching devices and optical detectors are positioned in the path of the reference beam and are effective to produce an electrical output indicative of the alignment of the main beam. This electrical output drives servomotor operated adjustment screws to adjust the position of elements of the structure associated with the main beam to maintain alignment of the main beam. 5 figs.

  7. Precision alignment device

    DOEpatents

    Jones, Nelson E.

    1990-01-01

    Apparatus for providing automatic alignment of beam devices having an associated structure for directing, collimating, focusing, reflecting, or otherwise modifying the main beam. A reference laser is attached to the structure enclosing the main beam producing apparatus and produces a reference beam substantially parallel to the main beam. Detector modules containing optical switching devices and optical detectors are positioned in the path of the reference beam and are effective to produce an electrical output indicative of the alignment of the main beam. This electrical output drives servomotor operated adjustment screws to adjust the position of elements of the structure associated with the main beam to maintain alignment of the main beam.

  8. Hybrid vehicle motor alignment

    DOEpatents

    Levin, Michael Benjamin

    2001-07-03

    A rotor of an electric motor for a motor vehicle is aligned to an axis of rotation for a crankshaft of an internal combustion engine having an internal combustion engine and an electric motor. A locator is provided on the crankshaft, a piloting tool is located radially by the first locator to the crankshaft. A stator of the electric motor is aligned to a second locator provided on the piloting tool. The stator is secured to the engine block. The rotor is aligned to the crankshaft and secured thereto.

  9. JPARSS: A Java Parallel Network Package for Grid Computing

    SciTech Connect

    Chen, Jie; Akers, Walter; Chen, Ying; Watson, William

    2002-03-01

    The emergence of high speed wide area networks makes grid computinga reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve bandwidth and to reduce latency on a high speed wide area network. This paper presents a Java package called JPARSS (Java Parallel Secure Stream (Socket)) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a grid environment without the necessity of tuning TCP window size. This package enables single sign-on, certificate delegation and secure or plain-text data transfer using several security components based on X.509 certificate and SSL. Several experiments will be presented to show that using Java parallelstreams is more effective than tuning TCP window size. In addition a simple architecture using Web services

  10. Parameterized BLOSUM Matrices for Protein Alignment.

    PubMed

    Song, Dandan; Chen, Jiaxing; Chen, Guang; Li, Ning; Li, Jin; Fan, Jun; Bu, Dongbo; Li, Shuai Cheng

    2015-01-01

    Protein alignment is a basic step for many molecular biology researches. The BLOSUM matrices, especially BLOSUM62, are the de facto standard matrices for protein alignments. However, after widely utilization of the matrices for 15 years, programming errors were surprisingly found in the initial version of source codes for their generation. And amazingly, after bug correction, the "intended" BLOSUM62 matrix performs consistently worse than the "miscalculated" one. In this paper, we find linear relationships among the eigenvalues of the matrices and propose an algorithm to find optimal unified eigenvectors. With them, we can parameterize matrix BLOSUMx for any given variable x that could change continuously. We compare the effectiveness of our parameterized isentropic matrix with BLOSUM62. Furthermore, an iterative alignment and matrix selection process is proposed to adaptively find the best parameter and globally align two sequences. Experiments are conducted on aligning 13,667 families of Pfam database and on clustering MHC II protein sequences, whose improved accuracy demonstrates the effectiveness of our proposed method. PMID:26357279

  11. BioViews: Java-based tools for genomic data visualization.

    PubMed

    Helt, G A; Lewis, S; Loraine, A E; Rubin, G M

    1998-03-01

    Visualization tools for bioinformatics ideally should provide universal access to the most current data in an interactive and intuitive graphical user interface. Since the introduction of Java, a language designed for distributed programming over the Web, the technology now exists to build a genomic data visualization tool that meets these requirements. Using Java we have developed a prototype genome browser applet (BioViews) that incorporates a three-level graphical view of genomic data: a physical map, an annotated sequence map, and a DNA sequence display. Annotated biological features are displayed on the physical and sequence-based maps, and the different views are interconnected. The applet is linked to several databases and can retrieve features and display hyperlinked textual data on selected features. In addition to browsing genomic data, different types of analyses can be performed interactively and the results of these analyses visualized alongside prior annotations. Our genome browser is built on top of extensible, reusable graphic components specifically designed for bioinformatics. Other groups can (and do) reuse this work in various ways. Genome centers can reuse large parts of the genome browser with minor modifications, bioinformatics groups working on sequence analysis can reuse components to build front ends for analysis programs, and biology laboratories can reuse components to publish results as dynamic Web documents. PMID:9521932

  12. Global tectonic significance of the Solomon Islands and Ontong Java Plateau convergent zone

    NASA Astrophysics Data System (ADS)

    Mann, Paul; Taira, Asahiko

    2004-10-01

    Oceanic plateaus, areas of anomalously thick oceanic crust, cover about 3% of the Earth's seafloor and are thought to mark the surface location of mantle plume "heads". Hotspot tracks represent continuing magmatism associated with the remaining plume conduit or "tail". It is presently controversial whether voluminous and mafic oceanic plateau lithosphere is eventually accreted at subduction zones, and, therefore: (1) influences the eventual composition of continental crust and; (2) is responsible for significantly higher rates of continental growth than growth only by accretion of island arcs. The Ontong Java Plateau (OJP) of the southwestern Pacific Ocean is the largest and thickest oceanic plateau on Earth and the largest plateau currently converging on an island arc (Solomon Islands). For this reason, this convergent zone is a key area for understanding the fate of large and thick plateaus on reaching subduction zones. This volume consists of a series of four papers that summarize the results of joint US-Japan marine geophysical studies in 1995 and 1998 of the Solomon Islands-Ontong Java Plateau convergent zone. Marine geophysical data include single and multi-channel seismic reflection, ocean-bottom seismometer (OBS) refraction, gravity, magnetic, sidescan sonar, and earthquake studies. Objectives of this introductory paper include: (1) review of the significance of oceanic plateaus as potential contributors to continental crust; (2) review of the current theories on the fate of oceanic plateaus at subduction zones; (3) establish the present-day and Neogene tectonic setting of the Solomon Islands-Ontong Java Plateau convergent zone; (4) discuss the controversial sequence and timing of tectonic events surrounding Ontong Java Plateau-Solomon arc convergence; (5) present a series of tectonic reconstructions for the period 20 Ma (early Miocene) to the present-day in support of our proposed timing of major tectonic events affecting the Ontong Java Plateau

  13. Analysis of viral protein-2 encoding gene of avian encephalomyelitis virus from field specimens in Central Java region, Indonesia

    PubMed Central

    Haryanto, Aris; Ermawati, Ratna; Wati, Vera; Irianingsih, Sri Handayani; Wijayanti, Nastiti

    2016-01-01

    Aim: Avian encephalomyelitis (AE) is a viral disease which can infect various types of poultry, especially chicken. In Indonesia, the incidence of AE infection in chicken has been reported since 2009, the AE incidence tends to increase from year to year. The objective of this study was to analyze viral protein 2 (VP-2) encoding gene of AE virus (AEV) from various species of birds in field specimen by reverse transcription polymerase chain reaction (RT-PCR) amplification using specific nucleotides primer for confirmation of AE diagnosis. Materials and Methods: A total of 13 AEV samples are isolated from various species of poultry which are serologically diagnosed infected by AEV from some areas in central Java, Indonesia. Research stage consists of virus samples collection from field specimens, extraction of AEV RNA, amplification of VP-2 protein encoding gene by RT-PCR, separation of RT-PCR product by agarose gel electrophoresis, DNA sequencing and data analysis. Results: Amplification products of the VP-2 encoding gene of AEV by RT-PCR methods of various types of poultry from field specimens showed a positive results on sample code 499/4/12 which generated DNA fragment in the size of 619 bp. Sensitivity test of RT-PCR amplification showed that the minimum concentration of RNA template is 127.75 ng/µl. The multiple alignments of DNA sequencing product indicated that positive sample with code 499/4/12 has 92% nucleotide homology compared with AEV with accession number AV1775/07 and 85% nucleotide homology with accession number ZCHP2/0912695 from Genbank database. Analysis of VP-2 gene sequence showed that it found 46 nucleotides difference between isolate 499/4/12 compared with accession number AV1775/07 and 93 nucleotides different with accession number ZCHP2/0912695. Conclusions: Analyses of the VP-2 encoding gene of AEV with RT-PCR method from 13 samples from field specimen generated the DNA fragment in the size of 619 bp from one sample with sample code 499

  14. Antares alignment gimbal positioner

    SciTech Connect

    Day, R.D.; Viswanathan, V.K.; Saxman, A.C.; Lujan, R.E.; Woodfin, G.L.; Sweatt, W.C.

    1981-01-01

    Antares is a 24-beam 40-TW carbon-dioxide (CO/sub 2/) laser fusion system currently under construction at the Los Alamos National Laboratory. The Antares alignment gimbal positioner (AGP) is an optomechanical instrument that will be used for target alignment and alignment of the 24 laser beams, as well as beam quality assessments. The AGP will be capable of providing pointing, focusing, and wavefront optical path difference, as well as aberration information at both helium-neon (He-Ne) and CO/sub 2/ wavelengths. It is designed to allow the laser beams to be aligned to any position within a 1-cm cube to a tolerance of 10 ..mu..m.

  15. EINSTEIN Cluster Alignments Revisited

    NASA Astrophysics Data System (ADS)

    Chambers, S. W.; Melott, A. L.; Miller, C. J.

    2000-12-01

    We have examined whether the major axes of rich galaxy clusters tend to point (in projection) toward their nearest neighboring cluster. We used the data of Ulmer, McMillan and Kowalski, who used x-ray morphology to define position angles. Our cluster samples, with well measured redshifts and updated positions, were taken from the MX Northern Abell Cluster Survey. The usual Kolmogorov-Smirnov test shows no significant alignment signal for nonrandom angles for all separations less than 100 Mpc/h. Refining the null hypothesis, however, with the Wilcoxon rank-sum test, reveals a high confidence signal for alignment. This confidence is highest when we restrict our sample to small nearest neighbor separations. We conclude that we have identified a more powerful tool for testing cluster-cluster alignments. Moreover, there is a strong signal in the data for alignment, consistent with a picture of hierarchical cluster formation in which matter falls into clusters along large scale filamentary structures.

  16. Java multi-histogram volume rendering framework for medical images

    NASA Astrophysics Data System (ADS)

    Senseney, Justin; Bokinsky, Alexandra; Cheng, Ruida; McCreedy, Evan; McAuliffe, Matthew J.

    2013-03-01

    This work extends the multi-histogram volume rendering framework proposed by Kniss et al. [1] to provide rendering results based on the impression of overlaid triangles on a graph of image intensity versus gradient magnitude. The developed method of volume rendering allows for greater emphasis to boundary visualization while avoiding issues common in medical image acquisition. For example, partial voluming effects in computed tomography and intensity inhomogeneity of similar tissue types in magnetic resonance imaging introduce pixel values that will not reflect differing tissue types when a standard transfer function is applied to an intensity histogram. This new framework uses developing technology to improve upon the Kniss multi-histogram framework by using Java, the GPU, and MIPAV, an open-source medical image processing application, to allow multi-histogram techniques to be widely disseminated. The OpenGL view aligned texture rendering approach suffered from performance setbacks, inaccessibility, and usability problems. Rendering results can now be interactively compared with other rendering frameworks, surfaces can now be extracted for use in other programs, and file formats that are widely used in the field of biomedical imaging can be visualized using this multi-histogram approach. OpenCL and GLSL are used to produce this new multi-histogram approach, leveraging texture memory on the graphics processing unit of desktops to provide a new interactive method for visualizing biomedical images. Performance results for this method are generated and qualitative rendering results are compared. The resulting framework provides the opportunity for further applications in medical imaging, both in volume rendering and in generic image processing.

  17. An implicit spatial memory alignment effect.

    PubMed

    Cerles, Mélanie; Gomez, Alice; Rousset, Stéphane

    2015-09-01

    The memory alignment effect is the advantage of reasoning from a perspective which is aligned with the frame of reference used to encode an environment in memory. It usually occurs when participants have to consciously take a perspective to perform a spatial memory task. The present experiment assesses whether the memory alignment effect can occur without requiring to consciously take a given perspective, when the misaligned perspective is only perceptively provided. In others words, does the memory alignment effect still arise when it is only implicitly prompted? Thirty participants learned a sequence of four objects' positions in a room from a north-as-up survey perspective. During the testing phase, they had to point to the direction of a target object from another object ('the reference') with a fixed north-up orientation. The background behind the reference object displayed either a uniform color (control condition) or a misaligned ground-level perspective. The latter displayed a reference object's position information which was either congruent with the studied environment (congruent misaligned condition) or incongruent (incongruent misaligned condition). Mean pointing errors were higher in the congruent misaligned condition than in the control condition, whereas the incongruent misaligned condition did not differ from the control one. The present study shows that the memory alignment effect can arise without requiring a conscious misaligned perspective taking. Moreover, the perceived misaligned perspective must share the same spatial content as the memorized spatial representation in order to induce an alignment effect. PMID:26233526

  18. Anatomy of the western Java plate interface from depth-migrated seismic images

    USGS Publications Warehouse

    Kopp, H.; Hindle, D.; Klaeschen, D.; Oncken, O.; Reichert, C.; Scholl, D.

    2009-01-01

    Newly pre-stack depth-migrated seismic images resolve the structural details of the western Java forearc and plate interface. The structural segmentation of the forearc into discrete mechanical domains correlates with distinct deformation styles. Approximately 2/3 of the trench sediment fill is detached and incorporated into frontal prism imbricates, while the floor sequence is underthrust beneath the d??collement. Western Java, however, differs markedly from margins such as Nankai or Barbados, where a uniform, continuous d??collement reflector has been imaged. In our study area, the plate interface reveals a spatially irregular, nonlinear pattern characterized by the morphological relief of subducted seamounts and thicker than average patches of underthrust sediment. The underthrust sediment is associated with a low velocity zone as determined from wide-angle data. Active underplating is not resolved, but likely contributes to the uplift of the large bivergent wedge that constitutes the forearc high. Our profile is located 100 km west of the 2006 Java tsunami earthquake. The heterogeneous d??collement zone regulates the friction behavior of the shallow subduction environment where the earthquake occurred. The alternating pattern of enhanced frictional contact zones associated with oceanic basement relief and weak material patches of underthrust sediment influences seismic coupling and possibly contributed to the heterogeneous slip distribution. Our seismic images resolve a steeply dipping splay fault, which originates at the d??collement and terminates at the sea floor and which potentially contributes to tsunami generation during co-seismic activity. ?? 2009 Elsevier B.V.

  19. An explorative multiproxy approach to characterize the ecospace of Homo erectus at Sangiran (Java, Indonesia)

    NASA Astrophysics Data System (ADS)

    Hertler, Christine; Haupt, Susanne; Lüdecke, Tina; Wirkner, Mathias; Bruch, Angela

    2015-04-01

    Homo erectus inhabited the islands of the Sunda Shelf in the late Early Pleistocene. This is illustrated by an extensive record of hominid specimens stemming from a variety of sites in Java. The hominid locality Sangiran plays a crucial role in studying related environments, because the geological record at the Sangiran dome covers a stratigraphic sequence, unlike any other hominid site in Java. Although the detailed chronology of the localities in Java is still under dispute, it covers the period between the late Early and early Middle Pleistocene. Fossil evidence includes the hominin specimens proper, diverse and evolving vertebrate faunas as well as pollen profiles. We applied a multiproxy approach to analyse and reconstruct features of the Homo erectus ecospace. Preliminary results of our explorative study are introduced in this paper. Based on the pollen record, we reconstructed temperature and precipitation for the major stratigraphic units. Although resulting values are averaging over wide chronological intervals, they illustrate general climatic trends in the late Early and early Middle Pleistocene in accordance with previous studies and the MIS record. The mammalian specimens we selected for this preliminary study possess a more restricted stratigraphic provenience. Our analyses are based on a dental sample of Duboisia santeng from the Koenigswald collection (n=14). The occurrence of the taxon is restricted to 3 layers in the stratigraphy. We reconstructed body mass and inferred diet from mesowear and isotope studies. There is no significant shift in body masses of Duboisia santeng. This result is in accordance with studies from other localities in Java. However, slight shifts in the mesowear signals (mixed feeder with increasingly browsing signal) are confirmed by studies of carbon isotopes. The analysis of oxygen isotopes provides evidence for seasonality which is compared with the signals from the vegetation.

  20. PDV Probe Alignment Technique

    SciTech Connect

    Whitworth, T L; May, C M; Strand, O T

    2007-10-26

    This alignment technique was developed while performing heterodyne velocimetry measurements at LLNL. There are a few minor items needed, such as a white card with aperture in center, visible alignment laser, IR back reflection meter, and a microscope to view the bridge surface. The work was performed on KCP flyers that were 6 and 8 mils wide. The probes used were Oz Optics manufactured with focal distances of 42mm and 26mm. Both probes provide a spot size of approximately 80?m at 1550nm. The 42mm probes were specified to provide an internal back reflection of -35 to -40dB, and the probe back reflections were measured to be -37dB and -33dB. The 26mm probes were specified as -30dB and both measured -30.5dB. The probe is initially aligned normal to the flyer/bridge surface. This provides a very high return signal, up to -2dB, due to the bridge reflectivity. A white card with a hole in the center as an aperture can be used to check the reflected beam position relative to the probe and launch beam, and the alignment laser spot centered on the bridge, see Figure 1 and Figure 2. The IR back reflection meter is used to measure the dB return from the probe and surface, and a white card or similar object is inserted between the probe and surface to block surface reflection. It may take several iterations between the visible alignment laser and the IR back reflection meter to complete this alignment procedure. Once aligned normal to the surface, the probe should be tilted to position the visible alignment beam as shown in Figure 3, and the flyer should be translated in the X and Y axis to reposition the alignment beam onto the flyer as shown in Figure 4. This tilting of the probe minimizes the amount of light from the bridge reflection into the fiber within the probe while maintaining the alignment as near normal to the flyer surface as possible. When the back reflection is measured after the tilt adjustment, the level should be about -3dB to -6dB higher than the probes

  1. Petroleum systems of the Northwest Java Province, Java and offshore southeast Sumatra, Indonesia

    USGS Publications Warehouse

    Bishop, Michele G.

    2000-01-01

    Mature, synrift lacustrine shales of Eocene to Oligocene age and mature, late-rift coals and coaly shales of Oligocene to Miocene age are source rocks for oil and gas in two important petroleum systems of the onshore and offshore areas of the Northwest Java Basin. Biogenic gas and carbonate-sourced gas have also been identified. These hydrocarbons are trapped primarily in anticlines and fault blocks involving sandstone and carbonate reservoirs. These source rocks and reservoir rocks were deposited in a complex of Tertiary rift basins formed from single or multiple half-grabens on the south edge of the Sunda Shelf plate. The overall transgressive succession was punctuated by clastic input from the exposed Sunda Shelf and marine transgressions from the south. The Northwest Java province may contain more than 2 billion barrels of oil equivalent in addition to the 10 billion barrels of oil equivalent already identified.

  2. Mineralogy of a perudic Andosol in central Java, Indonesia

    SciTech Connect

    Van Ranst, Eric; Utami, S. R.; Verdoodt, A.; Qafoku, Nikolla

    2008-02-15

    We studied the mineralogy of a perudic Andosol developed on the Dieng Tephra Sequence in central Java, Indonesia. The objective was to confirm the presence and determine the origin and stability of 2:1 and interlayered 2:1 phyllosilicates in well-drained Andosols. This was and still is a debated topic in the literature. Total elemental and selective dissolution, as well as microscopic and X-ray diffraction analyses, were performed on the soil samples collected from this site. These analyses confirmed that andic properties were present in the soil samples. The allophane content determined by selective dissolution was 3-4% in the A horizons, and increased to 12-18% in the deeper subsoil horizons. In addition, the clay fraction contained dioctahedral smectite, hydroxy-Al-interlayered 2:1 minerals (HIS), Al-chlorite, kaolinite, pyrophyllite, mica, cristobalite and some gibbsite. The silt and sand fractions were rich in plagioclase and pyroxene. The 2:1 minerals (smectite and pyrophyllite), as well as chlorite and kaolinite were of hydrothermal origin and were incorporated in the tephra during volcanic eruption. Besides desilication during dissolution of unstable minerals, Al interlayering of 2:1 layer silicates was most likely the most prominent pedogenic process. Although hydroxy-Al polymeric interlayers would normally stabilize the 2:1 clay phases, the strong weakening, and even disappearance of the characteristic XRD peaks, indicated instability of these minerals in the upper A horizons due to the perudic and intensive leaching conditions.

  3. HIV sequence compendium 2002

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Freed, Eric; Hahn, Beatrice; Marx, Preston; McCutchan, Francine; Mellors, John; Wolinsky, Steven; Korber, Bette

    2002-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Traditionally, we present the sequence data themselves in the form of alignments: Section II, an alignment of a selection of HIV-1/SIVcpz full-length genomes (a lot of LAI-like sequences, for example, have been omitted because they are so similar that they bias the alignment); Section III, a combined HIV-1/HIV-2/SIV whole genome alignment; Sections IV–VI, amino acid alignments for HIV-1/SIV-cpz, HIV-2/SIV, and SIVagm. The HIV-2/SIV and SIVagm amino acid alignments are separate because the genetic distances between these groups are so great that presenting them in one alignment would make it very elongated because of the large number of gaps that have to be inserted. As always, tables with extensive background information gathered from the literature accompany the whole genome alignments. The collection of whole-gene sequences in the database is now large enough that we have abundant representation of most subtypes. For many subtypes, and especially for subtype B, a large number of sequences that span entire genes were not included in the printed alignments to conserve space. A more complete version of all alignments is available on our website, http://hiv-web.lanl.gov/content/hiv-db/ALIGN_CURRENT/ALIGN-INDEX.html. Importantly, all these alignments have been edited to include only one sequence per person, based on phylogenetic trees that were created for all of them, as well as on the literature. Because of the number of sequences available, we have decided to use a different selection principle this year, based on the epidemiological importance of the subtypes. Subtypes A–D and CRFs 01 and 02 are by far the most widespread variants, and for these (when available) we have included 8–10 representatives in the alignments. The other

  4. Curriculum Alignment Research Suggests that Alignment Can Improve Student Achievement

    ERIC Educational Resources Information Center

    Squires, David

    2012-01-01

    Curriculum alignment research has developed showing the relationship among three alignment categories: the taught curriculum, the tested curriculum and the written curriculum. Each pair (for example, the taught and the written curriculum) shows a positive impact for aligning those results. Following this, alignment results from the Third…

  5. HIV Sequence Compendium 2010

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Leitner, Thomas; Apetrei, Christian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  6. New Web Server - the Java Version of Tempest - Produced

    NASA Technical Reports Server (NTRS)

    York, David W.; Ponyik, Joseph G.

    2000-01-01

    A new software design and development effort has produced a Java (Sun Microsystems, Inc.) version of the award-winning Tempest software (refs. 1 and 2). In 1999, the Embedded Web Technology (EWT) team received a prestigious R&D 100 Award for Tempest, Java Version. In this article, "Tempest" will refer to the Java version of Tempest, a World Wide Web server for desktop or embedded systems. Tempest was designed at the NASA Glenn Research Center at Lewis Field to run on any platform for which a Java Virtual Machine (JVM, Sun Microsystems, Inc.) exists. The JVM acts as a translator between the native code of the platform and the byte code of Tempest, which is compiled in Java. These byte code files are Java executables with a ".class" extension. Multiple byte code files can be zipped together as a "*.jar" file for more efficient transmission over the Internet. Today's popular browsers, such as Netscape (Netscape Communications Corporation) and Internet Explorer (Microsoft Corporation) have built-in Virtual Machines to display Java applets.

  7. VOTable JAVA Streaming Writer and Applications.

    NASA Astrophysics Data System (ADS)

    Kulkarni, P.; Kembhavi, A.; Kale, S.

    2004-07-01

    Virtual Observatory related tools use a new standard for data transfer called the VOTable format. This is a variant of the xml format that enables easy transfer of data over the web. We describe a streaming interface that can bridge the VOTable format, through a user friendly graphical interface, with the FITS and ASCII formats, which are commonly used by astronomers. A streaming interface is important for efficient use of memory because of the large size of catalogues. The tools are developed in JAVA to provide a platform independent interface. We have also developed a stand-alone version that can be used to convert data stored in ASCII or FITS format on a local machine. The Streaming writer is successfully being used in VOPlot (See Kale et al 2004 for a description of VOPlot).We present the test results of converting huge FITS and ASCII data into the VOTable format on machines that have only limited memory.

  8. Debris Dispersion Model Using Java 3D

    NASA Technical Reports Server (NTRS)

    Thirumalainambi, Rajkumar; Bardina, Jorge

    2004-01-01

    This paper describes web based simulation of Shuttle launch operations and debris dispersion. Java 3D graphics provides geometric and visual content with suitable mathematical model and behaviors of Shuttle launch. Because the model is so heterogeneous and interrelated with various factors, 3D graphics combined with physical models provides mechanisms to understand the complexity of launch and range operations. The main focus in the modeling and simulation covers orbital dynamics and range safety. Range safety areas include destruct limit lines, telemetry and tracking and population risk near range. If there is an explosion of Shuttle during launch, debris dispersion is explained. The shuttle launch and range operations in this paper are discussed based on the operations from Kennedy Space Center, Florida, USA.

  9. Enhancement of initial equivalency for protein structure alignment based on encoded local structures.

    PubMed

    Hung, Kenneth; Wang, Jui-Chih; Chen, Cheng-Wei; Chuang, Cheng-Long; Tsai, Kun-Nan; Chen, Chung-Ming

    2012-11-01

    Most alignment algorithms find an initial equivalent residue pair followed by an iterative optimization process to explore better near-optimal alignments in the surrounding solution space of the initial alignment. It plays a decisive role in determining the alignment quality since a poor initial alignment may make the final alignment trapped in an undesirable local optimum even with an iterative optimization. We proposed a vector-based alignment algorithm with a new initial alignment approach accounting for local structure features called MIRAGE-align. The new idea is to enhance the quality of the initial alignment based on encoded local structural alphabets to identify the protein structure pair whose sequence identity falls in or below twilight zone. The statistical analysis of alignment quality based on Match Index (MI) and computation time demonstrated that MIRAGE-align algorithm outperformed four previously published algorithms, i.e., the residue-based algorithm (CE), the vector-based algorithm (SSM), TM-align, and Fr-TM-align. MIRAGE-align yields a better estimate of initial solution to enhance the quality of initial alignment and enable the employment of a non-iterative optimization process to achieve a better alignment. PMID:22717522

  10. Where Does the Alignment Score Distribution Shape Come from?

    PubMed Central

    Ortet, Philippe; Bastien, Olivier

    2010-01-01

    Alignment algorithms are powerful tools for searching for homologous proteins in databases, providing a score for each sequence present in the database. It has been well known for 20 years that the shape of the score distribution looks like an extreme value distribution. The extremely large number of times biologists face this class of distributions raises the question of the evolutionary origin of this probability law. We investigated the possibility of deriving the main properties of sequence alignment score distributions from a basic evolutionary process: a duplication-divergence protein evolution process in a sequence space. Firstly, the distribution of sequences in this space was defined with respect to the genetic distance between sequences. Secondly, we derived a basic relation between the genetic distance and the alignment score. We obtained a novel score probability distribution which is qualitatively very similar to that of Karlin-Altschul but performing better than all other previous model. PMID:21258650

  11. HotJava: Sun's Animated Interactive World Wide Web Browser for the Internet.

    ERIC Educational Resources Information Center

    Machovec, George S., Ed.

    1995-01-01

    Examines HotJava and Java, World Wide Web technology for use on the Internet. HotJava, an interactive, animated Web browser, based on the object-oriented Java programming language, is different from HTML-based browsers such as Netscape. Its client/server design does not understand Internet protocols but can dynamically find what it needs to know.…

  12. Spilling the beans on java 3D: a tool for the virtual anatomist.

    PubMed

    Guttmann, G D

    1999-04-15

    The computing world has just provided the anatomist with another tool: Java 3D, within the Java 2 platform. On December 9, 1998, Sun Microsystems released Java 2. Java 3D classes are now included in the jar (Java Archive) archives of the extensions directory of Java 2. Java 3D is also a part of the Java Media Suite of APIs (Application Programming Interfaces). But what is Java? How does Java 3D work? How do you view Java 3D objects? A brief introduction to the concepts of Java and object-oriented programming is provided. Also, there is a short description of the tools of Java 3D and of the Java 3D viewer. Thus, the virtual anatomist has another set of computer tools to use for modeling various aspects of anatomy, such as embryological development. Also, the virtual anatomist will be able to assist the surgeon with virtual surgery using the tools found in Java 3D. Java 3D will be able to fulfill gaps, such as the lack of platform independence, interactivity, and manipulability of 3D images, currently existing in many anatomical computer-aided learning programs. PMID:10321435

  13. Sequence Maneuverer: tool for sequence extraction from genomes

    PubMed Central

    Yasmin, Tayyaba; Rehman, Inayat Ur; Ansari, Adnan Ahmad; liaqat, Khurrum; khan, Muhammad Irfan

    2012-01-01

    The availability of genomic sequences of many organisms has opened new challenges in many aspects particularly in terms of genome analysis. Sequence extraction is a vital step and many tools have been developed to solve this issue. These tools are available publically but have limitations with reference to the sequence extraction, length of the sequence to be extracted, organism specificity and lack of user friendly interface. We have developed a java based software package having three modules which can be used independently or sequentially. The tool efficiently extracts sequences from large datasets with few simple steps. It can efficiently extract multiple sequences of any desired length from a genome of any organism. The results are crosschecked by published data. Availability URL 1: http://ww3.comsats.edu.pk/bio/ResearchProjects.aspx URL 2: http://ww3.comsats.edu.pk/bio/SequenceManeuverer.aspx PMID:23275734

  14. BBMap: A Fast, Accurate, Splice-Aware Aligner

    SciTech Connect

    Bushnell, Brian

    2014-03-17

    Alignment of reads is one of the primary computational tasks in bioinformatics. Of paramount importance to resequencing, alignment is also crucial to other areas - quality control, scaffolding, string-graph assembly, homology detection, assembly evaluation, error-correction, expression quantification, and even as a tool to evaluate other tools. An optimal aligner would greatly improve virtually any sequencing process, but optimal alignment is prohibitively expensive for gigabases of data. Here, we will present BBMap [1], a fast splice-aware aligner for short and long reads. We will demonstrate that BBMap has superior speed, sensitivity, and specificity to alternative high-throughput aligners bowtie2 [2], bwa [3], smalt, [4] GSNAP [5], and BLASR [6].

  15. Mortality of domesticated java deer attributed to Surra.

    PubMed

    Nurulaini, R; Jamnah, O; Adnan, M; Zaini, C M; Khadijah, S; Rafiah, A; Chandrawathani, P

    2007-12-01

    This paper reports an outbreak of trypanosomiasis due to Trypanosoma evansi in Java deer (Cervus timorensis) on a government deer farm in Lenggong, Perak. Seventeen adult female Java deer were found dead within a week. Symptoms of dullness, inappetence, anaemia, anorexia, respiratory distress and recumbency were seen prior to death in the infected Java deer. Beside trypanosomiasis, other parasitic infections such as theileriosis, helminthiasis and ectoparasite infestation were also recorded. Post mortem results showed generalized anaemia in most animals with isolated cases of jaundice. There was no significant finding with respect to bacteriological and viral investigations. PMID:18209710

  16. Prototyping Faithful Execution in a Java virtual machine.

    SciTech Connect

    Tarman, Thomas David; Campbell, Philip LaRoche; Pierson, Lyndon George

    2003-09-01

    This report presents the implementation of a stateless scheme for Faithful Execution, the design for which is presented in a companion report, ''Principles of Faithful Execution in the Implementation of Trusted Objects'' (SAND 2003-2328). We added a simple cryptographic capability to an already simplified class loader and its associated Java Virtual Machine (JVM) to provide a byte-level implementation of Faithful Execution. The extended class loader and JVM we refer to collectively as the Sandia Faithfully Executing Java architecture (or JavaFE for short). This prototype is intended to enable exploration of more sophisticated techniques which we intend to implement in hardware.

  17. FMIT alignment cart

    SciTech Connect

    Potter, R.C.; Dauelsberg, L.B.; Clark, D.C.; Grieggs, R.J.

    1981-01-01

    The Fusion Materials Irradiation Test (FMIT) Facility alignment cart must perform several functions. It must serve as a fixture to receive the drift-tube girder assembly when it is removed from the linac tank. It must transport the girder assembly from the linac vault to the area where alignment or disassembly is to take place. It must serve as a disassembly fixture to hold the girder while individual drift tubes are removed for repair. It must align the drift tube bores in a straight line parallel to the girder, using an optical system. These functions must be performed without violating any clearances found within the building. The bore tubes of the drift tubes will be irradiated, and shielding will be included in the system for easier maintenance.

  18. Barrel alignment fixture

    NASA Astrophysics Data System (ADS)

    Sheeley, J. D.

    1981-04-01

    Fabrication of slapper type detonator cables requires bonding of a thin barrel over a bridge. Location of the barrel hole with respect to the bridge is critical: the barrel hole must be centered over the bridge uniform spacing on each side. An alignment fixture which permits rapid adjustment of the barrel position with respect to the bridge is described. The barrel is manipulated by pincer-type fingers which are mounted on a small x-y table equipped with micrometer adjustments. Barrel positioning, performed under a binocular microscopy, is rapid and accurate. After alignment, the microscope is moved out of position and an infrared (IR) heat source is aimed at the barrel. A 5-second pulse of infrared heat flows the adhesive under the barrel and bonds it to the cable. Sapphire and Fotoform glass barrels were bonded successfully with the alignment fixture.

  19. Improved docking alignment system

    NASA Technical Reports Server (NTRS)

    Monford, Leo G. (Inventor)

    1988-01-01

    Improved techniques are provided for the alignment of two objects. The present invention is particularly suited for 3-D translation and 3-D rotational alignment of objects in outer space. A camera is affixed to one object, such as a remote manipulator arm of the spacecraft, while the planar reflective surface is affixed to the other object, such as a grapple fixture. A monitor displays in real-time images from the camera such that the monitor displays both the reflected image of the camera and visible marking on the planar reflective surface when the objects are in proper alignment. The monitor may thus be viewed by the operator and the arm manipulated so that the reflective surface is perpendicular to the optical axis of the camera, the roll of the reflective surface is at a selected angle with respect to the camera, and the camera is spaced a pre-selected distance from the reflective surface.

  20. Optics Alignment Panel

    NASA Technical Reports Server (NTRS)

    Schroeder, Daniel J.

    1992-01-01

    The Optics Alignment Panel (OAP) was commissioned by the HST Science Working Group to determine the optimum alignment of the OTA optics. The goal was to find the position of the secondary mirror (SM) for which there is no coma or astigmatism in the camera images due to misaligned optics, either tilt or decenter. The despace position was reviewed of the SM and the optimum focus was sought. The results of these efforts are as follows: (1) the best estimate of the aligned position of the SM in the notation of HDOS is (DZ,DY,TZ,TY) = (+248 microns, +8 microns, +53 arcsec, -79 arcsec), and (2) the best focus, defined to be that despace which maximizes the fractional energy at 486 nm in a 0.1 arcsec radius of a stellar image, is 12.2 mm beyond paraxial focus. The data leading to these conclusions, and the estimated uncertainties in the final results, are presented.

  1. MUSE optical alignment procedure

    NASA Astrophysics Data System (ADS)

    Laurent, Florence; Renault, Edgard; Loupias, Magali; Kosmalski, Johan; Anwand, Heiko; Bacon, Roland; Boudon, Didier; Caillier, Patrick; Daguisé, Eric; Dubois, Jean-Pierre; Dupuy, Christophe; Kelz, Andreas; Lizon, Jean-Louis; Nicklas, Harald; Parès, Laurent; Remillieux, Alban; Seifert, Walter; Valentin, Hervé; Xu, Wenli

    2012-09-01

    MUSE (Multi Unit Spectroscopic Explorer) is a second generation VLT integral field spectrograph (1x1arcmin² Field of View) developed for the European Southern Observatory (ESO), operating in the visible wavelength range (0.465-0.93 μm). A consortium of seven institutes is currently assembling and testing MUSE in the Integration Hall of the Observatoire de Lyon for the Preliminary Acceptance in Europe, scheduled for 2013. MUSE is composed of several subsystems which are under the responsibility of each institute. The Fore Optics derotates and anamorphoses the image at the focal plane. A Splitting and Relay Optics feed the 24 identical Integral Field Units (IFU), that are mounted within a large monolithic instrument mechanical structure. Each IFU incorporates an image slicer, a fully refractive spectrograph with VPH-grating and a detector system connected to a global vacuum and cryogenic system. During 2011, all MUSE subsystems were integrated, aligned and tested independently in each institute. After validations, the systems were shipped to the P.I. institute at Lyon and were assembled in the Integration Hall This paper describes the end-to-end optical alignment procedure of the MUSE instrument. The design strategy, mixing an optical alignment by manufacturing (plug and play approach) and few adjustments on key components, is presented. We depict the alignment method for identifying the optical axis using several references located in pupil and image planes. All tools required to perform the global alignment between each subsystem are described. The success of this alignment approach is demonstrated by the good results for the MUSE image quality. MUSE commissioning at the VLT (Very Large Telescope) is planned for 2013.

  2. Orientation and Alignment Echoes

    NASA Astrophysics Data System (ADS)

    Karras, G.; Hertz, E.; Billard, F.; Lavorel, B.; Hartmann, J.-M.; Faucher, O.; Gershnabel, Erez; Prior, Yehiam; Averbukh, Ilya Sh.

    2015-04-01

    We present one of the simplest classical systems featuring the echo phenomenon—a collection of randomly oriented free rotors with dispersed rotational velocities. Following excitation by a pair of time-delayed impulsive kicks, the mean orientation or alignment of the ensemble exhibits multiple echoes and fractional echoes. We elucidate the mechanism of the echo formation by the kick-induced filamentation of phase space, and provide the first experimental demonstration of classical alignment echoes in a thermal gas of CO2 molecules excited by a pair of femtosecond laser pulses.

  3. Segment alignment control system

    NASA Technical Reports Server (NTRS)

    Aubrun, JEAN-N.; Lorell, Ken R.

    1988-01-01

    The segmented primary mirror for the LDR will require a special segment alignment control system to precisely control the orientation of each of the segments so that the resulting composite reflector behaves like a monolith. The W.M. Keck Ten Meter Telescope will utilize a primary mirror made up of 36 actively controlled segments. Thus the primary mirror and its segment alignment control system are directly analogous to the LDR. The problems of controlling the segments in the face of disturbances and control/structures interaction, as analyzed for the TMT, are virtually identical to those for the LDR. The two systems are briefly compared.

  4. JavaProtein Dossier: a novel web-based data visualization tool for comprehensive analysis of protein structure

    PubMed Central

    Neshich, Goran; Rocchia, Walter; Mancini, Adauto L.; Yamagishi, Michel E. B.; Kuser, Paula R.; Fileto, Renato; Baudet, Christian; Pinto, Ivan P.; Montagner, Arnaldo J.; Palandrani, Juliana F.; Krauchenco, Joao N.; Torres, Renato C.; Souza, Savio; Togawa, Roberto C.; Higa, Roberto H.

    2004-01-01

    JavaProtein Dossier (JPD) is a new concept, database and visualization tool providing one of the largest collections of the physicochemical parameters describing proteins' structure, stability, function and interaction with other macromolecules. By collecting as many descriptors/parameters as possible within a single database, we can achieve a better use of the available data and information. Furthermore, data grouping allows us to generate different parameters with the potential to provide new insights into the sequence–structure–function relationship. In JPD, residue selection can be performed according to multiple criteria. JPD can simultaneously display and analyze all the physicochemical parameters of any pair of structures, using precalculated structural alignments, allowing direct parameter comparison at corresponding amino acid positions among homologous structures. In order to focus on the physicochemical (and consequently pharmacological) profile of proteins, visualization tools (showing the structure and structural parameters) also had to be optimized. Our response to this challenge was the use of Java technology with its exceptional level of interactivity. JPD is freely accessible (within the Gold Sting Suite) at http://sms.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS, http://trantor.bioc.columbia.edu/SMS and http://www.es.embnet.org/SMS/ (Option: JavaProtein Dossier). PMID:15215458

  5. Proteins comparison through probabilistic optimal structure local alignment

    PubMed Central

    Micale, Giovanni; Pulvirenti, Alfredo; Giugno, Rosalba; Ferro, Alfredo

    2014-01-01

    Multiple local structure comparison helps to identify common structural motifs or conserved binding sites in 3D structures in distantly related proteins. Since there is no best way to compare structures and evaluate the alignment, a wide variety of techniques and different similarity scoring schemes have been proposed. Existing algorithms usually compute the best superposition of two structures or attempt to solve it as an optimization problem in a simpler setting (e.g., considering contact maps or distance matrices). Here, we present PROPOSAL (PROteins comparison through Probabilistic Optimal Structure local ALignment), a stochastic algorithm based on iterative sampling for multiple local alignment of protein structures. Our method can efficiently find conserved motifs across a set of protein structures. Only the distances between all pairs of residues in the structures are computed. To show the accuracy and the effectiveness of PROPOSAL we tested it on a few families of protein structures. We also compared PROPOSAL with two state-of-the-art tools for pairwise local alignment on a dataset of manually annotated motifs. PROPOSAL is available as a Java 2D standalone application or a command line program at http://ferrolab.dmi.unict.it/proposal/proposal.html. PMID:25228906

  6. Multiple structure alignment with msTALI

    PubMed Central

    2012-01-01

    Background Multiple structure alignments have received increasing attention in recent years as an alternative to multiple sequence alignments. Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification. A method that is capable of solving a variety of problems using structure comparison is still absent. Here we introduce a program msTALI for aligning multiple protein structures. Our algorithm uses several informative features to guide its alignments: torsion angles, backbone Cα atom positions, secondary structure, residue type, surface accessibility, and properties of nearby atoms. The algorithm allows the user to weight the types of information used to generate the alignment, which expands its utility to a wide variety of problems. Results msTALI exhibits competitive results on 824 families from the Homstrad and SABmark databases when compared to Matt and Mustang. We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications. Finally, we present an example applying msTALI to the problem of detecting hinges in a protein undergoing rigid-body motion. Conclusions msTALI is an effective algorithm for multiple structure alignment. In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications. The C++ source code for msTALI is available for Linux on the web at http://ifestos.cse.sc.edu/mstali. PMID:22607234

  7. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  8. The Java based control software of the LUCIFER instrument

    NASA Astrophysics Data System (ADS)

    Jütte, Marcus; Polsterer, Kai; Knierim, Volker; Luks, Thomas; Schimmelmann, Jan; Muhlack, Tobias; Mandel, Holger; Lehmitz, Michael

    2006-06-01

    The LUCIFER instrument is a near infrared spectrograph/imager with MOS for the Large Binocular Telescope. Here we present the final software design, the interrelation of the software packages and the used hardware architecture. The software package is completely running under Java using intensively its Remote Method Invocation (RMI) mechanisms in a distributed system environment. The use of Java helped us to cope with a small amount of available manpower for the SW development, providing many native built-in Java methods and classes, which speed up the development process a lot. The control software will be finally installed on a Solaris OS, hosted on a Sun Fire V880 server, which results from a specific hardware constraint. For testing purposes a standard Linux environment is used. This shows another big Java advantage, the platform independency. The "First Light" of LUCIFER 1 is estimated for summer/fall 2007, following LUCIFER 2 one year later.

  9. The CERN PS/SL Controls Java Application Programming Interface

    SciTech Connect

    I. Deloose; J. Cuperus; P. Charrue; F. DiMaio; K. Kostro; M. Vanden Eynden; W. Watson

    1999-10-01

    The PS/SL Convergence Project was launched in March 1998. Its objective is to deliver a common control as infrastructure for the CERN accelerators by year 2001. In the framework of this convergence activity, a project was launched to develop a Java Application Programming Interface (API) between programs written in the Java language and the PS and SL accelerator equipment. This Java API was specified and developed in collaboration with TJNAF. It is based on the Java CDEV [1] package that has been extended in order to end up with a CERN/TJNAF common product. It implements a detailed model composed of devices organized in named classes that provide a property-based interface. It supports data subscription and introspection facilities. The device model is presented and the capabilities of the API are described with syntax examples. The software architecture is also described.

  10. The 17 July 2006 Tsunami earthquake in West Java, Indonesia

    USGS Publications Warehouse

    Mori, J.; Mooney, W.D.; Afnimar; Kurniawan, S.; Anaya, A.I.; Widiyantoro, S.

    2007-01-01

    A tsunami earthquake (Mw = 7.7) occurred south of Java on 17 July 2006. The event produced relatively low levels of high-frequency radiation, and local felt reports indicated only weak shaking in Java. There was no ground motion damage from the earthquake, but there was extensive damage and loss of life from the tsunami along 250 km of the southern coasts of West Java and Central Java. An inspection of the area a few days after the earthquake showed extensive damage to wooden and unreinforced masonry buildings that were located within several hundred meters of the coast. Since there was no tsunami warning system in place, efforts to escape the large waves depended on how people reacted to the earthquake shaking, which was only weakly felt in the coastal areas. This experience emphasizes the need for adequate tsunami warning systems for the Indian Ocean region.

  11. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs.

    PubMed

    Cheng, Hua; Kim, Bong-Hyun; Grishin, Nick V

    2008-03-01

    We describe MALIDUP (manual alignments of duplicated domains), a database of 241 pairwise structure alignments for homologous domains originated by internal duplication within the same polypeptide chain. Since duplicated domains within a protein frequently diverge in function and thus in sequence, this would be the first database of structurally similar homologs that is not strongly biased by sequence or functional similarity. Our manual alignments in most cases agree with the automatic structural alignments generated by several commonly used programs. This carefully constructed database could be used in studies on protein evolution and as a reference for testing structure alignment programs. The database is available at http://prodata.swmed.edu/malidup. PMID:17932926

  12. Curriculum Alignment: Establishing Coherence

    ERIC Educational Resources Information Center

    Gagné, Philippe; Dumont, Laurence; Brunet, Sabine; Boucher, Geneviève

    2013-01-01

    In this paper, we present a step-by-step guide to implement a curricular alignment project, directed at professional development and student support, and developed in a higher education French as a second language department. We outline best practices and preliminary results from our experience and provide ways to adapt our experience to other…

  13. Aligning brains and minds

    PubMed Central

    Tong, Frank

    2012-01-01

    In this issue of Neuron, Haxby and colleagues describe a new method for aligning functional brain activity patterns across participants. Their study demonstrates that objects are similarly represented across different brains, allowing for reliable classification of one person’s brain activity based on another’s. PMID:22017984

  14. Aligned-or Not?

    ERIC Educational Resources Information Center

    Roseman, Jo Ellen; Koppal, Mary

    2015-01-01

    When state leaders and national partners in the development of the Next Generation Science Standards met to consider implementation strategies, states and school districts wanted to know which materials were aligned to the new standards. The answer from the developers was short but not sweet: You won't find much now, and it's going to…

  15. Optically Aligned Drill Press

    NASA Technical Reports Server (NTRS)

    Adderholdt, Bruce M.

    1994-01-01

    Precise drill press equipped with rotary-indexing microscope. Microscope and drill exchange places when turret rotated. Microscope axis first aligned over future hole, then rotated out of way so drill axis assumes its precise position. New procedure takes less time to locate drilling positions and produces more accurate results. Apparatus adapted to such other machine tools as milling and measuring machines.

  16. A tool for learning the programming style of Java

    NASA Astrophysics Data System (ADS)

    Arai, Masayuki

    2013-03-01

    We developed a tool for learning the programming style of Java. The tool has the following functions: (1) recommends the use of CamelCase and English words for the names of classes, methods, and variables; (2) recommends setting the correct scope level of variables and the appropriate length of variable names; (3) recommends writing comments in source programs; (4) shows sample source programs according to the programming style of Java.

  17. A rapid protein structure alignment algorithm based on a text modeling technique

    PubMed Central

    Razmara, Jafar; Deris, Safaai; Parvizpour, Sepideh

    2011-01-01

    Structural alignment of proteins is widely used in various fields of structural biology. In order to further improve the quality of alignment, we describe an algorithm for structural alignment based on text modelling techniques. The technique firstly superimposes secondary structure elements of two proteins and then, models the 3D-structure of the protein in a sequence of alphabets. These sequences are utilized by a step-by-step sequence alignment procedure to align two protein structures. A benchmark test was organized on a set of 200 non-homologous proteins to evaluate the program and compare it to state of the art programs, e.g. CE, SAL, TM-align and 3D-BLAST. On average, the results of all-against-all structure comparison by the program have a competitive accuracy with CE and TM-align where the algorithm has a high running speed like 3D-BLAST. PMID:21814392

  18. A JAVA User Interface for the Virtual Human

    SciTech Connect

    Easterly, C E; Strickler, D J; Tolliver, J S; Ward, R C

    1999-10-13

    A human simulation environment, the Virtual Human (VH), is under development at the Oak Ridge National Laboratory (ORNL). Virtual Human connects three-dimensional (3D) anatomical models of the body with dynamic physiological models to investigate a wide range of human biological and physical responses to stimuli. We have utilized the Java programming language to develop a flexible user interface to the VH. The Java prototype interface has been designed to display dynamic results from selected physiological models, with user control of the initial model parameters and ability to steer the simulation as it is proceeding. Taking advantage of Java's Remote Method Invocation (RMI) features, the interface runs as a Java client that connects to a Java RMI server process running on a remote server machine. The RMI server can couple to physiological models written in Java, or in other programming languages, including C and FORTRAN. Future versions of the interface will be linked to 3D anatomical models of the human body to complete the development of the VH.

  19. Java Performance for Scientific Applications on LLNL Computer Systems

    SciTech Connect

    Kapfer, C; Wissink, A

    2002-05-10

    Languages in use for high performance computing at the laboratory--Fortran (f77 and f90), C, and C++--have many years of development behind them and are generally considered the fastest available. However, Fortran and C do not readily extend to object-oriented programming models, limiting their capability for very complex simulation software. C++ facilitates object-oriented programming but is a very complex and error-prone language. Java offers a number of capabilities that these other languages do not. For instance it implements cleaner (i.e., easier to use and less prone to errors) object-oriented models than C++. It also offers networking and security as part of the language standard, and cross-platform executables that make it architecture neutral, to name a few. These features have made Java very popular for industrial computing applications. The aim of this paper is to explain the trade-offs in using Java for large-scale scientific applications at LLNL. Despite its advantages, the computational science community has been reluctant to write large-scale computationally intensive applications in Java due to concerns over its poor performance. However, considerable progress has been made over the last several years. The Java Grande Forum [1] has been promoting the use of Java for large-scale computing. Members have introduced efficient array libraries, developed fast just-in-time (JIT) compilers, and built links to existing packages used in high performance parallel computing.

  20. jFuzz: A Concolic Whitebox Fuzzer for Java

    NASA Technical Reports Server (NTRS)

    Jayaraman, Karthick; Harvison, David; Ganesh, Vijay; Kiezun, Adam

    2009-01-01

    We present jFuzz, a automatic testing tool for Java programs. jFuzz is a concolic whitebox fuzzer, built on the NASA Java PathFinder, an explicit-state Java model checker, and a framework for developing reliability and analysis tools for Java. Starting from a seed input, jFuzz automatically and systematically generates inputs that exercise new program paths. jFuzz uses a combination of concrete and symbolic execution, and constraint solving. Time spent on solving constraints can be significant. We implemented several well-known optimizations and name-independent caching, which aggressively normalizes the constraints to reduce the number of calls to the constraint solver. We present preliminary results due to the optimizations, and demonstrate the effectiveness of jFuzz in creating good test inputs. The source code of jFuzz is available as part of the NASA Java PathFinder. jFuzz is intended to be a research testbed for investigating new testing and analysis techniques based on concrete and symbolic execution. The source code of jFuzz is available as part of the NASA Java PathFinder.

  1. Sedimentary deposits study of the 2006 Java tsunami, in Pangandaran, West Java (preliminary result)

    NASA Astrophysics Data System (ADS)

    Maemunah, Imun; Suparka, Emmy; Puspito, Nanang T.; Hidayati, Sri

    2015-04-01

    The 2006 Java Earthquake (Mw 7.2) has generated a tsunami that reached Pangandaran coastal plain with 9.7 m above sea level height of wave. In 2014 we examined the tsunami deposit exposed in shallow trenches along a˜300 m at 5 transect from shoreline to inland on Karapyak and Madasari, Pangandaran. We documented stratigraphically and sedimentologically, the characteristics of Java Tsunami deposit on Karapyak and Madasari and compared both sediments. In local farmland a moderately-sorted, brown soil is buried by a poorly-sorted, grey, medium-grained sand-sheet. The tsunami deposit was distinguished from the underlying soil by a pronounced increase in grain size that becomes finner upwards and landwards. Decreasing concentration of coarse size particles with distance toward inland are in agreement with grain size analysis. The thickest tsunami deposit is about 25 cm found at 84 m from shoreline in Madasari and about 15 cm found at 80 m from shoreline in Karapyak. The thickness of tsunami deposits in some transect become thinner landward but in some other transect lack a consistent suggested strongly affected by local topography. Tsunami deposits at Karapyak and Madasari show many similarities. Both deposits consist of coarse sand that sharply overlies a finer sandy soil. The presence mud drapes and other sedimentary structure like graded bedding, massive beds, mud clasts in many locations shows a dynamics process of tsunami waves. The imbrication coarse and shell fragments of the 2006 Java, tsunami deposits also provide information about the curent direction, allowing us to distinguish run up deposits from backwash deposits.

  2. Sedimentary deposits study of the 2006 Java tsunami, in Pangandaran, West Java (preliminary result)

    SciTech Connect

    Maemunah, Imun; Suparka, Emmy Puspito, Nanang T; Hidayati, Sri

    2015-04-24

    The 2006 Java Earthquake (Mw 7.2) has generated a tsunami that reached Pangandaran coastal plain with 9.7 m above sea level height of wave. In 2014 we examined the tsunami deposit exposed in shallow trenches along a∼300 m at 5 transect from shoreline to inland on Karapyak and Madasari, Pangandaran. We documented stratigraphically and sedimentologically, the characteristics of Java Tsunami deposit on Karapyak and Madasari and compared both sediments. In local farmland a moderately-sorted, brown soil is buried by a poorly-sorted, grey, medium-grained sand-sheet. The tsunami deposit was distinguished from the underlying soil by a pronounced increase in grain size that becomes finner upwards and landwards. Decreasing concentration of coarse size particles with distance toward inland are in agreement with grain size analysis. The thickest tsunami deposit is about 25 cm found at 84 m from shoreline in Madasari and about 15 cm found at 80 m from shoreline in Karapyak. The thickness of tsunami deposits in some transect become thinner landward but in some other transect lack a consistent suggested strongly affected by local topography. Tsunami deposits at Karapyak and Madasari show many similarities. Both deposits consist of coarse sand that sharply overlies a finer sandy soil. The presence mud drapes and other sedimentary structure like graded bedding, massive beds, mud clasts in many locations shows a dynamics process of tsunami waves. The imbrication coarse and shell fragments of the 2006 Java, tsunami deposits also provide information about the curent direction, allowing us to distinguish run up deposits from backwash deposits.

  3. Aligning genomes with inversions and swaps

    SciTech Connect

    Holloway, J.L.; Cull, P.

    1994-12-31

    The decision about what operators to allow and how to charge for these operations when aligning strings that arise in a biological context is the decision about what model of evolution to assume. Frequently the operators used to construct an alignment between biological sequences axe limited to deletion, insertion, or replacement of a character or block of characters, but there is biological evidence for the evolutionary operations of exchanging the positions of two segments in a sequence and the replacement of a segment by its reversed complement. In this paper we describe a family of heuristics designed to compute alignments of biological sequences assuming a model of evolution with swaps and inversions. The heuristics will necessarily be approximate since the appropriate way to charge for the evolutionary events (delete, insert, substitute, swap, and invert) is not known. The paper concludes with a pair-wise comparison of 20 Picornavirus genomes, and a detailed comparison of the hepatitis delta virus with the citrus exocortis viroid.

  4. Molecular characterization and phylogenetic analysis of Fasciola gigantica from western Java, Indonesia.

    PubMed

    Hayashi, Kei; Ichikawa-Seki, Madoka; Allamanda, Puttik; Wibowo, Putut Eko; Mohanta, Uday Kumar; Sodirun; Guswanto, Azirwan; Nishikawa, Yoshifumi

    2016-10-01

    Fasciola gigantica and aspermic (hybrid) Fasciola flukes are thought to be distributed in Southeast Asian countries. The objectives of this study were to investigate the distribution of these flukes from unidentified ruminants in western Java, Indonesia, and to determine their distribution history into the area. Sixty Fasciola flukes from western Java were identified as F. gigantica based on the nucleotide sequences of the nuclear phosphoenolpyruvate carboxykinase (pepck) and DNA polymerase delta (pold) genes. The flukes were then analyzed phylogenetically based on the nucleotide sequence of the mitochondrial NADH dehydrogenase subunit 1 (nad1) gene, together with Fasciola flukes from other Asian countries. All but one F. gigantica fluke were classified in F. gigantica haplogroup C, which mainly contains nad1 haplotypes detected in flukes from Thailand, Vietnam, and China. A population genetic analysis suggested that haplogroup C spread from Thailand to the neighboring countries including Indonesia together with domestic ruminants, such as the swamp buffalo, Bubalus bubalis. The swamp buffalo is one of the important definitive hosts of Fasciola flukes in Indonesia, and is considered to have been domesticated in the north of Thailand. The remaining one fluke displayed a novel nad1 haplotype that has never been detected in the reference countries. Therefore, the origin of the fluke could not be established. No hybrid Fasciola flukes were detected in this study, in contrast to neighboring Asian countries. PMID:27266482

  5. Jeagle: a JAVA Runtime Verification Tool

    NASA Technical Reports Server (NTRS)

    DAmorim, Marcelo; Havelund, Klaus

    2005-01-01

    We introduce the temporal logic Jeagle and its supporting tool for runtime verification of Java programs. A monitor for an Jeagle formula checks if a finite trace of program events satisfies the formula. Jeagle is a programming oriented extension of the rule-based powerful Eagle logic that has been shown to be capable of defining and implementing a range of finite trace monitoring logics, including future and past time temporal logic, real-time and metric temporal logics, interval logics, forms of quantified temporal logics, and so on. Monitoring is achieved on a state-by-state basis avoiding any need to store the input trace. Jeagle extends Eagle with constructs for capturing parameterized program events such as method calls and method returns. Parameters can be the objects that methods are called upon, arguments to methods, and return values. Jeagle allows one to refer to these in formulas. The tool performs automated program instrumentation using AspectJ. We show the transformational semantics of Jeagle.

  6. Org.Lcsim: Event Reconstruction in Java

    SciTech Connect

    Graf, Norman A.; /SLAC

    2012-04-19

    Maximizing the physics performance of detectors being designed for the International Linear Collider, while remaining sensitive to cost constraints, requires a powerful, efficient, and flexible simulation, reconstruction and analysis environment to study the capabilities of a large number of different detector designs. The preparation of Letters Of Intent for the International Linear Collider involved the detailed study of dozens of detector options, layouts and readout technologies; the final physics benchmarking studies required the reconstruction and analysis of hundreds of millions of events. We describe the Java-based software toolkit (org.lcsim) which was used for full event reconstruction and analysis. The components are fully modular and are available for tasks from digitization of tracking detector signals through to cluster finding, pattern recognition, track-fitting, calorimeter clustering, individual particle reconstruction, jet-finding, and analysis. The detector is defined by the same xml input files used for the detector response simulation, ensuring the simulation and reconstruction geometries are always commensurate by construction. We discuss the architecture as well as the performance.

  7. MUSE alignment onto VLT

    NASA Astrophysics Data System (ADS)

    Laurent, Florence; Renault, Edgard; Boudon, Didier; Caillier, Patrick; Daguisé, Eric; Dupuy, Christophe; Jarno, Aurélien; Lizon, Jean-Louis; Migniau, Jean-Emmanuel; Nicklas, Harald; Piqueras, Laure

    2014-07-01

    MUSE (Multi Unit Spectroscopic Explorer) is a second generation Very Large Telescope (VLT) integral field spectrograph developed for the European Southern Observatory (ESO). It combines a 1' x 1' field of view sampled at 0.2 arcsec for its Wide Field Mode (WFM) and a 7.5"x7.5" field of view for its Narrow Field Mode (NFM). Both modes will operate with the improved spatial resolution provided by GALACSI (Ground Atmospheric Layer Adaptive Optics for Spectroscopic Imaging), that will use the VLT deformable secondary mirror and 4 Laser Guide Stars (LGS) foreseen in 2015. MUSE operates in the visible wavelength range (0.465-0.93 μm). A consortium of seven institutes is currently commissioning MUSE in the Very Large Telescope for the Preliminary Acceptance in Chile, scheduled for September, 2014. MUSE is composed of several subsystems which are under the responsibility of each institute. The Fore Optics derotates and anamorphoses the image at the focal plane. A Splitting and Relay Optics feed the 24 identical Integral Field Units (IFU), that are mounted within a large monolithic structure. Each IFU incorporates an image slicer, a fully refractive spectrograph with VPH-grating and a detector system connected to a global vacuum and cryogenic system. During 2012 and 2013, all MUSE subsystems were integrated, aligned and tested to the P.I. institute at Lyon. After successful PAE in September 2013, MUSE instrument was shipped to the Very Large Telescope in Chile where that was aligned and tested in ESO integration hall at Paranal. After, MUSE was directly transported, fully aligned and without any optomechanical dismounting, onto VLT telescope where the first light was overcame the 7th of February, 2014. This paper describes the alignment procedure of the whole MUSE instrument with respect to the Very Large Telescope (VLT). It describes how 6 tons could be move with accuracy better than 0.025mm and less than 0.25 arcmin in order to reach alignment requirements. The success

  8. Aspect-Oriented Subprogram Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    The Rational Sequence computer program described elsewhere includes a subprogram that utilizes the capability for aspect-oriented programming when that capability is present. This subprogram is denoted the Rational Sequence (AspectJ) component because it uses AspectJ, which is an extension of the Java programming language that introduces aspect-oriented programming techniques into the language

  9. A Sequence Based Synteny Map Between Soybean and Arabidopis Thaliana.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In an effort to identify conserved sequences between soybean (Glycine max, L. Merr.) and the model organism Arabidopsis thaliana (thale cress), a series of JAVA-based programs were created that processed and compared 341,619 soybean DNA sequences against A. thaliana chromosomal DNA. A. thaliana DNA ...

  10. Automated whole-genome multiple alignment of rat, mouse, and human

    SciTech Connect

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  11. Inflation by alignment

    SciTech Connect

    Burgess, C.P.; Roest, Diederik

    2015-06-08

    Pseudo-Goldstone bosons (pGBs) can provide technically natural inflatons, as has been comparatively well-explored in the simplest axion examples. Although inflationary success requires trans-Planckian decay constants, f≳M{sub p}, several mechanisms have been proposed to obtain this, relying on (mis-)alignments between potential and kinetic energies in multiple-field models. We extend these mechanisms to a broader class of inflationary models, including in particular the exponential potentials that arise for pGB potentials based on noncompact groups (and so which might apply to moduli in an extra-dimensional setting). The resulting potentials provide natural large-field inflationary models and can predict a larger primordial tensor signal than is true for simpler single-field versions of these models. In so doing we provide a unified treatment of several alignment mechanisms, showing how each emerges as a limit of the more general setup.

  12. Alignment reference device

    DOEpatents

    Patton, Gail Y.; Torgerson, Darrel D.

    1987-01-01

    An alignment reference device provides a collimated laser beam that minimizes angular deviations therein. A laser beam source outputs the beam into a single mode optical fiber. The output end of the optical fiber acts as a source of radiant energy and is positioned at the focal point of a lens system where the focal point is positioned within the lens. The output beam reflects off a mirror back to the lens that produces a collimated beam.

  13. Nuclear reactor alignment plate configuration

    DOEpatents

    Altman, David A; Forsyth, David R; Smith, Richard E; Singleton, Norman R

    2014-01-28

    An alignment plate that is attached to a core barrel of a pressurized water reactor and fits within slots within a top plate of a lower core shroud and upper core plate to maintain lateral alignment of the reactor internals. The alignment plate is connected to the core barrel through two vertically-spaced dowel pins that extend from the outside surface of the core barrel through a reinforcement pad and into corresponding holes in the alignment plate. Additionally, threaded fasteners are inserted around the perimeter of the reinforcement pad and into the alignment plate to further secure the alignment plate to the core barrel. A fillet weld also is deposited around the perimeter of the reinforcement pad. To accomodate thermal growth between the alignment plate and the core barrel, a gap is left above, below and at both sides of one of the dowel pins in the alignment plate holes through with the dowel pins pass.

  14. SPA: A Probabilistic Algorithm for Spliced Alignment

    PubMed Central

    van Nimwegen, Erik; Paul, Nicodeme; Sheridan, Robert; Zavolan, Mihaela

    2006-01-01

    Recent large-scale cDNA sequencing efforts show that elaborate patterns of splice variation are responsible for much of the proteome diversity in higher eukaryotes. To obtain an accurate account of the repertoire of splice variants, and to gain insight into the mechanisms of alternative splicing, it is essential that cDNAs are very accurately mapped to their respective genomes. Currently available algorithms for cDNA-to-genome alignment do not reach the necessary level of accuracy because they use ad hoc scoring models that cannot correctly trade off the likelihoods of various sequencing errors against the probabilities of different gene structures. Here we develop a Bayesian probabilistic approach to cDNA-to-genome alignment. Gene structures are assigned prior probabilities based on the lengths of their introns and exons, and based on the sequences at their splice boundaries. A likelihood model for sequencing errors takes into account the rates at which misincorporation, as well as insertions and deletions of different lengths, occurs during sequencing. The parameters of both the prior and likelihood model can be automatically estimated from a set of cDNAs, thus enabling our method to adapt itself to different organisms and experimental procedures. We implemented our method in a fast cDNA-to-genome alignment program, SPA, and applied it to the FANTOM3 dataset of over 100,000 full-length mouse cDNAs and a dataset of over 20,000 full-length human cDNAs. Comparison with the results of four other mapping programs shows that SPA produces alignments of significantly higher quality. In particular, the quality of the SPA alignments near splice boundaries and SPA's mapping of the 5′ and 3′ ends of the cDNAs are highly improved, allowing for more accurate identification of transcript starts and ends, and accurate identification of subtle splice variations. Finally, our splice boundary analysis on the human dataset suggests the existence of a novel non-canonical splice

  15. Dynamic Alignment at SLS

    SciTech Connect

    Ruland, Robert E.

    2003-04-23

    The relative alignment of components in the storage ring of the Swiss Light Source (SLS) is guaranteed by mechanical means. The magnets are rigidly fixed to 48 girders by means of alignment rails with tolerances of less than {+-}15 {micro}m. The bending magnets, supported by 3 point ball bearings, overlap adjacent girders and thus establish virtual train links between the girders, located near the bending magnet centres. Keeping the distortion of the storage ring geometry within a tolerance of {+-}100 {micro}m in order to guarantee sufficient dynamic apertures, requires continuous monitoring and correction of the girder locations. Two monitoring systems for the horizontal and the vertical direction will be installed to measure displacements of the train link between girders, which are due to ground settings and temperature effects: The hydrostatic levelling system (HLS) gives an absolute vertical reference, while the horizontal positioning system (HPS), which employs low cost linear encoders with sub-micron resolution, measures relative horizontal movements. The girder mover system based on five DC motors per girder allows a dynamic realignment of the storage ring within a working window of more than {+-}1 mm for girder translations and {+-}1 mrad for rotations. We will describe both monitoring systems (HLS and HPS) as well as the applied correction scheme based on the girder movers. We also show simulations indicating that beam based girder alignment takes care of most of the static closed orbit correction.

  16. Docking alignment system

    NASA Technical Reports Server (NTRS)

    Monford, Leo G. (Inventor)

    1990-01-01

    Improved techniques are provided for alignment of two objects. The present invention is particularly suited for three-dimensional translation and three-dimensional rotational alignment of objects in outer space. A camera 18 is fixedly mounted to one object, such as a remote manipulator arm 10 of the spacecraft, while the planar reflective surface 30 is fixed to the other object, such as a grapple fixture 20. A monitor 50 displays in real-time images from the camera, such that the monitor displays both the reflected image of the camera and visible markings on the planar reflective surface when the objects are in proper alignment. The monitor may thus be viewed by the operator and the arm 10 manipulated so that the reflective surface is perpendicular to the optical axis of the camera, the roll of the reflective surface is at a selected angle with respect to the camera, and the camera is spaced a pre-selected distance from the reflective surface.

  17. ASH structure alignment package: Sensitivity and selectivity in domain classification

    PubMed Central

    Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

    2007-01-01

    Background Structure alignment methods offer the possibility of measuring distant evolutionary relationships between proteins that are not visible by sequence-based analysis. However, the question of how structural differences and similarities ought to be quantified in this regard remains open. In this study we construct a training set of sequence-unique CATH and SCOP domains, from which we develop a scoring function that can reliably identify domains with the same CATH topology and SCOP fold classification. The score is implemented in the ASH structure alignment package, for which the source code and a web service are freely available from the PDBj website . Results The new ASH score shows increased selectivity and sensitivity compared with values reported for several popular programs using the same test set of 4,298,905 structure pairs, yielding an area of .96 under the receiver operating characteristic (ROC) curve. In addition, weak sequence homologies between similar domains are revealed that could not be detected by BLAST sequence alignment. Also, a subset of domain pairs is identified that exhibit high similarity, even though their CATH and SCOP classification differs. Finally, we show that the ranking of alignment programs based solely on geometric measures depends on the choice of the quality measure. Conclusion ASH shows high selectivity and sensitivity with regard to domain classification, an important step in defining distantly related protein sequence families. Moreover, the CPU cost per alignment is competitive with the fastest programs, making ASH a practical option for large-scale structure classification studies. PMID:17407606

  18. COS to FGS Alignment {NUV}

    NASA Astrophysics Data System (ADS)

    Hartig, George

    2009-07-01

    DESCRIPTION: In order to determine the location of the COS reference frame with respect to the FGS reference frames, NUV MIRRORA images will be obtained of an astrometric target and field. Astrometric guide stars and targets must be employed for this activity in order to facilitate the alignment wth the FGS. Images will be obtained at the initial pointing and at positions offset in V2 and in V3. Starting with the original blind pointing, obtain MIRRORA image exposures in a 5x5 POS-TARG grid centered on initial pointing; repeat the image sequence at two bracketing focus positions in same visit. Following completion of third pattern, return to nominal focus and perform 5x5 ACQ/SEARCH target acquisition and obtain one TIME-TAG MIRRORA image and one ACCUM verification exposure. Next perform an ACQ/IMAGE target acquisition followed by an ACCUM verification exposure. Also obtain ACCUM verification exposure for each of the two alternate focus positions used previously. Using MIRRORB obtain ACCUM confirmation image at nominal focus and ACCUM images at alternate focus positions and then perform an ACQ/IMAGE and confirming image at nominal focus. Analyze imagery, uplink pointing offset as offset 11469A and adjust nominal focus via patchable constant uplinked with subsequent visit of this program; update aperture locations via modified SIAF file uplinked with subsequent SMS. Use updated focus and offset pointing as input for COS 09 {program 11469 - NUV Optics Alignment and Focus} {note the SIAF update is not a prerequisite for COS 09 to proceed, but the pointing offset and focus update are}.

  19. Sequence logos: a new way to display consensus sequences.

    PubMed Central

    Schneider, T D; Stephens, R M

    1990-01-01

    A graphical method is presented for displaying the patterns in a set of aligned sequences. The characters representing the sequence are stacked on top of each other for each position in the aligned sequences. The height of each letter is made proportional to its frequency, and the letters are sorted so the most common one is on top. The height of the entire stack is then adjusted to signify the information content of the sequences at that position. From these 'sequence logos', one can determine not only the consensus sequence but also the relative frequency of bases and the information content (measured in bits) at every position in a site or sequence. The logo displays both significant residues and subtle sequence patterns. PMID:2172928

  20. SWAMP+: multiple subsequence alignment using associative massive parallelism

    SciTech Connect

    Steinfadt, Shannon Irene; Baker, Johnnie W

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  1. Establishing homologies in protein sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Hunt, L. T.

    1983-01-01

    Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.

  2. Polar cap arcs: Sun-aligned or cusp-aligned?

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Paxton, L. J.; Zhang, Qinghe; Xing, Zanyang

    2016-08-01

    Polar cap arcs are often called sun-aligned arcs. Satellite observations reveal that polar cap arcs join together at the cusp and are actually cusp aligned. Strong ionospheric plasma velocity shears, thus field aligned currents, were associated with polar arcs and they were likely caused by Kelvin-Helmholtz waves around the low-latitude magnetopause under a northward IMF Bz. The magnetic field lines around the magnetopause join together in the cusp region so are the field aligned currents and particle precipitation. This explains why polar arcs are cusp aligned.

  3. Alignment and alignment transition of bent core nematics

    NASA Astrophysics Data System (ADS)

    Elamain, Omaima; Hegde, Gurumurthy; Komitov, Lachezar

    2013-07-01

    We report on the alignment of nematics consisting of bimesogen bent core molecules of chlorine substituent of benzene derivative and their binary mixture with rod like nematics. It was found that the alignment layer made from polyimide material, which is usually used for promoting vertical (homeotropic) alignment of rod like nematics, promotes instead a planar alignment of the bent core nematic and its nematic mixtures. At higher concentration of the rod like nematic component in these mixtures, a temperature driven transition from vertical to planar alignment was found near the transition to isotropic phase.

  4. Alignment as a Teacher Variable

    ERIC Educational Resources Information Center

    Porter, Andrew C.; Smithson, John; Blank, Rolf; Zeidner, Timothy

    2007-01-01

    With the exception of the procedures developed by Porter and colleagues (Porter, 2002), other methods of defining and measuring alignment are essentially limited to alignment between tests and standards. Porter's procedures have been generalized to investigating the alignment between content standards, tests, textbooks, and even classroom…

  5. QuRe: software for viral quasispecies reconstruction from next-generation sequencing data

    PubMed Central

    Prosperi, Mattia C. F.; Salemi, Marco

    2012-01-01

    Summary: Next-generation sequencing (NGS) is an ideal framework for the characterization of highly variable pathogens, with a deep resolution able to capture minority variants. However, the reconstruction of all variants of a viral population infecting a host is a challenging task for genome regions larger than the average NGS read length. QuRe is a program for viral quasispecies reconstruction, specifically developed to analyze long read (>100 bp) NGS data. The software performs alignments of sequence fragments against a reference genome, finds an optimal division of the genome into sliding windows based on coverage and diversity and attempts to reconstruct all the individual sequences of the viral quasispecies—along with their prevalence—using a heuristic algorithm, which matches multinomial distributions of distinct viral variants overlapping across the genome division. QuRe comes with a built-in Poisson error correction method and a post-reconstruction probabilistic clustering, both parameterized on given error rates in homopolymeric and non-homopolymeric regions. Availability: QuRe is platform-independent, multi-threaded software implemented in Java. It is distributed under the GNU General Public License, available at https://sourceforge.net/projects/qure/. Contact: ahnven@yahoo.it; ahnven@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22088846

  6. Recombination-aware alignment of diploid individuals

    PubMed Central

    2014-01-01

    Background Traditionally biological similarity search has been studied under the abstraction of a single string to represent each genome. The more realistic representation of diploid genomes, with two strings defining the genome, has so far been largely omitted in this context. With the development of sequencing techniques and better phasing routines through haplotype assembly algorithms, we are not far from the situation when individual diploid genomes could be represented in their full complexity with a pair-wise alignment defining the genome. Results We propose a generalization of global alignment that is designed to measure similarity between phased predictions of individual diploid genomes. This generalization takes into account that individual diploid genomes evolve through a mutation and recombination process, and that predictions may be erroneous in both dimensions. Even though our model is generic, we focus on the case where one wants to measure only the similarity of genome content allowing free recombination. This results into efficient algorithms for direct application in (i) evaluation of variation calling predictions and (ii) progressive multiple alignments based on labeled directed acyclic graphs (DAGs) to represent profiles. The latter may be of more general interest, in connection to covering alignment of DAGs. Extensions of our model and algorithms can be foreseen to have applications in evaluating phasing algorithms, as well as more fundamental role in phasing child genome based on parent genomes. PMID:25572943

  7. Creating Web-Based Scientific Applications Using Java Servlets

    NASA Technical Reports Server (NTRS)

    Palmer, Grant; Arnold, James O. (Technical Monitor)

    2001-01-01

    There are many advantages to developing web-based scientific applications. Any number of people can access the application concurrently. The application can be accessed from a remote location. The application becomes essentially platform-independent because it can be run from any machine that has internet access and can run a web browser. Maintenance and upgrades to the application are simplified since only one copy of the application exists in a centralized location. This paper details the creation of web-based applications using Java servlets. Java is a powerful, versatile programming language that is well suited to developing web-based programs. A Java servlet provides the interface between the central server and the remote client machines. The servlet accepts input data from the client, runs the application on the server, and sends the output back to the client machine. The type of servlet that supports the HTTP protocol will be discussed in depth. Among the topics the paper will discuss are how to write an http servlet, how the servlet can run applications written in Java and other languages, and how to set up a Java web server. The entire process will be demonstrated by building a web-based application to compute stagnation point heat transfer.

  8. First geodetic measurement of convergence across the Java Trench

    NASA Technical Reports Server (NTRS)

    Tregoning, P.; Brunner, F. K.; Bock, Y.; Puntodewo, S. S. O.; Mccraffrey, R.; Genrich, J. F.; Calais, E.; Rais, J.; Subarya, C.

    1994-01-01

    Convergence across the Java Trench has been estimated for the first time, from annual Global Positioning System (GPS) measurements commencing in 1989. The directions of motion of Christmas and Cocos Island are within 1 deg of that predicted by the No-Net Rotation (NNR) NUVEL-1 plate motion model for the Australian plate although their rates are 25% and 37% less than predcited, respectively. The motion of West Java differs significantly from the NNR NUVEL-1 prediction for the Eurasian plate with a 1 deg difference in direction and a 40% increase in rate. We infer that either West Java moves with a distinct Southeast Asian plate or this region experiences plate margin deformation. The convergence of Christmas Island with respect to West Java is 67 +/- mm/yr in a direction N11 deg E +/- 4 deg which is orthogonal to the trench. The magnitude of convergence agrees well with rescaled NUVEL-1 relative plate model which predicts a value of 71 mm/yr between Australia and Eurasia. The direction of motion matches the direction inferred from earthquake slip vectors at the trench but may be more northerly than the N20 deg E +/- 3 deg predicted by NUVEL-1. On June 2, 1994, almost a year after the last GPS survey, an M(sub W) = 7.5 earthquake with slip vector direction N5 deg occurred south of central Java.

  9. Solar Alignments - Identification and Analysis

    NASA Astrophysics Data System (ADS)

    Belmonte, Juan Antonio

    The sun was such an important divinity in antiquity, and even today, that solar alignments should be expected within a large variety of places and cultures. These are probably the most conspicuous kind of astronomical alignments a field researcher can deal with. The need for a correct identification is thus evident. The different kind of solar phenomena susceptible of being determined by astronomical alignments will be scrutinized, following by the way in which such alignments can materialize in space. It will be shown that analyzing solar alignments is not always an easy task.

  10. Towards Alignment Independent Quantitative Assessment of Homology Detection

    PubMed Central

    Kliger, Yossef

    2006-01-01

    Identification of homologous proteins provides a basis for protein annotation. Sequence alignment tools reliably identify homologs sharing high sequence similarity. However, identification of homologs that share low sequence similarity remains a challenge. Lowering the cutoff value could enable the identification of diverged homologs, but also introduces numerous false hits. Methods are being continuously developed to minimize this problem. Estimation of the fraction of homologs in a set of protein alignments can help in the assessment and development of such methods, and provides the users with intuitive quantitative assessment of protein alignment results. Herein, we present a computational approach that estimates the amount of homologs in a set of protein pairs. The method requires a prevalent and detectable protein feature that is conserved between homologs. By analyzing the feature prevalence in a set of pairwise protein alignments, the method can estimate the number of homolog pairs in the set independently of the alignments' quality. Using the HomoloGene database as a standard of truth, we implemented this approach in a proteome-wide analysis. The results revealed that this approach, which is independent of the alignments themselves, works well for estimating the number of homologous proteins in a wide range of homology values. In summary, the presented method can accompany homology searches and method development, provides validation to search results, and allows tuning of tools and methods. PMID:17205117

  11. Evaluation of microRNA alignment techniques.

    PubMed

    Ziemann, Mark; Kaspi, Antony; El-Osta, Assam

    2016-08-01

    Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164

  12. TSGC and JSC Alignment

    NASA Technical Reports Server (NTRS)

    Sanchez, Humberto

    2013-01-01

    NASA and the SGCs are, by design, intended to work closely together and have synergistic Vision, Mission, and Goals. The TSGC affiliates and JSC have been working together, but not always in a concise, coordinated, nor strategic manner. Today we have a couple of simple ideas to present about how TSGC and JSC have started to work together in a more concise, coordinated, and strategic manner, and how JSC and non-TSG Jurisdiction members have started to collaborate: Idea I: TSGC and JSC Technical Alignment Idea II: Concept of Clusters.

  13. Optimized Java Binary and Virtual Machine for Tiny Motes

    NASA Astrophysics Data System (ADS)

    Aslam, Faisal; Fennell, Luminous; Schindelhauer, Christian; Thiemann, Peter; Ernst, Gidon; Haussmann, Elmar; Rührup, Stefan; Uzmi, Zastash A.

    We have developed TakaTuka, a Java Virtual Machine optimized for tiny embedded devices such as wireless sensor motes. TakaTuka requires very little memory and processing power from the host device. This has been verified by successfully running TakaTuka on four different mote platforms. The focus of this paper is TakaTuka's optimization of program memory usage. In addition, it also gives an overview of TakaTuka's linkage with TinyOS and power management. TakaTuka optimizes storage requirements for the Java classfiles as well as for the JVM interpreter, both of which are expected to be stored on the embedded devices. These optimizations are performed on the desktop computer during the linking phase, before transferring the Java binary and the corresponding JVM interpreter onto a mote and thus without burdening its memory or computation resources. We have compared TakaTuka with the Sentilla, Darjeeling and Squawk JVMs.

  14. Scientific Programming Using Java: A Remote Sensing Example

    NASA Technical Reports Server (NTRS)

    Prados, Don; Mohamed, Mohamed A.; Johnson, Michael; Cao, Changyong; Gasser, Jerry

    1999-01-01

    This paper presents results of a project to port remote sensing code from the C programming language to Java. The advantages and disadvantages of using Java versus C as a scientific programming language in remote sensing applications are discussed. Remote sensing applications deal with voluminous data that require effective memory management, such as buffering operations, when processed. Some of these applications also implement complex computational algorithms, such as Fast Fourier Transformation analysis, that are very performance intensive. Factors considered include performance, precision, complexity, rapidity of development, ease of code reuse, ease of maintenance, memory management, and platform independence. Performance of radiometric calibration code written in Java for the graphical user interface and of using C for the domain model are also presented.

  15. OPUS: A CORBA Pipeline for Java, Python, and Perl Applications

    NASA Astrophysics Data System (ADS)

    Miller, W. W., III; Sontag, C.; Rose, J. F.

    With the introduction of the OPUS CORBA mode, a limited subset of OPUS Applications Programming Interface (OAPI) functionality was cast into CORBA IDL so that both OPUS applications and the Java-based OPUS pipeline managers were able to use the same CORBA infrastructure to access information on blackboards. Exposing even more of the OAPI through CORBA interfaces benefits OPUS applications in similar ways. Those applications not developed in C++ could use CORBA to interact with OPUS facilities directly, providing that a CORBA binding exists for the programming language of choice. Other applications might benefit from running `outside' of the traditional file system-based OPUS environment like the Java managers and, in particular, on platforms not supported by OPUS. The enhancements to OPUS discussed in this paper have been exercised in both Java and Python, and the code for these examples are available on the web.

  16. A Java application for tissue section image analysis.

    PubMed

    Kamalov, R; Guillaud, M; Haskins, D; Harrison, A; Kemp, R; Chiu, D; Follen, M; MacAulay, C

    2005-02-01

    The medical industry has taken advantage of Java and Java technologies over the past few years, in large part due to the language's platform-independence and object-oriented structure. As such, Java provides powerful and effective tools for developing tissue section analysis software. The background and execution of this development are discussed in this publication. Object-oriented structure allows for the creation of "Slide", "Unit", and "Cell" objects to simulate the corresponding real-world objects. Different functions may then be created to perform various tasks on these objects, thus facilitating the development of the software package as a whole. At the current time, substantial parts of the initially planned functionality have been implemented. Getafics 1.0 is fully operational and currently supports a variety of research projects; however, there are certain features of the software that currently introduce unnecessary complexity and inefficiency. In the future, we hope to include features that obviate these problems. PMID:15652632

  17. Alignment of asymmetric-top molecules using multiple-pulse trains

    SciTech Connect

    Pabst, Stefan; Santra, Robin

    2010-06-15

    We theoretically analyze the effectiveness of multiple-pulse laser alignment methods for asymmetric-top molecules. As an example, we choose SO{sub 2} and investigate the alignment dynamics induced by two different sequences, each consisting of four identical laser pulses. Each sequence differs only in the time delay between the pulses. Equally spaced pulses matching the alignment revival of the symmetrized SO{sub 2} rotor model are exploited in the first sequence. The pulse separations in the second sequence are short compared to the rotation dynamics of the molecule and monotonically increase the degree of alignment until the maximum alignment is reached. We point out the significant differences between the alignment dynamics of SO{sub 2} treated as an asymmetric-top and a symmetric-top rotor, respectively. We also explain why the fast sequence of laser pulses creates considerably stronger one-dimensional molecular alignment for asymmetric-top molecules. In addition, we show that multiple-pulse trains with elliptically polarized pulses do not enhance one-dimensional alignment or create three-dimensional alignment.

  18. Alignment of asymetric-top molecules using multiple-pulse trains.

    SciTech Connect

    Pabst, S.; Santra, R.; X-Ray Science Division; Univ. Erlangen-Nuremberg; Univ. of Chicago

    2010-06-07

    We theoretically analyze the effectiveness of multiple-pulse laser alignment methods for asymmetric-top molecules. As an example, we choose SO2 and investigate the alignment dynamics induced by two different sequences, each consisting of four identical laser pulses. Each sequence differs only in the time delay between the pulses. Equally spaced pulses matching the alignment revival of the symmetrized SO2 rotor model are exploited in the first sequence. The pulse separations in the second sequence are short compared to the rotation dynamics of the molecule and monotonically increase the degree of alignment until the maximum alignment is reached. We point out the significant differences between the alignment dynamics of SO2 treated as an asymmetric-top and a symmetric-top rotor, respectively. We also explain why the fast sequence of laser pulses creates considerably stronger one-dimensional molecular alignment for asymmetric-top molecules. In addition, we show that multiple-pulse trains with elliptically polarized pulses do not enhance one-dimensional alignment or create three-dimensional alignment.

  19. Add Java extensions to your wiki: Java applets can bring dynamic functionality to your wiki pages

    SciTech Connect

    Scarberry, Randall E.

    2008-08-12

    Virtually everyone familiar with today’s world wide web has encountered the free online encyclopedia Wikipedia many times. What you may not know is that Wikipedia is driven by an excellent open-source product called MediaWiki which is available to anyone for free. This has led to a proliferation of wiki sites devoted to just about any topic one can imagine. Users of a wiki can add content -- all that is required of them is that they type in their additions into their web browsers using the simple markup language called wikitext. Even better, the developers of wikitext made it extensible. With a little server-side development of your own, you can add your own custom syntax. Users aware of your extensions can then utilize them on their wiki pages with a few simple keystrokes. These extensions can be custom decorations, formatting, web applications, and even instances of the venerable old Java applet. One example of a Java applet extension is the Jmol extension (REF), used to embed a 3-D molecular viewer. This article will walk you through the deployment of a fairly elaborate applet via a MediaWiki extension. By no means exhaustive -- an entire book would be required for that -- it will demonstrate how to give the applet resize handles using using a little Javascript and CSS coding and some popular Javascript libraries. It even describes how a user may customize the extension somewhat using a wiki template. Finally, it explains a rudimentary persistence mechanism which allows applets to save data directly to the wiki pages on which they reside.

  20. The r-Java 2.0 code: nuclear physics

    NASA Astrophysics Data System (ADS)

    Kostka, M.; Koning, N.; Shand, Z.; Ouyed, R.; Jaikumar, P.

    2014-08-01

    Aims: We present r-Java 2.0, a nucleosynthesis code for open use that performs r-process calculations, along with a suite of other analysis tools. Methods: Equipped with a straightforward graphical user interface, r-Java 2.0 is capable of simulating nuclear statistical equilibrium (NSE), calculating r-process abundances for a wide range of input parameters and astrophysical environments, computing the mass fragmentation from neutron-induced fission and studying individual nucleosynthesis processes. Results: In this paper we discuss enhancements to this version of r-Java, especially the ability to solve the full reaction network. The sophisticated fission methodology incorporated in r-Java 2.0 that includes three fission channels (beta-delayed, neutron-induced, and spontaneous fission), along with computation of the mass fragmentation, is compared to the upper limit on mass fission approximation. The effects of including beta-delayed neutron emission on r-process yield is studied. The role of Coulomb interactions in NSE abundances is shown to be significant, supporting previous findings. A comparative analysis was undertaken during the development of r-Java 2.0 whereby we reproduced the results found in the literature from three other r-process codes. This code is capable of simulating the physical environment of the high-entropy wind around a proto-neutron star, the ejecta from a neutron star merger, or the relativistic ejecta from a quark nova. Likewise the users of r-Java 2.0 are given the freedom to define a custom environment. This software provides a platform for comparing proposed r-process sites.