Sample records for sequence analysis program

  1. A DNA sequence analysis package for the IBM personal computer.

    PubMed Central

    Lagrimini, L M; Brentano, S T; Donelson, J E

    1984-01-01

    We present here a collection of DNA sequence analysis programs, called "PC Sequence" (PCS), which are designed to run on the IBM Personal Computer (PC). These programs are written in IBM PC compiled BASIC and take full advantage of the IBM PC's speed, error handling, and graphics capabilities. For a modest initial expense in hardware any laboratory can use these programs to quickly perform computer analysis on DNA sequences. They are written with the novice user in mind and require very little training or previous experience with computers. Also provided are a text editing program for creating and modifying DNA sequence files and a communications program which enables the PC to communicate with and collect information from mainframe computers and DNA sequence databases. PMID:6546433

  2. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  3. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. 'DNA Strider': a 'C' program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers.

    PubMed Central

    Marck, C

    1988-01-01

    DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds. PMID:2832831

  5. Oligo Design: a computer program for development of probes for oligonucleotide microarrays.

    PubMed

    Herold, Keith E; Rasooly, Avraham

    2003-12-01

    Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.

  6. Bellerophon: A program to detect chimeric sequences in multiple sequence alignments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

    2003-12-23

    Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.

  7. Interactive computer programs for the graphic analysis of nucleotide sequence data.

    PubMed Central

    Luckow, V A; Littlewood, R K; Rownd, R H

    1984-01-01

    A group of interactive computer programs have been developed which aid in the collection and graphical analysis of nucleotide and protein sequence data. The programs perform the following basic functions: a) enter, edit, list, and rearrange sequence data; b) permit automatic entry of nucleotide sequence data directly from an autoradiograph into the computer; c) search for restriction sites or other specified patterns and plot a linear or circular restriction map, or print their locations; d) plot base composition; e) analyze homology between sequences by plotting a two-dimensional graphic matrix; and f) aid in plotting predicted secondary structures of RNA molecules. PMID:6546437

  8. Processing and population genetic analysis of multigenic datasets with ProSeq3 software.

    PubMed

    Filatov, Dmitry A

    2009-12-01

    The current tendency in molecular population genetics is to use increasing numbers of genes in the analysis. Here I describe a program for handling and population genetic analysis of DNA polymorphism data collected from multiple genes. The program includes a sequence/alignment editor and an internal relational database that simplify the preparation and manipulation of multigenic DNA polymorphism datasets. The most commonly used DNA polymorphism analyses are implemented in ProSeq3, facilitating population genetic analysis of large multigenic datasets. Extensive input/output options make ProSeq3 a convenient hub for sequence data processing and analysis. The program is available free of charge from http://dps.plants.ox.ac.uk/sequencing/proseq.htm.

  9. TRAP: automated classification, quantification and annotation of tandemly repeated sequences.

    PubMed

    Sobreira, Tiago José P; Durham, Alan M; Gruber, Arthur

    2006-02-01

    TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.

  10. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    PubMed Central

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  11. SEQassembly: A Practical Tools Program for Coding Sequences Splicing

    NASA Astrophysics Data System (ADS)

    Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming

    CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.

  12. Software for Analyzing Sequences of Flow-Related Images

    NASA Technical Reports Server (NTRS)

    Klimek, Robert; Wright, Ted

    2004-01-01

    Spotlight is a computer program for analysis of sequences of images generated in combustion and fluid physics experiments. Spotlight can perform analysis of a single image in an interactive mode or a sequence of images in an automated fashion. The primary type of analysis is tracking of positions of objects over sequences of frames. Features and objects that are typically tracked include flame fronts, particles, droplets, and fluid interfaces. Spotlight automates the analysis of object parameters, such as centroid position, velocity, acceleration, size, shape, intensity, and color. Images can be processed to enhance them before statistical and measurement operations are performed. An unlimited number of objects can be analyzed simultaneously. Spotlight saves results of analyses in a text file that can be exported to other programs for graphing or further analysis. Spotlight is a graphical-user-interface-based program that at present can be executed on Microsoft Windows and Linux operating systems. A version that runs on Macintosh computers is being considered.

  13. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments.

    PubMed

    Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

    2004-09-22

    Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl

  14. The current status and portability of our sequence handling software.

    PubMed Central

    Staden, R

    1986-01-01

    I describe the current status of our sequence analysis software. The package contains a comprehensive suite of programs for managing large shotgun sequencing projects, a program containing 61 functions for analysing single sequences and a program for comparing pairs of sequences for similarity. The programs that have been described before have been improved by the addition of new functions and by being made very much easier to use. The major interactive programs have 125 pages of online help available from within them. Several new programs are described including screen editing of aligned gel readings for shotgun sequencing projects; a method to highlight errors in aligned gel readings, new methods for searching for putative signals in sequences. We use the programs on a VAX computer but the whole package has been rewritten to make it easy to transport it to other machines. I believe the programs will now run on any machine with a FORTRAN77 compiler and sufficient memory. We are currently putting the programs onto an IBM PC XT/AT and another micro running under UNIX. PMID:3511446

  15. Improved programs for DNA and protein sequence analysis on the IBM personal computer and other standard computer systems.

    PubMed Central

    Mount, D W; Conrad, B

    1986-01-01

    We have previously described programs for a variety of types of sequence analysis (1-4). These programs have now been integrated into a single package. They are written in the standard C programming language and run on virtually any computer system with a C compiler, such as the IBM/PC and other computers running under the MS/DOS and UNIX operating systems. The programs are widely distributed and may be obtained from the authors as described below. PMID:3753780

  16. GATA: A graphic alignment tool for comparative sequenceanalysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nix, David A.; Eisen, Michael B.

    2005-01-01

    Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less

  17. Interim reliability-evaluation program: analysis of the Browns Ferry, Unit 1, nuclear plant. Appendix C - sequence quantification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mays, S.E.; Poloski, J.P.; Sullivan, W.H.

    1982-07-01

    This report describes a risk study of the Browns Ferry, Unit 1, nuclear plant. The study is one of four such studies sponsored by the NRC Office of Research, Division of Risk Assessment, as part of its Interim Reliability Evaluation Program (IREP), Phase II. This report is contained in four volumes: a main report and three appendixes. Appendix C generally describes the methods used to estimate accident sequence frequency values. Information is presented concerning the approach, example collection, failure data, candidate dominant sequences, uncertainty analysis, and sensitivity analysis.

  18. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    USGS Publications Warehouse

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  19. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    PubMed

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  20. Object-oriented sequence analysis: SCL--a C++ class library.

    PubMed

    Vahrson, W; Hermann, K; Kleffe, J; Wittig, B

    1996-04-01

    SCL (Sequence Class Library) is a class library written in the C++ programming language. Designed using object-oriented programming principles, SCL consists of classes of objects performing tasks typically needed for analyzing DNA or protein sequences. Among them are very flexible sequence classes, classes accessing databases in various formats, classes managing collections of sequences, as well as classes performing higher-level tasks like calculating a pairwise sequence alignment. SCL also includes classes that provide general programming support, like a dynamically growing array, sets, matrices, strings, classes performing file input/output, and utilities for error handling. By providing these components, SCL fosters an explorative programming style: experimenting with algorithms and alternative implementations is encouraged rather than punished. A description of SCL's overall structure as well as an overview of its classes is given. Important aspects of the work with SCL are discussed in the context of a sample program.

  1. Programming in a Robotics Context in the Kindergarten Classroom: The Impact on Sequencing Skills

    ERIC Educational Resources Information Center

    Kazakoff, Elizabeth; Bers, Marina

    2012-01-01

    This paper examines the impact of computer programming of robots on sequencing ability in early childhood and the relationship between sequencing skills, class size, and teacher's comfort level and experience with technology. Fifty-eight children participated in the study, 54 of whom were included in data analysis. This study was conducted in two…

  2. Illuminator, a desktop program for mutation detection using short-read clonal sequencing.

    PubMed

    Carr, Ian M; Morgan, Joanne E; Diggle, Christine P; Sheridan, Eamonn; Markham, Alexander F; Logan, Clare V; Inglehearn, Chris F; Taylor, Graham R; Bonthron, David T

    2011-10-01

    Current methods for sequencing clonal populations of DNA molecules yield several gigabases of data per day, typically comprising reads of < 100 nt. Such datasets permit widespread genome resequencing and transcriptome analysis or other quantitative tasks. However, this huge capacity can also be harnessed for the resequencing of smaller (gene-sized) target regions, through the simultaneous parallel analysis of multiple subjects, using sample "tagging" or "indexing". These methods promise to have a huge impact on diagnostic mutation analysis and candidate gene testing. Here we describe a software package developed for such studies, offering the ability to resolve pooled samples carrying barcode tags and to align reads to a reference sequence using a mutation-tolerant process. The program, Illuminator, can identify rare sequence variants, including insertions and deletions, and permits interactive data analysis on standard desktop computers. It facilitates the effective analysis of targeted clonal sequencer data without dedicated computational infrastructure or specialized training. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. TCW: Transcriptome Computational Workbench

    PubMed Central

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R.

    2013-01-01

    Background The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. Methodology The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. Conclusion It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw. PMID:23874959

  4. TCW: transcriptome computational workbench.

    PubMed

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R

    2013-01-01

    The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.

  5. JGI Fungal Genomics Program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigoriev, Igor V.

    2011-03-14

    Genomes of energy and environment fungi are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). Its key project, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts), and explores fungal diversity by means of genome sequencing and analysis. Over 50 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functionalmore » genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such 'parts' suggested by comparative genomics and functional analysis in these areas are presented here« less

  6. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    USGS Publications Warehouse

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  7. Supplement to the ICRPG turbulent boundary layer nozzle analysis computer program

    NASA Technical Reports Server (NTRS)

    Omori, S.; Gross, K. W.

    1972-01-01

    A supplement is presented for a turbulent boundary layer nozzle analysis computer program. It describes the program calculation sequence and presents a detailed documentation of each subroutine. Important equations are derived explicitly, and improvements to the program are discussed.

  8. Designing a Bioengine for Detection and Analysis of Base String on an Affected Sequence in High-Concentration Regions

    PubMed Central

    Mandal, Bijoy Kumar; Kim, Tai-hoon

    2013-01-01

    We design an Algorithm for bioengine. As a program are enable optimal alignments searching between two sequences, the host sequence (normal plant) as well as query sequence (virus). Searching for homologues has become a routine operation of biological sequences in 4 × 4 combination with different subsequence (word size). This program takes the advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. There is a main aim which is to detect the overlapping reading frames. This program also enables to find out the highly infected colones selection highest matching region with minimum gap or mismatch zones and unique virus colones matches. This is a small, portable, interactive, front-end program intended to be used to find out the regions of matching between host sequence and query subsequences. All the operations are carried out in fraction of seconds, depending on the required task and on the sequence length. PMID:24000321

  9. Dynamic Assessment of Microbial Ecology (DAME): A web app for interactive analysis and visualization of microbial sequencing data

    USDA-ARS?s Scientific Manuscript database

    Dynamic Assessment of Microbial Ecology (DAME) is a shiny-based web application for interactive analysis and visualization of microbial sequencing data. DAME provides researchers not familiar with R programming the ability to access the most current R functions utilized for ecology and gene sequenci...

  10. Fungal Genomics for Energy and Environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigoriev, Igor V.

    2013-03-11

    Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Sequencing Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for usersmore » to nominate new species for sequencing. Over 200 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.« less

  11. RSAT 2018: regulatory sequence analysis tools 20th anniversary.

    PubMed

    Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

    2018-05-02

    RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  12. Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.

    PubMed

    Bauer, Markus; Klau, Gunnar W; Reinert, Knut

    2007-07-27

    The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input sequences. Our program LARA is freely available for academic purposes from http://www.planet-lisa.net.

  13. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    PubMed

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  14. Program for Editing Spacecraft Command Sequences

    NASA Technical Reports Server (NTRS)

    Gladden, Roy; Waggoner, Bruce; Kordon, Mark; Hashemi, Mahnaz; Hanks, David; Salcedo, Jose

    2006-01-01

    Sequence Translator, Editor, and Expander Resource (STEER) is a computer program that facilitates construction of sequences and blocks of sequences (hereafter denoted generally as sequence products) for commanding a spacecraft. STEER also provides mechanisms for translating among various sequence product types and quickly expanding activities of a given sequence in chronological order for review and analysis of the sequence. To date, construction of sequence products has generally been done by use of such clumsy mechanisms as text-editor programs, translating among sequence product types has been challenging, and expanding sequences to time-ordered lists has involved arduous processes of converting sequence products to "real" sequences and running them through Class-A software (defined, loosely, as flight and ground software critical to a spacecraft mission). Also, heretofore, generating sequence products in standard formats has been troublesome because precise formatting and syntax are required. STEER alleviates these issues by providing a graphical user interface containing intuitive fields in which the user can enter the necessary information. The STEER expansion function provides a "quick and dirty" means of seeing how a sequence and sequence block would expand into a chronological list, without need to use of Class-A software.

  15. Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

    USDA-ARS?s Scientific Manuscript database

    Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...

  16. Fueling the Future with Fungal Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigoriev, Igor V.

    2014-10-27

    Genomes of fungi relevant to energy and environment are in focus of the JGI Fungal Genomic Program. One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts and pathogens) and biorefinery processes (cellulose degradation and sugar fermentation) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Science Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for users to nominate new species for sequencing. Over 400 fungal genomes have beenmore » sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics will lead to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such ‘parts’ suggested by comparative genomics and functional analysis in these areas are presented here.« less

  17. Fungal Genomics Program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigoriev, Igor

    The JGI Fungal Genomics Program aims to scale up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Combining new sequencing technologies and comparative genomics tools, JGI is now leading the world in fungal genome sequencing and analysis. Over 120 sequenced fungal genomes with analytical tools are available via MycoCosm (www.jgi.doe.gov/fungi), a web-portal for fungal biologists. Our model of interacting with user communities, unique among other sequencing centers, helps organize these communities, improves genome annotation and analysis work, and facilitates new larger-scalemore » genomic projects. This resulted in 20 high-profile papers published in 2011 alone and contributing to the Genomics Encyclopedia of Fungi, which targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts). Our next grand challenges include larger scale exploration of fungal diversity (1000 fungal genomes), developing molecular tools for DOE-relevant model organisms, and analysis of complex systems and metagenomes.« less

  18. Prediction and phylogenetic analysis of mammalian short interspersed elements (SINEs).

    PubMed

    Rogozin, I B; Mayorov, V I; Lavrentieva, M V; Milanesi, L; Adkison, L R

    2000-09-01

    The presence of repetitive elements can create serious problems for sequence analysis, especially in the case of homology searches in nucleotide sequence databases. Repetitive elements should be treated carefully by using special programs and databases. In this paper, various aspects of SINE (short interspersed repetitive element) identification, analysis and evolution are discussed.

  19. Introduction on Using the FastPCR Software and the Related Java Web Tools for PCR and Oligonucleotide Assembly and Analysis.

    PubMed

    Kalendar, Ruslan; Tselykh, Timofey V; Khassenov, Bekbolat; Ramanculov, Erlan M

    2017-01-01

    This chapter introduces the FastPCR software as an integrated tool environment for PCR primer and probe design, which predicts properties of oligonucleotides based on experimental studies of the PCR efficiency. The software provides comprehensive facilities for designing primers for most PCR applications and their combinations. These include the standard PCR as well as the multiplex, long-distance, inverse, real-time, group-specific, unique, overlap extension PCR for multi-fragments assembling cloning and loop-mediated isothermal amplification (LAMP). It also contains a built-in program to design oligonucleotide sets both for long sequence assembly by ligase chain reaction and for design of amplicons that tile across a region(s) of interest. The software calculates the melting temperature for the standard and degenerate oligonucleotides including locked nucleic acid (LNA) and other modifications. It also provides analyses for a set of primers with the prediction of oligonucleotide properties, dimer and G/C-quadruplex detection, linguistic complexity as well as a primer dilution and resuspension calculator. The program consists of various bioinformatical tools for analysis of sequences with the GC or AT skew, CG% and GA% content, and the purine-pyrimidine skew. It also analyzes the linguistic sequence complexity and performs generation of random DNA sequence as well as restriction endonucleases analysis. The program allows to find or create restriction enzyme recognition sites for coding sequences and supports the clustering of sequences. It performs efficient and complete detection of various repeat types with visual display. The FastPCR software allows the sequence file batch processing that is essential for automation. The program is available for download at http://primerdigital.com/fastpcr.html , and its online version is located at http://primerdigital.com/tools/pcr.html .

  20. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

    PubMed Central

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611

  1. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  2. Multiply controlled verbal operants: an analysis and extension to the picture exchange communication system.

    PubMed

    Bondy, Andy; Tincani, Matt; Frost, Lori

    2004-01-01

    This paper presents Skinner's (1957) analysis of verbal behavior as a framework for understanding language acquisition in children with autism. We describe Skinner's analysis of pure and impure verbal operants and illustrate how this analysis may be applied to the design of communication training programs. The picture exchange communication system (PECS) is a training program influenced by Skinner's framework. We describe the training sequence associated with PECS and illustrate how this sequence may establish multiply controlled verbal behavior in children with autism. We conclude with an examination of how Skinner's framework may apply to other communication modalities and training strategies.

  3. Elimination sequence optimization for SPAR

    NASA Technical Reports Server (NTRS)

    Hogan, Harry A.

    1986-01-01

    SPAR is a large-scale computer program for finite element structural analysis. The program allows user specification of the order in which the joints of a structure are to be eliminated since this order can have significant influence over solution performance, in terms of both storage requirements and computer time. An efficient elimination sequence can improve performance by over 50% for some problems. Obtaining such sequences, however, requires the expertise of an experienced user and can take hours of tedious effort to affect. Thus, an automatic elimination sequence optimizer would enhance productivity by reducing the analysts' problem definition time and by lowering computer costs. Two possible methods for automating the elimination sequence specifications were examined. Several algorithms based on the graph theory representations of sparse matrices were studied with mixed results. Significant improvement in the program performance was achieved, but sequencing by an experienced user still yields substantially better results. The initial results provide encouraging evidence that the potential benefits of such an automatic sequencer would be well worth the effort.

  4. DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data.

    PubMed

    Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko

    2013-07-01

    Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.

  5. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.

  6. RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences

    PubMed Central

    Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas

    2009-01-01

    Background The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. Methods RetroTector© (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. Results ROL was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10 000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of <= 100 Megabases. Jobs are shown in an IP-number specific list. Results are text files, and can be viewed with the program, RetroTectorViewer.jar (at the same site), which has the full graphical capabilities of the basic ReTe program. A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats. With the current server, a complete analysis of a 1 Megabase sequence is complete in 10 minutes. It is possible to mask nonretroviral repetitive sequences in the submitted sequence, using host genome specific "brooms", which increase specificity. Discussion Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks. Conclusion ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission. PMID:19534753

  7. RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences.

    PubMed

    Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas

    2009-06-16

    The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of

  8. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis

    PubMed Central

    Steele, Joe; Bastola, Dhundy

    2014-01-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base–base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel–Ziv techniques from data compression. PMID:23904502

  9. Timeline Resource Analysis Program (TRAP): User's manual and program document

    NASA Technical Reports Server (NTRS)

    Sessler, J. G.

    1981-01-01

    The Timeline Resource Analysis Program (TRAP), developed for scheduling and timelining problems, is described. Given an activity network, TRAP generates timeline plots, resource histograms, and tabular summaries of the network, schedules, and resource levels. It is written in ANSI FORTRAN for the Honeywell SIGMA 5 computer and operates in the interactive mode using the TEKTRONIX 4014-1 graphics terminal. The input network file may be a standard SIGMA 5 file or one generated using the Interactive Graphics Design System. The timeline plots can be displayed in two orderings: according to the sequence in which the tasks were read on input, and a waterfall sequence in which the tasks are ordered by start time. The input order is especially meaningful when the network consists of several interacting subnetworks. The waterfall sequence is helpful in assessing the project status at any point in time.

  10. VisRseq: R-based visual framework for analysis of sequencing data

    PubMed Central

    2015-01-01

    Background Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. Results We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. Conclusions To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights. PMID:26328469

  11. VisRseq: R-based visual framework for analysis of sequencing data.

    PubMed

    Younesy, Hamid; Möller, Torsten; Lorincz, Matthew C; Karimi, Mohammad M; Jones, Steven J M

    2015-01-01

    Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.

  12. Iterative refinement of structure-based sequence alignments by Seed Extension

    PubMed Central

    Kim, Changhoon; Tai, Chin-Hsien; Lee, Byungkook

    2009-01-01

    Background Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment. Results RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs. Conclusion RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs. PMID:19589133

  13. Minimizing the average distance to a closest leaf in a phylogenetic tree.

    PubMed

    Matsen, Frederick A; Gallagher, Aaron; McCoy, Connor O

    2013-11-01

    When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collection of "query sequences" of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting "reference tree" sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software.

  14. Pulseq-Graphical Programming Interface: Open source visual environment for prototyping pulse sequences and integrated magnetic resonance imaging algorithm development.

    PubMed

    Ravi, Keerthi Sravan; Potdar, Sneha; Poojar, Pavan; Reddy, Ashok Kumar; Kroboth, Stefan; Nielsen, Jon-Fredrik; Zaitsev, Maxim; Venkatesan, Ramesh; Geethanath, Sairam

    2018-03-11

    To provide a single open-source platform for comprehensive MR algorithm development inclusive of simulations, pulse sequence design and deployment, reconstruction, and image analysis. We integrated the "Pulseq" platform for vendor-independent pulse programming with Graphical Programming Interface (GPI), a scientific development environment based on Python. Our integrated platform, Pulseq-GPI, permits sequences to be defined visually and exported to the Pulseq file format for execution on an MR scanner. For comparison, Pulseq files using either MATLAB only ("MATLAB-Pulseq") or Python only ("Python-Pulseq") were generated. We demonstrated three fundamental sequences on a 1.5 T scanner. Execution times of the three variants of implementation were compared on two operating systems. In vitro phantom images indicate equivalence with the vendor supplied implementations and MATLAB-Pulseq. The examples demonstrated in this work illustrate the unifying capability of Pulseq-GPI. The execution times of all the three implementations were fast (a few seconds). The software is capable of user-interface based development and/or command line programming. The tool demonstrated here, Pulseq-GPI, integrates the open-source simulation, reconstruction and analysis capabilities of GPI Lab with the pulse sequence design and deployment features of Pulseq. Current and future work includes providing an ISMRMRD interface and incorporating Specific Absorption Ratio and Peripheral Nerve Stimulation computations. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    PubMed Central

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  16. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  17. Factor Analysis of the HEW National Strategy for Youth Development Model's Community Program Impact Scales.

    ERIC Educational Resources Information Center

    Truckenmiller, James L.

    The former HEW (Health, Education, and Welfare) National Strategy for Youth Development Model proposed a community-based program to promote positive youth development and to prevent delinquency through a sequence of youth needs assessments, needs-targeted programs, and program impact evaluation. HEW Community Program Impact Scales data obtained…

  18. Unified Engineering Software System

    NASA Technical Reports Server (NTRS)

    Purves, L. R.; Gordon, S.; Peltzman, A.; Dube, M.

    1989-01-01

    Collection of computer programs performs diverse functions in prototype engineering. NEXUS, NASA Engineering Extendible Unified Software system, is research set of computer programs designed to support full sequence of activities encountered in NASA engineering projects. Sequence spans preliminary design, design analysis, detailed design, manufacturing, assembly, and testing. Primarily addresses process of prototype engineering, task of getting single or small number of copies of product to work. Written in FORTRAN 77 and PROLOG.

  19. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  20. A DMAP Program for the Selection of Accelerometer Locations in MSC/NASTRAN

    NASA Technical Reports Server (NTRS)

    Peck, Jeff; Torres, Isaias

    2004-01-01

    A new program for selecting sensor locations has been written in the DMAP (Direct Matrix Abstraction Program) language of MSC/NASTRAN. The program implements the method of Effective Independence for selecting sensor locations, and is executed within a single NASTRAN analysis as a "rigid format alter" to the normal modes solution sequence (SOL 103). The user of the program is able to choose among various analysis options using Case Control and Bulk Data entries. Algorithms tailored for the placement of both uni-axial and tri- axial accelerometers are available, as well as several options for including the model s mass distribution into the calculations. Target modes for the Effective Independence analysis are selected from the MSC/NASTRAN ASET modes calculated by the "SOL 103" solution sequence. The initial candidate sensor set is also under user control, and is selected from the ASET degrees of freedom. Analysis results are printed to the MSCINASTRAN output file (*.f06), and may include the current candidate sensors set, and their associated Effective Independence distribution, at user specified iteration intervals. At the conclusion of the analysis, the model is reduced to the final sensor set, and frequencies and orthogonality checks are printed. Example results are given for a pre-test analysis of NASA s five-segment solid rocket booster modal test.

  1. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

    PubMed

    Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.

  2. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

    PubMed Central

    Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204

  3. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    PubMed

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  4. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    PubMed Central

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  5. Promoter classifier: software package for promoter database analysis.

    PubMed

    Gershenzon, Naum I; Ioshikhes, Ilya P

    2005-01-01

    Promoter Classifier is a package of seven stand-alone Windows-based C++ programs allowing the following basic manipulations with a set of promoter sequences: (i) calculation of positional distributions of nucleotides averaged over all promoters of the dataset; (ii) calculation of the averaged occurrence frequencies of the transcription factor binding sites and their combinations; (iii) division of the dataset into subsets of sequences containing or lacking certain promoter elements or combinations; (iv) extraction of the promoter subsets containing or lacking CpG islands around the transcription start site; and (v) calculation of spatial distributions of the promoter DNA stacking energy and bending stiffness. All programs have a user-friendly interface and provide the results in a convenient graphical form. The Promoter Classifier package is an effective tool for various basic manipulations with eukaryotic promoter sequences that usually are necessary for analysis of large promoter datasets. The program Promoter Divider is described in more detail as a representative component of the package.

  6. WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

    PubMed

    Sharma, Parichit; Mantri, Shrikant S

    2014-01-01

    The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.

  7. WImpiBLAST: Web Interface for mpiBLAST to Help Biologists Perform Large-Scale Annotation Using High Performance Computing

    PubMed Central

    Sharma, Parichit; Mantri, Shrikant S.

    2014-01-01

    The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410

  8. DnaSAM: Software to perform neutrality testing for large datasets with complex null models.

    PubMed

    Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B

    2010-05-01

    Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.

  9. Using HIV Sequence and Epidemiologic Data to Assess the Effect of Self-referral Testing for Acute HIV Infection on Incident Diagnoses in San Diego, California

    PubMed Central

    Mehta, Sanjay R.; Murrell, Ben; Anderson, Christy M.; Kosakovsky Pond, Sergei L.; Wertheim, Joel O.; Young, Jason A.; Freitas, Lorri; Richman, Douglas D.; Mathews, W. Chris; Scheffler, Konrad; Little, Susan J.; Smith, Davey M.

    2016-01-01

    Background. Because recently infected individuals disproportionately contribute to the spread of human immunodeficiency virus (HIV), we evaluated the impact of a primary HIV screening program (the Early Test) implemented in San Diego. Methods. The Early Test program used combined nucleic acid and serology testing to screen for primary infection targeting local high-risk individuals. Epidemiologic, HIV sequence, and geographic data were obtained from the San Diego County Department of Public Health and the Early Test program. Poisson regression analysis was performed to determine whether the Early Test program was temporally and geographically associated with changes in incident HIV diagnoses. Transmission chains were inferred by phylogenetic analysis of sequence data. Results. Over time, a decrease in incident HIV diagnoses was observed proportional to the number primary HIV infections diagnosed in each San Diego region (P < .001). Molecular network analyses also showed that transmission chains were more likely to terminate in regions where the program was marketed (P = .002). Although, individuals in these zip codes had infection diagnosed earlier (P = .08), they were not treated earlier (P = .83). Conclusions. These findings suggests that early HIV diagnoses by this primary infection screening program probably contributed to the observed decrease in new HIV diagnoses in San Diego, and they support the expansion and evaluation of similar programs. PMID:27174704

  10. SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments.

    PubMed

    Ajawatanawong, Pravech; Atkinson, Gemma C; Watson-Haigh, Nathan S; Mackenzie, Bryony; Baldauf, Sandra L

    2012-07-01

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

  11. Computing Lives And Reliabilities Of Turboprop Transmissions

    NASA Technical Reports Server (NTRS)

    Coy, J. J.; Savage, M.; Radil, K. C.; Lewicki, D. G.

    1991-01-01

    Computer program PSHFT calculates lifetimes of variety of aircraft transmissions. Consists of main program, series of subroutines applying to specific configurations, generic subroutines for analysis of properties of components, subroutines for analysis of system, and common block. Main program selects routines used in analysis and causes them to operate in desired sequence. Series of configuration-specific subroutines put in configuration data, perform force and life analyses for components (with help of generic component-property-analysis subroutines), fill property array, call up system-analysis routines, and finally print out results of analysis for system and components. Written in FORTRAN 77(IV).

  12. 10 CFR 70.62 - Safety program and integrated safety analysis.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ...; (iv) Potential accident sequences caused by process deviations or other events internal to the... of occurrence of each potential accident sequence identified pursuant to paragraph (c)(1)(iv) of this... have experience in nuclear criticality safety, radiation safety, fire safety, and chemical process...

  13. 10 CFR 70.62 - Safety program and integrated safety analysis.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ...; (iv) Potential accident sequences caused by process deviations or other events internal to the... of occurrence of each potential accident sequence identified pursuant to paragraph (c)(1)(iv) of this... have experience in nuclear criticality safety, radiation safety, fire safety, and chemical process...

  14. 10 CFR 70.62 - Safety program and integrated safety analysis.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ...; (iv) Potential accident sequences caused by process deviations or other events internal to the... of occurrence of each potential accident sequence identified pursuant to paragraph (c)(1)(iv) of this... have experience in nuclear criticality safety, radiation safety, fire safety, and chemical process...

  15. PHASTpep: Analysis Software for Discovery of Cell-Selective Peptides via Phage Display and Next-Generation Sequencing

    PubMed Central

    Dasa, Siva Sai Krishna; Kelly, Kimberly A.

    2016-01-01

    Next-generation sequencing has enhanced the phage display process, allowing for the quantification of millions of sequences resulting from the biopanning process. In response, many valuable analysis programs focused on specificity and finding targeted motifs or consensus sequences were developed. For targeted drug delivery and molecular imaging, it is also necessary to find peptides that are selective—targeting only the cell type or tissue of interest. We present a new analysis strategy and accompanying software, PHage Analysis for Selective Targeted PEPtides (PHASTpep), which identifies highly specific and selective peptides. Using this process, we discovered and validated, both in vitro and in vivo in mice, two sequences (HTTIPKV and APPIMSV) targeted to pancreatic cancer-associated fibroblasts that escaped identification using previously existing software. Our selectivity analysis makes it possible to discover peptides that target a specific cell type and avoid other cell types, enhancing clinical translatability by circumventing complications with systemic use. PMID:27186887

  16. Microbial community analysis using MEGAN.

    PubMed

    Huson, Daniel H; Weber, Nico

    2013-01-01

    Metagenomics, the study of microbes in the environment using DNA sequencing, depends upon dedicated software tools for processing and analyzing very large sequencing datasets. One such tool is MEGAN (MEtaGenome ANalyzer), which can be used to interactively analyze and compare metagenomic and metatranscriptomic data, both taxonomically and functionally. To perform a taxonomic analysis, the program places the reads onto the NCBI taxonomy, while functional analysis is performed by mapping reads to the SEED, COG, and KEGG classifications. Samples can be compared taxonomically and functionally, using a wide range of different charting and visualization techniques. PCoA analysis and clustering methods allow high-level comparison of large numbers of samples. Different attributes of the samples can be captured and used within analysis. The program supports various input formats for loading data and can export analysis results in different text-based and graphical formats. The program is designed to work with very large samples containing many millions of reads. It is written in Java and installers for the three major computer operating systems are available from http://www-ab.informatik.uni-tuebingen.de. © 2013 Elsevier Inc. All rights reserved.

  17. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    PubMed

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  18. HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

    PubMed Central

    David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  19. Analysis of simulated image sequences from sensors for restricted-visibility operations

    NASA Technical Reports Server (NTRS)

    Kasturi, Rangachar

    1991-01-01

    A real time model of the visible output from a 94 GHz sensor, based on a radiometric simulation of the sensor, was developed. A sequence of images as seen from an aircraft as it approaches for landing was simulated using this model. Thirty frames from this sequence of 200 x 200 pixel images were analyzed to identify and track objects in the image using the Cantata image processing package within the visual programming environment provided by the Khoros software system. The image analysis operations are described.

  20. Integrative analysis of environmental sequences using MEGAN4.

    PubMed

    Huson, Daniel H; Mitra, Suparna; Ruscheweyh, Hans-Joachim; Weber, Nico; Schuster, Stephan C

    2011-09-01

    A major challenge in the analysis of environmental sequences is data integration. The question is how to analyze different types of data in a unified approach, addressing both the taxonomic and functional aspects. To facilitate such analyses, we have substantially extended MEGAN, a widely used taxonomic analysis program. The new program, MEGAN4, provides an integrated approach to the taxonomic and functional analysis of metagenomic, metatranscriptomic, metaproteomic, and rRNA data. While taxonomic analysis is performed based on the NCBI taxonomy, functional analysis is performed using the SEED classification of subsystems and functional roles or the KEGG classification of pathways and enzymes. A number of examples illustrate how such analyses can be performed, and show that one can also import and compare classification results obtained using others' tools. MEGAN4 is freely available for academic purposes, and installers for all three major operating systems can be downloaded from www-ab.informatik.uni-tuebingen.de/software/megan.

  1. Physical-chemical property based sequence motifs and methods regarding same

    DOEpatents

    Braun, Werner [Friendswood, TX; Mathura, Venkatarajan S [Sarasota, FL; Schein, Catherine H [Friendswood, TX

    2008-09-09

    A data analysis system, program, and/or method, e.g., a data mining/data exploration method, using physical-chemical property motifs. For example, a sequence database may be searched for identifying segments thereof having physical-chemical properties similar to the physical-chemical property motifs.

  2. Implementation of Quality Management in Core Service Laboratories

    PubMed Central

    Creavalle, T.; Haque, K.; Raley, C.; Subleski, M.; Smith, M.W.; Hicks, B.

    2010-01-01

    CF-28 The Genetics and Genomics group of the Advanced Technology Program of SAIC-Frederick exists to bring innovative genomic expertise, tools and analysis to NCI and the scientific community. The Sequencing Facility (SF) provides next generation short read (Illumina) sequencing capacity to investigators using a streamlined production approach. The Laboratory of Molecular Technology (LMT) offers a wide range of genomics core services including microarray expression analysis, miRNA analysis, array comparative genome hybridization, long read (Roche) next generation sequencing, quantitative real time PCR, transgenic genotyping, Sanger sequencing, and clinical mutation detection services to investigators from across the NIH. As the technology supporting this genomic research becomes more complex, the need for basic quality processes within all aspects of the core service groups becomes critical. The Quality Management group works alongside members of these labs to establish or improve processes supporting operations control (equipment, reagent and materials management), process improvement (reengineering/optimization, automation, acceptance criteria for new technologies and tech transfer), and quality assurance and customer support (controlled documentation/SOPs, training, service deficiencies and continual improvement efforts). Implementation and expansion of quality programs within unregulated environments demonstrates SAIC-Frederick's dedication to providing the highest quality products and services to the NIH community.

  3. "First generation" automated DNA sequencing technology.

    PubMed

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  4. High-throughput sequence alignment using Graphics Processing Units

    PubMed Central

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-01-01

    Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356

  5. Analysis of loss of decay-heat-removal sequences at Browns Ferry Unit One

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harrington, R.M.

    1983-01-01

    This paper summarizes the Oak Ridge National Laboratory (ORNL) report Loss of DHR Sequences at Browns Ferry Unit One - Accident Sequence Analysis (NUREG/CR-2973). The Loss of DHR investigation is the third in a series of accident studies concerning the BWR 4 - MK I containment plant design. These studies, sponsored by the Nuclear Regulatory Commission Severe Accident Sequence Analysis (SASA) program, have been conducted at ORNL with the full cooperation of the Tennessee Valley Authority (TVA). The purpose of the SASA studies is to predetermine the probable course of postulated severe accidents so as to establish the timing andmore » the sequence of events. The SASA studies also produce recommendations concerning the implementation of better system design and better emergency operating instructions and operator training. The ORNL studies also include a detailed, best-estimate calculation of the release and transport of radioactive fission products following postulated severe accidents.« less

  6. ISSYS: An integrated synergistic Synthesis System

    NASA Technical Reports Server (NTRS)

    Dovi, A. R.

    1980-01-01

    Integrated Synergistic Synthesis System (ISSYS), an integrated system of computer codes in which the sequence of program execution and data flow is controlled by the user, is discussed. The commands available to exert such control, the ISSYS major function and rules, and the computer codes currently available in the system are described. Computational sequences frequently used in the aircraft structural analysis and synthesis are defined. External computer codes utilized by the ISSYS system are documented. A bibliography on the programs is included.

  7. Using HIV Sequence and Epidemiologic Data to Assess the Effect of Self-referral Testing for Acute HIV Infection on Incident Diagnoses in San Diego, California.

    PubMed

    Mehta, Sanjay R; Murrell, Ben; Anderson, Christy M; Kosakovsky Pond, Sergei L; Wertheim, Joel O; Young, Jason A; Freitas, Lorri; Richman, Douglas D; Mathews, W Chris; Scheffler, Konrad; Little, Susan J; Smith, Davey M

    2016-07-01

    Because recently infected individuals disproportionately contribute to the spread of human immunodeficiency virus (HIV), we evaluated the impact of a primary HIV screening program (the Early Test) implemented in San Diego. The Early Test program used combined nucleic acid and serology testing to screen for primary infection targeting local high-risk individuals. Epidemiologic, HIV sequence, and geographic data were obtained from the San Diego County Department of Public Health and the Early Test program. Poisson regression analysis was performed to determine whether the Early Test program was temporally and geographically associated with changes in incident HIV diagnoses. Transmission chains were inferred by phylogenetic analysis of sequence data. Over time, a decrease in incident HIV diagnoses was observed proportional to the number primary HIV infections diagnosed in each San Diego region (P < .001). Molecular network analyses also showed that transmission chains were more likely to terminate in regions where the program was marketed (P = .002). Although, individuals in these zip codes had infection diagnosed earlier (P = .08), they were not treated earlier (P = .83). These findings suggests that early HIV diagnoses by this primary infection screening program probably contributed to the observed decrease in new HIV diagnoses in San Diego, and they support the expansion and evaluation of similar programs. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.

  8. A Correlational Analysis of the Effects of Learner and Linear Programming Characteristics on Learning Programmed Instruction. Final Report.

    ERIC Educational Resources Information Center

    Seibert, Warren F.; Reid, Christopher J.

    Learning and retention may be influenced by subtle instructional stimulus characteristics and certain visual memory aptitudes. Ten stimulus characteristics were chosen for study; 50 sequences of programed instructional material were specially written to conform to sampled values of each stimulus characteristic. Seventy-three freshman subjects…

  9. Programmers manual for static and dynamic reusable surface insulation stresses (resist)

    NASA Technical Reports Server (NTRS)

    Ogilvie, P. L.; Levy, A.; Austin, F.; Ojalvo, I. U.

    1974-01-01

    Programming information for the RESIST program for the dynamic and thermal stress analysis of the space shuttle surface insulation is presented. The overall flow chart of the program, overlay chart, data set allocation, and subprogram calling sequence are given along with a brief description of the individual subprograms and typical subprogram output.

  10. Discrepancy Analysis and Continuity Matrix: Tools for Measuring the Impact of Inservice Training.

    ERIC Educational Resources Information Center

    Kite, R. Hayman

    Within an inservice training program there is a functional interdependent relationship among problems, causes, and solutions. During a sequence of eight steps to ascertain program impact, a "continuity matrix", a management technique that assists in dealing with the problem/solution paradox is created. A successful training program must: (1) aim…

  11. Aircraft stress sequence development: A complex engineering process made simple

    NASA Technical Reports Server (NTRS)

    Schrader, K. H.; Butts, D. G.; Sparks, W. A.

    1994-01-01

    Development of stress sequences for critical aircraft structure requires flight measured usage data, known aircraft loads, and established relationships between aircraft flight loads and structural stresses. Resulting cycle-by-cycle stress sequences can be directly usable for crack growth analysis and coupon spectra tests. Often, an expert in loads and spectra development manipulates the usage data into a typical sequence of representative flight conditions for which loads and stresses are calculated. For a fighter/trainer type aircraft, this effort is repeated many times for each of the fatigue critical locations (FCL) resulting in expenditure of numerous engineering hours. The Aircraft Stress Sequence Computer Program (ACSTRSEQ), developed by Southwest Research Institute under contract to San Antonio Air Logistics Center, presents a unique approach for making complex technical computations in a simple, easy to use method. The program is written in Microsoft Visual Basic for the Microsoft Windows environment.

  12. For the Love of Statistics: Appreciating and Learning to Apply Experimental Analysis and Statistics through Computer Programming Activities

    ERIC Educational Resources Information Center

    Mascaró, Maite; Sacristán, Ana Isabel; Rufino, Marta M.

    2016-01-01

    For the past 4 years, we have been involved in a project that aims to enhance the teaching and learning of experimental analysis and statistics, of environmental and biological sciences students, through computational programming activities (using R code). In this project, through an iterative design, we have developed sequences of R-code-based…

  13. Genomic Encyclopedia of Fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grigoriev, Igor

    Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). Its key project, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts), and explores fungal diversity by means of genome sequencing and analysis. Over 150 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supportedmore » by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.« less

  14. CRITICA: coding region identification tool invoking comparative analysis

    NASA Technical Reports Server (NTRS)

    Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

    1999-01-01

    Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).

  15. Integrated databanks access and sequence/structure analysis services at the PBIL.

    PubMed

    Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

    2003-07-01

    The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.

  16. DNATagger, colors for codons.

    PubMed

    Scherer, N M; Basso, D M

    2008-09-16

    DNATagger is a web-based tool for coloring and editing DNA, RNA and protein sequences and alignments. It is dedicated to the visualization of protein coding sequences and also protein sequence alignments to facilitate the comprehension of evolutionary processes in sequence analysis. The distinctive feature of DNATagger is the use of codons as informative units for coloring DNA and RNA sequences. The codons are colored according to their corresponding amino acids. It is the first program that colors codons in DNA sequences without being affected by "out-of-frame" gaps of alignments. It can handle single gaps and gaps inside the triplets. The program also provides the possibility to edit the alignments and change color patterns and translation tables. DNATagger is a JavaScript application, following the W3C guidelines, designed to work on standards-compliant web browsers. It therefore requires no installation and is platform independent. The web-based DNATagger is available as free and open source software at http://www.inf.ufrgs.br/~dmbasso/dnatagger/.

  17. FOUNTAIN: A JAVA open-source package to assist large sequencing projects

    PubMed Central

    Buerstedde, Jean-Marie; Prill, Florian

    2001-01-01

    Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214

  18. Elements of Mathematics, Book O: Intuitive Background. Chapter 1, Operational Systems.

    ERIC Educational Resources Information Center

    Exner, Robert; And Others

    The sixteen chapters of this book provide the core material for the Elements of Mathematics Program, a secondary sequence developed for highly motivated students with strong verbal abilities. The sequence is based on a functional-relational approach to mathematics teaching, and emphasizes teaching by analysis of real-life situations. This text is…

  19. Elements of Mathematics, Book O: Intuitive Background. Chapter 5, Mappings.

    ERIC Educational Resources Information Center

    Exner, Robert; And Others

    The sixteen chapters of this book provide the core material for the Elements of Mathematics Program, a secondary sequence developed for highly motivated students with strong verbal abilities. The sequence is based on a functional-relational approach to mathematics teaching, and emphasizes teaching by analysis of real-life situations. This text is…

  20. Teaching Research Methodology Using a Project-Based Three Course Sequence Critical Reflections on Practice

    ERIC Educational Resources Information Center

    Braguglia, Kay H.; Jackson, Kanata A.

    2012-01-01

    This article presents a reflective analysis of teaching research methodology through a three course sequence using a project-based approach. The authors reflect critically on their experiences in teaching research methods courses in an undergraduate business management program. The introduction of a range of specific techniques including student…

  1. Elements of Mathematics, Book O: Intuitive Background. Chapter 2, The Integers.

    ERIC Educational Resources Information Center

    Exner, Robert; And Others

    The sixteen chapters of this book provide the core materials for the Elements of Mathematics Program, a secondary sequence developed for highly motivated students with strong verbal abilities. The sequence is based on a functional-relational approach to mathematics teaching, and emphasizes teaching by analysis of real-life situations. This text is…

  2. Managing complex processing of medical image sequences by program supervision techniques

    NASA Astrophysics Data System (ADS)

    Crubezy, Monica; Aubry, Florent; Moisan, Sabine; Chameroy, Virginie; Thonnat, Monique; Di Paola, Robert

    1997-05-01

    Our objective is to offer clinicians wider access to evolving medical image processing (MIP) techniques, crucial to improve assessment and quantification of physiological processes, but difficult to handle for non-specialists in MIP. Based on artificial intelligence techniques, our approach consists in the development of a knowledge-based program supervision system, automating the management of MIP libraries. It comprises a library of programs, a knowledge base capturing the expertise about programs and data and a supervision engine. It selects, organizes and executes the appropriate MIP programs given a goal to achieve and a data set, with dynamic feedback based on the results obtained. It also advises users in the development of new procedures chaining MIP programs.. We have experimented the approach for an application of factor analysis of medical image sequences as a means of predicting the response of osteosarcoma to chemotherapy, with both MRI and NM dynamic image sequences. As a result our program supervision system frees clinical end-users from performing tasks outside their competence, permitting them to concentrate on clinical issues. Therefore our approach enables a better exploitation of possibilities offered by MIP and higher quality results, both in terms of robustness and reliability.

  3. SOBA: sequence ontology bioinformatics analysis.

    PubMed

    Moore, Barry; Fan, Guozhen; Eilbeck, Karen

    2010-07-01

    The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.

  4. Visualization of Concurrent Program Executions

    NASA Technical Reports Server (NTRS)

    Artho, Cyrille; Havelund, Klaus; Honiden, Shinichi

    2007-01-01

    Various program analysis techniques are efficient at discovering failures and properties. However, it is often difficult to evaluate results, such as program traces. This calls for abstraction and visualization tools. We propose an approach based on UML sequence diagrams, addressing shortcomings of such diagrams for concurrency. The resulting visualization is expressive and provides all the necessary information at a glance.

  5. Designing and Deploying Programming Courses: Strategies, Tools, Difficulties and Pedagogy

    ERIC Educational Resources Information Center

    Xinogalos, Stelios

    2016-01-01

    Designing and deploying programming courses is undoubtedly a challenging task. In this paper, an attempt to analyze important aspects of a sequence of two courses on imperative-procedural and object-oriented programming in a non-CS majors Department is made. This analysis is based on a questionnaire filled in by fifty students in a voluntary…

  6. Activity Catalog Tool (ACT) user manual, version 2.0

    NASA Technical Reports Server (NTRS)

    Segal, Leon D.; Andre, Anthony D.

    1994-01-01

    This report comprises the user manual for version 2.0 of the Activity Catalog Tool (ACT) software program, developed by Leon D. Segal and Anthony D. Andre in cooperation with NASA Ames Aerospace Human Factors Research Division, FLR branch. ACT is a software tool for recording and analyzing sequences of activity over time that runs on the Macintosh platform. It was designed as an aid for professionals who are interested in observing and understanding human behavior in field settings, or from video or audio recordings of the same. Specifically, the program is aimed at two primary areas of interest: human-machine interactions and interactions between humans. The program provides a means by which an observer can record an observed sequence of events, logging such parameters as frequency and duration of particular events. The program goes further by providing the user with a quantified description of the observed sequence, through application of a basic set of statistical routines, and enables merging and appending of several files and more extensive analysis of the resultant data.

  7. WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches

    PubMed Central

    Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest

    2007-01-01

    WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794

  8. DraGnET: Software for storing, managing and analyzing annotated draft genome sequence data

    PubMed Central

    2010-01-01

    Background New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available. Results To address these limitations, we have developed DraGnET (Draft Genome Evaluation Tool). DraGnET is an open source web application which allows researchers, with no experience in programming and database management, to setup their own in-house projects for storing, retrieving, organizing and managing annotated draft and complete genome sequence data. The software provides a web interface for the use of BLAST, allowing users to perform preliminary comparative analysis among multiple genomes. We demonstrate the utility of DraGnET for performing comparative genomics on closely related bacterial strains. Furthermore, DraGnET can be further developed to incorporate additional tools for more sophisticated analyses. Conclusions DraGnET is designed for use either by individual researchers or as a collaborative tool available through Internet (or Intranet) deployment. For genome projects that require genome sequencing data to initially remain proprietary, DraGnET provides the means for researchers to keep their data in-house for analysis using local programs or until it is made publicly available, at which point it may be uploaded to additional analysis software applications. The DraGnET home page is available at http://www.dragnet.cvm.iastate.edu and includes example files for examining the functionalities, a link for downloading the DraGnET setup package and a link to the DraGnET source code hosted with full documentation on SourceForge. PMID:20175920

  9. Small Bodies, Big Concepts: Bringing Visual Analysis into the Middle School Classroom

    NASA Astrophysics Data System (ADS)

    Cobb, W. H.; Lebofsky, L. A.; Ristvey, J. D.; Buxner, S.; Weeks, S.; Zolensky, M. E.

    2012-03-01

    Multi-disciplinary PD model digs into high-end planetary science backed by a pedagogical framework, Designing Effective Science Instruction. NASA activities are sequenced to promote visual analysis of emerging data from Discovery Program missions.

  10. Protein domain analysis of genomic sequence data reveals regulation of LRR related domains in plant transpiration in Ficus.

    PubMed

    Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K

    2014-01-01

    Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.

  11. Elements of Mathematics, Book O: Intuitive Background. Chapter 14, Geometry: Similitudes, Coordinates, and Trigonometry.

    ERIC Educational Resources Information Center

    Exner, Robert; And Others

    The sixteen chapters of this book provide the core material for the Elements of Mathematics Program, a secondary sequence developed for highly motivated students with strong verbal abilities. The sequence is based on a functional-relational approach to mathematics teaching, and emphasizes teaching by analysis of real-life situations. This text is…

  12. Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    PubMed Central

    Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

    2011-01-01

    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875

  13. Program Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  14. Decision-Making Theory Applied to Architectural Programming: Some Research Implications.

    ERIC Educational Resources Information Center

    Green, Meg

    The implications of delineating and determining the sequence of programming decisions are shown in the selection of building committee membership. The role relationships of client and architect are discussed in terms of decision-making function. Decision tables are described as aids in problem analysis. Other topics include information and…

  15. An Interactive Multiobjective Programming Approach to Combinatorial Data Analysis.

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Stahl, Stephanie

    2001-01-01

    Describes an interactive procedure for multiobjective asymmetric unidimensional seriation problems that uses a dynamic-programming algorithm to generate partially the efficient set of sequences for small to medium-sized problems and a multioperational heuristic to estimate the efficient set for larger problems. Applies the procedure to an…

  16. Update on Rover Sequencing and Visualization Program

    NASA Technical Reports Server (NTRS)

    Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

    2005-01-01

    The Rover Sequencing and Visualization Program (RSVP) has been updated. RSVP was reported in Rover Sequencing and Visualization Program (NPO-30845), NASA Tech Briefs, Vol. 29, No. 4 (April 2005), page 38. To recapitulate: The Rover Sequencing and Visualization Program (RSVP) is the software tool to be used in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (robotic arm) operations. Additionally, RoSE and HyperDrive together evaluate command sequences for potential violations of flight and safety rules. The products of RSVP include command sequences for uplink that are stored in the Distributed Object Manager (DOM) and predicted rover state histories stored in the OSS for comparison and validation of downlinked telemetry. The majority of components comprising RSVP utilize the MER command and activity dictionaries to automatically customize the system for MER activities.

  17. Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

    PubMed Central

    Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

    2003-01-01

    To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979

  18. Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing

    PubMed Central

    Gibson, Joel F.; Shokralla, Shadi; Curry, Colin; Baird, Donald J.; Monk, Wendy A.; King, Ian; Hajibabaei, Mehrdad

    2015-01-01

    Biodiversity metrics are critical for assessment and monitoring of ecosystems threatened by anthropogenic stressors. Existing sorting and identification methods are too expensive and labour-intensive to be scaled up to meet management needs. Alternately, a high-throughput DNA sequencing approach could be used to determine biodiversity metrics from bulk environmental samples collected as part of a large-scale biomonitoring program. Here we show that both morphological and DNA sequence-based analyses are suitable for recovery of individual taxonomic richness, estimation of proportional abundance, and calculation of biodiversity metrics using a set of 24 benthic samples collected in the Peace-Athabasca Delta region of Canada. The high-throughput sequencing approach was able to recover all metrics with a higher degree of taxonomic resolution than morphological analysis. The reduced cost and increased capacity of DNA sequence-based approaches will finally allow environmental monitoring programs to operate at the geographical and temporal scale required by industrial and regulatory end-users. PMID:26488407

  19. A Web interface generator for molecular biology programs in Unix.

    PubMed

    Letondal, C

    2001-01-01

    Almost all users encounter problems using sequence analysis programs. Not only are they difficult to learn because of the parameters, syntax and semantic, but many are different. That is why we have developed a Web interface generator for more than 150 molecular biology command-line driven programs, including: phylogeny, gene prediction, alignment, RNA, DNA and protein analysis, motif discovery, structure analysis and database searching programs. The generator uses XML as a high-level description language of the legacy software parameters. Its aim is to provide users with the equivalent of a basic Unix environment, with program combination, customization and basic scripting through macro registration. The program has been used for three years by about 15000 users throughout the world; it has recently been installed on other sites and evaluated as a standard user interface for EMBOSS programs.

  20. Genetic diversity assessment of anoxygenic photosynthetic bacteria by distance-based grouping analysis of pufM sequences.

    PubMed

    Zeng, Y H; Chen, X H; Jiao, N Z

    2007-12-01

    To assess how completely the diversity of anoxygenic phototrophic bacteria (APB) was sampled in natural environments. All nucleotide sequences of the APB marker gene pufM from cultures and environmental clones were retrieved from the GenBank database. A set of cutoff values (sequence distances 0.06, 0.15 and 0.48 for species, genus, and (sub)phylum levels, respectively) was established using a distance-based grouping program. Analysis of the environmental clones revealed that current efforts on APB isolation and sampling in natural environments are largely inadequate. Analysis of the average distance between each identified genus and an uncultured environmental pufM sequence indicated that the majority of cultured APB genera lack environmental representatives. The distance-based grouping method is fast and efficient for bulk functional gene sequences analysis. The results clearly show that we are at a relatively early stage in sampling the global richness of APB species. Periodical assessment will undoubtedly facilitate in-depth analysis of potential biogeographical distribution pattern of APB. This is the first attempt to assess the present understanding of APB diversity in natural environments. The method used is also useful for assessing the diversity of other functional genes.

  1. Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device.

    PubMed

    Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin

    2017-01-01

    The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.

  2. VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.

    PubMed

    Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G

    2018-01-01

    Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.

  3. Genome Improvement at JGI-HAGSC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grimwood, Jane; Schmutz, Jeremy J.; Myers, Richard M.

    Since the completion of the sequencing of the human genome, the Joint Genome Institute (JGI) has rapidly expanded its scientific goals in several DOE mission-relevant areas. At the JGI-HAGSC, we have kept pace with this rapid expansion of projects with our focus on assessing, assembling, improving and finishing eukaryotic whole genome shotgun (WGS) projects for which the shotgun sequence is generated at the Production Genomic Facility (JGI-PGF). We follow this by combining the draft WGS with genomic resources generated at JGI-HAGSC or in collaborator laboratories (including BAC end sequences, genetic maps and FLcDNA sequences) to produce an improved draft sequence.more » For eukaryotic genomes important to the DOE mission, we then add further information from directed experiments to produce reference genomic sequences that are publicly available for any scientific researcher. Also, we have continued our program for producing BAC-based finished sequence, both for adding information to JGI genome projects and for small BAC-based sequencing projects proposed through any of the JGI sequencing programs. We have now built our computational expertise in WGS assembly and analysis and have moved eukaryotic genome assembly from the JGI-PGF to JGI-HAGSC. We have concentrated our assembly development work on large plant genomes and complex fungal and algal genomes.« less

  4. Simple and efficient identification of rare recessive pathologically important sequence variants from next generation exome sequence data.

    PubMed

    Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T

    2013-07-01

    Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.

  5. Characterization, genetic diversity, and evolutionary link of Cucumber mosaic virus strain New Delhi from India.

    PubMed

    Koundal, Vikas; Haq, Qazi Mohd Rizwanul; Praveen, Shelly

    2011-02-01

    The genome of Cucumber mosaic virus New Delhi strain (CMV-ND) from India, obtained from tomato, was completely sequenced and compared with full genome sequences of 14 known CMV strains from subgroups I and II, for their genetic diversity. Sequence analysis suggests CMV-ND shares maximum sequence identity at the nucleotide level with a CMV strain from Taiwan. Among all 15 strains of CMV, the encoded protein 2b is least conserved, whereas the coat protein (CP) is most conserved. Sequence identity values and phylogram results indicate that CMV-ND belongs to subgroup I. Based on the recombination detection program result, it appears that CMV is prone to recombination, and different RNA components of CMV-ND have evolved differently. Recombinational analysis of all 15 CMV strains detected maximum recombination breakpoints in RNA2; CP showed the least recombination sites.

  6. Special Focus

    PubMed Central

    Nawrocki, Eric P.; Burge, Sarah W.

    2013-01-01

    The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research. PMID:23948768

  7. The MHOST finite element program: 3-D inelastic analysis methods for hot section components. Volume 3: Systems' manual

    NASA Technical Reports Server (NTRS)

    Nakazawa, Shohei

    1989-01-01

    The internal structure is discussed of the MHOST finite element program designed for 3-D inelastic analysis of gas turbine hot section components. The computer code is the first implementation of the mixed iterative solution strategy for improved efficiency and accuracy over the conventional finite element method. The control structure of the program is covered along with the data storage scheme and the memory allocation procedure and the file handling facilities including the read and/or write sequences.

  8. Electronic Warfare Training Analysis.

    DTIC Science & Technology

    1972-01-01

    associated have a question, the instructor can select a with it one plugboard and two program switch previously presented training step sequence by drums...operates in conjunction with Located on the control panel. an overlay, programmed plugboard , and two 1-13. JACK PIN CONTROL. The device also program...drums. Supplied with Device 3CI27A are one plugboard and two program switch has means for student participation. An open- *.- . . ,_ . cLruZuiL Iwo-pote

  9. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

    PubMed

    Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke

    2012-01-01

    Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.

  11. A quantitative study of a physics-first pilot program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pasero, Spencer Lee; /Northern Illinois U.

    Hundreds of high schools around the United States have inverted the traditional core sequence of high school science courses, putting physics first, followed by chemistry, and then biology. A quarter-century of theory, opinion, and anecdote are available, but the literature lacks empirical evidence of the effects of the program. The current study was designed to investigate the effects of the program on science achievement gain, growth in attitude toward science, and growth in understanding of the nature of scientific knowledge. One hundred eighty-five honor students participated in this quasi-experiment, self-selecting into either the traditional or inverted sequence. Students took themore » Explore test as freshmen, and the Plan test as sophomores. Gain scores were calculated for the composite scores and for the science and mathematics subscale scores. A two-factor analysis of variance (ANOVA) on course sequence and cohort showed significantly greater composite score gains by students taking the inverted sequence. Participants were administered surveys measuring attitude toward science and understanding of the nature of scientific knowledge twice per year. A multilevel growth model, compared across program groups, did not show any significant effect of the inverted sequence on either attitude or understanding of the nature of scientific knowledge. The sole significant parameter showed a decline in student attitude independent of course sequence toward science over the first two years of high school. The results of this study support the theory that moving physics to the front of the science sequence can improve achievement. The importance of the composite gain score on tests vertically aligned with the high-stakes ACT is discussed, and several ideas for extensions of the current study are offered.« less

  12. Spotlight-8 Image Analysis Software

    NASA Technical Reports Server (NTRS)

    Klimek, Robert; Wright, Ted

    2006-01-01

    Spotlight is a cross-platform GUI-based software package designed to perform image analysis on sequences of images generated by combustion and fluid physics experiments run in a microgravity environment. Spotlight can perform analysis on a single image in an interactive mode or perform analysis on a sequence of images in an automated fashion. Image processing operations can be employed to enhance the image before various statistics and measurement operations are performed. An arbitrarily large number of objects can be analyzed simultaneously with independent areas of interest. Spotlight saves results in a text file that can be imported into other programs for graphing or further analysis. Spotlight can be run on Microsoft Windows, Linux, and Apple OS X platforms.

  13. GPUmotif: An Ultra-Fast and Energy-Efficient Motif Analysis Program Using Graphics Processing Units

    PubMed Central

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a “fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/ PMID:22662128

  14. GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

    PubMed

    Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

    2012-01-01

    Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/

  15. Open discovery: An integrated live Linux platform of Bioinformatics tools.

    PubMed

    Vetrivel, Umashankar; Pilla, Kalabharath

    2008-01-01

    Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery - a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in.

  16. Environmental Barcoding: A Next-Generation Sequencing Approach for Biomonitoring Applications Using River Benthos

    PubMed Central

    Hajibabaei, Mehrdad; Shokralla, Shadi; Zhou, Xin; Singer, Gregory A. C.; Baird, Donald J.

    2011-01-01

    Timely and accurate biodiversity analysis poses an ongoing challenge for the success of biomonitoring programs. Morphology-based identification of bioindicator taxa is time consuming, and rarely supports species-level resolution especially for immature life stages. Much work has been done in the past decade to develop alternative approaches for biodiversity analysis using DNA sequence-based approaches such as molecular phylogenetics and DNA barcoding. On-going assembly of DNA barcode reference libraries will provide the basis for a DNA-based identification system. The use of recently introduced next-generation sequencing (NGS) approaches in biodiversity science has the potential to further extend the application of DNA information for routine biomonitoring applications to an unprecedented scale. Here we demonstrate the feasibility of using 454 massively parallel pyrosequencing for species-level analysis of freshwater benthic macroinvertebrate taxa commonly used for biomonitoring. We designed our experiments in order to directly compare morphology-based, Sanger sequencing DNA barcoding, and next-generation environmental barcoding approaches. Our results show the ability of 454 pyrosequencing of mini-barcodes to accurately identify all species with more than 1% abundance in the pooled mixture. Although the approach failed to identify 6 rare species in the mixture, the presence of sequences from 9 species that were not represented by individuals in the mixture provides evidence that DNA based analysis may yet provide a valuable approach in finding rare species in bulk environmental samples. We further demonstrate the application of the environmental barcoding approach by comparing benthic macroinvertebrates from an urban region to those obtained from a conservation area. Although considerable effort will be required to robustly optimize NGS tools to identify species from bulk environmental samples, our results indicate the potential of an environmental barcoding approach for biomonitoring programs. PMID:21533287

  17. MACSIMS : multiple alignment of complete sequences information management system

    PubMed Central

    Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

    2006-01-01

    Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820

  18. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities tomore » known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.« less

  19. Detailed Test Plan Redundant Sensor Strapdown IMU Evaluation Program

    NASA Technical Reports Server (NTRS)

    Hartwell, T.; Miyatake, Y.; Wedekind, D. E.

    1971-01-01

    The test plan for a redundant sensor strapdown inertial measuring unit evaluation program is presented. The subjects discussed are: (1) test philosophy and limitations, (2) test sequence, (3) equipment specifications, (4) general operating procedures, (5) calibration procedures, (6) alignment test phase, and (7) navigation test phase. The data and analysis requirements are analyzed.

  20. Assessing Global Awareness over Short-Term Study Abroad Sequence: A Factor Analysis

    ERIC Educational Resources Information Center

    Kurt, Mark R.; Olitsky, Neal H.; Geis, Paul

    2013-01-01

    Academic study abroad programs are uniquely equipped to give students the opportunities to achieve outcomes for global citizenship (Langran, Langran, and Ozment 2009). These programs take students outside the confines of their home institutions and expose students to new cultures and languages while integrating academic content to enhance the…

  1. Scenario-Based Programming, Usability-Oriented Perception

    ERIC Educational Resources Information Center

    Alexandron, Giora; Armoni, Michal; Gordon, Michal; Harel, David

    2014-01-01

    In this article, we discuss the possible connection between the programming language and the paradigm behind it, and programmers' tendency to adopt an external or internal perspective of the system they develop. Based on a qualitative analysis, we found that when working with the visual, interobject language of live sequence charts (LSC),…

  2. TraceContract: A Scala DSL for Trace Analysis

    NASA Technical Reports Server (NTRS)

    Barringer, Howard; Havelund, Klaus

    2011-01-01

    In this paper we describe TRACECONTRACT, an API for trace analysis, implemented in the SCALA programming language. We argue that for certain forms of trace analysis the best weapon is a high level programming language augmented with constructs for temporal reasoning. A trace is a sequence of events, which may for example be generated by a running program, instrumented appropriately to generate events. The API supports writing properties in a notation that combines an advanced form of data parameterized state machines with temporal logic. The implementation utilizes SCALA's support for defining internal Domain Specific Languages (DSLs). Furthermore SCALA's combination of object oriented and functional programming features, including partial functions and pattern matching, makes it an ideal host language for such an API.

  3. MANGO: a new approach to multiple sequence alignment.

    PubMed

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2007-01-01

    Multiple sequence alignment is a classical and challenging task for biological sequence analysis. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state of the art multiple sequence alignment programs suffer from the 'once a gap, always a gap' phenomenon. Is there a radically new way to do multiple sequence alignment? This paper introduces a novel and orthogonal multiple sequence alignment method, using multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds are provably significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks showing that MANGO compares favorably, in both accuracy and speed, against state-of-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, Prob-ConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0 and Kalign 2.0.

  4. Flexbar 3.0 - SIMD and multicore parallelization.

    PubMed

    Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut

    2017-09-15

    High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. Programs for analysis and resizing of complex structures. [computerized minimum weight design

    NASA Technical Reports Server (NTRS)

    Haftka, R. T.; Prasad, B.

    1978-01-01

    The paper describes the PARS (Programs for Analysis and Resizing of Structures) system. PARS is a user oriented system of programs for the minimum weight design of structures modeled by finite elements and subject to stress, displacement, flutter and thermal constraints. The system is built around SPAR - an efficient and modular general purpose finite element program, and consists of a series of processors that communicate through the use of a data base. An efficient optimizer based on the Sequence of Unconstrained Minimization Technique (SUMT) with an extended interior penalty function and Newton's method is used. Several problems are presented for demonstration of the system capabilities.

  6. Shuttle cryogenics supply system. Optimization study. Volume 5 B-4: Programmers manual for space shuttle orbit injection analysis (SOPSA)

    NASA Technical Reports Server (NTRS)

    1973-01-01

    A computer program for space shuttle orbit injection propulsion system analysis (SOPSA) is described to show the operational characteristics and the computer system requirements. The program was developed as an analytical tool to aid in the preliminary design of propellant feed systems for the space shuttle orbiter main engines. The primary purpose of the program is to evaluate the propellant tank ullage pressure requirements imposed by the need to accelerate propellants rapidly during the engine start sequence. The SOPSA program will generate parametric feed system pressure histories and weight data for a range of nominal feedline sizes.

  7. NGSANE: a lightweight production informatics framework for high-throughput data analysis.

    PubMed

    Buske, Fabian A; French, Hugh J; Smith, Martin A; Clark, Susan J; Bauer, Denis C

    2014-05-15

    The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. Denis.Bauer@csiro.au Supplementary data are available at Bioinformatics online.

  8. A survey of tools for variant analysis of next-generation genome sequencing data

    PubMed Central

    Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes

    2014-01-01

    Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494

  9. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    PubMed

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  10. Clustalnet: the joining of Clustal and CORBA.

    PubMed

    Campagne, F

    2000-07-01

    Performing sequence alignment operations from a different program than the original sequence alignment code, and/or through a network connection, is often required. Interactive alignment editors and large-scale biological data analysis are common examples where such a flexibility is important. Interoperability between the alignment engine and the client should be obtained regardless of the architectures and programming languages of the server and client. Clustalnet, a Clustal alignment CORBA server is described, which was developed on the basis of Clustalw. This server brings the robustness of the algorithms and implementations of Clustal to a new level of reuse. A Clustalnet server object can be accessed from a program, transparently through the network. We present interfaces to perform the alignment operations and to control these operations via immutable contexts. The interfaces that select the contexts do not depend on the nature of the operation to be performed, making the design modular. The IDL interfaces presented here are not specific to Clustal and can be implemented on top of different sequence alignment algorithm implementations.

  11. STING Millennium: a web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence

    PubMed Central

    Neshich, Goran; Togawa, Roberto C.; Mancini, Adauto L.; Kuser, Paula R.; Yamagishi, Michel E. B.; Pappas, Georgios; Torres, Wellington V.; Campos, Tharsis Fonseca e; Ferreira, Leonardo L.; Luna, Fabio M.; Oliveira, Adilton G.; Miura, Ronald T.; Inoue, Marcus K.; Horita, Luiz G.; de Souza, Dimas F.; Dominiquini, Fabiana; Álvaro, Alexandre; Lima, Cleber S.; Ogawa, Fabio O.; Gomes, Gabriel B.; Palandrani, Juliana F.; dos Santos, Gabriela F.; de Freitas, Esther M.; Mattiuz, Amanda R.; Costa, Ivan C.; de Almeida, Celso L.; Souza, Savio; Baudet, Christian; Higa, Roberto H.

    2003-01-01

    STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Cα–Cα and Cβ–Cβ distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)—amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS. PMID:12824333

  12. Development of Integrated Programs for Aerospace-vehicle design (IPAD): Reference design process

    NASA Technical Reports Server (NTRS)

    Meyer, D. D.

    1979-01-01

    The airplane design process and its interfaces with manufacturing and customer operations are documented to be used as criteria for the development of integrated programs for the analysis, design, and testing of aerospace vehicles. Topics cover: design process management, general purpose support requirements, design networks, and technical program elements. Design activity sequences are given for both supersonic and subsonic commercial transports, naval hydrofoils, and military aircraft.

  13. SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data.

    PubMed

    de Andrade, Roberto R S; Vaslin, Maite F S

    2014-03-07

    Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/.

  14. SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data

    PubMed Central

    2014-01-01

    Background Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. Methods In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. Results The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. Conclusions SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. Availability and implementation SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/. PMID:24607237

  15. Phylogenetic Analysis of Rubella Viruses Identified in Uganda, 2003–2012

    PubMed Central

    Namuwulya, Prossy; Abernathy, Emily; Bukenya, Henry; Bwogi, Josephine; Tushabe, Phionah; Birungi, Molly; Seguya, Ronald; Kabaliisa, Theopista; Alibu, Vincent P.; Kayondo, Jonathan K.; Rivailler, Pierre; Icenogle, Joseph; Bakamutumaho, Barnabas

    2014-01-01

    Molecular data on rubella viruses are limited in Uganda despite the importance of congenital rubella syndrome (CRS). Routine rubella vaccination, while not administered currently in Uganda, is expected to begin by 2015. The World Health Organization recommends that countries without rubella vaccination programs assess the burden of rubella and CRS before starting a routine vaccination program. Uganda is already involved in integrated case-based surveillance, including laboratory testing to confirm measles and rubella, but molecular epidemiologic aspects of rubella circulation have so far not been documented in Uganda. Twenty throat swab or oral fluid samples collected from 12 districts during routine rash and fever surveillance between 2003 and 2012 were identified as rubella virus RNA positive and PCR products encompassing the region used for genotyping were sequenced. Phylogenetic analysis of the 20 sequences identified 19 genotype 1G viruses and 1 genotype 1E virus. Genotype-specific trees showed that the Uganda viruses belonged to specific clusters for both genotypes 1G and 1E and grouped with similar sequences from neighboring countries. Genotype 1G was predominant in Uganda. More epidemiological and molecular epidemiological data are required to determine if genotype 1E is also endemic in Uganda. The information obtained in this study will assist the immunization program in monitoring changes in circulating genotypes. PMID:24700073

  16. Phylogenetic analysis of rubella viruses identified in Uganda, 2003-2012.

    PubMed

    Namuwulya, Prossy; Abernathy, Emily; Bukenya, Henry; Bwogi, Josephine; Tushabe, Phionah; Birungi, Molly; Seguya, Ronald; Kabaliisa, Theopista; Alibu, Vincent P; Kayondo, Jonathan K; Rivailler, Pierre; Icenogle, Joseph; Bakamutumaho, Barnabas

    2014-12-01

    Molecular data on rubella viruses are limited in Uganda despite the importance of congenital rubella syndrome (CRS). Routine rubella vaccination, while not administered currently in Uganda, is expected to begin by 2015. The World Health Organization recommends that countries without rubella vaccination programs assess the burden of rubella and CRS before starting a routine vaccination program. Uganda is already involved in integrated case-based surveillance, including laboratory testing to confirm measles and rubella, but molecular epidemiologic aspects of rubella circulation have so far not been documented in Uganda. Twenty throat swab or oral fluid samples collected from 12 districts during routine rash and fever surveillance between 2003 and 2012 were identified as rubella virus RNA positive and PCR products encompassing the region used for genotyping were sequenced. Phylogenetic analysis of the 20 sequences identified 19 genotype 1G viruses and 1 genotype 1E virus. Genotype-specific trees showed that the Uganda viruses belonged to specific clusters for both genotypes 1G and 1E and grouped with similar sequences from neighboring countries. Genotype 1G was predominant in Uganda. More epidemiological and molecular epidemiological data are required to determine if genotype 1E is also endemic in Uganda. The information obtained in this study will assist the immunization program in monitoring changes in circulating genotypes. © 2014 Wiley Periodicals, Inc.

  17. Use of mutation spectra analysis software.

    PubMed

    Rogozin, I; Kondrashov, F; Glazko, G

    2001-02-01

    The study and comparison of mutation(al) spectra is an important problem in molecular biology, because these spectra often reflect on important features of mutations and their fixation. Such features include the interaction of DNA with various mutagens, the function of repair/replication enzymes, and properties of target proteins. It is known that mutability varies significantly along nucleotide sequences, such that mutations often concentrate at certain positions, called "hotspots," in a sequence. In this paper, we discuss in detail two approaches for mutation spectra analysis: the comparison of mutation spectra with a HG-PUBL program, (FTP: sunsite.unc.edu/pub/academic/biology/dna-mutations/hyperg) and hotspot prediction with the CLUSTERM program (www.itba.mi.cnr.it/webmutation; ftp.bionet.nsc.ru/pub/biology/dbms/clusterm.zip). Several other approaches for mutational spectra analysis, such as the analysis of a target protein structure, hotspot context revealing, multiple spectra comparisons, as well as a number of mutation databases are briefly described. Mutation spectra in the lacI gene of E. coli and the human p53 gene are used for illustration of various difficulties of such analysis. Copyright 2001 Wiley-Liss, Inc.

  18. RAMONA-3B application to Browns Ferry ATWS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Slovik, G.C.; Neymotin, L.Y.; Saha, P.

    1985-01-01

    The Anticipated Transient Without Scram (ATWS) is known to be a dominant accident sequence for possible core melt in a Boiling Water Reactor (BWR). A recent Probabilistic Risk Assessment (PRA) analysis for the Browns Ferry nuclear power plant indicates that ATWS is the second most dominant transient for core melt in BWR/4 with Mark I containment. The most dominant sequence being the failure of long term decay heat removal function of the Residual Heat Removal (RHR) system. Of all the various ATWS scenarios, the Main Steam Isolation Valve (MSIV) closure ATWS sequence was chosen for present analysis because of itsmore » relatively high frequency of occurrence and its challenge to the residual heat removal system and containment integrity. The objective of this paper is to discuss four MSIV closure ATWS calculations using the RAMONA-3B code. The paper is a summary of a report being prepared for the USNRC Severe Accident Sequence Analysis (SASA) program which should be referred to for details. 10 refs., 20 figs., 3 tabs.« less

  19. Phylogenetic analysis of human immunodeficiency virus type 2 isolated from Cuban individuals.

    PubMed

    Machado, Liuber Y; Díaz, Héctor M; Noa, Enrique; Martín, Dayamí; Blanco, Madeline; Díaz, Dervel F; Sánchez, Yordank R; Nibot, Carmen; Sánchez, Lourdes; Dubed, Marta

    2014-08-01

    The presence of infection by human immunodeficiency virus type 2 (HIV-2) in Cuba has been previously documented. However, genetic information on the strains that circulate in the Cuban people is still unknown. The present work constitutes the first study concerning the phylogenetic relationship of HIV-2 Cuban isolates conducted on 13 Cuban patients who were diagnosed with HIV-2. The env sequences were analyzed for the construction of a phylogenetic tree with reference sequences of HIV-2. Phylogenetic analysis of the env gene showed that all the Cuban sequences clustered in group A of HIV-2. The analysis indicated several independent introductions of HIV-2 into Cuba. The results of the study will reinforce the program on the epidemiological surveillance of the infection in Cuba and make possible further molecular evolutionary studies.

  20. Molecular Phylogenetics: Concepts for a Newcomer.

    PubMed

    Ajawatanawong, Pravech

    Molecular phylogenetics is the study of evolutionary relationships among organisms using molecular sequence data. The aim of this review is to introduce the important terminology and general concepts of tree reconstruction to biologists who lack a strong background in the field of molecular evolution. Some modern phylogenetic programs are easy to use because of their user-friendly interfaces, but understanding the phylogenetic algorithms and substitution models, which are based on advanced statistics, is still important for the analysis and interpretation without a guide. Briefly, there are five general steps in carrying out a phylogenetic analysis: (1) sequence data preparation, (2) sequence alignment, (3) choosing a phylogenetic reconstruction method, (4) identification of the best tree, and (5) evaluating the tree. Concepts in this review enable biologists to grasp the basic ideas behind phylogenetic analysis and also help provide a sound basis for discussions with expert phylogeneticists.

  1. Severe Accident Sequence Analysis Program: Anticipated transient without scram simulations for Browns Ferry Nuclear Plant Unit 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dallman, R J; Gottula, R C; Holcomb, E E

    1987-05-01

    An analysis of five anticipated transients without scram (ATWS) was conducted at the Idaho National Engineering Laboratory (INEL). The five detailed deterministic simulations of postulated ATWS sequences were initiated from a main steamline isolation valve (MSIV) closure. The subject of the analysis was the Browns Ferry Nuclear Plant Unit 1, a boiling water reactor (BWR) of the BWR/4 product line with a Mark I containment. The simulations yielded insights to the possible consequences resulting from a MSIV closure ATWS. An evaluation of the effects of plant safety systems and operator actions on accident progression and mitigation is presented.

  2. TaxI: a software tool for DNA barcoding using distance methods

    PubMed Central

    Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

    2005-01-01

    DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755

  3. Open discovery: An integrated live Linux platform of Bioinformatics tools

    PubMed Central

    Vetrivel, Umashankar; Pilla, Kalabharath

    2008-01-01

    Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery ‐ a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. Availability The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in PMID:19238235

  4. A convenient and adaptable package of DNA sequence analysis programs for microcomputers.

    PubMed Central

    Pustell, J; Kafatos, F C

    1982-01-01

    We describe a package of DNA data handling and analysis programs designed for microcomputers. The package is convenient for immediate use by persons with little or no computer experience, and has been optimized by trial in our group for a year. By typing a single command, the user enters a system which asks questions or gives instructions in English. The system will enter, alter, and manage sequence files or a restriction enzyme library. It generates the reverse complement, translates, calculates codon usage, finds restriction sites, finds homologies with various degrees of mismatch, and graphs amino acid composition or base frequencies. A number of options for data handling and printing can be used to produce figures for publication. The package will be available in ANSI Standard FORTRAN for use with virtually any FORTRAN compiler. PMID:6278412

  5. Identification of Human Lineage-Specific Transcriptional Coregulators Enabled by a Glossary of Binding Modules and Tunable Genomic Backgrounds.

    PubMed

    Mariani, Luca; Weinand, Kathryn; Vedenko, Anastasia; Barrera, Luis A; Bulyk, Martha L

    2017-09-27

    Transcription factors (TFs) control cellular processes by binding specific DNA motifs to modulate gene expression. Motif enrichment analysis of regulatory regions can identify direct and indirect TF binding sites. Here, we created a glossary of 108 non-redundant TF-8mer "modules" of shared specificity for 671 metazoan TFs from publicly available and new universal protein binding microarray data. Analysis of 239 ENCODE TF chromatin immunoprecipitation sequencing datasets and associated RNA sequencing profiles suggest the 8mer modules are more precise than position weight matrices in identifying indirect binding motifs and their associated tethering TFs. We also developed GENRE (genomically equivalent negative regions), a tunable tool for construction of matched genomic background sequences for analysis of regulatory regions. GENRE outperformed four state-of-the-art approaches to background sequence construction. We used our TF-8mer glossary and GENRE in the analysis of the indirect binding motifs for the co-occurrence of tethering factors, suggesting novel TF-TF interactions. We anticipate that these tools will aid in elucidating tissue-specific gene-regulatory programs. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

    PubMed

    Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

    2012-03-15

    Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.

  7. SONAR: A High-Throughput Pipeline for Inferring Antibody Ontogenies from Longitudinal Sequencing of B Cell Transcripts.

    PubMed

    Schramm, Chaim A; Sheng, Zizhang; Zhang, Zhenhai; Mascola, John R; Kwong, Peter D; Shapiro, Lawrence

    2016-01-01

    The rapid advance of massively parallel or next-generation sequencing technologies has made possible the characterization of B cell receptor repertoires in ever greater detail, and these developments have triggered a proliferation of software tools for processing and annotating these data. Of especial interest, however, is the capability to track the development of specific antibody lineages across time, which remains beyond the scope of most current programs. We have previously reported on the use of techniques such as inter- and intradonor analysis and CDR3 tracing to identify transcripts related to an antibody of interest. Here, we present Software for the Ontogenic aNalysis of Antibody Repertoires (SONAR), capable of automating both general repertoire analysis and specialized techniques for investigating specific lineages. SONAR annotates next-generation sequencing data, identifies transcripts in a lineage of interest, and tracks lineage development across multiple time points. SONAR also generates figures, such as identity-divergence plots and longitudinal phylogenetic "birthday" trees, and provides interfaces to other programs such as DNAML and BEAST. SONAR can be downloaded as a ready-to-run Docker image or manually installed on a local machine. In the latter case, it can also be configured to take advantage of a high-performance computing cluster for the most computationally intensive steps, if available. In summary, this software provides a useful new tool for the processing of large next-generation sequencing datasets and the ontogenic analysis of neutralizing antibody lineages. SONAR can be found at https://github.com/scharch/SONAR, and the Docker image can be obtained from https://hub.docker.com/r/scharch/sonar/.

  8. No evidence for the use of DIR, D–D fusions, chromosome 15 open reading frames or VHreplacement in the peripheral repertoire was found on application of an improved algorithm, JointML, to 6329 human immunoglobulin H rearrangements

    PubMed Central

    Ohm-Laursen, Line; Nielsen, Morten; Larsen, Stine R; Barington, Torben

    2006-01-01

    Antibody diversity is created by imprecise joining of the variability (V), diversity (D) and joining (J) gene segments of the heavy and light chain loci. Analysis of rearrangements is complicated by somatic hypermutations and uncertainty concerning the sources of gene segments and the precise way in which they recombine. It has been suggested that D genes with irregular recombination signal sequences (DIR) and chromosome 15 open reading frames (OR15) can replace conventional D genes, that two D genes or inverted D genes may be used and that the repertoire can be further diversified by heavy chain V gene (VH) replacement. Safe conclusions require large, well-defined sequence samples and algorithms minimizing stochastic assignment of segments. Two computer programs were developed for analysis of heavy chain joints. JointHMM is a profile hidden Markow model, while JointML is a maximum-likelihood-based method taking the lengths of the joint and the mutational status of the VH gene into account. The programs were applied to a set of 6329 clonally unrelated rearrangements. A conventional D gene was found in 80% of unmutated sequences and 64% of mutated sequences, while D-gene assignment was kept below 5% in artificial (randomly permutated) rearrangements. No evidence for the use of DIR, OR15, multiple D genes or VH replacements was found, while inverted D genes were used in less than 1‰ of the sequences. JointML was shown to have a higher predictive performance for D-gene assignment in mutated and unmutated sequences than four other publicly available programs. An online version 1·0 of JointML is available at http://www.cbs.dtu.dk/services/VDJsolver. PMID:17005006

  9. Sarment: Python modules for HMM analysis and partitioning of sequences.

    PubMed

    Guéguen, Laurent

    2005-08-15

    Sarment is a package of Python modules for easy building and manipulation of sequence segmentations. It provides efficient implementation of usual algorithms for hidden Markov Model computation, as well as for maximal predictive partitioning. Owing to its very large variety of criteria for computing segmentations, Sarment can handle many kinds of models. Because of object-oriented programming, the results of the segmentation are very easy tomanipulate.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jarocki, John Charles; Zage, David John; Fisher, Andrew N.

    LinkShop is a software tool for applying the method of Linkography to the analysis time-sequence data. LinkShop provides command line, web, and application programming interfaces (API) for input and processing of time-sequence data, abstraction models, and ontologies. The software creates graph representations of the abstraction model, ontology, and derived linkograph. Finally, the tool allows the user to perform statistical measurements of the linkograph and refine the ontology through direct manipulation of the linkograph.

  11. BNL severe-accident sequence experiments and analysis program. [PWR; BWR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Greene, G.A.; Ginsberg, T.; Tutu, N.K.

    1983-01-01

    In the analysis of degraded core accidents, the two major sources of pressure loading on light water reactor containments are: steam generation from core debris-water thermal interactions; and molten core-concrete interactions. Experiments are in progress at BNL in support of analytical model development related to aspects of the above containment loading mechanisms. The work supports development and evaluation of the CORCON (Muir, 1981) and MARCH (Wooton, 1980) computer codes. Progress in the two programs is described.

  12. Buying in to bioinformatics: an introduction to commercial sequence analysis software

    PubMed Central

    2015-01-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247

  13. Buying in to bioinformatics: an introduction to commercial sequence analysis software.

    PubMed

    Smith, David Roy

    2015-07-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. © The Author 2014. Published by Oxford University Press.

  14. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    PubMed

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  15. A powerful graphical pulse sequence programming tool for magnetic resonance imaging.

    PubMed

    Jie, Shen; Ying, Liu; Jianqi, Li; Gengying, Li

    2005-12-01

    A powerful graphical pulse sequence programming tool has been designed for creating magnetic resonance imaging (MRI) applications. It allows rapid development of pulse sequences in graphical mode (allowing for the visualization of sequences), and consists of three modules which include a graphical sequence editor, a parameter management module and a sequence compiler. Its key features are ease to use, flexibility and hardware independence. When graphic elements are combined with a certain text expressions, the graphical pulse sequence programming is as flexible as text-based programming tool. In addition, a hardware-independent design is implemented by using the strategy of two step compilations. To demonstrate the flexibility and the capability of this graphical sequence programming tool, a multi-slice fast spin echo experiment is performed on our home-made 0.3 T permanent magnet MRI system.

  16. An A Priori Multiobjective Optimization Model of a Search and Rescue Network

    DTIC Science & Technology

    1992-03-01

    sequences. Classical sensitivity analysis and tolerance analysis were used to analyze the frequency assignments generated by the different weight...function for excess coverage of a frequency. Sensitivity analysis is used to investigate the robustness of the frequency assignments produced by the...interest. The linear program solution is used to produce classical sensitivity analysis for the weight ranges. 17 III. Model Formulation This chapter

  17. Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.

    PubMed

    Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal

    2002-12-15

    We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.

  18. Net Metering | State, Local, and Tribal Governments | NREL

    Science.gov Websites

    research organizations have explored this question by conducting solar cost-benefit studies. Program Design Sequencing for State Distributed PV Policies: A Quantitative Analysis of Policy Impacts and Interactions

  19. TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence - TCGA

    Cancer.gov

    Carolyn Hutter, Ph.D., Program Director of NHGRI's Division of Genomic Medicine, discusses the expansion of TCGA's Pan-Cancer efforts to include the Pan-Cancer Analysis of Whole Genomes (PAWG) project.

  20. Determining protein function and interaction from genome analysis

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  1. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  2. Enhanced Modeling of First-Order Plant Equations of Motion for Aeroelastic and Aeroservoelastic Applications

    NASA Technical Reports Server (NTRS)

    Pototzky, Anthony S.

    2010-01-01

    A methodology is described for generating first-order plant equations of motion for aeroelastic and aeroservoelastic applications. The description begins with the process of generating data files representing specialized mode-shapes, such as rigid-body and control surface modes, using both PATRAN and NASTRAN analysis. NASTRAN executes the 146 solution sequence using numerous Direct Matrix Abstraction Program (DMAP) calls to import the mode-shape files and to perform the aeroelastic response analysis. The aeroelastic response analysis calculates and extracts structural frequencies, generalized masses, frequency-dependent generalized aerodynamic force (GAF) coefficients, sensor deflections and load coefficients data as text-formatted data files. The data files are then re-sequenced and re-formatted using a custom written FORTRAN program. The text-formatted data files are stored and coefficients for s-plane equations are fitted to the frequency-dependent GAF coefficients using two Interactions of Structures, Aerodynamics and Controls (ISAC) programs. With tabular files from stored data created by ISAC, MATLAB generates the first-order aeroservoelastic plant equations of motion. These equations include control-surface actuator, turbulence, sensor and load modeling. Altitude varying root-locus plot and PSD plot results for a model of the F-18 aircraft are presented to demonstrate the capability.

  3. cnvScan: a CNV screening and annotation tool to improve the clinical utility of computational CNV prediction from exome sequencing data.

    PubMed

    Samarakoon, Pubudu Saneth; Sorte, Hanne Sørmo; Stray-Pedersen, Asbjørg; Rødningen, Olaug Kristin; Rognes, Torbjørn; Lyle, Robert

    2016-01-14

    With advances in next generation sequencing technology and analysis methods, single nucleotide variants (SNVs) and indels can be detected with high sensitivity and specificity in exome sequencing data. Recent studies have demonstrated the ability to detect disease-causing copy number variants (CNVs) in exome sequencing data. However, exonic CNV prediction programs have shown high false positive CNV counts, which is the major limiting factor for the applicability of these programs in clinical studies. We have developed a tool (cnvScan) to improve the clinical utility of computational CNV prediction in exome data. cnvScan can accept input from any CNV prediction program. cnvScan consists of two steps: CNV screening and CNV annotation. CNV screening evaluates CNV prediction using quality scores and refines this using an in-house CNV database, which greatly reduces the false positive rate. The annotation step provides functionally and clinically relevant information using multiple source datasets. We assessed the performance of cnvScan on CNV predictions from five different prediction programs using 64 exomes from Primary Immunodeficiency (PIDD) patients, and identified PIDD-causing CNVs in three individuals from two different families. In summary, cnvScan reduces the time and effort required to detect disease-causing CNVs by reducing the false positive count and providing annotation. This improves the clinical utility of CNV detection in exome data.

  4. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

    DOE PAGES

    Yim, Won Cheol; Cushman, John C.

    2017-07-22

    Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

  5. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yim, Won Cheol; Cushman, John C.

    Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

  6. Considerations in STS payload environmental verification

    NASA Technical Reports Server (NTRS)

    Keegan, W. B.

    1978-01-01

    Considerations regarding the Space Transportation System (STS) payload environmental verification are reviewed. It is noted that emphasis is placed on testing at the subassembly level and that the basic objective of structural dynamic payload verification is to ensure reliability in a cost-effective manner. Structural analyses consist of: (1) stress analysis for critical loading conditions, (2) model analysis for launch and orbital configurations, (3) flight loads analysis, (4) test simulation analysis to verify models, (5) kinematic analysis of deployment/retraction sequences, and (6) structural-thermal-optical program analysis. In addition to these approaches, payload verification programs are being developed in the thermal-vacuum area. These include the exposure to extreme temperatures, temperature cycling, thermal-balance testing and thermal-vacuum testing.

  7. Manned space flight nuclear system safety. Volume 3: Reactor system preliminary nuclear safety analysis. Part 2: Accident Model Document (AMD)

    NASA Technical Reports Server (NTRS)

    1972-01-01

    The Accident Model Document is one of three documents of the Preliminary Safety Analysis Report (PSAR) - Reactor System as applied to a Space Base Program. Potential terrestrial nuclear hazards involving the zirconium hydride reactor-Brayton power module are identified for all phases of the Space Base program. The accidents/events that give rise to the hazards are defined and abort sequence trees are developed to determine the sequence of events leading to the hazard and the associated probabilities of occurence. Source terms are calculated to determine the magnitude of the hazards. The above data is used in the mission accident analysis to determine the most probable and significant accidents/events in each mission phase. The only significant hazards during the prelaunch and launch ascent phases of the mission are those which arise form criticality accidents. Fission product inventories during this time period were found to be very low due to very limited low power acceptance testing.

  8. PAQ: Partition Analysis of Quasispecies.

    PubMed

    Baccam, P; Thompson, R J; Fedrigo, O; Carpenter, S; Cornette, J L

    2001-01-01

    The complexities of genetic data may not be accurately described by any single analytical tool. Phylogenetic analysis is often used to study the genetic relationship among different sequences. Evolutionary models and assumptions are invoked to reconstruct trees that describe the phylogenetic relationship among sequences. Genetic databases are rapidly accumulating large amounts of sequences. Newly acquired sequences, which have not yet been characterized, may require preliminary genetic exploration in order to build models describing the evolutionary relationship among sequences. There are clustering techniques that rely less on models of evolution, and thus may provide nice exploratory tools for identifying genetic similarities. Some of the more commonly used clustering methods perform better when data can be grouped into mutually exclusive groups. Genetic data from viral quasispecies, which consist of closely related variants that differ by small changes, however, may best be partitioned by overlapping groups. We have developed an intuitive exploratory program, Partition Analysis of Quasispecies (PAQ), which utilizes a non-hierarchical technique to partition sequences that are genetically similar. PAQ was used to analyze a data set of human immunodeficiency virus type 1 (HIV-1) envelope sequences isolated from different regions of the brain and another data set consisting of the equine infectious anemia virus (EIAV) regulatory gene rev. Analysis of the HIV-1 data set by PAQ was consistent with phylogenetic analysis of the same data, and the EIAV rev variants were partitioned into two overlapping groups. PAQ provides an additional tool which can be used to glean information from genetic data and can be used in conjunction with other tools to study genetic similarities and genetic evolution of viral quasispecies.

  9. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  10. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  11. A generalized global alignment algorithm.

    PubMed

    Huang, Xiaoqiu; Chao, Kun-Mao

    2003-01-22

    Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.

  12. PANGEA: pipeline for analysis of next generation amplicons

    PubMed Central

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-01-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525

  13. PANGEA: pipeline for analysis of next generation amplicons.

    PubMed

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-07-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.

  14. Computer program for the IBM personal computer which searches for approximate matches to short oligonucleotide sequences in long target DNA sequences.

    PubMed Central

    Myers, E W; Mount, D W

    1986-01-01

    We describe a program which may be used to find approximate matches to a short predefined DNA sequence in a larger target DNA sequence. The program predicts the usefulness of specific DNA probes and sequencing primers and finds nearly identical sequences that might represent the same regulatory signal. The program is written in the C programming language and will run on virtually any computer system with a C compiler, such as the IBM/PC and other computers running under the MS/DOS and UNIX operating systems. The program has been integrated into an existing software package for the IBM personal computer (see article by Mount and Conrad, this volume). Some examples of its use are given. PMID:3753785

  15. Flavivirus and Filovirus EvoPrinters: New alignment tools for the comparative analysis of viral evolution.

    PubMed

    Brody, Thomas; Yavatkar, Amarendra S; Park, Dong Sun; Kuzin, Alexander; Ross, Jermaine; Odenwald, Ward F

    2017-06-01

    Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest. We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome. EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter's ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.

  16. Hal: an automated pipeline for phylogenetic analyses of genomic data.

    PubMed

    Robbertse, Barbara; Yoder, Ryan J; Boyd, Alex; Reeves, John; Spatafora, Joseph W

    2011-02-07

    The rapid increase in genomic and genome-scale data is resulting in unprecedented levels of discrete sequence data available for phylogenetic analyses. Major analytical impasses exist, however, prior to analyzing these data with existing phylogenetic software. Obstacles include the management of large data sets without standardized naming conventions, identification and filtering of orthologous clusters of proteins or genes, and the assembly of alignments of orthologous sequence data into individual and concatenated super alignments. Here we report the production of an automated pipeline, Hal that produces multiple alignments and trees from genomic data. These alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. In short, the Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees. The script is available at sourceforge (http://sourceforge.net/projects/bio-hal/). The results from an example analysis of Kingdom Fungi are briefly discussed.

  17. Converting CSV Files to RKSML Files

    NASA Technical Reports Server (NTRS)

    Trebi-Ollennu, Ashitey; Liebersbach, Robert

    2009-01-01

    A computer program converts, into a format suitable for processing on Earth, files of downlinked telemetric data pertaining to the operation of the Instrument Deployment Device (IDD), which is a robot arm on either of the Mars Explorer Rovers (MERs). The raw downlinked data files are in comma-separated- value (CSV) format. The present program converts the files into Rover Kinematics State Markup Language (RKSML), which is an Extensible Markup Language (XML) format that facilitates representation of operations of the IDD and enables analysis of the operations by means of the Rover Sequencing Validation Program (RSVP), which is used to build sequences of commanded operations for the MERs. After conversion by means of the present program, the downlinked data can be processed by RSVP, enabling the MER downlink operations team to play back the actual IDD activity represented by the telemetric data against the planned IDD activity. Thus, the present program enhances the diagnosis of anomalies that manifest themselves as differences between actual and planned IDD activities.

  18. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    PubMed

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  19. Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis

    PubMed Central

    Navarro, Javier; Nevado, Bruno; Hernández, Porfidio; Vera, Gonzalo; Ramos-Onsins, Sebastián E

    2017-01-01

    The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data. PMID:28894353

  20. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  1. Program Criteria Specifications Document. Computer Program TWDA for Design and Analysis of Inverted-T Retaining Walls and Floodwalls.

    DTIC Science & Technology

    1981-02-01

    or analysis IloduIls,* each pCr forming one specific step in the design or analysis process. These modules will be callable , in any logical sequence...tempt to 1)l 1cC Cind cut of I bar, hut Will slow the required steel area and bond r i u I rl- t t)s per I oot at Uitablt intervals across the base... bond strength) shall be as required in ACI 318-71 Chapter 12, except that computed shear V shall be multiplied by 2.0 and substituted for V u. Tn

  2. Quantitative statistical analysis of cis-regulatory sequences in ABA/VP1- and CBF/DREB1-regulated genes of Arabidopsis.

    PubMed

    Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R

    2005-09-01

    We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.

  3. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services: an example with ChIP-chip data.

    PubMed

    Sand, Olivier; Thomas-Chollier, Morgane; Vervisch, Eric; van Helden, Jacques

    2008-01-01

    This protocol shows how to access the Regulatory Sequence Analysis Tools (RSAT) via a programmatic interface in order to automate the analysis of multiple data sets. We describe the steps for writing a Perl client that connects to the RSAT Web services and implements a workflow to discover putative cis-acting elements in promoters of gene clusters. In the presented example, we apply this workflow to lists of transcription factor target genes resulting from ChIP-chip experiments. For each factor, the protocol predicts the binding motifs by detecting significantly overrepresented hexanucleotides in the target promoters and generates a feature map that displays the positions of putative binding sites along the promoter sequences. This protocol is addressed to bioinformaticians and biologists with programming skills (notions of Perl). Running time is approximately 6 min on the example data set.

  4. AntiHunter 2.0: increased speed and sensitivity in searching BLAST output for EST antisense transcripts.

    PubMed

    Lavorgna, Giovanni; Triunfo, Riccardo; Santoni, Federico; Orfanelli, Ugo; Noci, Sara; Bulfone, Alessandro; Zanetti, Gianluigi; Casari, Giorgio

    2005-07-01

    An increasing number of eukaryotic and prokaryotic genes are being found to have natural antisense transcripts (NATs). There is also growing evidence to suggest that antisense transcription could play a key role in many human diseases. Consequently, there have been several recent attempts to set up computational procedures aimed at identifying novel NATs. Our group has developed the AntiHunter program for the identification of expressed sequence tag (EST) antisense transcripts from BLAST output. In order to perform an analysis, the program requires a genomic sequence plus an associated list of transcript names and coordinates of the genomic region. After masking the repeated regions, the program carries out a BLASTN search of this sequence in the selected EST database, reporting via email the EST entries that reveal an antisense transcript according to the user-supplied list. Here, we present the newly developed version 2.0 of the AntiHunter tool. Several improvements have been added to this version of the program in order to increase its ability to detect a larger number of antisense ESTs. As a result, AntiHunter can now detect, on average, >45% more antisense ESTs with little or no increase in the percentage of the false positives. We also raised the maximum query size to 3 Mb (previously 1 Mb). Moreover, we found that a reasonable trade-off between the program search sensitivity and the maximum allowed size of the input-query sequence could be obtained by querying the database with the MEGABLAST program, rather than by using the BLAST one. We now offer this new opportunity to users, i.e. if choosing the MEGABLAST option, users can input a query sequence up to 30 Mb long, thus considerably improving the possibility to analyze longer query regions. The AntiHunter tool is freely available at http://bioinfo.crs4.it/AH2.0.

  5. Rapid Detection of Rare Deleterious Variants by Next Generation Sequencing with Optional Microarray SNP Genotype Data

    PubMed Central

    Watson, Christopher M.; Crinnion, Laura A.; Gurgel‐Gianetti, Juliana; Harrison, Sally M.; Daly, Catherine; Antanavicuite, Agne; Lascelles, Carolina; Markham, Alexander F.; Pena, Sergio D. J.; Bonthron, David T.

    2015-01-01

    ABSTRACT Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease‐causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome‐wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution. PMID:26037133

  6. Complexity: an internet resource for analysis of DNA sequence complexity

    PubMed Central

    Orlov, Y. L.; Potapov, V. N.

    2004-01-01

    The search for DNA regions with low complexity is one of the pivotal tasks of modern structural analysis of complete genomes. The low complexity may be preconditioned by strong inequality in nucleotide content (biased composition), by tandem or dispersed repeats or by palindrome-hairpin structures, as well as by a combination of all these factors. Several numerical measures of textual complexity, including combinatorial and linguistic ones, together with complexity estimation using a modified Lempel–Ziv algorithm, have been implemented in a software tool called ‘Complexity’ (http://wwwmgs.bionet.nsc.ru/mgs/programs/low_complexity/). The software enables a user to search for low-complexity regions in long sequences, e.g. complete bacterial genomes or eukaryotic chromosomes. In addition, it estimates the complexity of groups of aligned sequences. PMID:15215465

  7. Looking for Trouble: Preventive Genomic Sequencing in the General Population and the Role of Patient Choice

    PubMed Central

    Lázaro-Muñoz, Gabriel; Conley, John M.; Davis, Arlene M.; Van Riper, Marcia; Walker, Rebecca L.; Juengst, Eric T.

    2015-01-01

    Advances in genomics have led to calls for developing population-based preventive genomic sequencing (PGS) programs with the goal of identifying genetic health risks in adults without known risk factors. One critical issue for minimizing the harms and maximizing the benefits of PGS is determining the kind and degree of control individuals should have over the generation, use, and handling of their genomic information. In this article we examine whether PGS programs should offer individuals the opportunity to selectively opt-out of the sequencing or analysis of specific genomic conditions (the menu approach) or whether PGS should be implemented using an all-or-nothing panel approach. We conclude that any responsible scale up of PGS will require a menu approach that may seem impractical to some, but which draws its justification from a rich mix of normative, legal, and practical considerations. PMID:26147254

  8. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

    PubMed

    Roca, Alberto I

    2014-01-01

    The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.

  9. On the Multilevel Nature of Meta-Analysis: A Tutorial, Comparison of Software Programs, and Discussion of Analytic Choices.

    PubMed

    Pastor, Dena A; Lazowski, Rory A

    2018-01-01

    The term "multilevel meta-analysis" is encountered not only in applied research studies, but in multilevel resources comparing traditional meta-analysis to multilevel meta-analysis. In this tutorial, we argue that the term "multilevel meta-analysis" is redundant since all meta-analysis can be formulated as a special kind of multilevel model. To clarify the multilevel nature of meta-analysis the four standard meta-analytic models are presented using multilevel equations and fit to an example data set using four software programs: two specific to meta-analysis (metafor in R and SPSS macros) and two specific to multilevel modeling (PROC MIXED in SAS and HLM). The same parameter estimates are obtained across programs underscoring that all meta-analyses are multilevel in nature. Despite the equivalent results, not all software programs are alike and differences are noted in the output provided and estimators available. This tutorial also recasts distinctions made in the literature between traditional and multilevel meta-analysis as differences between meta-analytic choices, not between meta-analytic models, and provides guidance to inform choices in estimators, significance tests, moderator analyses, and modeling sequence. The extent to which the software programs allow flexibility with respect to these decisions is noted, with metafor emerging as the most favorable program reviewed.

  10. Transcriptome Analysis of the Portunus trituberculatus: De Novo Assembly, Growth-Related Gene Identification and Marker Discovery

    PubMed Central

    Lv, Jianjian; Liu, Ping; Gao, Baoquan; Wang, Yu; Wang, Zheng; Chen, Ping; Li, Jian

    2014-01-01

    Background The swimming crab, Portunus trituberculatus, is an important farmed species in China, has been attracting extensive studies, which require more and more genome background knowledge. To date, the sequencing of its whole genome is unavailable and transcriptomic information is also scarce for this species. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for major tissues of Portunus trituberculatus by the Illumina paired-end sequencing technology. Results Total RNA was isolated from eyestalk, gill, heart, hepatopancreas and muscle. Equal quantities of RNA from each tissue were pooled to construct a cDNA library. Using the Illumina paired-end sequencing technology, we generated a total of 120,137 transcripts with an average length of 1037 bp. Further assembly analysis showed that all contigs contributed to 87,100 unigenes, of these, 16,029 unigenes (18.40% of the total) can be matched in the GenBank non-redundant database. Potential genes and their functions were predicted by GO, KEGG pathway mapping and COG analysis. Based on our sequence analysis and published literature, many putative genes with fundamental roles in growth and muscle development, including actin, myosin, tropomyosin, troponin and other potentially important candidate genes were identified for the first time in this specie. Furthermore, 22,673 SSRs and 66,191 high-confidence SNPs were identified in this EST dataset. Conclusion The transcriptome provides an invaluable new data for a functional genomics resource and future biological research in Portunus trituberculatus. The data will also instruct future functional studies to manipulate or select for genes influencing growth that should find practical applications in aquaculture breeding programs. The molecular markers identified in this study will provide a material basis for future genetic linkage and quantitative trait loci analyses, and will be essential for accelerating aquaculture breeding programs with this species. PMID:24722690

  11. Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

    PubMed Central

    Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

    2000-01-01

    The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409

  12. SONAR: A High-Throughput Pipeline for Inferring Antibody Ontogenies from Longitudinal Sequencing of B Cell Transcripts

    PubMed Central

    Schramm, Chaim A.; Sheng, Zizhang; Zhang, Zhenhai; Mascola, John R.; Kwong, Peter D.; Shapiro, Lawrence

    2016-01-01

    The rapid advance of massively parallel or next-generation sequencing technologies has made possible the characterization of B cell receptor repertoires in ever greater detail, and these developments have triggered a proliferation of software tools for processing and annotating these data. Of especial interest, however, is the capability to track the development of specific antibody lineages across time, which remains beyond the scope of most current programs. We have previously reported on the use of techniques such as inter- and intradonor analysis and CDR3 tracing to identify transcripts related to an antibody of interest. Here, we present Software for the Ontogenic aNalysis of Antibody Repertoires (SONAR), capable of automating both general repertoire analysis and specialized techniques for investigating specific lineages. SONAR annotates next-generation sequencing data, identifies transcripts in a lineage of interest, and tracks lineage development across multiple time points. SONAR also generates figures, such as identity–divergence plots and longitudinal phylogenetic “birthday” trees, and provides interfaces to other programs such as DNAML and BEAST. SONAR can be downloaded as a ready-to-run Docker image or manually installed on a local machine. In the latter case, it can also be configured to take advantage of a high-performance computing cluster for the most computationally intensive steps, if available. In summary, this software provides a useful new tool for the processing of large next-generation sequencing datasets and the ontogenic analysis of neutralizing antibody lineages. SONAR can be found at https://github.com/scharch/SONAR, and the Docker image can be obtained from https://hub.docker.com/r/scharch/sonar/. PMID:27708645

  13. VOE Computer Programming: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in computer programming. The guide consists of a course description; general course…

  14. A new version of the RDP (Ribosomal Database Project)

    NASA Technical Reports Server (NTRS)

    Maidak, B. L.; Cole, J. R.; Parker, C. T. Jr; Garrity, G. M.; Larsen, N.; Li, B.; Lilburn, T. G.; McCaughey, M. J.; Olsen, G. J.; Overbeek, R.; hide

    1999-01-01

    The Ribosomal Database Project (RDP-II), previously described by Maidak et al. [ Nucleic Acids Res. (1997), 25, 109-111], is now hosted by the Center for Microbial Ecology at Michigan State University. RDP-II is a curated database that offers ribosomal RNA (rRNA) nucleotide sequence data in aligned and unaligned forms, analysis services, and associated computer programs. During the past two years, data alignments have been updated and now include >9700 small subunit rRNA sequences. The recent development of an ObjectStore database will provide more rapid updating of data, better data accuracy and increased user access. RDP-II includes phylogenetically ordered alignments of rRNA sequences, derived phylogenetic trees, rRNA secondary structure diagrams, and various software programs for handling, analyzing and displaying alignments and trees. The data are available via anonymous ftp (ftp.cme.msu. edu) and WWW (http://www.cme.msu.edu/RDP). The WWW server provides ribosomal probe checking, approximate phylogenetic placement of user-submitted sequences, screening for possible chimeric rRNA sequences, automated alignment, and a suggested placement of an unknown sequence on an existing phylogenetic tree. Additional utilities also exist at RDP-II, including distance matrix, T-RFLP, and a Java-based viewer of the phylogenetic trees that can be used to create subtrees.

  15. GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data

    PubMed Central

    Dorff, Kevin C.; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien

    2013-01-01

    We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. PMID:23936070

  16. Interim Reliability Evaluation Program: analysis of the Browns Ferry, Unit 1, nuclear plant. Main report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mays, S.E.; Poloski, J.P.; Sullivan, W.H.

    1982-07-01

    A probabilistic risk assessment (PRA) was made of the Browns Ferry, Unit 1, nuclear plant as part of the Nuclear Regulatory Commission's Interim Reliability Evaluation Program (IREP). Specific goals of the study were to identify the dominant contributors to core melt, develop a foundation for more extensive use of PRA methods, expand the cadre of experienced PRA practitioners, and apply procedures for extension of IREP analyses to other domestic light water reactors. Event tree and fault tree analyses were used to estimate the frequency of accident sequences initiated by transients and loss of coolant accidents. External events such as floods,more » fires, earthquakes, and sabotage were beyond the scope of this study and were, therefore, excluded. From these sequences, the dominant contributors to probable core melt frequency were chosen. Uncertainty and sensitivity analyses were performed on these sequences to better understand the limitations associated with the estimated sequence frequencies. Dominant sequences were grouped according to common containment failure modes and corresponding release categories on the basis of comparison with analyses of similar designs rather than on the basis of detailed plant-specific calculations.« less

  17. Empirical transfer functions for stations in the Central California seismological network

    USGS Publications Warehouse

    Bakun, W.H.; Dratler, Jay

    1976-01-01

    A sequence of calibration signals composed of a station identification code, a transient from the release of the seismometer mass at rest from a known displacement from the equilibrium position, and a transient from a known step in voltage to the amplifier input are generated by the automatic daily calibration system (ADCS) now operational in the U.S. Geological Survey central California seismographic network. Documentation of a sequence of interactive programs to compute, from the calibration data, the complex transfer functions for the seismographic system (ground motion through digitizer) the electronics (amplifier through digitizer), and the seismometer alone are presented. The analysis utilizes the Fourier transform technique originally suggested by Espinosa et al (1962). Section I is a general description of seismographic calibration. Section II contrasts the 'Fourier transform' and the 'least-squares' techniques for analyzing transient calibration signals. Theoretical consideration for the Fourier transform technique used here are described in Section III. Section IV is a detailed description of the sequence of calibration signals generated by the ADCS. Section V is a brief 'cookbook description' of the calibration programs; Section VI contains a detailed sample program execution. Section VII suggests the uses of the resultant empirical transfer functions. Supplemental interactive programs by which smooth response functions, suitable for reducing seismic data to ground motion, are also documented in Section VII. Appendices A and B contain complete listings of the Fortran source Codes while Appendix C is an update containing preliminary results obtained from an analysis of some of the calibration signals from stations in the seismographic network near Oroville, California.

  18. Analysis of whole genome sequences of 16 strains of rubella virus from the United States, 1961-2009.

    PubMed

    Abernathy, Emily; Chen, Min-hsin; Bera, Jayati; Shrivastava, Susmita; Kirkness, Ewen; Zheng, Qi; Bellini, William; Icenogle, Joseph

    2013-01-25

    Rubella virus is the causative agent of rubella, a mild rash illness, and a potent teratogenic agent when contracted by a pregnant woman. Global rubella control programs target the reduction and elimination of congenital rubella syndrome. Phylogenetic analysis of partial sequences of rubella viruses has contributed to virus surveillance efforts and played an important role in demonstrating that indigenous rubella viruses have been eliminated in the United States. Sixteen wild-type rubella viruses were chosen for whole genome sequencing. All 16 viruses were collected in the United States from 1961 to 2009 and are from 8 of the 13 known rubella genotypes. Phylogenetic analysis of 30 whole genome sequences produced a maximum likelihood tree giving high bootstrap values for all genotypes except provisional genotype 1a. Comparison of the 16 new complete sequences and 14 previously sequenced wild-type viruses found regions with clusters of variable amino acids. The 5' 250 nucleotides of the genome are more conserved than any other part of the genome. Genotype specific deletions in the untranslated region between the non-structural and structural open reading frames were observed for genotypes 2B and genotype 1G. No evidence was seen for recombination events among the 30 viruses. The analysis presented here is consistent with previous reports on the genetic characterization of rubella virus genomes. Conserved and variable regions were identified and additional evidence for genotype specific nucleotide deletions in the intergenic region was found. Phylogenetic analysis confirmed genotype groupings originally based on structural protein coding region sequences, which provides support for the WHO nomenclature for genetic characterization of wild-type rubella viruses.

  19. Data compression of discrete sequence: A tree based approach using dynamic programming

    NASA Technical Reports Server (NTRS)

    Shivaram, Gurusrasad; Seetharaman, Guna; Rao, T. R. N.

    1994-01-01

    A dynamic programming based approach for data compression of a ID sequence is presented. The compression of an input sequence of size N to that of a smaller size k is achieved by dividing the input sequence into k subsequences and replacing the subsequences by their respective average values. The partitioning of the input sequence is carried with the intention of reducing the mean squared error in the reconstructed sequence. The complexity involved in finding the partitions which would result in such an optimal compressed sequence is reduced by using the dynamic programming approach, which is presented.

  20. Aspect-Oriented Subprogram Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    The Rational Sequence computer program described elsewhere includes a subprogram that utilizes the capability for aspect-oriented programming when that capability is present. This subprogram is denoted the Rational Sequence (AspectJ) component because it uses AspectJ, which is an extension of the Java programming language that introduces aspect-oriented programming techniques into the language

  1. MethylViewer: computational analysis and editing for bisulfite sequencing and methyltransferase accessibility protocol for individual templates (MAPit) projects.

    PubMed

    Pardo, Carolina E; Carr, Ian M; Hoffman, Christopher J; Darst, Russell P; Markham, Alexander F; Bonthron, David T; Kladde, Michael P

    2011-01-01

    Bisulfite sequencing is a widely-used technique for examining cytosine DNA methylation at nucleotide resolution along single DNA strands. Probing with cytosine DNA methyltransferases followed by bisulfite sequencing (MAPit) is an effective technique for mapping protein-DNA interactions. Here, MAPit methylation footprinting with M.CviPI, a GC methyltransferase we previously cloned and characterized, was used to probe hMLH1 chromatin in HCT116 and RKO colorectal cancer cells. Because M.CviPI-probed samples contain both CG and GC methylation, we developed a versatile, visually-intuitive program, called MethylViewer, for evaluating the bisulfite sequencing results. Uniquely, MethylViewer can simultaneously query cytosine methylation status in bisulfite-converted sequences at as many as four different user-defined motifs, e.g. CG, GC, etc., including motifs with degenerate bases. Data can also be exported for statistical analysis and as publication-quality images. Analysis of hMLH1 MAPit data with MethylViewer showed that endogenous CG methylation and accessible GC sites were both mapped on single molecules at high resolution. Disruption of positioned nucleosomes on single molecules of the PHO5 promoter was detected in budding yeast using M.CviPII, increasing the number of enzymes available for probing protein-DNA interactions. MethylViewer provides an integrated solution for primer design and rapid, accurate and detailed analysis of bisulfite sequencing or MAPit datasets from virtually any biological or biochemical system.

  2. ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    PubMed

    Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

    2002-12-19

    Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.

  3. Enabling large-scale next-generation sequence assembly with Blacklight

    PubMed Central

    Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.

    2014-01-01

    Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974

  4. Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.

    PubMed

    Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan

    2012-03-01

    Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.

  5. Complete mitochondrial genome sequence of Indian medium carp, Labeo gonius (Hamilton, 1822) and its comparison with other related carp species.

    PubMed

    Behera, Bijay Kumar; Kumari, Kavita; Baisvar, Vishwamitra Singh; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Jena, J K

    2017-01-01

    In the present study, the complete mitochondrial genome sequence of Labeo gonius is reported using PGM sequencer (Ion Torrent). The complete mitogenome of L. gonius is obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP) which is 16 614 bp in length. The mitogenome of L. gonius comprised of 13 protein-coding genes, 22 tRNAs, 2 rRNA genes, and D-loop as control region along with gene order and organization, being similar to most of other fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of Labeo fimbriatus, as reported earlier. The phylogenetic analysis of Cypriniformes depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of L. gonius would be helpful in understanding the population genetics, phylogenetics, and evolution of Indian Carps.

  6. [Development of Staphylococcus Haemolyticus multilocus sequencing scheme and its use for molecular-epidemiologic analysis of strains isolated in hospitals in Russian federation in 2009-2010].

    PubMed

    Voronina, O L; Kunda, M S; Dmitrenko, O A; Lunin, V G; Gintsburg, A L

    2011-01-01

    Development of Staphylococcus haemolyticus strain typing method based on multilocus sequencing for resolving problems of molecular epidemiology. 102 strains of coagulase negative staphylococci (CNS) isolated in hospitals of various specialization in N. Novgorod and Moscow were studied. Species identification of strain was performed by using tuf gene fragment sequencing, S. haemolyticus strain differentiation--by MLST results. eBURST approach was used for cluster analysis of MLST data; structural changes in tagatose-6-phosphate kinase were studied by using InterProScan platform and SWISS-MODEL site programs; MLST scheme gene allele variability analysis was performed by using MEGA4.0 program package. In the 102 strains sampled CNS was detected in 28 strains of the S. haemolyticus species. The MLST scheme developed for the first time for S. haemolyticus including mvaK, rphE, tphK, gtr, arcC, triA, aroE genes allowed the differentiation of the sampled strains by 11 genotypes. Strains with ST 3, 8, 6, 1, 4, 5 and 11 differed by highest epidemiologic significance. Cluster and phylogenetic analysis of the data obtained showed a high adaptive ability of the nosocomial S. haemolyticus strains. Multiresistance to antibacterial preparations was detected in the analyzed strains. The MLST method developed was effective in the differentiation of S. haemolyticus strains that circulate in hospitals and threaten both neonates and hospitalized adult patients.

  7. Finding Protein and Nucleotide Similarities with FASTA

    PubMed Central

    Pearson, William R.

    2016-01-01

    The FASTA programs provide a comprehensive set of rapid similarity searching tools ( fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local and global similarity searches ( ssearch36, ggsearch36) and for searching with short peptides and oligonucleotides ( fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity (Unit 3.5). The FASTA programs can produce “BLAST-like” alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases (Unit 9.4). The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. PMID:27010337

  8. Finding Protein and Nucleotide Similarities with FASTA.

    PubMed

    Pearson, William R

    2016-03-24

    The FASTA programs provide a comprehensive set of rapid similarity searching tools (fasta36, fastx36, tfastx36, fasty36, tfasty36), similar to those provided by the BLAST package, as well as programs for slower, optimal, local, and global similarity searches (ssearch36, ggsearch36), and for searching with short peptides and oligonucleotides (fasts36, fastm36). The FASTA programs use an empirical strategy for estimating statistical significance that accommodates a range of similarity scoring matrices and gap penalties, improving alignment boundary accuracy and search sensitivity. The FASTA programs can produce "BLAST-like" alignment and tabular output, for ease of integration into existing analysis pipelines, and can search small, representative databases, and then report results for a larger set of sequences, using links from the smaller dataset. The FASTA programs work with a wide variety of database formats, including mySQL and postgreSQL databases. The programs also provide a strategy for integrating domain and active site annotations into alignments and highlighting the mutational state of functionally critical residues. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons. Copyright © 2016 John Wiley & Sons, Inc.

  9. Physics First: Impact on SAT Math Scores

    NASA Astrophysics Data System (ADS)

    Bouma, Craig E.

    Improving science, technology, engineering, and mathematics (STEM) education has become a national priority and the call to modernize secondary science has been heard. A Physics First (PF) program with the curriculum sequence of physics, chemistry, and biology (PCB) driven by inquiry- and project-based learning offers a viable alternative to the traditional curricular sequence (BCP) and methods of teaching, but requires more empirical evidence. This study determined impact of a PF program (PF-PCB) on math achievement (SAT math scores) after the first two cohorts of students completed the PF-PCB program at Matteo Ricci High School (MRHS) and provided more quantitative data to inform the PF debate and advance secondary science education. Statistical analysis (ANCOVA) determined the influence of covariates and revealed that PF-PCB program had a significant (p < .05) impact on SAT math scores in the second cohort at MRHS. Statistically adjusted, the SAT math means for PF students were 21.4 points higher than their non-PF counterparts when controlling for prior math achievement (HSTP math), socioeconomic status (SES), and ethnicity/race.

  10. Ecological literacy through critical/place-based pedagogy in the environmental studies program at a small liberal arts college

    NASA Astrophysics Data System (ADS)

    Beeman-Cadwallader, Nicole

    In 2007 Pioneer High School, a public school in Whittier, California changed the sequence of its science courses from the Traditional Biology-Chemistry-Physics (B-C-P) to Biology-Physics-Chemistry (B-P-C), or "Physics Second." The California Standards Tests (CSTs) scores in Physics and Chemistry from 2004-2012 were used to determine if there were any effects of the Physics Second sequencing on student achievement in those courses. The data was also used to determine whether the Physics Second sequence had an effect on performance in Physics and Chemistry based on gender. Independent t tests and chi-square analysis of the data determined an improvement in student performance in Chemistry but not Physics. The 2x2 Factorial ANOVA analysis revealed that in Physics male students performed better on the CSTs than their female peers. In Chemistry, it was noted that male and female students performed equally well. Neither finding was a result ofthe change to the "Physics Second" sequencing.

  11. EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

    PubMed Central

    Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

    2006-01-01

    Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150

  12. Eigenproblem solution by a combined Sturm sequence and inverse iteration technique.

    NASA Technical Reports Server (NTRS)

    Gupta, K. K.

    1973-01-01

    Description of an efficient and numerically stable algorithm, along with a complete listing of the associated computer program, developed for the accurate computation of specified roots and associated vectors of the eigenvalue problem Aq = lambda Bq with band symmetric A and B, B being also positive-definite. The desired roots are first isolated by the Sturm sequence procedure; then a special variant of the inverse iteration technique is applied for the individual determination of each root along with its vector. The algorithm fully exploits the banded form of relevant matrices, and the associated program written in FORTRAN V for the JPL UNIVAC 1108 computer proves to be most significantly economical in comparison to similar existing procedures. The program may be conveniently utilized for the efficient solution of practical engineering problems, involving free vibration and buckling analysis of structures. Results of such analyses are presented for representative structures.

  13. Analysis of quality raw data of second generation sequencers with Quality Assessment Software.

    PubMed

    Ramos, Rommel Tj; Carneiro, Adriana R; Baumbach, Jan; Azevedo, Vasco; Schneider, Maria Pc; Silva, Artur

    2011-04-18

    Second generation technologies have advantages over Sanger; however, they have resulted in new challenges for the genome construction process, especially because of the small size of the reads, despite the high degree of coverage. Independent of the program chosen for the construction process, DNA sequences are superimposed, based on identity, to extend the reads, generating contigs; mismatches indicate a lack of homology and are not included. This process improves our confidence in the sequences that are generated. We developed Quality Assessment Software, with which one can review graphs showing the distribution of quality values from the sequencing reads. This software allow us to adopt more stringent quality standards for sequence data, based on quality-graph analysis and estimated coverage after applying the quality filter, providing acceptable sequence coverage for genome construction from short reads. Quality filtering is a fundamental step in the process of constructing genomes, as it reduces the frequency of incorrect alignments that are caused by measuring errors, which can occur during the construction process due to the size of the reads, provoking misassemblies. Application of quality filters to sequence data, using the software Quality Assessment, along with graphing analyses, provided greater precision in the definition of cutoff parameters, which increased the accuracy of genome construction.

  14. Cloning, sequencing, and analysis of the griseusin polyketide synthase gene cluster from Streptomyces griseus.

    PubMed Central

    Yu, T W; Bibb, M J; Revill, W P; Hopwood, D A

    1994-01-01

    A fragment of DNA was cloned from the Streptomyces griseus K-63 genome by using genes (act) for the actinorhodin polyketide synthase (PKS) of Streptomyces coelicolor as a probe. Sequencing of a 5.4-kb segment of the cloned DNA revealed a set of five gris open reading frames (ORFs), corresponding to the act PKS genes, in the following order: ORF1 for a ketosynthase, ORF2 for a chain length-determining factor, ORF3 for an acyl carrier protein, ORF5 for a ketoreductase, and ORF4 for a cyclase-dehydrase. Replacement of the gris genes with a marker gene in the S. griseus genome by using a single-stranded suicide vector propagated in Escherichia coli resulted in loss of the ability to produce griseusins A and B, showing that the five gris genes do indeed encode the type II griseusin PKS. These genes, encoding a PKS that is programmed differently from those for other aromatic PKSs so far available, will provide further valuable material for analysis of the programming mechanism by the construction and analysis of strains carrying hybrid PKS. Images PMID:8169211

  15. Detecting and Analyzing Genetic Recombination Using RDP4.

    PubMed

    Martin, Darren P; Murrell, Ben; Khoosal, Arjun; Muhire, Brejnev

    2017-01-01

    Recombination between nucleotide sequences is a major process influencing the evolution of most species on Earth. The evolutionary value of recombination has been widely debated and so too has its influence on evolutionary analysis methods that assume nucleotide sequences replicate without recombining. When nucleic acids recombine, the evolution of the daughter or recombinant molecule cannot be accurately described by a single phylogeny. This simple fact can seriously undermine the accuracy of any phylogenetics-based analytical approach which assumes that the evolutionary history of a set of recombining sequences can be adequately described by a single phylogenetic tree. There are presently a large number of available methods and associated computer programs for analyzing and characterizing recombination in various classes of nucleotide sequence datasets. Here we examine the use of some of these methods to derive and test recombination hypotheses using multiple sequence alignments.

  16. Operations analysis (study 2.1): Program manual and users guide for the LOVES computer code

    NASA Technical Reports Server (NTRS)

    Wray, S. T., Jr.

    1975-01-01

    Information is provided necessary to use the LOVES Computer Program in its existing state, or to modify the program to include studies not properly handled by the basic model. The Users Guide defines the basic elements assembled together to form the model for servicing satellites in orbit. As the program is a simulation, the method of attack is to disassemble the problem into a sequence of events, each occurring instantaneously and each creating one or more other events in the future. The main driving force of the simulation is the deterministic launch schedule of satellites and the subsequent failure of the various modules which make up the satellites. The LOVES Computer Program uses a random number generator to simulate the failure of module elements and therefore operates over a long span of time typically 10 to 15 years. The sequence of events is varied by making several runs in succession with different random numbers resulting in a Monte Carlo technique to determine statistical parameters of minimum value, average value, and maximum value.

  17. Selective Exposure to Televised Violence.

    ERIC Educational Resources Information Center

    Atkin, Charles; And Others

    1979-01-01

    Present the results of a study conducted to determine the correlation between children's selection of television programs and aggression. The regression analysis suggests that the relationship between viewing and aggression may be attributable to selective exposure rather than the reverse viewing-causes-aggression sequence. (Author/JVP)

  18. BIOPEP database and other programs for processing bioactive peptide sequences.

    PubMed

    Minkiewicz, Piotr; Dziuba, Jerzy; Iwaniak, Anna; Dziuba, Marta; Darewicz, Małgorzata

    2008-01-01

    This review presents the potential for application of computational tools in peptide science based on a sample BIOPEP database and program as well as other programs and databases available via the World Wide Web. The BIOPEP application contains a database of biologically active peptide sequences and a program enabling construction of profiles of the potential biological activity of protein fragments, calculation of quantitative descriptors as measures of the value of proteins as potential precursors of bioactive peptides, and prediction of bonds susceptible to hydrolysis by endopeptidases in a protein chain. Other bioactive and allergenic peptide sequence databases are also presented. Programs enabling the construction of binary and multiple alignments between peptide sequences, the construction of sequence motifs attributed to a given type of bioactivity, searching for potential precursors of bioactive peptides, and the prediction of sites susceptible to proteolytic cleavage in protein chains are available via the Internet as are other approaches concerning secondary structure prediction and calculation of physicochemical features based on amino acid sequence. Programs for prediction of allergenic and toxic properties have also been developed. This review explores the possibilities of cooperation between various programs.

  19. PRADA: pipeline for RNA sequencing data analysis.

    PubMed

    Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F; Weinstein, John N; Getz, Gad; Verhaak, Roel G W

    2014-08-01

    Technological advances in high-throughput sequencing necessitate improved computational tools for processing and analyzing large-scale datasets in a systematic automated manner. For that purpose, we have developed PRADA (Pipeline for RNA-Sequencing Data Analysis), a flexible, modular and highly scalable software platform that provides many different types of information available by multifaceted analysis starting from raw paired-end RNA-seq data: gene expression levels, quality metrics, detection of unsupervised and supervised fusion transcripts, detection of intragenic fusion variants, homology scores and fusion frame classification. PRADA uses a dual-mapping strategy that increases sensitivity and refines the analytical endpoints. PRADA has been used extensively and successfully in the glioblastoma and renal clear cell projects of The Cancer Genome Atlas program.  http://sourceforge.net/projects/prada/  gadgetz@broadinstitute.org or rverhaak@mdanderson.org  Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Streaming support for data intensive cloud-based sequence analysis.

    PubMed

    Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

  1. 2012 U.S. Department of Energy: Joint Genome Institute: Progress Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, David

    2013-01-01

    The mission of the U.S. Department of Energy Joint Genome Institute (DOE JGI) is to serve the diverse scientific community as a user facility, enabling the application of large-scale genomics and analysis of plants, microbes, and communities of microbes to address the DOE mission goals in bioenergy and the environment. The DOE JGI's sequencing efforts fall under the Eukaryote Super Program, which includes the Plant and Fungal Genomics Programs; and the Prokaryote Super Program, which includes the Microbial Genomics and Metagenomics Programs. In 2012, several projects made news for their contributions to energy and environment research.

  2. HIV-1 pol mutation frequency by subtype and treatment experience: extension of the HIVseq program to seven non-B subtypes.

    PubMed

    Rhee, Soo-Yon; Kantor, Rami; Katzenstein, David A; Camacho, Ricardo; Morris, Lynn; Sirivichayakul, Sunee; Jorgensen, Louise; Brigido, Luis F; Schapiro, Jonathan M; Shafer, Robert W

    2006-03-21

    HIVseq was developed in 2000 to make published data on the frequency of HIV-1 group M protease and reverse transcriptase (RT) mutations available in real time to laboratories and researchers sequencing these genes. Because most published protease and RT sequences belonged to subtype B, the initial version of HIVseq was based on this subtype. As additional non-B sequences from persons with well-characterized antiretroviral treatment histories have become available, the program has been extended to subtypes A, C, D, F, G, CRF01, and CRF02. The latest frequency of each protease and RT mutation according to subtype and drug-class exposure was calculated using published sequences in the Stanford HIV RT and Protease Sequence Database. Each mutation was hyperlinked to published reports of viruses containing the mutation. As of September 2005, the mean number of protease sequences per non-B subtype was 534 from protease inhibitor-naive persons and 133 from protease inhibitor-treated persons, representing 13.2% and 2.3%, respectively, of the data available for subtype B. The mean number of RT sequences per non-B subtype was 373 from RT inhibitor-naive persons and 288 from RT inhibitor-treated persons, representing 17.9% and 3.8%, respectively, of the data available for subtype B. HIVseq allows users to examine protease and RT mutations within the context of previously published sequences of these genes. The publication of additional non-B protease and RT sequences from persons with well-characterized treatment histories, however, will be required to perform the same types of analysis possible with the much larger number of subtype B sequences.

  3. The technology application process as applied to a firefighter's breathing system

    NASA Technical Reports Server (NTRS)

    Mclaughlan, P. B.

    1974-01-01

    The FBS Program indicated that applications of advanced technology can result in an improved FBS that will satisfy the requirements defined by municipal fire departments. To accomplish this technology transfer, a substantial commitment of resources over an extended period of time has been required. This program has indicated that the ability of NASA in terms of program management such as requirement definition, system analysis, and industry coordination may play as important a role as specific sources of hardware technology. As a result of the FBS program, a sequence of milestones was passed that may have applications as generalized milestones and objectives for any technical application program.

  4. Evaluation of sequence alignments and oligonucleotide probes with respect to three-dimensional structure of ribosomal RNA using ARB software package

    PubMed Central

    Kumar, Yadhu; Westram, Ralf; Kipfer, Peter; Meier, Harald; Ludwig, Wolfgang

    2006-01-01

    Background Availability of high-resolution RNA crystal structures for the 30S and 50S ribosomal subunits and the subsequent validation of comparative secondary structure models have prompted the biologists to use three-dimensional structure of ribosomal RNA (rRNA) for evaluating sequence alignments of rRNA genes. Furthermore, the secondary and tertiary structural features of rRNA are highly useful and successfully employed in designing rRNA targeted oligonucleotide probes intended for in situ hybridization experiments. RNA3D, a program to combine sequence alignment information with three-dimensional structure of rRNA was developed. Integration into ARB software package, which is used extensively by the scientific community for phylogenetic analysis and molecular probe designing, has substantially extended the functionality of ARB software suite with 3D environment. Results Three-dimensional structure of rRNA is visualized in OpenGL 3D environment with the abilities to change the display and overlay information onto the molecule, dynamically. Phylogenetic information derived from the multiple sequence alignments can be overlaid onto the molecule structure in a real time. Superimposition of both statistical and non-statistical sequence associated information onto the rRNA 3D structure can be done using customizable color scheme, which is also applied to a textual sequence alignment for reference. Oligonucleotide probes designed by ARB probe design tools can be mapped onto the 3D structure along with the probe accessibility models for evaluation with respect to secondary and tertiary structural conformations of rRNA. Conclusion Visualization of three-dimensional structure of rRNA in an intuitive display provides the biologists with the greater possibilities to carry out structure based phylogenetic analysis. Coupled with secondary structure models of rRNA, RNA3D program aids in validating the sequence alignments of rRNA genes and evaluating probe target sites. Superimposition of the information derived from the multiple sequence alignment onto the molecule dynamically allows the researchers to observe any sequence inherited characteristics (phylogenetic information) in real-time environment. The extended ARB software package is made freely available for the scientific community via . PMID:16672074

  5. Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.

    PubMed

    Sakai, Ryo; Aerts, Jan

    2014-01-01

    The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.

  6. ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos

    PubMed Central

    2014-01-01

    Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393

  7. RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps

    PubMed Central

    Drory Retwitzer, Matan; Polishchuk, Maya; Churkin, Elena; Kifer, Ilona; Yakhini, Zohar; Barash, Danny

    2015-01-01

    Searching for RNA sequence-structure patterns is becoming an essential tool for RNA practitioners. Novel discoveries of regulatory non-coding RNAs in targeted organisms and the motivation to find them across a wide range of organisms have prompted the use of computational RNA pattern matching as an enhancement to sequence similarity. State-of-the-art programs differ by the flexibility of patterns allowed as queries and by their simplicity of use. In particular—no existing method is available as a user-friendly web server. A general program that searches for RNA sequence-structure patterns is RNA Structator. However, it is not available as a web server and does not provide the option to allow flexible gap pattern representation with an upper bound of the gap length being specified at any position in the sequence. Here, we introduce RNAPattMatch, a web-based application that is user friendly and makes sequence/structure RNA queries accessible to practitioners of various background and proficiency. It also extends RNA Structator and allows a more flexible variable gaps representation, in addition to analysis of results using energy minimization methods. RNAPattMatch service is available at http://www.cs.bgu.ac.il/rnapattmatch. A standalone version of the search tool is also available to download at the site. PMID:25940619

  8. Spreadsheet-based program for alignment of overlapping DNA sequences.

    PubMed

    Anbazhagan, R; Gabrielson, E

    1999-06-01

    Molecular biology laboratories frequently face the challenge of aligning small overlapping DNA sequences derived from a long DNA segment. Here, we present a short program that can be used to adapt Excel spreadsheets as a tool for aligning DNA sequences, regardless of their orientation. The program runs on any Windows or Macintosh operating system computer with Excel 97 or Excel 98. The program is available for use as an Excel file, which can be downloaded from the BioTechniques Web site. Upon execution, the program opens a specially designed customized workbook and is capable of identifying overlapping regions between two sequence fragments and displaying the sequence alignment. It also performs a number of specialized functions such as recognition of restriction enzyme cutting sites and CpG island mapping without costly specialized software.

  9. High-Throughput Tabular Data Processor - Platform independent graphical tool for processing large data sets.

    PubMed

    Madanecki, Piotr; Bałut, Magdalena; Buckley, Patrick G; Ochocka, J Renata; Bartoszewski, Rafał; Crossman, David K; Messiaen, Ludwine M; Piotrowski, Arkadiusz

    2018-01-01

    High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp).

  10. High-Throughput Tabular Data Processor – Platform independent graphical tool for processing large data sets

    PubMed Central

    Bałut, Magdalena; Buckley, Patrick G.; Ochocka, J. Renata; Bartoszewski, Rafał; Crossman, David K.; Messiaen, Ludwine M.; Piotrowski, Arkadiusz

    2018-01-01

    High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp). PMID:29432475

  11. iSeq: Web-Based RNA-seq Data Analysis and Visualization.

    PubMed

    Zhang, Chao; Fan, Caoqi; Gan, Jingbo; Zhu, Ping; Kong, Lei; Li, Cheng

    2018-01-01

    Transcriptome sequencing (RNA-seq) is becoming a standard experimental methodology for genome-wide characterization and quantification of transcripts at single base-pair resolution. However, downstream analysis of massive amount of sequencing data can be prohibitively technical for wet-lab researchers. A functionally integrated and user-friendly platform is required to meet this demand. Here, we present iSeq, an R-based Web server, for RNA-seq data analysis and visualization. iSeq is a streamlined Web-based R application under the Shiny framework, featuring a simple user interface and multiple data analysis modules. Users without programming and statistical skills can analyze their RNA-seq data and construct publication-level graphs through a standardized yet customizable analytical pipeline. iSeq is accessible via Web browsers on any operating system at http://iseq.cbi.pku.edu.cn .

  12. Pulse sequence programming in a dynamic visual environment: SequenceTree.

    PubMed

    Magland, Jeremy F; Li, Cheng; Langham, Michael C; Wehrli, Felix W

    2016-01-01

    To describe SequenceTree, an open source, integrated software environment for implementing MRI pulse sequences and, ideally, exporting them to actual MRI scanners. The software is a user-friendly alternative to vendor-supplied pulse sequence design and editing tools and is suited for programmers and nonprogrammers alike. The integrated user interface was programmed using the Qt4/C++ toolkit. As parameters and code are modified, the pulse sequence diagram is automatically updated within the user interface. Several aspects of pulse programming are handled automatically, allowing users to focus on higher-level aspects of sequence design. Sequences can be simulated using a built-in Bloch equation solver and then exported for use on a Siemens MRI scanner. Ideally, other types of scanners will be supported in the future. SequenceTree has been used for 8 years in our laboratory and elsewhere and has contributed to more than 50 peer-reviewed publications in areas such as cardiovascular imaging, solid state and nonproton NMR, MR elastography, and high-resolution structural imaging. SequenceTree is an innovative, open source, visual pulse sequence environment for MRI combining simplicity with flexibility and is ideal both for advanced users and users with limited programming experience. © 2015 Wiley Periodicals, Inc.

  13. G2S: a web-service for annotating genomic variants on 3D protein structures.

    PubMed

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-06-01

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.

  14. JPL-ANTOPT antenna structure optimization program

    NASA Technical Reports Server (NTRS)

    Strain, D. M.

    1994-01-01

    New antenna path-length error and pointing-error structure optimization codes were recently added to the MSC/NASTRAN structural analysis computer program. Path-length and pointing errors are important measured of structure-related antenna performance. The path-length and pointing errors are treated as scalar displacements for statics loading cases. These scalar displacements can be subject to constraint during the optimization process. Path-length and pointing-error calculations supplement the other optimization and sensitivity capabilities of NASTRAN. The analysis and design functions were implemented as 'DMAP ALTERs' to the Design Optimization (SOL 200) Solution Sequence of MSC-NASTRAN, Version 67.5.

  15. Fast single-pass alignment and variant calling using sequencing data

    USDA-ARS?s Scientific Manuscript database

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  16. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    PubMed Central

    2010-01-01

    Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976

  17. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.

    PubMed

    Taylor, Ronald C

    2010-12-21

    Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms.

  18. Analysis of whole genome sequencing for the Escherichia coli O157:H7 typing phages.

    PubMed

    Cowley, Lauren A; Beckett, Stephen J; Chase-Topping, Margo; Perry, Neil; Dallman, Tim J; Gally, David L; Jenkins, Claire

    2015-04-08

    Shiga toxin producing Escherichia coli O157 can cause severe bloody diarrhea and haemolytic uraemic syndrome. Phage typing of E. coli O157 facilitates public health surveillance and outbreak investigations, certain phage types are more likely to occupy specific niches and are associated with specific age groups and disease severity. The aim of this study was to analyse the genome sequences of 16 (fourteen T4 and two T7) E. coli O157 typing phages and to determine the genes responsible for the subtle differences in phage type profiles. The typing phages were sequenced using paired-end Illumina sequencing at The Genome Analysis Centre and the Animal Health and Veterinary Laboratories Agency and bioinformatics programs including Velvet, Brig and Easyfig were used to analyse them. A two-way Euclidian cluster analysis highlighted the associations between groups of phage types and typing phages. The analysis showed that the T7 typing phages (9 and 10) differed by only three genes and that the T4 typing phages formed three distinct groups of similar genomic sequences: Group 1 (1, 8, 11, 12 and 15, 16), Group 2 (3, 6, 7 and 13) and Group 3 (2, 4, 5 and 14). The E. coli O157 phage typing scheme exhibited a significantly modular network linked to the genetic similarity of each group showing that these groups are specialised to infect a subset of phage types. Sequencing the typing phage has enabled us to identify the variable genes within each group and to determine how this corresponds to changes in phage type.

  19. MySSP: Non-stationary evolutionary sequence simulation, including indels

    PubMed Central

    Rosenberg, Michael S.

    2007-01-01

    MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package. PMID:19325855

  20. A space-efficient algorithm for local similarities.

    PubMed

    Huang, X Q; Hardison, R C; Miller, W

    1990-10-01

    Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.

  1. An adaptive, object oriented strategy for base calling in DNA sequence analysis.

    PubMed Central

    Giddings, M C; Brumley, R L; Haker, M; Smith, L M

    1993-01-01

    An algorithm has been developed for the determination of nucleotide sequence from data produced in fluorescence-based automated DNA sequencing instruments employing the four-color strategy. This algorithm takes advantage of object oriented programming techniques for modularity and extensibility. The algorithm is adaptive in that data sets from a wide variety of instruments and sequencing conditions can be used with good results. Confidence values are provided on the base calls as an estimate of accuracy. The algorithm iteratively employs confidence determinations from several different modules, each of which examines a different feature of the data for accurate peak identification. Modules within this system can be added or removed for increased performance or for application to a different task. In comparisons with commercial software, the algorithm performed well. Images PMID:8233787

  2. REPPER—repeats and their periodicities in fibrous proteins

    PubMed Central

    Gruber, Markus; Söding, Johannes; Lupas, Andrei N.

    2005-01-01

    REPPER (REPeats and their PERiodicities) is an integrated server that detects and analyzes regions with short gapless repeats in protein sequences or alignments. It finds periodicities by Fourier Transform (FTwin) and internal similarity analysis (REPwin). FTwin assigns numerical values to amino acids that reflect certain properties, for instance hydrophobicity, and gives information on corresponding periodicities. REPwin uses self-alignments and displays repeats that reveal significant internal similarities. Both programs use a sliding window to ensure that different periodic regions within the same protein are detected independently. FTwin and REPwin are complemented by secondary structure prediction (PSIPRED) and coiled coil prediction (COILS), making the server a versatile analysis tool for sequences of fibrous proteins. REPPER is available at . PMID:15980460

  3. Dr. Sanger's Apprentice: A Computer-Aided Instruction to Protein Sequencing.

    ERIC Educational Resources Information Center

    Schmidt, Thomas G.; Place, Allen R.

    1985-01-01

    Modeled after the program "Mastermind," this program teaches students the art of protein sequencing. The program (written in Turbo Pascal for the IBM PC, requiring 128K, a graphics adapter, and an 8070 mathematics coprocessor) generates a polypeptide whose sequence and length can be user-defined (for practice) or computer-generated (for…

  4. SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

    PubMed

    Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B

    2016-02-04

    Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.

  5. GenBank

    PubMed Central

    Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

    2007-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (). PMID:17202161

  6. Increasing Active Learning and End-Client Interaction in the Systems Analysis and Design and Capstone Courses

    ERIC Educational Resources Information Center

    Reinicke, Bryan A.; Janicki, Thomas N.

    2010-01-01

    Systems analysis and design (SAD) is one of the core courses offered in most IS programs, yet this class can be challenging for students and instructors alike. The concepts can be abstract, and getting students to appreciate their importance can be difficult. This paper discusses the implementation of a two semester sequence in which the students…

  7. Ray Wu as Fifth Business: Deconstructing collective memory in the history of DNA sequencing.

    PubMed

    Onaga, Lisa A

    2014-06-01

    The concept of 'Fifth Business' is used to analyze a minority standpoint and bring serious attention to the role of scientists who play a galvanizing role in a science but for multiple reasons appear less prominently in more common recounts of any particular development. Biochemist Ray Wu (1928-2008) published a DNA sequencing experiment in March 1970 using DNA polymerase catalysis and specific nucleotide labeling, both of which are foundational to general sequencing methods today. The scant mention of Wu's work from textbooks, research articles, and other accounts of DNA sequencing calls into question how scientific collective memory forms. This alternative history seeks to understand why a key figure in nucleic acid sequence analysis has remained less visibly connected or peripheral to solidifying narratives about the history of DNA sequencing. The study resists predictable dismissals of Wu's work in order to seriously examine the formation of his nucleic acid sequence analysis research program and how he shared his knowledge of sequencing during a period of rapid advancement in the field. An analysis of Wu's work on sequencing the cohesive ends of lambda bacteriophage in the 1960s and 1970s exemplifies how a variety of individuals and groups attempted to develop protocol for sequencing the order of nucleotide base pairs comprising DNA. This historical examination of the sociality of scientific research suggests a way to understand how Wu and others contributed to the very collective memory of DNA sequencing that Wu eventually tried to repair. The study of Wu, who was a Chinese immigrant to the United States, provides a foundation for further critical scholarship on the heterogeneous histories of Asian American bioscientists, the sociality of their scientific works, and how the resulting knowledge produced is preserved, if not evenly, in a scientific field's collective memory. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. Motor programming in apraxia of speech.

    PubMed

    Maas, Edwin; Robin, Donald A; Wright, David L; Ballard, Kirrie J

    2008-08-01

    Apraxia of Speech (AOS) is an impairment of motor programming. However, the exact nature of this deficit remains unclear. The present study examined motor programming in AOS in the context of a recent two-stage model [Klapp, S. T. (1995). Motor response programming during simple and choice reaction time: The role of practice. Journal of Experimental Psychology: Human Perception and Performance, 21, 1015-1027; Klapp, S. T. (2003). Reaction time analysis of two types of motor preparation for speech articulation: Action as a sequence of chunks. Journal of Motor Behavior, 35, 135-150] that proposes a preprogramming stage (INT) and a process that assigns serial order to multiple programs in a sequence (SEQ). The main hypothesis was that AOS involves a process-specific deficit in the INT (preprogramming) stage of processing, rather than in the on-line serial ordering (SEQ) and initiation of movement. In addition, we tested the hypothesis that AOS involves a central (i.e., modality-general) motor programming deficit. We used a reaction time paradigm that provides two dependent measures: study time (the amount of time for participants to ready a motor response; INT), and reaction time (time to initiate movement; SEQ). Two experiments were conducted to examine INT and SEQ in AOS: Experiment 1 involved finger movements, Experiment 2 involved speech movements analogous to the finger movements. Results showed longer preprogramming time for patients with AOS but normal sequencing and initiation times, relative to controls. Together, the findings are consistent with the hypothesis of a process-specific, but central (modality-independent) deficit in AOS; alternative explanations are also discussed.

  9. TRDistiller: a rapid filter for enrichment of sequence datasets with proteins containing tandem repeats.

    PubMed

    Richard, François D; Kajava, Andrey V

    2014-06-01

    The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. Version VI of the ESTree db: an improved tool for peach transcriptome analysis

    PubMed Central

    Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Merelli, Ivan; Barale, Francesca; Milanesi, Luciano; Stella, Alessandra; Pozzi, Carlo

    2008-01-01

    Background The ESTree database (db) is a collection of Prunus persica and Prunus dulcis EST sequences that in its current version encompasses 75,404 sequences from 3 almond and 19 peach libraries. Nine peach genotypes and four peach tissues are represented, from four fruit developmental stages. The aim of this work was to implement the already existing ESTree db by adding new sequences and analysis programs. Particular care was given to the implementation of the web interface, that allows querying each of the database features. Results A Perl modular pipeline is the backbone of sequence analysis in the ESTree db project. Outputs obtained during the pipeline steps are automatically arrayed into the fields of a MySQL database. Apart from standard clustering and annotation analyses, version VI of the ESTree db encompasses new tools for tandem repeat identification, annotation against genomic Rosaceae sequences, and positioning on the database of oligomer sequences that were used in a peach microarray study. Furthermore, known protein patterns and motifs were identified by comparison to PROSITE. Based on data retrieved from sequence annotation against the UniProtKB database, a script was prepared to track positions of homologous hits on the GO tree and build statistics on the ontologies distribution in GO functional categories. EST mapping data were also integrated in the database. The PHP-based web interface was upgraded and extended. The aim of the authors was to enable querying the database according to all the biological aspects that can be investigated from the analysis of data available in the ESTree db. This is achieved by allowing multiple searches on logical subsets of sequences that represent different biological situations or features. Conclusions The version VI of ESTree db offers a broad overview on peach gene expression. Sequence analyses results contained in the database, extensively linked to external related resources, represent a large amount of information that can be queried via the tools offered in the web interface. Flexibility and modularity of the ESTree analysis pipeline and of the web interface allowed the authors to set up similar structures for different datasets, with limited manual intervention. PMID:18387211

  11. Phylo-VISTA: Interactive visualization of multiple DNA sequence alignments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.

    The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. Results: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a frameworkmore » based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. Availability: Phylo-VISTA is available at http://www-gsd.lbl. gov/phylovista. It requires an Internet browser with Java Plugin 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu« less

  12. The complete genome sequence of a south Indian isolate of Rice tungro spherical virus reveals evidence of genetic recombination between distinct isolates.

    PubMed

    Sailaja, B; Anjum, Najreen; Patil, Yogesh K; Agarwal, Surekha; Malathi, P; Krishnaveni, D; Balachandran, S M; Viraktamath, B C; Mangrauthia, Satendra K

    2013-12-01

    In this study, complete genome of a south Indian isolate of Rice tungro spherical virus (RTSV) from Andhra Pradesh (AP) was sequenced, and the predicted amino acid sequence was analysed. The RTSV RNA genome consists of 12,171 nt without the poly(A) tail, encoding a putative typical polyprotein of 3,470 amino acids. Furthermore, cleavage sites and sequence motifs of the polyprotein were predicted. Multiple alignment with other RTSV isolates showed a nucleotide sequence identity of 95% to east Indian isolates and 90% to Philippines isolates. A phylogenetic tree based on complete genome sequence showed that Indian isolates clustered together, while Vt6 and PhilA isolates of Philippines formed two separate clusters. Twelve recombination events were detected in RNA genome of RTSV using the Recombination Detection Program version 3. Recombination analysis suggested significant role of 5' end and central region of genome in virus evolution. Further, AP and Odisha isolates appeared as important RTSV isolates involved in diversification of this virus in India through recombination phenomenon. The new addition of complete genome of first south Indian isolate provided an opportunity to establish the molecular evolution of RTSV through recombination analysis and phylogenetic relationship.

  13. Comparison of Ion Personal Genome Machine Platforms for the Detection of Variants in BRCA1 and BRCA2.

    PubMed

    Hwang, Sang Mee; Lee, Ki Chan; Lee, Min Seob; Park, Kyoung Un

    2018-01-01

    Transition to next generation sequencing (NGS) for BRCA1 / BRCA2 analysis in clinical laboratories is ongoing but different platforms and/or data analysis pipelines give different results resulting in difficulties in implementation. We have evaluated the Ion Personal Genome Machine (PGM) Platforms (Ion PGM, Ion PGM Dx, Thermo Fisher Scientific) for the analysis of BRCA1 /2. The results of Ion PGM with OTG-snpcaller, a pipeline based on Torrent mapping alignment program and Genome Analysis Toolkit, from 75 clinical samples and 14 reference DNA samples were compared with Sanger sequencing for BRCA1 / BRCA2 . Ten clinical samples and 14 reference DNA samples were additionally sequenced by Ion PGM Dx with Torrent Suite. Fifty types of variants including 18 pathogenic or variants of unknown significance were identified from 75 clinical samples and known variants of the reference samples were confirmed by Sanger sequencing and/or NGS. One false-negative results were present for Ion PGM/OTG-snpcaller for an indel variant misidentified as a single nucleotide variant. However, eight discordant results were present for Ion PGM Dx/Torrent Suite with both false-positive and -negative results. A 40-bp deletion, a 4-bp deletion and a 1-bp deletion variant was not called and a false-positive deletion was identified. Four other variants were misidentified as another variant. Ion PGM/OTG-snpcaller showed acceptable performance with good concordance with Sanger sequencing. However, Ion PGM Dx/Torrent Suite showed many discrepant results not suitable for use in a clinical laboratory, requiring further optimization of the data analysis for calling variants.

  14. VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening.

    PubMed

    Schäffer, Alejandro A; Nawrocki, Eric P; Choi, Yoon; Kitts, Paul A; Karsch-Mizrachi, Ilene; McVeigh, Richard

    2018-03-01

    Nucleic acid sequences in public databases should not contain vector contamination, but many sequences in GenBank do (or did) contain vectors. The National Center for Biotechnology Information uses the program VecScreen to screen submitted sequences for contamination. Additional tools are needed to distinguish true-positive (contamination) from false-positive (not contamination) VecScreen matches. A principal reason for false-positive VecScreen matches is that the sequence and the matching vector subsequence originate from closely related or identical organisms (for example, both originate in Escherichia coli). We collected information on the taxonomy of sources of vector segments in the UniVec database used by VecScreen. We used that information in two overlapping software pipelines for retrospective analysis of contamination in GenBank and for prospective analysis of contamination in new sequence submissions. Using the retrospective pipeline, we identified and corrected over 8000 contaminated sequences in the nonredundant nucleotide database. The prospective analysis pipeline has been in production use since April 2017 to evaluate some new GenBank submissions. Data on the sources of UniVec entries were included in release 10.0 (ftp://ftp.ncbi.nih.gov/pub/UniVec/). The main software is freely available at https://github.com/aaschaffer/vecscreen_plus_taxonomy. aschaffe@helix.nih.gov. Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.

  15. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    PubMed

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  16. Genetic analysis of tolerance to the root lesion nematode Pratylenchus neglectus in the legume Medicago littoralis.

    PubMed

    Oldach, Klaus H; Peck, David M; Nair, Ramakrishnan M; Sokolova, Maria; Harris, John; Bogacki, Paul; Ballard, Ross

    2014-04-17

    The nematode Pratylenchus neglectus has a wide host range and is able to feed on the root systems of cereals, oilseeds, grain and pasture legumes. Under the Mediterranean low rainfall environments of Australia, annual Medicago pasture legumes are used in rotation with cereals to fix atmospheric nitrogen and improve soil parameters. Considerable efforts are being made in breeding programs to improve resistance and tolerance to Pratylenchus neglectus in the major crops wheat and barley, which makes it vital to develop appropriate selection tools in medics. A strong source of tolerance to root damage by the root lesion nematode (RLN) Pratylenchus neglectus had previously been identified in line RH-1 (strand medic, M. littoralis). Using RH-1, we have developed a single seed descent (SSD) population of 138 lines by crossing it to the intolerant cultivar Herald. After inoculation, RLN-associated root damage clearly segregated in the population. Genetic analysis was performed by constructing a genetic map using simple sequence repeat (SSR) and gene-based SNP markers. A highly significant quantitative trait locus (QTL), QPnTolMl.1, was identified explaining 49% of the phenotypic variation in the SSD population. All SSRs and gene-based markers in the QTL region were derived from chromosome 1 of the sequenced genome of the closely related species M. truncatula. Gene-based markers were validated in advanced breeding lines derived from the RH-1 parent and also a second RLN tolerance source, RH-2 (M. truncatula ssp. tricycla). Comparative analysis to sequenced legume genomes showed that the physical QTL interval exists as a synteny block in Lotus japonicus, common bean, soybean and chickpea. Furthermore, using the sequenced genome information of M. truncatula, the QTL interval contains 55 genes out of which five are discussed as potential candidate genes responsible for the mapped tolerance. The closely linked set of SNP-based PCR markers is directly applicable to select for two different sources of RLN tolerance in breeding programs. Moreover, genome sequence information has allowed proposing candidate genes for further functional analysis and nominates QPnTolMl.1 as a target locus for RLN tolerance in economically important grain legumes, e.g. chickpea.

  17. Paternity analysis in Excel.

    PubMed

    Rocheta, Margarida; Dionísio, F Miguel; Fonseca, Luís; Pires, Ana M

    2007-12-01

    Paternity analysis using microsatellite information is a well-studied subject. These markers are ideal for parentage studies and fingerprinting, due to their high-discrimination power. This type of data is used to assign paternity, to compute the average selfing and outcrossing rates and to estimate the biparental inbreeding. There are several public domain programs that compute all this information from data. Most of the time, it is necessary to export data to some sort of format, feed it to the program and import the output to an Excel book for further processing. In this article we briefly describe a program referred from now on as Paternity Analysis in Excel (PAE), developed at IST and IBET (see the acknowledgments) that computes paternity candidates from data, and other information, from within Excel. In practice this means that the end user provides the data in an Excel sheet and, by pressing an appropriate button, obtains the results in another Excel sheet. For convenience PAE is divided into two modules. The first one is a filtering module that selects data from the sequencer and reorganizes it in a format appropriate to process paternity analysis, assuming certain conventions for the names of parents and offspring from the sequencer. The second module carries out the paternity analysis assuming that one parent is known. Both modules are written in Excel-VBA and can be obtained at the address (www.math.ist.utl.pt/~fmd/pa/pa.zip). They are free for non-commercial purposes and have been tested with different data and against different software (Cervus, FaMoz, and MLTR).

  18. AlignMe—a membrane protein sequence alignment web server

    PubMed Central

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  19. Genomic sequencing of Pleistocene cave bears

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Noonan, James P.; Hofreiter, Michael; Smith, Doug

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less

  20. MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis.

    PubMed

    Kumar, Sudhir; Stecher, Glen; Peterson, Daniel; Tamura, Koichiro

    2012-10-15

    There is a growing need in the research community to apply the molecular evolutionary genetics analysis (MEGA) software tool for batch processing a large number of datasets and to integrate it into analysis workflows. Therefore, we now make available the computing core of the MEGA software as a stand-alone executable (MEGA-CC), along with an analysis prototyper (MEGA-Proto). MEGA-CC provides users with access to all the computational analyses available through MEGA's graphical user interface version. This includes methods for multiple sequence alignment, substitution model selection, evolutionary distance estimation, phylogeny inference, substitution rate and pattern estimation, tests of natural selection and ancestral sequence inference. Additionally, we have upgraded the source code for phylogenetic analysis using the maximum likelihood methods for parallel execution on multiple processors and cores. Here, we describe MEGA-CC and outline the steps for using MEGA-CC in tandem with MEGA-Proto for iterative and automated data analysis. http://www.megasoftware.net/.

  1. Development of Genomic Microsatellite Markers in Carthamus tinctorius L. (Safflower) Using Next Generation Sequencing and Assessment of Their Cross-Species Transferability and Utility for Diversity Analysis

    PubMed Central

    Variath, Murali Tottekkad; Joshi, Gopal; Bali, Sapinder; Agarwal, Manu; Kumar, Amar; Jagannath, Arun; Goel, Shailendra

    2015-01-01

    Background Safflower (Carthamus tinctorius L.), an Asteraceae member, yields high quality edible oil rich in unsaturated fatty acids and is resilient to dry conditions. The crop holds tremendous potential for improvement through concerted molecular breeding programs due to the availability of significant genetic and phenotypic diversity. Genomic resources that could facilitate such breeding programs remain largely underdeveloped in the crop. The present study was initiated to develop a large set of novel microsatellite markers for safflower using next generation sequencing. Principal Findings Low throughput genome sequencing of safflower was performed using Illumina paired end technology providing ~3.5X coverage of the genome. Analysis of sequencing data allowed identification of 23,067 regions harboring perfect microsatellite loci. The safflower genome was found to be rich in dinucleotide repeats followed by tri-, tetra-, penta- and hexa-nucleotides. Primer pairs were designed for 5,716 novel microsatellite sequences with repeat length ≥ 20 bases and optimal flanking regions. A subset of 325 microsatellite loci was tested for amplification, of which 294 loci produced robust amplification. The validated primers were used for assessment of 23 safflower accessions belonging to diverse agro-climatic zones of the world leading to identification of 93 polymorphic primers (31.6%). The numbers of observed alleles at each locus ranged from two to four and mean polymorphism information content was found to be 0.3075. The polymorphic primers were tested for cross-species transferability on nine wild relatives of cultivated safflower. All primers except one showed amplification in at least two wild species while 25 primers amplified across all the nine species. The UPGMA dendrogram clustered C. tinctorius accessions and wild species separately into two major groups. The proposed progenitor species of safflower, C. oxyacantha and C. palaestinus were genetically closer to cultivated safflower and formed a distinct cluster. The cluster analysis also distinguished diploid and tetraploid wild species of safflower. Conclusion Next generation sequencing of safflower genome generated a large set of microsatellite markers. The novel markers developed in this study will add to the existing repertoire of markers and can be used for diversity analysis, synteny studies, construction of linkage maps and marker-assisted selection. PMID:26287743

  2. A better sequence-read simulator program for metagenomics.

    PubMed

    Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony

    2014-01-01

    There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.

  3. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2008-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  4. GenBank

    PubMed Central

    Benson, Dennis A.; Karsch-Mizrachi, Ilene; Lipman, David J.; Ostell, James; Wheeler, David L.

    2008-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 260 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov PMID:18073190

  5. CpG PatternFinder: a Windows-based utility program for easy and rapid identification of the CpG methylation status of DNA.

    PubMed

    Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C

    2007-09-01

    The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.

  6. Spectral and cross-spectral analysis of uneven time series with the smoothed Lomb-Scargle periodogram and Monte Carlo evaluation of statistical significance

    NASA Astrophysics Data System (ADS)

    Pardo-Igúzquiza, Eulogio; Rodríguez-Tovar, Francisco J.

    2012-12-01

    Many spectral analysis techniques have been designed assuming sequences taken with a constant sampling interval. However, there are empirical time series in the geosciences (sediment cores, fossil abundance data, isotope analysis, …) that do not follow regular sampling because of missing data, gapped data, random sampling or incomplete sequences, among other reasons. In general, interpolating an uneven series in order to obtain a succession with a constant sampling interval alters the spectral content of the series. In such cases it is preferable to follow an approach that works with the uneven data directly, avoiding the need for an explicit interpolation step. The Lomb-Scargle periodogram is a popular choice in such circumstances, as there are programs available in the public domain for its computation. One new computer program for spectral analysis improves the standard Lomb-Scargle periodogram approach in two ways: (1) It explicitly adjusts the statistical significance to any bias introduced by variance reduction smoothing, and (2) it uses a permutation test to evaluate confidence levels, which is better suited than parametric methods when neighbouring frequencies are highly correlated. Another novel program for cross-spectral analysis offers the advantage of estimating the Lomb-Scargle cross-periodogram of two uneven time series defined on the same interval, and it evaluates the confidence levels of the estimated cross-spectra by a non-parametric computer intensive permutation test. Thus, the cross-spectrum, the squared coherence spectrum, the phase spectrum, and the Monte Carlo statistical significance of the cross-spectrum and the squared-coherence spectrum can be obtained. Both of the programs are written in ANSI Fortran 77, in view of its simplicity and compatibility. The program code is of public domain, provided on the website of the journal (http://www.iamg.org/index.php/publisher/articleview/frmArticleID/112/). Different examples (with simulated and real data) are described in this paper to corroborate the methodology and the implementation of these two new programs.

  7. Processing sequence annotation data using the Lua programming language.

    PubMed

    Ueno, Yutaka; Arita, Masanori; Kumagai, Toshitaka; Asai, Kiyoshi

    2003-01-01

    The data processing language in a graphical software tool that manages sequence annotation data from genome databases should provide flexible functions for the tasks in molecular biology research. Among currently available languages we adopted the Lua programming language. It fulfills our requirements to perform computational tasks for sequence map layouts, i.e. the handling of data containers, symbolic reference to data, and a simple programming syntax. Upon importing a foreign file, the original data are first decomposed in the Lua language while maintaining the original data schema. The converted data are parsed by the Lua interpreter and the contents are stored in our data warehouse. Then, portions of annotations are selected and arranged into our catalog format to be depicted on the sequence map. Our sequence visualization program was successfully implemented, embedding the Lua language for processing of annotation data and layout script. The program is available at http://staff.aist.go.jp/yutaka.ueno/guppy/.

  8. The EMBL-EBI bioinformatics web and programmatic tools framework.

    PubMed

    Li, Weizhong; Cowley, Andrew; Uludag, Mahmut; Gur, Tamer; McWilliam, Hamish; Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Lopez, Rodrigo

    2015-07-01

    Since 2009 the EMBL-EBI Job Dispatcher framework has provided free access to a range of mainstream sequence analysis applications. These include sequence similarity search services (https://www.ebi.ac.uk/Tools/sss/) such as BLAST, FASTA and PSI-Search, multiple sequence alignment tools (https://www.ebi.ac.uk/Tools/msa/) such as Clustal Omega, MAFFT and T-Coffee, and other sequence analysis tools (https://www.ebi.ac.uk/Tools/pfa/) such as InterProScan. Through these services users can search mainstream sequence databases such as ENA, UniProt and Ensembl Genomes, utilising a uniform web interface or systematically through Web Services interfaces (https://www.ebi.ac.uk/Tools/webservices/) using common programming languages, and obtain enriched results with novel visualisations. Integration with EBI Search (https://www.ebi.ac.uk/ebisearch/) and the dbfetch retrieval service (https://www.ebi.ac.uk/Tools/dbfetch/) further expands the usefulness of the framework. New tools and updates such as NCBI BLAST+, InterProScan 5 and PfamScan, new categories such as RNA analysis tools (https://www.ebi.ac.uk/Tools/rna/), new databases such as ENA non-coding, WormBase ParaSite, Pfam and Rfam, and new workflow methods, together with the retirement of depreciated services, ensure that the framework remains relevant to today's biological community. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Manned space flight nuclear system safety. Volume 3: Reactor system preliminary nuclear safety analysis. Part 2A: Accident model document, appendix

    NASA Technical Reports Server (NTRS)

    1972-01-01

    The detailed abort sequence trees for the reference zirconium hydride (ZrH) reactor power module that have been generated for each phase of the reference Space Base program mission are presented. The trees are graphical representations of causal sequences. Each tree begins with the phase identification and the dichotomy between success and failure. The success branch shows the mission phase objective as being achieved. The failure branch is subdivided, as conditions require, into various primary initiating abort conditions.

  10. Slip of grip of a molecular motor on a crowded track: Modeling shift of reading frame of ribosome on RNA template

    NASA Astrophysics Data System (ADS)

    Mishra, Bhavya; Schütz, Gunter M.; Chowdhury, Debashish

    2016-06-01

    We develop a stochastic model for the programmed frameshift of ribosomes synthesizing a protein while moving along a mRNA template. Normally the reading frame of a ribosome decodes successive triplets of nucleotides on the mRNA in a step-by-step manner. We focus on the programmed shift of the ribosomal reading frame, forward or backward, by only one nucleotide which results in a fusion protein; it occurs when a ribosome temporarily loses its grip to its mRNA track. Special “slippery” sequences of nucleotides and also downstream secondary structures of the mRNA strand are believed to play key roles in programmed frameshift. Here we explore the role of an hitherto neglected parameter in regulating -1 programmed frameshift. Specifically, we demonstrate that the frameshift frequency can be strongly regulated also by the density of the ribosomes, all of which are engaged in simultaneous translation of the same mRNA, at and around the slippery sequence. Monte Carlo simulations support the analytical predictions obtained from a mean-field analysis of the stochastic dynamics.

  11. Cosmetology: Task Analyses. Competency-Based Education.

    ERIC Educational Resources Information Center

    Henrico County Public Schools, Glen Allen, VA. Virginia Vocational Curriculum Center.

    These task analyses are designed to be used in combination with the "Trade and Industrial Education Service Area Resource" in order to implement competency-based education in the cosmetology program in Virginia. The task analysis document contains the task inventory, suggested task sequence lists, and content outlines for the secondary…

  12. Commercial Photography: Task Analyses. Competency-Based Education.

    ERIC Educational Resources Information Center

    Endo, Paula; Morrell, Linda

    These task analyses are designed to be used in combination with the "Trade and Industrial Education Service Area Resource" in order to implement competency-based education in the commercial photography program in Virginia. The task analysis document contains the task inventory, suggested task sequence lists, and content outlines for the…

  13. Nurse's Assistant: Task Analyses. Competency-Based Education.

    ERIC Educational Resources Information Center

    Henrico County Public Schools, Glen Allen, VA. Virginia Vocational Curriculum Center.

    These task analyses are designed to be used in combination with the "Health Occupations Education Service Area Resource" in order to implement competency-based education in the nurse's assistant program in Virginia. The task analysis document contains the task inventory, suggested task sequence lists, and content outlines for Nursing…

  14. Masonry: Task Analyses. Competency-Based Education.

    ERIC Educational Resources Information Center

    Henrico County Public Schools, Glen Allen, VA. Virginia Vocational Curriculum Center.

    These task analyses are designed to be used in combination with the "Trade and Industrial Education Service Area Resource" in order to implement competency-based education in the masonry program in Virginia. The task analysis document contains the task inventory, suggested task sequence lists, and content outlines for the secondary…

  15. Scientific Software: How to Find What You Need and Get What You Pay for.

    ERIC Educational Resources Information Center

    Gabaldon, Diana J.

    1984-01-01

    Provides examples of software for the sciences, including: packages for pathology/toxicology laboratories (costing over $15,000), DNA sequencing, and data acquisition/analysis; general-purpose software for scientific uses; and "custom" packages, including a program to maintain a listing of "Escherichia coli" strains and a…

  16. Tech Prep Model for Marketing Education.

    ERIC Educational Resources Information Center

    Ruhland, Sheila K.; King, Binky M.

    A project was conducted to develop two tech prep models for marketing education (ME) in Missouri to provide a sequence of courses for skill-enhanced and time-shortened programs. First, labor market trends, employment growth projections, and business and industry labor needs in Missouri were researched and analyzed. The analysis results were used…

  17. Identification of sex-linked SNP markers using RAD sequencing suggests ZW/ZZ sex determination in Pistacia vera L.

    PubMed

    Kafkas, Salih; Khodaeiaminjan, Mortaza; Güney, Murat; Kafkas, Ebru

    2015-02-18

    Pistachio (Pistacia vera L.) is a dioecious species that has a long juvenility period. Therefore, development of marker-assisted selection (MAS) techniques would greatly facilitate pistachio cultivar-breeding programs. The sex determination mechanism is presently unknown in pistachio. The generation of sex-linked markers is likely to reduce time, labor, and costs associated with breeding programs, and will help to clarify the sex determination system in pistachio. Restriction site-associated DNA (RAD) markers were used to identify sex-linked markers and to elucidate the sex determination system in pistachio. Eight male and eight female F1 progenies from a Pistacia vera L. Siirt × Bağyolu cross, along with the parents, were subjected to RAD sequencing in two lanes of a Hi-Seq 2000 sequencing platform. This generated 449 million reads, comprising approximately 37.7 Gb of sequences. There were 33,757 polymorphic single nucleotide polymorphism (SNP) loci between the parents. Thirty-eight of these, from 28 RAD reads, were detected as putative sex-associated loci in pistachio. Validation was performed by SNaPshot analysis in 42 mature F1 progenies and in 124 cultivars and genotypes in a germplasm collection. Eight loci could distinguish sex with 100% accuracy in pistachio. To ascertain cost-effective application of markers in a breeding program, high-resolution melting (HRM) analysis was performed; four markers were found to perfectly separate sexes in pistachio. Because of the female heterogamety in all candidate SNP loci, we report for the first time that pistachio has a ZZ/ZW sex determination system. As the reported female-to-male segregation ratio is 1:1 in all known segregating populations and there is no previous report of super-female genotypes or female heteromorphic chromosomes in pistachio, it appears that the WW genotype is not viable. Sex-linked SNP markers were identified and validated in a large germplasm and proved their suitability for MAS in pistachio. HRM analysis successfully validated the sex-linked markers for MAS. For the first time in dioecious pistachio, a female heterogamety ZW/ZZ sex determination system is suggested.

  18. Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method.

    PubMed

    Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan

    2016-01-01

    Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.

  19. Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing.

    PubMed

    Giraud, Mathieu; Salson, Mikaël; Duez, Marc; Villenet, Céline; Quief, Sabine; Caillault, Aurélie; Grardel, Nathalie; Roumier, Christophe; Preudhomme, Claude; Figeac, Martin

    2014-05-28

    V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.

  20. Use of the LUS in sequence allele designations to facilitate probabilistic genotyping of NGS-based STR typing results.

    PubMed

    Just, Rebecca S; Irwin, Jodi A

    2018-05-01

    Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate - in the nearterm - probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools employed. Further, these biologically based, easy-to-derive designations uphold clear relationships between parent alleles and their stutter products, enabling analysis in fully continuous probabilistic programs that model stutter while avoiding the algorithmic complexities that come with string based searches. Though using repeat unit plus LUS length as the allele designator does not capture variation that occurs outside of the core repeat regions, this straightforward approach would permit the large majority of known STR sequence variation to be used for mixture deconvolution and, in turn, result in more informative mixture statistics in the near term. Ultimately, the method could bridge the gap from current length-based probabilistic systems to facilitate broader adoption of NGS by forensic DNA testing laboratories. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  1. Radio-science performance analysis software

    NASA Astrophysics Data System (ADS)

    Morabito, D. D.; Asmar, S. W.

    1995-02-01

    The Radio Science Systems Group (RSSG) provides various support functions for several flight project radio-science teams. Among these support functions are uplink and sequence planning, real-time operations monitoring and support, data validation, archiving and distribution functions, and data processing and analysis. This article describes the support functions that encompass radio-science data performance analysis. The primary tool used by the RSSG to fulfill this support function is the STBLTY program set. STBLTY is used to reconstruct observable frequencies and calculate model frequencies, frequency residuals, frequency stability in terms of Allan deviation, reconstructed phase, frequency and phase power spectral density, and frequency drift rates. In the case of one-way data, using an ultrastable oscillator (USO) as a frequency reference, the program set computes the spacecraft transmitted frequency and maintains a database containing the in-flight history of the USO measurements. The program set also produces graphical displays. Some examples and discussions on operating the program set on Galileo and Ulysses data will be presented.

  2. Radio-Science Performance Analysis Software

    NASA Astrophysics Data System (ADS)

    Morabito, D. D.; Asmar, S. W.

    1994-10-01

    The Radio Science Systems Group (RSSG) provides various support functions for several flight project radio-science teams. Among these support functions are uplink and sequence planning, real-time operations monitoring and support, data validation, archiving and distribution functions, and data processing and analysis. This article describes the support functions that encompass radio science data performance analysis. The primary tool used by the RSSG to fulfill this support function is the STBLTY program set. STBLTY is used to reconstruct observable frequencies and calculate model frequencies, frequency residuals, frequency stability in terms of Allan deviation, reconstructed phase, frequency and phase power spectral density, and frequency drift rates. In the case of one-way data, using an ultrastable oscillator (USO) as a frequency reference, the program set computes the spacecraft transmitted frequency and maintains a database containing the in-flight history of the USO measurements. The program set also produces graphical displays. Some examples and discussion on operating the program set on Galileo and Ulysses data will be presented.

  3. Radio-science performance analysis software

    NASA Technical Reports Server (NTRS)

    Morabito, D. D.; Asmar, S. W.

    1995-01-01

    The Radio Science Systems Group (RSSG) provides various support functions for several flight project radio-science teams. Among these support functions are uplink and sequence planning, real-time operations monitoring and support, data validation, archiving and distribution functions, and data processing and analysis. This article describes the support functions that encompass radio-science data performance analysis. The primary tool used by the RSSG to fulfill this support function is the STBLTY program set. STBLTY is used to reconstruct observable frequencies and calculate model frequencies, frequency residuals, frequency stability in terms of Allan deviation, reconstructed phase, frequency and phase power spectral density, and frequency drift rates. In the case of one-way data, using an ultrastable oscillator (USO) as a frequency reference, the program set computes the spacecraft transmitted frequency and maintains a database containing the in-flight history of the USO measurements. The program set also produces graphical displays. Some examples and discussions on operating the program set on Galileo and Ulysses data will be presented.

  4. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

    PubMed Central

    2012-01-01

    Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920

  5. Acquisition Management for Systems-of-Systems: Analysis of Alternatives via Computational Exploratory Model

    DTIC Science & Technology

    2012-02-03

    node to the analysis of eigenmodes (connected trees /networks) of disruption sequences. The identification of disruption eigenmodes is particularly...investment portfolio approach enables the identification of optimal SoS network topologies and provides a tool for acquisition professionals to...a program based on its ability to provide a new capability for a given cost, and not on its ability to meet specific performance requirements ( Spacy

  6. Analysis Commons, A Team Approach to Discovery in a Big-Data Environment for Genetic Epidemiology

    PubMed Central

    Brody, Jennifer A.; Morrison, Alanna C.; Bis, Joshua C.; O'Connell, Jeffrey R.; Brown, Michael R.; Huffman, Jennifer E.; Ames, Darren C.; Carroll, Andrew; Conomos, Matthew P.; Gabriel, Stacey; Gibbs, Richard A.; Gogarten, Stephanie M.; Gupta, Namrata; Jaquish, Cashell E.; Johnson, Andrew D.; Lewis, Joshua P.; Liu, Xiaoming; Manning, Alisa K.; Papanicolaou, George J.; Pitsillides, Achilleas N.; Rice, Kenneth M.; Salerno, William; Sitlani, Colleen M.; Smith, Nicholas L.; Heckbert, Susan R.; Laurie, Cathy C.; Mitchell, Braxton D.; Vasan, Ramachandran S.; Rich, Stephen S.; Rotter, Jerome I.; Wilson, James G.; Boerwinkle, Eric; Psaty, Bruce M.; Cupples, L. Adrienne

    2017-01-01

    Summary paragraph The exploding volume of whole-genome sequence (WGS) and multi-omics data requires new approaches for analysis. As one solution, we have created a cloud-based Analysis Commons, which brings together genotype and phenotype data from multiple studies in a setting that is accessible by multiple investigators. This framework addresses many of the challenges of multi-center WGS analyses, including data sharing mechanisms, phenotype harmonization, integrated multi-omics analyses, annotation, and computational flexibility. In this setting, the computational pipeline facilitates a sequence-to-discovery analysis workflow illustrated here by an analysis of plasma fibrinogen levels in 3996 individuals from the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) WGS program. The Analysis Commons represents a novel model for transforming WGS resources from a massive quantity of phenotypic and genomic data into knowledge of the determinants of health and disease risk in diverse human populations. PMID:29074945

  7. RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids

    PubMed Central

    Sample, Paul J.; Gaston, Kirk W.; Alfonzo, Juan D.; Limbach, Patrick A.

    2015-01-01

    Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing. PMID:25820423

  8. Kakusan4 and Aminosan: two programs for comparing nonpartitioned, proportional and separate models for combined molecular phylogenetic analyses of multilocus sequence data.

    PubMed

    Tanabe, Akifumi S

    2011-09-01

    Proportional and separate models able to apply different combination of substitution rate matrix (SRM) and among-site rate variation model (ASRVM) to each locus are frequently used in phylogenetic studies of multilocus data. A proportional model assumes that branch lengths are proportional among partitions and a separate model assumes that each partition has an independent set of branch lengths. However, the selection from among nonpartitioned (i.e., a common combination of models is applied to all-loci concatenated sequences), proportional and separate models is usually based on the researcher's preference rather than on any information criteria. This study describes two programs, 'Kakusan4' (for DNA sequences) and 'Aminosan' (for amino-acid sequences), which allow the selection of evolutionary models based on several types of information criteria. The programs can handle both multilocus and single-locus data, in addition to providing an easy-to-use wizard interface and a noninteractive command line interface. In the case of multilocus data, SRMs and ASRVMs are compared at each locus and at all-loci concatenated sequences, after which nonpartitioned, proportional and separate models are compared based on information criteria. The programs also provide model configuration files for mrbayes, paup*, phyml, raxml and Treefinder to support further phylogenetic analysis using a selected model. When likelihoods are optimized by Treefinder, the best-fit models were found to differ depending on the data set. Furthermore, differences in the information criteria among nonpartitioned, proportional and separate models were much larger than those among the nonpartitioned models. These findings suggest that selecting from nonpartitioned, proportional and separate models results in a better phylogenetic tree. Kakusan4 and Aminosan are available at http://www.fifthdimension.jp/. They are licensed under gnugpl Ver.2, and are able to run on Windows, MacOS X and Linux. © 2011 Blackwell Publishing Ltd.

  9. A new polymorphic and multicopy MHC gene family related to nonmammalian class I

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leelayuwat, C.; Degli-Esposti, M.A.; Abraham, L.J.

    1994-12-31

    The authors have used genomic analysis to characterize a region of the central major histocompatibility complex (MHC) spanning {approximately} 300 kilobases (kb) between TNF and HLA-B. This region has been suggested to carry genetic factors relevant to the development of autoimmune diseases such as myasthenia gravis (MG) and insulin dependent diabetes mellitus (IDDM). Genomic sequence was analyzed for coding potential, using two neural network programs, GRAIL and GeneParser. A genomic probe, JAB, containing putative coding sequences (PERB11) located 60 kb centromeric of HLA-B, was used for northern analysis of human tissues. Multiple transcripts were detected. Southern analysis of genomic DNAmore » and overlapping YAC clones, covering the region from BAT1 to HLA-F, indicated that there are at least five copies of PERB11, four of which are located within this region of the MHC. The partial cDNA sequence of PERB11 was obtained from poly-A RNA derived from skeletal muscle. The putative amino acid sequence of PERB11 shares {approximately} 30% identity to MHC class I molecules from various species, including reptiles, chickens, and frogs, as well as to other MHC class I-like molecules, such as the IgG FcR of the mouse and rat and the human Zn-{alpha}2-glycoprotein. From direct comparison of amino acid sequences, it is concluded that PERB11 is a distinct molecule more closely related to nonmammalian than known mammalian MHC class I molecules. Genomic sequence analysis of PERB11 from five MHC ancestral haplotypes (AH) indicated that the gene is polymorphic at both DNA and protein level. The results suggest that the authors have identified a novel polymorphic gene family with multiple copies within the MHC. 48 refs., 10 figs., 2 tabs.« less

  10. Effect of multimedia information sequencing on educational outcome in orthodontic training.

    PubMed

    Aly, Medhat; Willems, Guy; Van Den Noortgate, Wim; Elen, Jan

    2012-08-01

    The aim of this research was to compare the effectiveness of hierarchical sequencing (HS) versus elaboration sequencing (ES) models in improving educational outcome of clinical knowledge when using instructional multimedia programs in postgraduate orthodontic training. Twenty-four postgraduate and 24 undergraduate dental students participated in this study. The postgraduates were following an orthodontic speciality training programme. The undergraduates were fourth- and fifth-year dental students. Twelve instructional multimedia modules were developed, six logically sequenced (LS) discussing six different orthodontic topics. Another six modules on identical topics were sequenced according to one macro-sequencing (MS) model. The implemented MS model was either HS or ES. The only difference between LS and MS modules was the adopted sequencing model. All participants were assigned into consistent pairs of students and were randomly divided into a test and a control group. In each pair, one student studied the LS module (control group) while the other studied the MS version (test group). Pre- and post-evaluation tests of each pair of participants were performed to measure knowledge, understanding and application of each participant with regard to the discussed topic. A multilevel analysis was conducted to assess the estimated effect of the different sequencing models. The level of significance was set at 0.05. At baseline, no significant differences (P > 0.05) were found in pre-test scores between groups. The HS model showed a significant effect on the scores achieved (P = 0.05). The test group showed a significantly higher estimated probability of correct answers to the questions (P = 0.003) when applying the HS model. The HS model may improve educational outcome when using instructional multimedia programs in postgraduate orthodontic training.

  11. Single-Cell RNA-Sequencing Reveals a Continuous Spectrum of Differentiation in Hematopoietic Cells.

    PubMed

    Macaulay, Iain C; Svensson, Valentine; Labalette, Charlotte; Ferreira, Lauren; Hamey, Fiona; Voet, Thierry; Teichmann, Sarah A; Cvejic, Ana

    2016-02-02

    The transcriptional programs that govern hematopoiesis have been investigated primarily by population-level analysis of hematopoietic stem and progenitor cells, which cannot reveal the continuous nature of the differentiation process. Here we applied single-cell RNA-sequencing to a population of hematopoietic cells in zebrafish as they undergo thrombocyte lineage commitment. By reconstructing their developmental chronology computationally, we were able to place each cell along a continuum from stem cell to mature cell, refining the traditional lineage tree. The progression of cells along this continuum is characterized by a highly coordinated transcriptional program, displaying simultaneous suppression of genes involved in cell proliferation and ribosomal biogenesis as the expression of lineage specific genes increases. Within this program, there is substantial heterogeneity in the expression of the key lineage regulators. Overall, the total number of genes expressed, as well as the total mRNA content of the cell, decreases as the cells undergo lineage commitment. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  12. JVM: Java Visual Mapping tool for next generation sequencing read.

    PubMed

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.

  13. BioRuby: bioinformatics software for the Ruby programming language.

    PubMed

    Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki

    2010-10-15

    The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. katayama@bioruby.org

  14. Evaluation of a Secondary School Science Program Inversion: Moving from a Traditional to a Modifified-PCB Sequence

    ERIC Educational Resources Information Center

    Gaubatz, Julie

    2013-01-01

    Studies of high-school science course sequences have been limited primarily to a small number of site-specific investigations comparing traditional science sequences (e.g., Biology-Chemistry-Physics: BCP) to various Physics First-influenced sequences (Physics-Chemistry-Biology: PCB). The present study summarizes a five-year program evaluation…

  15. VIZARD: analysis of Affymetrix Arabidopsis GeneChip data

    NASA Technical Reports Server (NTRS)

    Moseyko, Nick; Feldman, Lewis J.

    2002-01-01

    SUMMARY: The Affymetrix GeneChip Arabidopsis genome array has proved to be a very powerful tool for the analysis of gene expression in Arabidopsis thaliana, the most commonly studied plant model organism. VIZARD is a Java program created at the University of California, Berkeley, to facilitate analysis of Arabidopsis GeneChip data. It includes several integrated tools for filtering, sorting, clustering and visualization of gene expression data as well as tools for the discovery of regulatory motifs in upstream sequences. VIZARD also includes annotation and upstream sequence databases for the majority of genes represented on the Affymetrix Arabidopsis GeneChip array. AVAILABILITY: VIZARD is available free of charge for educational, research, and not-for-profit purposes, and can be downloaded at http://www.anm.f2s.com/research/vizard/ CONTACT: moseyko@uclink4.berkeley.edu.

  16. Music Program of Study: Educational Program Definition.

    ERIC Educational Resources Information Center

    West Virginia State Dept. of Education, Charleston.

    The West Virginia music study program is a public school K-12 curriculum sequence. This program is divided into the four principal areas of: (1) general classroom music; (2) string instrumental music; (3) wind and percussion instrumental music; and (4) choral music. The general classroom music program is an early and middle childhood sequence of…

  17. Closha: bioinformatics workflow system for the analysis of massive sequencing data.

    PubMed

    Ko, GunHwan; Kim, Pan-Gyu; Yoon, Jongcheol; Han, Gukhee; Park, Seong-Jin; Song, Wangho; Lee, Byungwook

    2018-02-19

    While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rapidly increasing amounts of data available from the new high-throughput methods have made data processing infeasible without automated pipelines. The integration of data and analytic resources into workflow systems provides a solution to the problem by simplifying the task of data analysis. To address this challenge, we developed a cloud-based workflow management system, Closha, to provide fast and cost-effective analysis of massive genomic data. We implemented complex workflows making optimal use of high-performance computing clusters. Closha allows users to create multi-step analyses using drag and drop functionality and to modify the parameters of pipeline tools. Users can also import the Galaxy pipelines into Closha. Closha is a hybrid system that enables users to use both analysis programs providing traditional tools and MapReduce-based big data analysis programs simultaneously in a single pipeline. Thus, the execution of analytics algorithms can be parallelized, speeding up the whole process. We also developed a high-speed data transmission solution, KoDS, to transmit a large amount of data at a fast rate. KoDS has a file transfer speed of up to 10 times that of normal FTP and HTTP. The computer hardware for Closha is 660 CPU cores and 800 TB of disk storage, enabling 500 jobs to run at the same time. Closha is a scalable, cost-effective, and publicly available web service for large-scale genomic data analysis. Closha supports the reliable and highly scalable execution of sequencing analysis workflows in a fully automated manner. Closha provides a user-friendly interface to all genomic scientists to try to derive accurate results from NGS platform data. The Closha cloud server is freely available for use from http://closha.kobic.re.kr/ .

  18. Spectral analysis of time series of categorical variables in earth sciences

    NASA Astrophysics Data System (ADS)

    Pardo-Igúzquiza, Eulogio; Rodríguez-Tovar, Francisco J.; Dorador, Javier

    2016-10-01

    Time series of categorical variables often appear in Earth Science disciplines and there is considerable interest in studying their cyclic behavior. This is true, for example, when the type of facies, petrofabric features, ichnofabrics, fossil assemblages or mineral compositions are measured continuously over a core or throughout a stratigraphic succession. Here we deal with the problem of applying spectral analysis to such sequences. A full indicator approach is proposed to complement the spectral envelope often used in other disciplines. Additionally, a stand-alone computer program is provided for calculating the spectral envelope, in this case implementing the permutation test to assess the statistical significance of the spectral peaks. We studied simulated sequences as well as real data in order to illustrate the methodology.

  19. A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies.

    PubMed

    Jagtap, Pratik; Goslinga, Jill; Kooren, Joel A; McGowan, Thomas; Wroblewski, Matthew S; Seymour, Sean L; Griffin, Timothy J

    2013-04-01

    Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Multi-Harmony: detecting functional specificity from sequence alignment

    PubMed Central

    Brandt, Bernd W.; Feenstra, K. Anton; Heringa, Jaap

    2010-01-01

    Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww. PMID:20525785

  1. Streaming Support for Data Intensive Cloud-Based Sequence Analysis

    PubMed Central

    Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

    2013-01-01

    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461

  2. An Interactive Visualization Framework to Support Exploration and Analysis of TBI/PTSD Clinical Data

    DTIC Science & Technology

    2017-05-01

    techniques to overcome some of the challenges and complexities of the data . Our approach uses a novel adaptive window-based frequency sequence mining ...AWARD NUMBER: W81XWH-15-2-0016 TITLE: An Interactive Visualization Framework to Support Exploration and Analysis of TBI/PTSD Clinical Data ...Analysis of TBI/PTSD Clinical Data 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-15-2-0016 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Dr. Jesus Caban 5d

  3. m6aViewer: software for the detection, analysis, and visualization of N6-methyladenosine peaks from m6A-seq/ME-RIP sequencing data.

    PubMed

    Antanaviciute, Agne; Baquero-Perez, Belinda; Watson, Christopher M; Harrison, Sally M; Lascelles, Carolina; Crinnion, Laura; Markham, Alexander F; Bonthron, David T; Whitehouse, Adrian; Carr, Ian M

    2017-10-01

    Recent methods for transcriptome-wide N 6 -methyladenosine (m 6 A) profiling have facilitated investigations into the RNA methylome and established m 6 A as a dynamic modification that has critical regulatory roles in gene expression and may play a role in human disease. However, bioinformatics resources available for the analysis of m 6 A sequencing data are still limited. Here, we describe m6aViewer-a cross-platform application for analysis and visualization of m 6 A peaks from sequencing data. m6aViewer implements a novel m 6 A peak-calling algorithm that identifies high-confidence methylated residues with more precision than previously described approaches. The application enables data analysis through a graphical user interface, and thus, in contrast to other currently available tools, does not require the user to be skilled in computer programming. m6aViewer and test data can be downloaded here: http://dna2.leeds.ac.uk/m6a. © 2017 Antanaviciute et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  4. Dynamic assessment of microbial ecology (DAME): a web app for interactive analysis and visualization of microbial sequencing data.

    PubMed

    Piccolo, Brian D; Wankhade, Umesh D; Chintapalli, Sree V; Bhattacharyya, Sudeepa; Chunqiao, Luo; Shankar, Kartik

    2018-03-15

    Dynamic assessment of microbial ecology (DAME) is a Shiny-based web application for interactive analysis and visualization of microbial sequencing data. DAME provides researchers not familiar with R programming the ability to access the most current R functions utilized for ecology and gene sequencing data analyses. Currently, DAME supports group comparisons of several ecological estimates of α-diversity and β-diversity, along with differential abundance analysis of individual taxa. Using the Shiny framework, the user has complete control of all aspects of the data analysis, including sample/experimental group selection and filtering, estimate selection, statistical methods and visualization parameters. Furthermore, graphical and tabular outputs are supported by R packages using D3.js and are fully interactive. DAME was implemented in R but can be modified by Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript. It is freely available on the web at https://acnc-shinyapps.shinyapps.io/DAME/. Local installation and source code are available through Github (https://github.com/bdpiccolo/ACNC-DAME). Any system with R can launch DAME locally provided the shiny package is installed. bdpiccolo@uams.edu.

  5. A De Novo-Assembly Based Data Analysis Pipeline for Plant Obligate Parasite Metatranscriptomic Studies.

    PubMed

    Guo, Li; Allen, Kelly S; Deiulio, Greg; Zhang, Yong; Madeiras, Angela M; Wick, Robert L; Ma, Li-Jun

    2016-01-01

    Current and emerging plant diseases caused by obligate parasitic microbes such as rusts, downy mildews, and powdery mildews threaten worldwide crop production and food safety. These obligate parasites are typically unculturable in the laboratory, posing technical challenges to characterize them at the genetic and genomic level. Here we have developed a data analysis pipeline integrating several bioinformatic software programs. This pipeline facilitates rapid gene discovery and expression analysis of a plant host and its obligate parasite simultaneously by next generation sequencing of mixed host and pathogen RNA (i.e., metatranscriptomics). We applied this pipeline to metatranscriptomic sequencing data of sweet basil (Ocimum basilicum) and its obligate downy mildew parasite Peronospora belbahrii, both lacking a sequenced genome. Even with a single data point, we were able to identify both candidate host defense genes and pathogen virulence genes that are highly expressed during infection. This demonstrates the power of this pipeline for identifying genes important in host-pathogen interactions without prior genomic information for either the plant host or the obligate biotrophic pathogen. The simplicity of this pipeline makes it accessible to researchers with limited computational skills and applicable to metatranscriptomic data analysis in a wide range of plant-obligate-parasite systems.

  6. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    PubMed

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  7. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    PubMed Central

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  8. CWDPRNP: A tool for cervid prion sequence analysis in program R

    USGS Publications Warehouse

    Miller, William L.; Walter, W. David

    2017-01-01

    Chronic wasting disease is a fatal, neurological disease caused by an infectious prion protein, which affects economically and ecologically important members of the family Cervidae. Single nucleotide polymorphisms within the prion protein gene have been linked to differential susceptibility to the disease in many species. Wildlife managers are seeking to determine the frequencies of disease-associated alleles and genotypes and delineate spatial genetic patterns. The CWDPRNP package, implemented in program R, provides a unified framework for analyzing prion protein gene variability and spatial structure.

  9. GeneBee-net: Internet-based server for analyzing biopolymers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brodsky, L.I.; Ivanov, V.V.; Nikolaev, V.K.

    This work describes a network server for searching databanks of biopolymer structures and performing other biocomputing procedures; it is available via direct Internet connection. Basic server procedures are dedicated to homology (similarity) search of sequence and 3D structure of proteins. The homologies found could be used to build multiple alignments, predict protein and RNA secondary structure, and construct phylogenetic trees. In addition to traditional methods of sequence similarity search, the authors propose {open_quotes}non-matrix{close_quotes} (correlational) search. An analogous approach is used to identify regions of similar tertiary structure of proteins. Algorithm concepts and usage examples are presented for new methods. Servicemore » logic is based upon interaction of a client program and server procedures. The client program allows the compilation of queries and the processing of results of an analysis.« less

  10. Genome-Wide Analysis of the Arabidopsis Replication Timing Program1[OPEN

    PubMed Central

    Brooks, Ashley M.; Wheeler, Emily; LeBlanc, Chantal; Lee, Tae-Jin; Martienssen, Robert A.; Thompson, William F.

    2018-01-01

    Eukaryotes use a temporally regulated process, known as the replication timing program, to ensure that their genomes are fully and accurately duplicated during S phase. Replication timing programs are predictive of genomic features and activity and are considered to be functional readouts of chromatin organization. Although replication timing programs have been described for yeast and animal systems, much less is known about the temporal regulation of plant DNA replication or its relationship to genome sequence and chromatin structure. We used the thymidine analog, 5-ethynyl-2′-deoxyuridine, in combination with flow sorting and Repli-Seq to describe, at high-resolution, the genome-wide replication timing program for Arabidopsis (Arabidopsis thaliana) Col-0 suspension cells. We identified genomic regions that replicate predominantly during early, mid, and late S phase, and correlated these regions with genomic features and with data for chromatin state, accessibility, and long-distance interaction. Arabidopsis chromosome arms tend to replicate early while pericentromeric regions replicate late. Early and mid-replicating regions are gene-rich and predominantly euchromatic, while late regions are rich in transposable elements and primarily heterochromatic. However, the distribution of chromatin states across the different times is complex, with each replication time corresponding to a mixture of states. Early and mid-replicating sequences interact with each other and not with late sequences, but early regions are more accessible than mid regions. The replication timing program in Arabidopsis reflects a bipartite genomic organization with early/mid-replicating regions and late regions forming separate, noninteracting compartments. The temporal order of DNA replication within the early/mid compartment may be modulated largely by chromatin accessibility. PMID:29301956

  11. Biosequence Similarity Search on the Mercury System

    PubMed Central

    Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

    2007-01-01

    Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described. PMID:18846267

  12. TRAC-P1: an advanced best estimate computer program for PWR LOCA analysis. I. Methods, models, user information, and programming details

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1978-05-01

    The Transient Reactor Analysis Code (TRAC) is being developed at the Los Alamos Scientific Laboratory (LASL) to provide an advanced ''best estimate'' predictive capability for the analysis of postulated accidents in light water reactors (LWRs). TRAC-Pl provides this analysis capability for pressurized water reactors (PWRs) and for a wide variety of thermal-hydraulic experimental facilities. It features a three-dimensional treatment of the pressure vessel and associated internals; two-phase nonequilibrium hydrodynamics models; flow-regime-dependent constitutive equation treatment; reflood tracking capability for both bottom flood and falling film quench fronts; and consistent treatment of entire accident sequences including the generation of consistent initial conditions.more » The TRAC-Pl User's Manual is composed of two separate volumes. Volume I gives a description of the thermal-hydraulic models and numerical solution methods used in the code. Detailed programming and user information is also provided. Volume II presents the results of the developmental verification calculations.« less

  13. DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

    PubMed

    Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

    2004-09-09

    Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  14. Hepatitis C infection among intravenous drug users attending therapy programs in Cyprus.

    PubMed

    Demetriou, Victoria L; van de Vijver, David A M C; Hezka, Johana; Kostrikis, Leondios G; Kostrikis, Leondios G

    2010-02-01

    The most high-risk population for HCV transmission worldwide today are intravenous drug users. HCV genotypes in the general population in Cyprus demonstrate a polyphyletic infection and include subtypes associated with intravenous drug users. The prevalence of HCV, HBV, and HIV infection, HCV genotypes and risk factors among intravenous drug users in Cyprus were investigated here for the first time. Blood samples and interviews were obtained from 40 consenting users in treatment centers, and were tested for HCV, HBV, and HIV antibodies. On the HCV-positive samples, viral RNA extraction, RT-PCR and sequencing were performed. Phylogenetic analysis determined subtype and any relationships with database sequences and statistical analysis determined any correlation of risk factors with HCV infection. The prevalence of HCV infection was 50%, but no HBV or HIV infections were found. Of the PCR-positive samples, eight (57%) were genotype 3a, and six (43%) were 1b. No other subtypes, recombinant strains or mixed infections were observed. The phylogenetic analysis of the injecting drug users' strains against database sequences observed no clustering, which does not allow determination of transmission route, possibly due to a limitation of sequences in the database. However, three clusters were discovered among the drug users' sequences, revealing small groups who possibly share injecting equipment. Statistical analysis showed the risk factor associated with HCV infection is drug use duration. Overall, the polyphyletic nature of HCV infection in Cyprus is confirmed, but the transmission route remains unknown. These findings highlight the need for harm-reduction strategies to reduce HCV transmission. (c) 2009 Wiley-Liss, Inc.

  15. Mississippi Curriculum Framework for Dental Assisting Technology Programs (Program CIP: 51.0601--Dental Assistant). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the dental assisting technology program. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies. Section II…

  16. Mississippi Curriculum Framework for Welding and Cutting Programs (Program CIP: 48.0508--Welder/Welding Technologist). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the welding and cutting programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and…

  17. Put Your Robot In, Put Your Robot Out: Sequencing through Programming Robots in Early Childhood

    ERIC Educational Resources Information Center

    Kazakoff, Elizabeth R.; Bers, Marina Umaschi

    2014-01-01

    This article examines the impact of programming robots on sequencing ability in early childhood. Thirty-four children (ages 4.5-6.5 years) participated in computer programming activities with a developmentally appropriate tool, CHERP, specifically designed to program a robot's behaviors. The children learned to build and program robots over three…

  18. Mississippi Curriculum Framework for Surgical Technology Programs (CIP: 51.0909--Surgical/Operating Room Tech.). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the surgical technology program. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies for the program,…

  19. Pulse Sequence Programming in a Dynamic Visual Environment: SequenceTree

    PubMed Central

    Magland, Jeremy F.; Li, Cheng; Langham, Michael C.; Wehrli, Felix W.

    2015-01-01

    Purpose To describe SequenceTree (ST), an open source. integrated software environment for implementing MRI pulse sequences, and ideally exported them to actual MRI scanners. The software is a user-friendly alternative to vendor-supplied pulse sequence design and editing tools and is suited for non-programmers and programmers alike. Methods The integrated user interface was programmed using the Qt4/C++ toolkit. As parameters and code are modified, the pulse sequence diagram is automatically updated within the user interface. Several aspects of pulse programming are handled automatically allowing users to focus on higher-level aspects of sequence design. Sequences can be simulated using a built-in Bloch equation solver and then exported for use on a Siemens MRI scanner. Ideally other types of scanners will be supported in the future. Results The software has been used for eight years in the authors’ laboratory and elsewhere and has been utilized in more than fifty peer-reviewed publications in areas such as cardiovascular imaging, solid state and non-proton NMR, MR elastography, and high resolution structural imaging. Conclusion ST is an innovative, open source, visual pulse sequence environment for MRI combining simplicity with flexibility and is ideal for both advanced users and those with limited programming experience. PMID:25754837

  20. GUI and Object Oriented Programming in COBOL.

    ERIC Educational Resources Information Center

    Lorents, Alden C.

    Various schools are struggling with the introduction of Object Oriented (OO) programming concepts and GUI (graphical user interfaces) within the traditional COBOL sequence. OO programming has been introduced in some of the curricula with languages such as C++, Smalltalk, and Java. Introducing OO programming into a typical COBOL sequence presents…

  1. A suppression hierarchy among competing motor programs drives sequential grooming in Drosophila

    PubMed Central

    Seeds, Andrew M; Ravbar, Primoz; Chung, Phuong; Hampel, Stefanie; Midgley, Frank M; Mensh, Brett D; Simpson, Julie H

    2014-01-01

    Motor sequences are formed through the serial execution of different movements, but how nervous systems implement this process remains largely unknown. We determined the organizational principles governing how dirty fruit flies groom their bodies with sequential movements. Using genetically targeted activation of neural subsets, we drove distinct motor programs that clean individual body parts. This enabled competition experiments revealing that the motor programs are organized into a suppression hierarchy; motor programs that occur first suppress those that occur later. Cleaning one body part reduces the sensory drive to its motor program, which relieves suppression of the next movement, allowing the grooming sequence to progress down the hierarchy. A model featuring independently evoked cleaning movements activated in parallel, but selected serially through hierarchical suppression, was successful in reproducing the grooming sequence. This provides the first example of an innate motor sequence implemented by the prevailing model for generating human action sequences. DOI: http://dx.doi.org/10.7554/eLife.02951.001 PMID:25139955

  2. Power Processing, Part 1. Electric Machinery Analysis.

    ERIC Educational Resources Information Center

    Hamilton, Howard B.

    This publication was developed as a portion of a two-semester sequence commencing at either the sixth or seventh term of the undergraduate program in electrical engineering at the University of Pittsburgh. The materials of the two courses, produced by a National Science Foundation grant, are concerned with power conversion systems comprising power…

  3. Problem Manual for Power Processing, Part 1. Electric Machinery Analysis.

    ERIC Educational Resources Information Center

    Hamilton, Howard B.

    This publication was developed as a portion of a two-semester sequence commencing at either the sixth or seventh term of the undergraduate program in electrical engineering at the University of Pittsburgh. The materials of the two courses, produced by a National Science Foundation grant, are concerned with power conversion systems comprising power…

  4. Cooperative Interactions in Peer Tutoring: Patterns and Sequences in Paired Writing

    ERIC Educational Resources Information Center

    Duran, David

    2010-01-01

    The research analyzes the interaction of 24 students (12 pairs) of secondary students when using peer tutoring techniques to learn Catalan. Students worked together in a program to produce an authentic writing experience. Significant increases were observed in pre- and posttest Catalan attainment scores of students. An analysis of the…

  5. NECAP 4.1: NASA's Energy Cost Analysis Program thermal response factor routine

    NASA Astrophysics Data System (ADS)

    Weise, M. R.

    1982-08-01

    A thermal response factor is described and calculation sequences and flowcharts for RESFAC2 are provided. RESFAC is used by NASA's (NECAP) to calculate hourly heat transfer coefficients (thermal response factors) for each unique delayed surface. NECAP uses these response factors to compute each spaces' hourly heat gain/loss.

  6. Debugging and Analysis of Large-Scale Parallel Programs

    DTIC Science & Technology

    1989-09-01

    Przybylski, T. Riordan , C. Rowen, and D. Van’t Hof, "A CMOS RISC Processor with Integrated System Functions," In Proc. of the 1986 COMPCON. IEEE, March 1986...Sequencers," Communications of the ACM, 22(2):115-123, 1979. 115 [Richardson, 1988] Rick Richardson, "Dhrystone 2.1 Benchmark," Usenet Distribution

  7. Laboratory Manual for Power Processing, Part 1. Electric Machinery Analysis.

    ERIC Educational Resources Information Center

    Hamilton, Howard B.

    This publication was developed as a portion of a two-semester sequence commencing at either the sixth or seventh term of the undergraduate program in electrical engineering at the University of Pittsburgh. The materials of the two courses, produced by a National Science Foundation grant, are concerned with power conversion systems comprising power…

  8. A Quantitative Analysis of Methods Used for Avoidance and Acceleration of Developmental Mathematics Sequences in Community College

    ERIC Educational Resources Information Center

    Travers, Steven T.

    2017-01-01

    Many developmental mathematics programs at community colleges in recent years have undergone a process of redesign in an attempt increase the historical poor rate of student successful completion of required developmental coursework. Various curriculum and instructional design models that incorporate methods of avoiding and accelerating the…

  9. Diagnosis of Lung Cancer by Fractal Analysis of Damaged DNA

    PubMed Central

    Namazi, Hamidreza; Kiminezhadmalaie, Mona

    2015-01-01

    Cancer starts when cells in a part of the body start to grow out of control. In fact cells become cancer cells because of DNA damage. A DNA walk of a genome represents how the frequency of each nucleotide of a pairing nucleotide couple changes locally. In this research in order to study the cancer genes, DNA walk plots of genomes of patients with lung cancer were generated using a program written in MATLAB language. The data so obtained was checked for fractal property by computing the fractal dimension using a program written in MATLAB. Also, the correlation of damaged DNA was studied using the Hurst exponent measure. We have found that the damaged DNA sequences are exhibiting higher degree of fractality and less correlation compared with normal DNA sequences. So we confirmed this method can be used for early detection of lung cancer. The method introduced in this research not only is useful for diagnosis of lung cancer but also can be applied for detection and growth analysis of different types of cancers. PMID:26539245

  10. Development of conditioning programs for dressage horses based on time-motion analysis of competitions.

    PubMed

    Clayton, H M

    1993-05-01

    The time-motion characteristics of Canadian basic- and medium-level dressage competitions are described, and the results are applied in formulating sport-specific conditioning programs. One competition was analyzed at the six levels from basic 1 to medium 3. Each test was divided into a series of sequences based on the type and speed of activity. The durations of the sequences were measured from videotapes. The basic-level tests had fewer sequences, and they were shorter in distance and duration than the medium tests (P < 0.10), but the average speed did not differ between the two levels. It is recommended that horses competing at the basic levels be conditioned using 5-min exercise periods, with short (10-s) bursts of lengthened trot and canter included at basic 2 and above. In preparation for medium-level competitions, the duration of the work periods increases to 7 min, 10- to 12-s bursts of medium or extended trot and canter are included, and transitions are performed frequently to simulate the energy expenditure in overcoming inertia.

  11. T-Reg Comparator: an analysis tool for the comparison of position weight matrices

    PubMed Central

    Roepcke, Stefan; Grossmann, Steffen; Rahmann, Sven; Vingron, Martin

    2005-01-01

    T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55–61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91–D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at . PMID:15980506

  12. T-Reg Comparator: an analysis tool for the comparison of position weight matrices.

    PubMed

    Roepcke, Stefan; Grossmann, Steffen; Rahmann, Sven; Vingron, Martin

    2005-07-01

    T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55-61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91-D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at http://treg.molgen.mpg.de.

  13. Mississippi Curriculum Framework for Horticulture Technology Cluster (Program CIP: 01.0601--Horticulture Serv. Op. & Mgmt., Gen.) (Program CIP: 01.0605--Landscaping Op. & Mgmt.). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the horticulture technology programs cluster. Presented in the introductory section are a framework of programs and courses, description of the programs, and suggested course sequences for…

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruggles, Kelly V.; Tang, Zuojian; Wang, Xuya

    Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we therefore describe a proteogenomic data integration tool (QUILTS) and illustrate its application to whole genome, transcriptome and global MS peptide sequence datasets generated from a pair of luminal and basal-like breast cancer patient derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS process replicates. Despite over thirty sample replicates, only about 10% of all SNV (somatic andmore » germline) were detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNV without a detectable mRNA transcript were also observed demonstrating the transcriptome coverage was also incomplete (~80%). In contrast to germ-line variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than the luminal tumor raising the possibility of differential translation or protein degradation effects. In conclusion, the QUILTS program integrates DNA, RNA and peptide sequencing to assess the degree to which somatic mutations are translated and therefore biologically active. By identifying gaps in sequence coverage QUILTS benchmarks current technology and assesses progress towards whole cancer proteome and transcriptome analysis.« less

  15. Development of cleaved amplified polymorphic sequence markers and a CAPS-based genetic linkage map in watermelon (Citrullus lanatus [Thunb.] Matsum. and Nakai) constructed using whole-genome re-sequencing data

    PubMed Central

    Liu, Shi; Gao, Peng; Zhu, Qianglong; Luan, Feishi; Davis, Angela R.; Wang, Xiaolu

    2016-01-01

    Cleaved amplified polymorphic sequence (CAPS) markers are useful tools for detecting single nucleotide polymorphisms (SNPs). This study detected and converted SNP sites into CAPS markers based on high-throughput re-sequencing data in watermelon, for linkage map construction and quantitative trait locus (QTL) analysis. Two inbred lines, Cream of Saskatchewan (COS) and LSW-177 had been re-sequenced and analyzed by Perl self-compiled script for CAPS marker development. 88.7% and 78.5% of the assembled sequences of the two parental materials could map to the reference watermelon genome, respectively. Comparative assembled genome data analysis provided 225,693 and 19,268 SNPs and indels between the two materials. 532 pairs of CAPS markers were designed with 16 restriction enzymes, among which 271 pairs of primers gave distinct bands of the expected length and polymorphic bands, via PCR and enzyme digestion, with a polymorphic rate of 50.94%. Using the new CAPS markers, an initial CAPS-based genetic linkage map was constructed with the F2 population, spanning 1836.51 cM with 11 linkage groups and 301 markers. 12 QTLs were detected related to fruit flesh color, length, width, shape index, and brix content. These newly CAPS markers will be a valuable resource for breeding programs and genetic studies of watermelon. PMID:27162496

  16. Novel species including Mycobacterium fukienense sp. is found from tuberculosis patients in Fujian Province, China, using phylogenetic analysis of Mycobacterium chelonae/abscessus complex.

    PubMed

    Zhang, Yuan Yuan; Li, Yan Bing; Huang, Ming Xiang; Zhao, Xiu Qin; Zhang, Li Shui; Liu, Wen En; Wan, Kang Lin

    2013-11-01

    To identify the novel species 'Mycobacterium fukienense' sp. nov of Mycobacterium chelonae/abscessus complex from tuberculosis patients in Fujian Province, China. Five of 27 clinical Mycobacterium isolates (Cls) were previously identified as M. chelonae/abscessus complex by sequencing the hsp65, rpoB, 16S-23S rRNA internal transcribed spacer region (its), recA and sodA house-keeping genes commonly used to describe the molecular characteristics of Mycobacterium. Clinical Mycobacterium isolates were classified according to the gene sequence using a clustering analysis program. Sequence similarity within clusters and diversity between clusters were analyzed. The 5 isolates were identified with distinct sequences exhibiting 99.8% homology in the hsp65 gene. However, a complete lack of homology was observed among the sequences of the rpoB, 16S-23S rRNA internal transcribed spacer region (its), sodA, and recA genes as compared with the M. abscessus. Furthermore, no match for rpoB, sodA, and recA genes was identified among the published sequences. The novel species, Mycobacterium fukienense, is identified from tuberculosis patients in Fujian Province, China, which does not belong to any existing subspecies of M. chelonea/abscessus complex. Copyright © 2013 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  17. Interactive computer graphics system for structural sizing and analysis of aircraft structures

    NASA Technical Reports Server (NTRS)

    Bendavid, D.; Pipano, A.; Raibstein, A.; Somekh, E.

    1975-01-01

    A computerized system for preliminary sizing and analysis of aircraft wing and fuselage structures was described. The system is based upon repeated application of analytical program modules, which are interactively interfaced and sequence-controlled during the iterative design process with the aid of design-oriented graphics software modules. The entire process is initiated and controlled via low-cost interactive graphics terminals driven by a remote computer in a time-sharing mode.

  18. Station blackout transient at the Browns Ferry Unit 1 Plant: a severe accident sequence analysis (SASA) program study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schultz, R.R.

    1982-01-01

    Operating plant transients are of great interest for many reasons, not the least of which is the potential for a mild transient to degenerate to a severe transient yielding core damage. Using the Browns Ferry (BF) Unit-1 plant as a basis of study, the station blackout sequence was investigated by the Severe Accident Sequence Analysis (SASA) Program in support of the Nuclear Regulatory Commission's Unresolved Safety Issue A-44: Station Blackout. A station blackout transient occurs when the plant's AC power from a comemrcial power grid is lost and cannot be restored by the diesel generators. Under normal operating conditions, fmore » a loss of offsite power (LOSP) occurs (i.e., a complete severance of the BF plants from the Tennessee Valley Authority (TVA) power grid), the eight diesel generators at the three BF units would quickly start and power the emergency AC buses. Of the eight diesel generators, only six are needed to safely shut down all three units. Examination of BF-specific data show that LOSP frequency is low at Unit 1. The station blackout frequency is even lower (5.7 x 10/sup -4/ events per year) and hinges on whether the diesel generators start. The frequency of diesel generator failure is dictated in large measure by the emergency equipment cooling water (EECW) system that cools the diesel generators.« less

  19. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    PubMed

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically relevant data analysis, and the uncovering of variations in the time-dependent behavior of chromatin. TDCA automates the analysis of TC ChIP-seq experiments, permitting researchers to easily obtain raw and modeled data for specific loci or groups of loci with similar behavior while also enhancing consistency of data analysis of TC data within the genomics field.

  20. Design automation techniques for custom LSI arrays

    NASA Technical Reports Server (NTRS)

    Feller, A.

    1975-01-01

    The standard cell design automation technique is described as an approach for generating random logic PMOS, CMOS or CMOS/SOS custom large scale integration arrays with low initial nonrecurring costs and quick turnaround time or design cycle. The system is composed of predesigned circuit functions or cells and computer programs capable of automatic placement and interconnection of the cells in accordance with an input data net list. The program generates a set of instructions to drive an automatic precision artwork generator. A series of support design automation and simulation programs are described, including programs for verifying correctness of the logic on the arrays, performing dc and dynamic analysis of MOS devices, and generating test sequences.

  1. Preparing and Analyzing Expressed Sequence Tags (ESTs) Library for the Mammary Tissue of Local Turkish Kivircik Sheep

    PubMed Central

    Omeroglu Ulu, Zehra; Ulu, Salih; Un, Cemal; Ozdem Oztabak, Kemal; Altunatmaz, Kemal

    2017-01-01

    Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO) category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260). EST data in this study have provided a new source of information to functional genome studies of sheep. PMID:28239610

  2. Cosmetology: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a cosmetology vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  3. Aircraft Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an aircraft mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and…

  4. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

    PubMed Central

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952

  5. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-04-28

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.

  6. Apple Macintosh programs for nucleic and protein sequence analyses.

    PubMed Central

    Bellon, B

    1988-01-01

    This paper describes a package of programs for handling and analyzing nucleic acid and protein sequences using the Apple Macintosh microcomputer. There are three important features of these programs: first, because of the now classical Macintosh interface the programs can be easily used by persons with little or no computer experience. Second, it is possible to save all the data, written in an editable scrolling text window or drawn in a graphic window, as files that can be directly used either as word processing documents or as picture documents. Third, sequences can be easily exchanged with any other computer. The package is composed of thirteen programs, written in Pascal programming language. PMID:2832832

  7. Massive programmed translational jumping in mitochondria

    PubMed Central

    Lang, B. Franz; Jakubkova, Michaela; Hegedusova, Eva; Daoud, Rachid; Forget, Lise; Brejova, Brona; Vinar, Tomas; Kosa, Peter; Fricova, Dominika; Nebohacova, Martina; Griac, Peter; Tomaska, Lubomir; Burger, Gertraud; Nosek, Jozef

    2014-01-01

    Programmed translational bypassing is a process whereby ribosomes “ignore” a substantial interval of mRNA sequence. Although discovered 25 y ago, the only experimentally confirmed example of this puzzling phenomenon is expression of the bacteriophage T4 gene 60. Bypassing requires translational blockage at a “takeoff codon” immediately upstream of a stop codon followed by a hairpin, which causes peptidyl-tRNA dissociation and reassociation with a matching “landing triplet” 50 nt downstream, where translation resumes. Here, we report 81 translational bypassing elements (byps) in mitochondria of the yeast Magnusiomyces capitatus and demonstrate in three cases, by transcript analysis and proteomics, that byps are retained in mitochondrial mRNAs but not translated. Although mitochondrial byps resemble the bypass sequence in the T4 gene 60, they utilize unused codons instead of stops for translational blockage and have relaxed matching rules for takeoff/landing sites. We detected byp-like sequences also in mtDNAs of several Saccharomycetales, indicating that byps are mobile genetic elements. These byp-like sequences lack bypassing activity and are tolerated when inserted in-frame in variable protein regions. We hypothesize that byp-like elements have the potential to contribute to evolutionary diversification of proteins by adding new domains that allow exploration of new structures and functions. PMID:24711422

  8. IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments.

    PubMed

    Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Naeem; Naveed, Nasir; Ahmad, Sarfraz; Muhammad, Shah; Qadri, Salman; Shahid, Muhammad; Hussain, Tanveer; Javed, Maryam

    2015-01-01

    IVisTMSA is a software package of seven graphical tools for multiple sequence alignments. MSApad is an editing and analysis tool. It can load 409% more data than Jalview, STRAP, CINEMA, and Base-by-Base. MSA comparator allows the user to visualize consistent and inconsistent regions of reference and test alignments of more than 21-MB size in less than 12 seconds. MSA comparator is 5,200% efficient and more than 40% efficient as compared to BALiBASE c program and FastSP, respectively. MSA reconstruction tool provides graphical user interfaces for four popular aligners and allows the user to load several sequence files at a time. FASTA generator converts seven formats of alignments of unlimited size into FASTA format in a few seconds. MSA ID calculator calculates identity matrix of more than 11,000 sequences with a sequence length of 2,696 base pairs in less than 100 seconds. Tree and Distance Matrix calculation tools generate phylogenetic tree and distance matrix, respectively, using neighbor joining% identity and BLOSUM 62 matrix.

  9. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences.

    PubMed

    Waldmann, Jost; Gerken, Jan; Hankeln, Wolfgang; Schweer, Timmy; Glöckner, Frank Oliver

    2014-06-14

    Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.

  10. Evolution and Diversity of the Human Hepatitis D Virus Genome

    PubMed Central

    Huang, Chi-Ruei; Lo, Szecheng J.

    2010-01-01

    Human hepatitis delta virus (HDV) is the smallest RNA virus in genome. HDV genome is divided into a viroid-like sequence and a protein-coding sequence which could have originated from different resources and the HDV genome was eventually constituted through RNA recombination. The genome subsequently diversified through accumulation of mutations selected by interactions between the mutated RNA and proteins with host factors to successfully form the infectious virions. Therefore, we propose that the conservation of HDV nucleotide sequence is highly related with its functionality. Genome analysis of known HDV isolates shows that the C-terminal coding sequences of large delta antigen (LDAg) are the highest diversity than other regions of protein-coding sequences but they still retain biological functionality to interact with the heavy chain of clathrin can be selected and maintained. Since viruses interact with many host factors, including escaping the host immune response, how to design a program to predict RNA genome evolution is a great challenging work. PMID:20204073

  11. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2007-01-01

    GenBank (R) is a comprehensive database that contains publicly available nucleotide sequences for more than 240 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage (www.ncbi.nlm.nih.gov).

  12. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2005-01-01

    GenBank is a comprehensive database that contains publicly available DNA sequences for more than 165,000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps to ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.

  13. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Wheeler, David L

    2006-01-01

    GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 205 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the Web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at www.ncbi.nlm.nih.gov.

  14. Self-Organizing Hidden Markov Model Map (SOHMMM).

    PubMed

    Ferles, Christos; Stafylopatis, Andreas

    2013-12-01

    A hybrid approach combining the Self-Organizing Map (SOM) and the Hidden Markov Model (HMM) is presented. The Self-Organizing Hidden Markov Model Map (SOHMMM) establishes a cross-section between the theoretic foundations and algorithmic realizations of its constituents. The respective architectures and learning methodologies are fused in an attempt to meet the increasing requirements imposed by the properties of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein chain molecules. The fusion and synergy of the SOM unsupervised training and the HMM dynamic programming algorithms bring forth a novel on-line gradient descent unsupervised learning algorithm, which is fully integrated into the SOHMMM. Since the SOHMMM carries out probabilistic sequence analysis with little or no prior knowledge, it can have a variety of applications in clustering, dimensionality reduction and visualization of large-scale sequence spaces, and also, in sequence discrimination, search and classification. Two series of experiments based on artificial sequence data and splice junction gene sequences demonstrate the SOHMMM's characteristics and capabilities. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules.

    PubMed

    Turatsinze, Jean-Valery; Thomas-Chollier, Morgane; Defrance, Matthieu; van Helden, Jacques

    2008-01-01

    This protocol shows how to detect putative cis-regulatory elements and regions enriched in such elements with the regulatory sequence analysis tools (RSAT) web server (http://rsat.ulb.ac.be/rsat/). The approach applies to known transcription factors, whose binding specificity is represented by position-specific scoring matrices, using the program matrix-scan. The detection of individual binding sites is known to return many false predictions. However, results can be strongly improved by estimating P value, and by searching for combinations of sites (homotypic and heterotypic models). We illustrate the detection of sites and enriched regions with a study case, the upstream sequence of the Drosophila melanogaster gene even-skipped. This protocol is also tested on random control sequences to evaluate the reliability of the predictions. Each task requires a few minutes of computation time on the server. The complete protocol can be executed in about one hour.

  16. Rapid and Easy Protocol for Quantification of Next-Generation Sequencing Libraries.

    PubMed

    Hawkins, Steve F C; Guest, Paul C

    2018-01-01

    The emergence of next-generation sequencing (NGS) over the last 10 years has increased the efficiency of DNA sequencing in terms of speed, ease, and price. However, the exact quantification of a NGS library is crucial in order to obtain good data on sequencing platforms developed by the current market leader Illumina. Different approaches for DNA quantification are available currently and the most commonly used are based on analysis of the physical properties of the DNA through spectrophotometric or fluorometric methods. Although these methods are technically simple, they do not allow exact quantification as can be achieved using a real-time quantitative PCR (qPCR) approach. A qPCR protocol for DNA quantification with applications in NGS library preparation studies is presented here. This can be applied in various fields of study such as medical disorders resulting from nutritional programming disturbances.

  17. Mississippi Curriculum Framework for Postsecondary Funeral Services Technology Programs (Program CIP: 12.0301--Funeral Service and Mortuary Science). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the state's funeral services technology program. Presented in the introduction are a program description and suggested course sequence. Section I lists baseline competencies for the funeral…

  18. Mississippi Curriculum Framework for Emergency Medical Technology--Basic (Program CIP: 51.0904). Emergency Medical Technology--Paramedic (Program CIP: 51.0904). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the emergency medical technology (EMT) programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline…

  19. Mississippi Curriculum Framework for Banking & Finance Technology (Program CIP: 52.0803--Banking and Related Financial Programs, Other). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the banking and finance technology program. Presented in the introduction are a program description and suggested course sequence. Section I is a curriculum guide consisting of outlines for…

  20. A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing

    2017-09-01

    Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.

  1. GENESUS: a two-step sequence design program for DNA nanostructure self-assembly.

    PubMed

    Tsutsumi, Takanobu; Asakawa, Takeshi; Kanegami, Akemi; Okada, Takao; Tahira, Tomoko; Hayashi, Kenshi

    2014-01-01

    DNA has been recognized as an ideal material for bottom-up construction of nanometer scale structures by self-assembly. The generation of sequences optimized for unique self-assembly (GENESUS) program reported here is a straightforward method for generating sets of strand sequences optimized for self-assembly of arbitrarily designed DNA nanostructures by a generate-candidates-and-choose-the-best strategy. A scalable procedure to prepare single-stranded DNA having arbitrary sequences is also presented. Strands for the assembly of various structures were designed and successfully constructed, validating both the program and the procedure.

  2. Mississippi Curriculum Framework for Automotive Technology Programs (CIP: 47.0604--Automotive Mechanic/Tech.). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the automotive technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and…

  3. Mississippi Curriculum Framework for Automotive Machinist (Program CIP: 47.0690--Auto Machinist). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the automotive machinist programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and…

  4. Mississippi Curriculum Framework for Medical Assisting Technology Programs (CIP: 51.0801--Medical Assistant). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the medical assisting technology program. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and…

  5. Health Occupations: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in health occupations. The guide consists of a course description; general course…

  6. Auto Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an auto mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  7. Diesel Mechanics: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a diesel mechanics vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  8. Air Conditioning, Heating, and Refrigeration: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for an air conditioning, heating, and refrigeration vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed…

  9. Commercial Art: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This scope and sequence guide, developed for a commercial art vocational education program, represents an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System. It was developed as a result of needs expressed by teachers, parents, and the…

  10. Urban Horticulture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 4-year program in urban horticulture. The guide consists of a course description; general course…

  11. VOE Accounting: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in accounting. The guide consists of a course description; general course objectives;…

  12. Agriculture: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 3-year program in agriculture. The guide consists of a course description; general course objectives;…

  13. Bi-PROF

    PubMed Central

    Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha

    2013-01-01

    The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588

  14. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2010-01-01

    GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bi-monthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI homepage: www.ncbi.nlm.nih.gov.

  15. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2009-01-01

    GenBank is a comprehensive database that contains publicly available nucleotide sequences for more than 300,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank(R) staff upon receipt. Daily data exchange with the European Molecular Biology Laboratory Nucleotide Sequence Database in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  16. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  17. The CNO Bi-cycle in the Open Cluster NGC 752

    NASA Astrophysics Data System (ADS)

    Hawkins, Keith; Schuler, S.; King, J.; The, L.

    2011-01-01

    The CNO bi-cycle is the primary energy source for main sequence stars more massive than the sun. To test our understanding of stellar evolution models using the CNO bi-cycle, we have undertaken light-element (CNO) abundance analysis of three main sequence dwarf stars and three red giant stars in the open cluster NGC 752 utilizing high resolution (R 50,000) spectroscopy from the Keck Observatory. Preliminary results indicate, as expected, there is a depletion of carbon in the giants relative to the dwarfs. Additional analysis is needed to determine if the amount of depletion is in line with model predictions, as seen in the Hyades open cluster. Oxygen abundances are derived from the high-excitation O I triplet, and there is a 0.19 dex offset in the [O/H] abundances between the giants and dwarfs which may be explained by non-local thermodynamic equilibrium (NLTE), although further analysis is needed to verify this. The standard procedure for spectroscopically determining stellar parameters used here allows for a measurement of the cluster metallicity, [Fe/H] = 0.04 ± 0.02. In addition to the Fe abundances we have determined Na, Mg, and Al abundances to determine the status of other nucleosynthesis processes. The Na, Mg and Al abundances of the giants are enhanced relative to the dwarfs, which is consistent with similar findings in giants of other open clusters. Support for K. Hawkins was provided by the NOAO/KPNO Research Experiences for Undergraduates (REU) Program which is funded by the National Science Foundation Research Experiences for Undergraduates Program and the Department of Defense ASSURE program through Scientific Program Order No. 13 (AST-0754223) of the Cooperative Agreement No. AST-0132798 between the Association of Universities for Research in Astronomy (AURA) and the NSF.

  18. A trace display and editing program for data from fluorescence based sequencing machines.

    PubMed

    Gleeson, T; Hillier, L

    1991-12-11

    'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.

  19. Rapid microsatellite marker development for African mahogany (Khaya senegalensis, Meliaceae) using next-generation sequencing and assessment of its intra-specific genetic diversity.

    PubMed

    Karan, M; Evans, D S; Reilly, D; Schulte, K; Wright, C; Innes, D; Holton, T A; Nikles, D G; Dickinson, G R

    2012-03-01

    Khaya senegalensis (African mahogany or dry-zone mahogany) is a high-value hardwood timber species with great potential for forest plantations in northern Australia. The species is distributed across the sub-Saharan belt from Senegal to Sudan and Uganda. Because of heavy exploitation and constraints on natural regeneration and sustainable planting, it is now classified as a vulnerable species. Here, we describe the development of microsatellite markers for K. senegalensis using next-generation sequencing to assess its intra-specific diversity across its natural range, which is a key for successful breeding programs and effective conservation management of the species. Next-generation sequencing yielded 93,943 sequences with an average read length of 234 bp. The assembled sequences contained 1030 simple sequence repeats, with primers designed for 522 microsatellite loci. Twenty-one microsatellite loci were tested with 11 showing reliable amplification and polymorphism in K. senegalensis. The 11 novel microsatellites, together with one previously published, were used to assess 73 accessions belonging to the Australian K. senegalensis domestication program, sampled from across the natural range of the species. STRUCTURE analysis shows two major clusters, one comprising mainly accessions from west Africa (Senegal to Benin) and the second based in the far eastern limits of the range in Sudan and Uganda. Higher levels of genetic diversity were found in material from western Africa. This suggests that new seed collections from this region may yield more diverse genotypes than those originating from Sudan and Uganda in eastern Africa. © 2011 Blackwell Publishing Ltd.

  20. Sequence search on a supercomputer.

    PubMed

    Gotoh, O; Tagashira, Y

    1986-01-10

    A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.

  1. Rover Sequencing and Visualization Program

    NASA Technical Reports Server (NTRS)

    Cooper, Brian; Hartman, Frank; Maxwell, Scott; Yen, Jeng; Wright, John; Balacuit, Carlos

    2005-01-01

    The Rover Sequencing and Visualization Program (RSVP) is the software tool for use in the Mars Exploration Rover (MER) mission for planning rover operations and generating command sequences for accomplishing those operations. RSVP combines three-dimensional (3D) visualization for immersive exploration of the operations area, stereoscopic image display for high-resolution examination of the downlinked imagery, and a sophisticated command-sequence editing tool for analysis and completion of the sequences. RSVP is linked with actual flight-code modules for operations rehearsal to provide feedback on the expected behavior of the rover prior to committing to a particular sequence. Playback tools allow for review of both rehearsed rover behavior and downlinked results of actual rover operations. These can be displayed simultaneously for comparison of rehearsed and actual activities for verification. The primary inputs to RSVP are downlink data products from the Operations Storage Server (OSS) and activity plans generated by the science team. The activity plans are high-level goals for the next day s activities. The downlink data products include imagery, terrain models, and telemetered engineering data on rover activities and state. The Rover Sequence Editor (RoSE) component of RSVP performs activity expansion to command sequences, command creation and editing with setting of command parameters, and viewing and management of rover resources. The HyperDrive component of RSVP performs 2D and 3D visualization of the rover s environment, graphical and animated review of rover-predicted and telemetered state, and creation and editing of command sequences related to mobility and Instrument Deployment Device (IDD) operations. Additionally, RoSE and HyperDrive together evaluate command sequences for potential violations of flight and safety rules. The products of RSVP include command sequences for uplink that are stored in the Distributed Object Manager (DOM) and predicted rover state histories stored in the OSS for comparison and validation of downlinked telemetry. The majority of components comprising RSVP utilize the MER command and activity dictionaries to automatically customize the system for MER activities. Thus, RSVP, being highly data driven, may be tailored to other missions with minimal effort. In addition, RSVP uses a distributed, message-passing architecture to allow multitasking, and collaborative visualization and sequence development by scattered team members.

  2. ATLAS, an integrated structural analysis and design system. Volume 3: User's manual, input and execution data

    NASA Technical Reports Server (NTRS)

    Dreisbach, R. L. (Editor)

    1979-01-01

    The input data and execution control statements for the ATLAS integrated structural analysis and design system are described. It is operational on the Control Data Corporation (CDC) 6600/CYBER computers in a batch mode or in a time-shared mode via interactive graphic or text terminals. ATLAS is a modular system of computer codes with common executive and data base management components. The system provides an extensive set of general-purpose technical programs with analytical capabilities including stiffness, stress, loads, mass, substructuring, strength design, unsteady aerodynamics, vibration, and flutter analyses. The sequence and mode of execution of selected program modules are controlled via a common user-oriented language.

  3. Project Report: Automatic Sequence Processor Software Analysis

    NASA Technical Reports Server (NTRS)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  4. ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.

    PubMed

    Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia

    2017-12-01

    Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.

  5. NDE detectability of fatigue-type cracks in high-strength alloys: NDI reliability assessments

    NASA Technical Reports Server (NTRS)

    Christner, Brent K.; Long, Donald L.; Rummel, Ward D.

    1988-01-01

    This program was conducted to generate quantitative flaw detection capability data for the nondestructive evaluation (NDE) techniques typically practiced by aerospace contractors. Inconel 718 and Haynes 188 alloy test specimens containing fatigue flaws with a wide distribution of sizes were used to assess the flaw detection capabilities at a number of contractor and government facilities. During this program 85 inspection sequences were completed presenting a total of 20,994 fatigue cracks to 53 different inspectors. The inspection sequences completed included 78 liquid penetrant, 4 eddy current, and 3 ultrasonic evaluations. The results of the assessment inspections are presented and discussed. In generating the flaw detection capability data base, procedures for data collection, data analysis, and specimen care and maintenance were developed, demonstrated, and validated. The data collection procedures and methods that evolved during this program for the measurement of flaw detection capabilities and the effects of inspection variables on performance are discussed. The Inconel 718 and Haynes 188 test specimens that were used in conducting this program and the NDE assessment procedures that were demonstrated, provide NASA with the capability to accurately assess the flaw detection capabilities of specific inspection procedures being applied or proposed for use on current and future fracture control hardware program.

  6. GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics

    PubMed Central

    Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel

    2003-01-01

    Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976

  7. Unlimited Thirst for Genome Sequencing, Data Interpretation, and Database Usage in Genomic Era: The Road towards Fast-Track Crop Plant Improvement

    PubMed Central

    Govindaraj, Mahalingam

    2015-01-01

    The number of sequenced crop genomes and associated genomic resources is growing rapidly with the advent of inexpensive next generation sequencing methods. Databases have become an integral part of all aspects of science research, including basic and applied plant and animal sciences. The importance of databases keeps increasing as the volume of datasets from direct and indirect genomics, as well as other omics approaches, keeps expanding in recent years. The databases and associated web portals provide at a minimum a uniform set of tools and automated analysis across a wide range of crop plant genomes. This paper reviews some basic terms and considerations in dealing with crop plant databases utilization in advancing genomic era. The utilization of databases for variation analysis with other comparative genomics tools, and data interpretation platforms are well described. The major focus of this review is to provide knowledge on platforms and databases for genome-based investigations of agriculturally important crop plants. The utilization of these databases in applied crop improvement program is still being achieved widely; otherwise, the end for sequencing is not far away. PMID:25874133

  8. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  9. Genetic analysis of tolerance to the root lesion nematode Pratylenchus neglectus in the legume Medicago littoralis

    PubMed Central

    2014-01-01

    Background The nematode Pratylenchus neglectus has a wide host range and is able to feed on the root systems of cereals, oilseeds, grain and pasture legumes. Under the Mediterranean low rainfall environments of Australia, annual Medicago pasture legumes are used in rotation with cereals to fix atmospheric nitrogen and improve soil parameters. Considerable efforts are being made in breeding programs to improve resistance and tolerance to Pratylenchus neglectus in the major crops wheat and barley, which makes it vital to develop appropriate selection tools in medics. Results A strong source of tolerance to root damage by the root lesion nematode (RLN) Pratylenchus neglectus had previously been identified in line RH-1 (strand medic, M. littoralis). Using RH-1, we have developed a single seed descent (SSD) population of 138 lines by crossing it to the intolerant cultivar Herald. After inoculation, RLN-associated root damage clearly segregated in the population. Genetic analysis was performed by constructing a genetic map using simple sequence repeat (SSR) and gene-based SNP markers. A highly significant quantitative trait locus (QTL), QPnTolMl.1, was identified explaining 49% of the phenotypic variation in the SSD population. All SSRs and gene-based markers in the QTL region were derived from chromosome 1 of the sequenced genome of the closely related species M. truncatula. Gene-based markers were validated in advanced breeding lines derived from the RH-1 parent and also a second RLN tolerance source, RH-2 (M. truncatula ssp. tricycla). Comparative analysis to sequenced legume genomes showed that the physical QTL interval exists as a synteny block in Lotus japonicus, common bean, soybean and chickpea. Furthermore, using the sequenced genome information of M. truncatula, the QTL interval contains 55 genes out of which five are discussed as potential candidate genes responsible for the mapped tolerance. Conclusion The closely linked set of SNP-based PCR markers is directly applicable to select for two different sources of RLN tolerance in breeding programs. Moreover, genome sequence information has allowed proposing candidate genes for further functional analysis and nominates QPnTolMl.1 as a target locus for RLN tolerance in economically important grain legumes, e.g. chickpea. PMID:24742262

  10. PERFILS: a program for the quantitative treatment of footprinting data.

    PubMed

    Salas, X; Portugal, J

    1993-10-01

    PERFILS, a computer program written in Borland TurboPascal, performs quantitative analysis of footprinting experiments using any IBM PC or compatible microcomputer. The program uses the height of the bands obtained from densitometric scanning of footprinting autoradiographs to calculate a differential cleavage plot. Such a plot displays, on a logarithmic scale, the difference of susceptibility of a DNA fragment to DNase I, or any other cleaving agent, in the presence of any ligand versus the sequence. PERFILS calculates the fractional cleavage values for control and ligand, giving a table of values for each internucleotidic bond and rendering the differential cleavage plot in only a few seconds.

  11. AVID - A design system for technology studies of advanced transportation concepts. [Aerospace Vehicle Interactive Design

    NASA Technical Reports Server (NTRS)

    Wilhite, A. W.; Rehder, J. J.

    1979-01-01

    The basic AVID (Aerospace Vehicle Interactive Design) is a general system for conceptual and preliminary design currently being applied to a broad range of future space transportation and spacecraft vehicle concepts. AVID hardware includes a minicomputer allowing rapid designer interaction. AVID software includes (1) an executive program and communication data base which provide the automated capability to couple individual programs, either individually in an interactive mode or chained together in an automatic sequence mode; and (2) the individual technology and utility programs which provide analysis capability in areas such as graphics, aerodynamics, propulsion, flight performance, weights, sizing, and costs.

  12. Mississippi Curriculum Framework for Veterinary Technology (Program CIP: 51.0808--Veterinarian Asst./Animal Health). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the veterinary technology program. Presented in the introductory section are a of the program and suggested course sequence. Section I lists baseline competencies, and section II consists of…

  13. Mississippi Curriculum Framework for Collision Repair Technology (Program CIP: 47.0603--Auto/Autobody Repair). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the collision repair technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequences for 1- and 2-year certificates. Section…

  14. Mississippi Curriculum Framework for Forestry Technology (Program CIP: 03.0401--Forest Harvesting and Production Technology). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the forestry technology program cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies for the…

  15. Mississippi Curriculum Framework for Ophthalmic Technology (Program CIP: 51.1801--Opticianry/Dispensing Optician). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the ophthalmic technology program. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and section II…

  16. Mississippi Curriculum Framework for Health Care Assistant (Program CIP: 51.1614--Nursing Assistant/Aide). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the health care assistant program. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies for the nurse…

  17. Mississippi Curriculum Framework for Plumber and Pipefitter/Steamfitter (Program CIP: 46.0501--Plumber and Pipefitter). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the plumber and pipefitter/steamfitter cluster. Presented in the introductory section are program descriptions and suggested course sequences for the plumbing and pipefitting programs. Section…

  18. Mississippi Curriculum Framework for Medical Radiologic Technology (Radiography) (CIP: 51.0907--Medical Radiologic Technology). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the radiologic technology program. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies for the program,…

  19. Mississippi Curriculum Framework for Civil Technology (Program CIP: 15.0201--Civil Engineering/Civil Technology). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the civil technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and section…

  20. Mississippi Curriculum Framework for Marketing Management Technology (Program CIP: 52.1401--Business Mkt. & Mkt. Mgmt.). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the state's marketing management technology program. Presented in the introduction are a program description and suggested course sequence. Section I lists baseline competencies for the…

  1. The Effect of a Classroom-Based Intensive Robotics and Programming Workshop on Sequencing Ability in Early Childhood

    ERIC Educational Resources Information Center

    Kazakoff, Elizabeth R.; Sullivan, Amanda; Bers, Marina U.

    2013-01-01

    This paper examines the impact of programming robots on sequencing ability during a 1-week intensive robotics workshop at an early childhood STEM magnet school in the Harlem area of New York City. Children participated in computer programming activities using a developmentally appropriate tangible programming language CHERP, specifically designed…

  2. Mississippi Curriculum Framework for Medical Laboratory Technology Programs (CIP: 51.1004--Medical Laboratory Technology). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the medical laboratory technology program. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies, and…

  3. A Model Program for Translational Medicine in Epilepsy Genetics

    PubMed Central

    Smith, Lacey A.; Ullmann, Jeremy F. P.; Olson, Heather E.; El Achkar, Christelle M.; Truglio, Gessica; Kelly, McKenna; Rosen-Sheidley, Beth; Poduri, Annapurna

    2017-01-01

    Recent technological advances in gene sequencing have led to a rapid increase in gene discovery in epilepsy. However, the ability to assess pathogenicity of variants, provide functional analysis, and develop targeted therapies has not kept pace with rapid advances in sequencing technology. Thus, although clinical genetic testing may lead to a specific molecular diagnosis for some patients, test results often lead to more questions than answers. As the field begins to focus on therapeutic applications of genetic diagnoses using precision medicine, developing processes that offer more than equivocal test results is essential. The success of precision medicine in epilepsy relies on establishing a correct genetic diagnosis, analyzing functional consequences of genetic variants, screening potential therapeutics in the preclinical laboratory setting, and initiating targeted therapy trials for patients. We describe the structure of a comprehensive, pediatric Epilepsy Genetics Program that can serve as a model for translational medicine in epilepsy. PMID:28056630

  4. A Web-Based System for Monitoring and Controlling Multidisciplinary Design Projects

    NASA Technical Reports Server (NTRS)

    Salas, Andrea O.; Rogers, James L.

    1997-01-01

    In today's competitive environment, both industry and government agencies are under enormous pressure to reduce the time and cost of multidisciplinary design projects. A number of frameworks have been introduced to assist in this process by facilitating the integration of and communication among diverse disciplinary codes. An examination of current frameworks reveals weaknesses in various areas such as sequencing, displaying, monitoring, and controlling the design process. The objective of this research is to explore how Web technology, in conjunction with an existing framework, can improve these areas of weakness. This paper describes a system that executes a sequence of programs, monitors and controls the design process through a Web-based interface, and visualizes intermediate and final results through the use of Java(Tm) applets. A small sample problem, which includes nine processes with two analysis programs that are coupled to an optimizer, is used to demonstrate the feasibility of this approach.

  5. Interim reliability evaluation program, Browns Ferry 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mays, S.E.; Poloski, J.P.; Sullivan, W.H.

    1981-01-01

    Probabilistic risk analysis techniques, i.e., event tree and fault tree analysis, were utilized to provide a risk assessment of the Browns Ferry Nuclear Plant Unit 1. Browns Ferry 1 is a General Electric boiling water reactor of the BWR 4 product line with a Mark 1 (drywell and torus) containment. Within the guidelines of the IREP Procedure and Schedule Guide, dominant accident sequences that contribute to public health and safety risks were identified and grouped according to release categories.

  6. PanWeb: A web interface for pan-genomic analysis.

    PubMed

    Pantoja, Yan; Pinheiro, Kenny; Veras, Allan; Araújo, Fabrício; Lopes de Sousa, Ailton; Guimarães, Luis Carlos; Silva, Artur; Ramos, Rommel T J

    2017-01-01

    With increased production of genomic data since the advent of next-generation sequencing (NGS), there has been a need to develop new bioinformatics tools and areas, such as comparative genomics. In comparative genomics, the genetic material of an organism is directly compared to that of another organism to better understand biological species. Moreover, the exponentially growing number of deposited prokaryote genomes has enabled the investigation of several genomic characteristics that are intrinsic to certain species. Thus, a new approach to comparative genomics, termed pan-genomics, was developed. In pan-genomics, various organisms of the same species or genus are compared. Currently, there are many tools that can perform pan-genomic analyses, such as PGAP (Pan-Genome Analysis Pipeline), Panseq (Pan-Genome Sequence Analysis Program) and PGAT (Prokaryotic Genome Analysis Tool). Among these software tools, PGAP was developed in the Perl scripting language and its reliance on UNIX platform terminals and its requirement for an extensive parameterized command line can become a problem for users without previous computational knowledge. Thus, the aim of this study was to develop a web application, known as PanWeb, that serves as a graphical interface for PGAP. In addition, using the output files of the PGAP pipeline, the application generates graphics using custom-developed scripts in the R programming language. PanWeb is freely available at http://www.computationalbiology.ufpa.br/panweb.

  7. PhyloTreePruner: A Phylogenetic Tree-Based Approach for Selection of Orthologous Sequences for Phylogenomics.

    PubMed

    Kocot, Kevin M; Citarella, Mathew R; Moroz, Leonid L; Halanych, Kenneth M

    2013-01-01

    Molecular phylogenetics relies on accurate identification of orthologous sequences among the taxa of interest. Most orthology inference programs available for use in phylogenomics rely on small sets of pre-defined orthologs from model organisms or phenetic approaches such as all-versus-all sequence comparisons followed by Markov graph-based clustering. Such approaches have high sensitivity but may erroneously include paralogous sequences. We developed PhyloTreePruner, a software utility that uses a phylogenetic approach to refine orthology inferences made using phenetic methods. PhyloTreePruner checks single-gene trees for evidence of paralogy and generates a new alignment for each group containing only sequences inferred to be orthologs. Importantly, PhyloTreePruner takes into account support values on the tree and avoids unnecessarily deleting sequences in cases where a weakly supported tree topology incorrectly indicates paralogy. A test of PhyloTreePruner on a dataset generated from 11 completely sequenced arthropod genomes identified 2,027 orthologous groups sampled for all taxa. Phylogenetic analysis of the concatenated supermatrix yielded a generally well-supported topology that was consistent with the current understanding of arthropod phylogeny. PhyloTreePruner is freely available from http://sourceforge.net/projects/phylotreepruner/.

  8. TraceContract

    NASA Technical Reports Server (NTRS)

    Kavelund, Klaus; Barringer, Howard

    2012-01-01

    TraceContract is an API (Application Programming Interface) for trace analysis. A trace is a sequence of events, and can, for example, be generated by a running program, instrumented appropriately to generate events. An event can be any data object. An example of a trace is a log file containing events that a programmer has found important to record during a program execution. Trace - Contract takes as input such a trace together with a specification formulated using the API and reports on any violations of the specification, potentially calling code (reactions) to be executed when violations are detected. The software is developed as an internal DSL (Domain Specific Language) in the Scala programming language. Scala is a relatively new programming language that is specifically convenient for defining such internal DSLs due to a number of language characteristics. This includes Scala s elegant combination of object-oriented and functional programming, a succinct notation, and an advanced type system. The DSL offers a combination of data-parameterized state machines and temporal logic, which is novel. As an extension of Scala, it is a very expressive and convenient log file analysis framework.

  9. Identification of Two Novel Amalgaviruses in the Common Eelgrass (Zostera marina) and in Silico Analysis of the Amalgavirus +1 Programmed Ribosomal Frameshifting Sites.

    PubMed

    Park, Dongbin; Goh, Chul Jun; Kim, Hyein; Hahn, Yoonsoo

    2018-04-01

    The genome sequences of two novel monopartite RNA viruses were identified in a common eelgrass ( Zostera marina ) transcriptome dataset. Sequence comparison and phylogenetic analyses revealed that these two novel viruses belong to the genus Amalgavirus in the family Amalgaviridae . They were named Zostera marina amalgavirus 1 (ZmAV1) and Zostera marina amalgavirus 2 (ZmAV2). Genomes of both ZmAV1 and ZmAV2 contain two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. The fusion protein (ORF1+2) of ORF1 and ORF2, which mediates RNA replication, was produced using the +1 programmed ribosomal frameshifting (PRF) mechanism. The +1 PRF motif sequence, UUU_CGN, which is highly conserved among known amalgaviruses, was also found in ZmAV1 and ZmAV2. Multiple sequence alignment of the ORF1+2 fusion proteins from 24 amalgaviruses revealed that +1 PRF occurred only at three different positions within the 13-amino acid-long segment, which was surrounded by highly conserved regions on both sides. This suggested that the +1 PRF may be constrained by the structure of fusion proteins. Genome sequences of ZmAV1 and ZmAV2, which are the first viruses to be identified in common eelgrass, will serve as useful resources for studying evolution and diversity of amalgaviruses.

  10. Identification of Two Novel Amalgaviruses in the Common Eelgrass (Zostera marina) and in Silico Analysis of the Amalgavirus +1 Programmed Ribosomal Frameshifting Sites

    PubMed Central

    Park, Dongbin; Goh, Chul Jun; Kim, Hyein; Hahn, Yoonsoo

    2018-01-01

    The genome sequences of two novel monopartite RNA viruses were identified in a common eelgrass (Zostera marina) transcriptome dataset. Sequence comparison and phylogenetic analyses revealed that these two novel viruses belong to the genus Amalgavirus in the family Amalgaviridae. They were named Zostera marina amalgavirus 1 (ZmAV1) and Zostera marina amalgavirus 2 (ZmAV2). Genomes of both ZmAV1 and ZmAV2 contain two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. The fusion protein (ORF1+2) of ORF1 and ORF2, which mediates RNA replication, was produced using the +1 programmed ribosomal frameshifting (PRF) mechanism. The +1 PRF motif sequence, UUU_CGN, which is highly conserved among known amalgaviruses, was also found in ZmAV1 and ZmAV2. Multiple sequence alignment of the ORF1+2 fusion proteins from 24 amalgaviruses revealed that +1 PRF occurred only at three different positions within the 13-amino acid-long segment, which was surrounded by highly conserved regions on both sides. This suggested that the +1 PRF may be constrained by the structure of fusion proteins. Genome sequences of ZmAV1 and ZmAV2, which are the first viruses to be identified in common eelgrass, will serve as useful resources for studying evolution and diversity of amalgaviruses. PMID:29628822

  11. A filtering method to generate high quality short reads using illumina paired-end technology.

    PubMed

    Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L

    2013-01-01

    Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.

  12. SNP-VISTA: An Interactive SNPs Visualization Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.

    2005-07-05

    Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmentalmore » samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.« less

  13. Information resources at the National Center for Biotechnology Information.

    PubMed Central

    Woodsmall, R M; Benson, D A

    1993-01-01

    The National Center for Biotechnology Information (NCBI), part of the National Library of Medicine, was established in 1988 to perform basic research in the field of computational molecular biology as well as build and distribute molecular biology databases. The basic research has led to new algorithms and analysis tools for interpreting genomic data and has been instrumental in the discovery of human disease genes for neurofibromatosis and Kallmann syndrome. The principal database responsibility is the National Institutes of Health (NIH) genetic sequence database, GenBank. NCBI, in collaboration with international partners, builds, distributes, and provides online and CD-ROM access to over 112,000 DNA sequences. Another major program is the integration of multiple sequences databases and related bibliographic information and the development of network-based retrieval systems for Internet access. PMID:8374583

  14. Marketing and Distributive Education: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year program in marketing and distributive education. The guide consists of a course description;…

  15. Industrial Cooperative Education Co-op: Scope and Sequence.

    ERIC Educational Resources Information Center

    Nashville - Davidson County Metropolitan Public Schools, TN.

    This guide, which was written as an initial step in the development of a systemwide articulated curriculum sequence for all vocational programs within the Metropolitan Nashville Public School System, outlines the suggested scope and sequence of a 2-year cooperative program in industrial education. The guide consists of a course description;…

  16. Multiple and substitute addictions involving prescription drugs misuse among 12th graders: gateway theory revisited with Market Basket Analysis.

    PubMed

    Jayawardene, Wasantha Parakrama; YoussefAgha, Ahmed Hassan

    2014-01-01

    This study aimed to identify the sequential patterns of drug use initiation, which included prescription drugs misuse (PDM), among 12th-grade students in Indiana. The study also tested the suitability of the data mining method Market Basket Analysis (MBA) to detect common drug use initiation sequences in large-scale surveys. Data from 2007 to 2009 Annual Surveys of Alcohol, Tobacco, and Other Drug Use by Indiana Children and Adolescents were used for this study. A close-ended, self-administered questionnaire was used to ask adolescents about the use of 21 substance categories and the age of first use. "Support%" and "confidence%" statistics of Market Basket Analysis detected multiple and substitute addictions, respectively. The lifetime prevalence of using any addictive substance was 73.3%, and it has been decreasing during past few years. Although the lifetime prevalence of PDM was 19.2%, it has been increasing. Males and whites were more likely to use drugs and engage in multiple addictions. Market Basket Analysis identified common drug use initiation sequences that involved 11 drugs. High levels of support existed for associations among alcohol, cigarettes, and marijuana, whereas associations that included prescription drugs had medium levels of support. Market Basket Analysis is useful for the detection of common substance use initiation sequences in large-scale surveys. Before initiation of prescription drugs, physicians should consider the adolescents' risk of addiction. Prevention programs should address multiple addictions, substitute addictions, common sequences in drug use initiation, sex and racial differences in PDM, and normative beliefs of parents and adolescents in relation to PDM.

  17. Program Calculates Forces in Bolted Structural Joints

    NASA Technical Reports Server (NTRS)

    Buder, Daniel A.

    2005-01-01

    FORTRAN 77 computer program calculates forces in bolts in the joints of structures. This program is used in conjunction with the NASTRAN finite-element structural-analysis program. A mathematical model of a structure is first created by approximating its load-bearing members with representative finite elements, then NASTRAN calculates the forces and moments that each finite element contributes to grid points located throughout the structure. The user selects the finite elements that correspond to structural members that contribute loads to the joints of interest, and identifies the grid point nearest to each such joint. This program reads the pertinent NASTRAN output, combines the forces and moments from the contributing elements to determine the resultant force and moment acting at each proximate grid point, then transforms the forces and moments from these grid points to the centroids of the affected joints. Then the program uses these joint loads to obtain the axial and shear forces in the individual bolts. The program identifies which bolts bear the greatest axial and/or shear loads. The program also performs a fail-safe analysis in which the foregoing calculations are repeated for a sequence of cases in which each fastener, in turn, is assumed not to transmit an axial force.

  18. FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.

    PubMed

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-07-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms.

  19. SeqHBase: a big data toolset for family based sequencing data analysis.

    PubMed

    He, Min; Person, Thomas N; Hebbring, Scott J; Heinzen, Ethan; Ye, Zhan; Schrodi, Steven J; McPherson, Elizabeth W; Lin, Simon M; Peissig, Peggy L; Brilliant, Murray H; O'Rawe, Jason; Robison, Reid J; Lyon, Gholson J; Wang, Kai

    2015-04-01

    Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. These results demonstrate SeqHBase's high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  20. FindGDPs: fast identification of primers for labeling microbial transcriptomes for DNA microarray analysis

    PubMed Central

    Blick, Robert J.; Revel, Andrew T.; Hansen, Eric J.

    2008-01-01

    Summary FindGDPs is a program that uses a greedy algorithm to quickly identify a set of genome-directed primers that specifically anneal to all of the open reading frames in a genome and that do not exhibit full-length complementarity to the members of another user-supplied set of nucleotide sequences. Availability The program code is distributed under the GNU General Public License at http://www8.utsouthwestern.edu/utsw/cda/dept131456/files/159331.html Contact eric.hansen@utsouthwestern.edu PMID:15593406

  1. Development of CACTA transposon derived SCAR markers and their use in population structure analysis in Zea mays.

    PubMed

    Roy, Neha Samir; Park, Kyong-Cheul; Lee, Sung-Il; Im, Min-Ji; Ramekar, Rahul Vasudeo; Kim, Nam-Soo

    2018-02-01

    Molecular marker technologies have proven to be an important breakthrough for genetic studies, construction of linkage maps and population genetics analysis. Transposable elements (TEs) constitute major fractions of repetitive sequences in plants and offer a wide range of possible areas to be explored as molecular markers. Sequence characterized amplified region (SCAR) marker development provides us with a simple and time saving alternative approach for marker development. We employed the CACTA-TD to develop SCARs and then integrated them into linkage map and used them for population structure and genetic diversity analysis of corn inbred population. A total of 108 dominant SCAR markers were designed out of which, 32 were successfully integrated in to the linkage map of maize RIL population and the remaining were added to a physical map for references to check the distribution throughout all chromosomes. Moreover, 76 polymorphic SCARs were used for diversity analysis of corn accessions being used in Korean corn breeding program. The overall average polymorphic information content (PIC) was 0.34, expected heterozygosity was 0.324 and Shannon's information index was 0.491 with a percentage of polymorphism of 98.67%. Further analysis by associating with desirable traits may also provide some accurate trait specific tagged SCAR markers. TE linked SCARs can provide an added level of polymorphism as well as improved discriminating ability and therefore can be useful in further breeding programs to develop high yielding germplasm.

  2. Knee cartilage extraction and bone-cartilage interface analysis from 3D MRI data sets

    NASA Astrophysics Data System (ADS)

    Tamez-Pena, Jose G.; Barbu-McInnis, Monica; Totterman, Saara

    2004-05-01

    This works presents a robust methodology for the analysis of the knee joint cartilage and the knee bone-cartilage interface from fused MRI sets. The proposed approach starts by fusing a set of two 3D MR images the knee. Although the proposed method is not pulse sequence dependent, the first sequence should be programmed to achieve good contrast between bone and cartilage. The recommended second pulse sequence is one that maximizes the contrast between cartilage and surrounding soft tissues. Once both pulse sequences are fused, the proposed bone-cartilage analysis is done in four major steps. First, an unsupervised segmentation algorithm is used to extract the femur, the tibia, and the patella. Second, a knowledge based feature extraction algorithm is used to extract the femoral, tibia and patellar cartilages. Third, a trained user corrects cartilage miss-classifications done by the automated extracted cartilage. Finally, the final segmentation is the revisited using an unsupervised MAP voxel relaxation algorithm. This final segmentation has the property that includes the extracted bone tissue as well as all the cartilage tissue. This is an improvement over previous approaches where only the cartilage was segmented. Furthermore, this approach yields very reproducible segmentation results in a set of scan-rescan experiments. When these segmentations were coupled with a partial volume compensated surface extraction algorithm the volume, area, thickness measurements shows precisions around 2.6%

  3. Analysis Tool Web Services from the EMBL-EBI.

    PubMed

    McWilliam, Hamish; Li, Weizhong; Uludag, Mahmut; Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Cowley, Andrew Peter; Lopez, Rodrigo

    2013-07-01

    Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods.

  4. Analysis Tool Web Services from the EMBL-EBI

    PubMed Central

    McWilliam, Hamish; Li, Weizhong; Uludag, Mahmut; Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Cowley, Andrew Peter; Lopez, Rodrigo

    2013-01-01

    Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services (http://www.ebi.ac.uk/Tools/webservices/) interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods. PMID:23671338

  5. A De Novo-Assembly Based Data Analysis Pipeline for Plant Obligate Parasite Metatranscriptomic Studies

    PubMed Central

    Guo, Li; Allen, Kelly S.; Deiulio, Greg; Zhang, Yong; Madeiras, Angela M.; Wick, Robert L.; Ma, Li-Jun

    2016-01-01

    Current and emerging plant diseases caused by obligate parasitic microbes such as rusts, downy mildews, and powdery mildews threaten worldwide crop production and food safety. These obligate parasites are typically unculturable in the laboratory, posing technical challenges to characterize them at the genetic and genomic level. Here we have developed a data analysis pipeline integrating several bioinformatic software programs. This pipeline facilitates rapid gene discovery and expression analysis of a plant host and its obligate parasite simultaneously by next generation sequencing of mixed host and pathogen RNA (i.e., metatranscriptomics). We applied this pipeline to metatranscriptomic sequencing data of sweet basil (Ocimum basilicum) and its obligate downy mildew parasite Peronospora belbahrii, both lacking a sequenced genome. Even with a single data point, we were able to identify both candidate host defense genes and pathogen virulence genes that are highly expressed during infection. This demonstrates the power of this pipeline for identifying genes important in host–pathogen interactions without prior genomic information for either the plant host or the obligate biotrophic pathogen. The simplicity of this pipeline makes it accessible to researchers with limited computational skills and applicable to metatranscriptomic data analysis in a wide range of plant-obligate-parasite systems. PMID:27462318

  6. Identification of estrogen-responsive genes using a genome-wide analysis of promoter elements for transcription factor binding sites.

    PubMed

    Kamalakaran, Sitharthan; Radhakrishnan, Senthil K; Beck, William T

    2005-06-03

    We developed a pipeline to identify novel genes regulated by the steroid hormone-dependent transcription factor, estrogen receptor, through a systematic analysis of upstream regions of all human and mouse genes. We built a data base of putative promoter regions for 23,077 human and 19,984 mouse transcripts from National Center for Biotechnology Information annotation and 8793 human and 6785 mouse promoters from the Data Base of Transcriptional Start Sites. We used this data base of putative promoters to identify potential targets of estrogen receptor by identifying estrogen response elements (EREs) in their promoters. Our program correctly identified EREs in genes known to be regulated by estrogen in addition to several new genes whose putative promoters contained EREs. We validated six genes (KIAA1243, NRIP1, MADH9, NME3, TPD52L, and ABCG2) to be estrogen-responsive in MCF7 cells using reverse transcription PCR. To allow for extensibility of our program in identifying targets of other transcription factors, we have built a Web interface to access our data base and programs. Our Web-based program for Promoter Analysis of Genome, PAGen@UIC, allows a user to identify putative target genes for vertebrate transcription factors through the analysis of their upstream sequences. The interface allows the user to search the human and mouse promoter data bases for potential target genes containing one or more listed transcription factor binding sites (TFBSs) in their upstream elements, using either regular expression-based consensus or position weight matrices. The data base can also be searched for promoters harboring user-defined TFBSs given as a consensus or a position weight matrix. Furthermore, the user can retrieve putative promoter sequences for any given gene together with identified TFBSs located on its promoter. Orthologous promoters are also analyzed to determine conserved elements.

  7. Presentation Extensions of the SOAP

    NASA Technical Reports Server (NTRS)

    Carnright, Robert; Stodden, David; Coggi, John

    2009-01-01

    A set of extensions of the Satellite Orbit Analysis Program (SOAP) enables simultaneous and/or sequential presentation of information from multiple sources. SOAP is used in the aerospace community as a means of collaborative visualization and analysis of data on planned spacecraft missions. The following definitions of terms also describe the display modalities of SOAP as now extended: In SOAP terminology, View signifies an animated three-dimensional (3D) scene, two-dimensional still image, plot of numerical data, or any other visible display derived from a computational simulation or other data source; a) "Viewport" signifies a rectangular portion of a computer-display window containing a view; b) "Palette" signifies a collection of one or more viewports configured for simultaneous (split-screen) display in the same window; c) "Slide" signifies a palette with a beginning and ending time and an animation time step; and d) "Presentation" signifies a prescribed sequence of slides. For example, multiple 3D views from different locations can be crafted for simultaneous display and combined with numerical plots and other representations of data for both qualitative and quantitative analysis. The resulting sets of views can be temporally sequenced to convey visual impressions of a sequence of events for a planned mission.

  8. Mississippi Curriculum Framework for Brick, Block, and Stonemasonry (Program CIP: 46.0101--Mason and Tile Setter). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the brick, block, and stonemasonry program. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies for the…

  9. Mississippi Curriculum Framework for Postsecondary Child Development Technology Programs (CIP: 20.0201--Child Care & Guidance Workers & Mgr). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the child development technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies,…

  10. Mississippi Curriculum Framework for Fashion Marketing Technology (Program CIP: 08.0101--Apparel and Accessories Mkt. Op., Gen.). Postsecondary Programs.

    ERIC Educational Resources Information Center

    Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.

    This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the course sequences in the fashion marketing technology programs cluster. Presented in the introductory section are a description of the program and suggested course sequence. Section I lists baseline competencies,…

  11. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

    PubMed

    Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

    2017-10-01

    We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. An alternative splicing program promotes adipose tissue thermogenesis

    PubMed Central

    Vernia, Santiago; Edwards, Yvonne JK; Han, Myoung Sook; Cavanagh-Kyros, Julie; Barrett, Tamera; Kim, Jason K; Davis, Roger J

    2016-01-01

    Alternative pre-mRNA splicing expands the complexity of the transcriptome and controls isoform-specific gene expression. Whether alternative splicing contributes to metabolic regulation is largely unknown. Here we investigated the contribution of alternative splicing to the development of diet-induced obesity. We found that obesity-induced changes in adipocyte gene expression include alternative pre-mRNA splicing. Bioinformatics analysis associated part of this alternative splicing program with sequence specific NOVA splicing factors. This conclusion was confirmed by studies of mice with NOVA deficiency in adipocytes. Phenotypic analysis of the NOVA-deficient mice demonstrated increased adipose tissue thermogenesis and improved glycemia. We show that NOVA proteins mediate a splicing program that suppresses adipose tissue thermogenesis. Together, these data provide quantitative analysis of gene expression at exon-level resolution in obesity and identify a novel mechanism that contributes to the regulation of adipose tissue function and the maintenance of normal glycemia. DOI: http://dx.doi.org/10.7554/eLife.17672.001 PMID:27635635

  13. MEGANTE: A Web-Based System for Integrated Plant Genome Annotation

    PubMed Central

    Numa, Hisataka; Itoh, Takeshi

    2014-01-01

    The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915

  14. BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations

    PubMed Central

    Wang, Junbai; Batmanov, Kirill

    2015-01-01

    Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972

  15. rpoB Gene Sequence-Based Identification of Aerobic Gram-Positive Cocci of the Genera Streptococcus, Enterococcus, Gemella, Abiotrophia, and Granulicatella

    PubMed Central

    Drancourt, Michel; Roux, Véronique; Fournier, Pierre-Edouard; Raoult, Didier

    2004-01-01

    We developed a new molecular tool based on rpoB gene (encoding the beta subunit of RNA polymerase) sequencing to identify streptococci. We first sequenced the complete rpoB gene for Streptococcus anginosus, S. equinus, and Abiotrophia defectiva. Sequences were aligned with these of S. pyogenes, S. agalactiae, and S. pneumoniae available in GenBank. Using an in-house analysis program (SVARAP), we identified a 740-bp variable region surrounded by conserved, 20-bp zones and, by using these conserved zones as PCR primer targets, we amplified and sequenced this variable region in an additional 30 Streptococcus, Enterococcus, Gemella, Granulicatella, and Abiotrophia species. This region exhibited 71.2 to 99.3% interspecies homology. We therefore applied our identification system by PCR amplification and sequencing to a collection of 102 streptococci and 60 bacterial isolates belonging to other genera. Amplicons were obtained in streptococci and Bacillus cereus, and sequencing allowed us to make a correct identification of streptococci. Molecular signatures were determined for the discrimination of closely related species within the S. pneumoniae-S. oralis-S. mitis group and the S. agalactiae-S. difficile group. These signatures allowed us to design a S. pneumoniae-specific PCR and sequencing primer pair. PMID:14766807

  16. Wheat EST resources for functional genomics of abiotic stress

    PubMed Central

    Houde, Mario; Belcaid, Mahdi; Ouellet, François; Danyluk, Jean; Monroy, Antonio F; Dryanova, Ani; Gulick, Patrick; Bergeron, Anne; Laroche, André; Links, Matthew G; MacCarthy, Luke; Crosby, William L; Sarhan, Fathey

    2006-01-01

    Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS) project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals. PMID:16772040

  17. Analysis of the distal gut bacterial community by 454-pyrosequencing in captive giraffes (Giraffa camelopardalis).

    PubMed

    AlZahal, Ousama; Valdes, Eduardo V; McBride, Brian W

    2016-01-01

    The objective of this study was to characterize the structure of the fecal bacterial community of five giraffes (Giraffa camelopardalis) at Disney's Animal Kingdom, FL. Fecal genomic DNA was extracted and variable regions 1-3 of the 16S rRNA gene was PCR-amplified and then sequenced. The MOTHUR software-program was used for sequence processing, diversity analysis, and classification. A total of 181,689 non-chimeric bacterial sequences were obtained, and average number of sequences per sample was 36,338 -± 8,818. Sequences were assigned to 8,284 operational taxonomic units (OTU) with 95% of genetic similarity, which included 2,942 singletons (36%). Number of OTUs per sample was 2,554 ± 264. Samples were normalized and alpha (intra-sample) diversity indices; Chao1, Inverse Simpson, Shannon, and coverage were estimated as 3,712 ± 430, 116 -± 70, 6.1 ± 0.4, and 96 ± 1%, respectively. Thirteen phyla were detected and Firmicutes, Bacteroidetes, and Spirochaetes were the most dominant phyla (more than 2% of total sequences), and constituted 92% of the classified sequences, 66% of total sequences, and 43% of total OTUs. Our computation predicted that three OTUs were likely to be present in at least three of the five samples at greater than 1% dominance rate. These OTUs were Treponema, an unidentified OTU belonging to the order Bacteroidales, and Ruminococcus. This report was the first to characterize the bacterial community of the distal gut in giraffes utilizing fecal samples, and it demonstrated that the distal gut of giraffes is likely a potential reservoir for a number of undocumented species of bacteria. © 2015 Wiley Periodicals, Inc.

  18. [Molecular epidemiological study on HIV/AIDS under the follow-up program in Zhejiang province in 2009].

    PubMed

    Zhang, Jia-feng; Pan, Xiao-hong; Ding, Xiao-bei; Chen, Lin; Guo, Zhi-hong; Xu, Yun; Huang, Jing-jing

    2013-01-01

    To analyze the molecular epidemiological characteristics on HIV infectors/AIDS patients (HIV/AIDS) under a follow-up program in Zhejiang province in 2009. 303 cases were randomly sampled. Information on the cases was collected and followed by genomic DNA extraction. Gag gene fragments were amplified by nested PCR, followed by sequencing and bio-informatic analysis. The rate of success for sequence acquisition was 74.3% (225/303). Distributions of HIV subtypes were as follows: CRF01_AE (58.7%), CRF07_BC (13.8%), CRF08_BC (9.8%), B' (15.1%), C (1.8%), G (0.4%) and unassigned BC (unique recombinant form 0.4%). from the HIV BLAST analysis showed that the sources of strains with the highest homology involved in 10 provinces/municipalities (Liaoning, Guangxi, Yunnan, Henan, etc.) and five other countries (Thailand, Vietnam, India, South Africa and Libya). The CRF01_AE phylogenetic tree was divided into four clusters. The sequences of HIV/AIDS with homosexual transmission showed a gather in cluster 1, and mix with those infected through heterosexual contact. Circulating recombinant forms of HIV seemed to play a dominant role in Zhejiang province. Unique recombinant form and new subtype of HIV were found. People living with HIV under homosexual transmission and heterosexual transmission had a trend of interwoven with each other. Increase of both the diversity and complexity of HIV strains were also noticed in Zhejiang province.

  19. Air Force Dynamic Mechanical Analysis of NATO Round Robin Propellant Testing for Development of AOP-4717

    DTIC Science & Technology

    2015-09-23

    Round Robin Propellant Testing for Development of AOP-4717 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S...area code) N/A Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std. 239.18 0 Air Force Dynamic Mechanical Analysis of NATO Round Robin ...the clamps are tight at the coldest temperature. • Long tests such as the frequency sweep sequences prescribed in this round robin may be

  20. Structural system reliability calculation using a probabilistic fault tree analysis method

    NASA Technical Reports Server (NTRS)

    Torng, T. Y.; Wu, Y.-T.; Millwater, H. R.

    1992-01-01

    The development of a new probabilistic fault tree analysis (PFTA) method for calculating structural system reliability is summarized. The proposed PFTA procedure includes: developing a fault tree to represent the complex structural system, constructing an approximation function for each bottom event, determining a dominant sampling sequence for all bottom events, and calculating the system reliability using an adaptive importance sampling method. PFTA is suitable for complicated structural problems that require computer-intensive computer calculations. A computer program has been developed to implement the PFTA.

  1. Development of Genomic Simple Sequence Repeats (SSR) by Enrichment Libraries in Date Palm.

    PubMed

    Al-Faifi, Sulieman A; Migdadi, Hussein M; Algamdi, Salem S; Khan, Mohammad Altaf; Al-Obeed, Rashid S; Ammar, Megahed H; Jakse, Jerenj

    2017-01-01

    Development of highly informative markers such as simple sequence repeats (SSR) for cultivar identification and germplasm characterization and management is essential for date palms genetic studies. The present study documents the development of SSR markers and assesses genetic relationships of commonly grown date palm (Phoenix dactylifera L.) cultivars in different geographical regions of Saudi Arabia. A total of 93 novel simple sequence repeat (SSR) markers were screened for their ability to detect polymorphism in date palm. Around 71% of genomic SSRs are dinucleotide, 25% trinucleotide, 3% tetranucleotide, and 1% pentanucleotide motives and show 100% polymorphism. The Unweighted Pair Group Method with Arithmetic Mean (UPGMA) cluster analysis illustrates that cultivars trend to group according to their class of maturity, region of cultivation, and fruit color. Analysis of molecular variations (AMOVA) reveals genetic variation among and within cultivars of 27% and 73%, respectively, according to the geographical distribution of the cultivars. Developed microsatellite markers are of additional value to date palm characterization, tools which can be used by researchers in population genetics, cultivar identification, as well as genetic resource exploration and management. The cultivars tested exhibited a significant amount of genetic diversity and could be suitable for successful breeding programs. Genomic sequences generated from this study are available at the National Center for Biotechnology Information (NCBI), Sequence Read Archive (Accession numbers. LIBGSS_039019).

  2. Terminal Restriction Fragment Length Polymorphism Analysis Program, a Web-Based Research Tool for Microbial Community Analysis

    PubMed Central

    Marsh, Terence L.; Saxman, Paul; Cole, James; Tiedje, James

    2000-01-01

    Rapid analysis of microbial communities has proven to be a difficult task. This is due, in part, to both the tremendous diversity of the microbial world and the high complexity of many microbial communities. Several techniques for community analysis have emerged over the past decade, and most take advantage of the molecular phylogeny derived from 16S rRNA comparative sequence analysis. We describe a web-based research tool located at the Ribosomal Database Project web site (http://www.cme.msu.edu/RDP/html/analyses.html) that facilitates microbial community analysis using terminal restriction fragment length polymorphism of 16S ribosomal DNA. The analysis function (designated TAP T-RFLP) permits the user to perform in silico restriction digestions of the entire 16S sequence database and derive terminal restriction fragment sizes, measured in base pairs, from the 5′ terminus of the user-specified primer to the 3′ terminus of the restriction endonuclease target site. The output can be sorted and viewed either phylogenetically or by size. It is anticipated that the site will guide experimental design as well as provide insight into interpreting results of community analysis with terminal restriction fragment length polymorphisms. PMID:10919828

  3. Performance comparison of the Prophecy (forecasting) Algorithm in FFT form for unseen feature and time-series prediction

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger; Handley, James

    2013-06-01

    We introduce a generalized numerical prediction and forecasting algorithm. We have previously published it for malware byte sequence feature prediction and generalized distribution modeling for disparate test article analysis. We show how non-trivial non-periodic extrapolation of a numerical sequence (forecast and backcast) from the starting data is possible. Our ancestor-progeny prediction can yield new options for evolutionary programming. Our equations enable analytical integrals and derivatives to any order. Interpolation is controllable from smooth continuous to fractal structure estimation. We show how our generalized trigonometric polynomial can be derived using a Fourier transform.

  4. Knowledge-based decision support for Space Station assembly sequence planning

    NASA Astrophysics Data System (ADS)

    1991-04-01

    A complete Personal Analysis Assistant (PAA) for Space Station Freedom (SSF) assembly sequence planning consists of three software components: the system infrastructure, intra-flight value added, and inter-flight value added. The system infrastructure is the substrate on which software elements providing inter-flight and intra-flight value-added functionality are built. It provides the capability for building representations of assembly sequence plans and specification of constraints and analysis options. Intra-flight value-added provides functionality that will, given the manifest for each flight, define cargo elements, place them in the National Space Transportation System (NSTS) cargo bay, compute performance measure values, and identify violated constraints. Inter-flight value-added provides functionality that will, given major milestone dates and capability requirements, determine the number and dates of required flights and develop a manifest for each flight. The current project is Phase 1 of a projected two phase program and delivers the system infrastructure. Intra- and inter-flight value-added were to be developed in Phase 2, which has not been funded. Based on experience derived from hundreds of projects conducted over the past seven years, ISX developed an Intelligent Systems Engineering (ISE) methodology that combines the methods of systems engineering and knowledge engineering to meet the special systems development requirements posed by intelligent systems, systems that blend artificial intelligence and other advanced technologies with more conventional computing technologies. The ISE methodology defines a phased program process that begins with an application assessment designed to provide a preliminary determination of the relative technical risks and payoffs associated with a potential application, and then moves through requirements analysis, system design, and development.

  5. Knowledge-based decision support for Space Station assembly sequence planning

    NASA Technical Reports Server (NTRS)

    1991-01-01

    A complete Personal Analysis Assistant (PAA) for Space Station Freedom (SSF) assembly sequence planning consists of three software components: the system infrastructure, intra-flight value added, and inter-flight value added. The system infrastructure is the substrate on which software elements providing inter-flight and intra-flight value-added functionality are built. It provides the capability for building representations of assembly sequence plans and specification of constraints and analysis options. Intra-flight value-added provides functionality that will, given the manifest for each flight, define cargo elements, place them in the National Space Transportation System (NSTS) cargo bay, compute performance measure values, and identify violated constraints. Inter-flight value-added provides functionality that will, given major milestone dates and capability requirements, determine the number and dates of required flights and develop a manifest for each flight. The current project is Phase 1 of a projected two phase program and delivers the system infrastructure. Intra- and inter-flight value-added were to be developed in Phase 2, which has not been funded. Based on experience derived from hundreds of projects conducted over the past seven years, ISX developed an Intelligent Systems Engineering (ISE) methodology that combines the methods of systems engineering and knowledge engineering to meet the special systems development requirements posed by intelligent systems, systems that blend artificial intelligence and other advanced technologies with more conventional computing technologies. The ISE methodology defines a phased program process that begins with an application assessment designed to provide a preliminary determination of the relative technical risks and payoffs associated with a potential application, and then moves through requirements analysis, system design, and development.

  6. Whole exome sequencing is an efficient, sensitive and specific method of mutation detection in osteogenesis imperfecta and Marfan syndrome

    PubMed Central

    McInerney-Leo, Aideen M; Marshall, Mhairi S; Gardiner, Brooke; Coucke, Paul J; Van Laer, Lut; Loeys, Bart L; Summers, Kim M; Symoens, Sofie; West, Jennifer A; West, Malcolm J; Paul Wordsworth, B; Zankl, Andreas; Leo, Paul J; Brown, Matthew A; Duncan, Emma L

    2013-01-01

    Osteogenesis imperfecta (OI) and Marfan syndrome (MFS) are common Mendelian disorders. Both conditions are usually diagnosed clinically, as genetic testing is expensive due to the size and number of potentially causative genes and mutations. However, genetic testing may benefit patients, at-risk family members and individuals with borderline phenotypes, as well as improving genetic counseling and allowing critical differential diagnoses. We assessed whether whole exome sequencing (WES) is a sensitive method for mutation detection in OI and MFS. WES was performed on genomic DNA from 13 participants with OI and 10 participants with MFS who had known mutations, with exome capture followed by massive parallel sequencing of multiplexed samples. Single nucleotide polymorphisms (SNPs) and small indels were called using Genome Analysis Toolkit (GATK) and annotated with ANNOVAR. CREST, exomeCopy and exomeDepth were used for large deletion detection. Results were compared with the previous data. Specificity was calculated by screening WES data from a control population of 487 individuals for mutations in COL1A1, COL1A2 and FBN1. The target capture of five exome capture platforms was compared. All 13 mutations in the OI cohort and 9/10 in the MFS cohort were detected (sensitivity=95.6%) including non-synonymous SNPs, small indels (<10 bp), and a large UTR5/exon 1 deletion. One mutation was not detected by GATK due to strand bias. Specificity was 99.5%. Capture platforms and analysis programs differed considerably in their ability to detect mutations. Consumable costs for WES were low. WES is an efficient, sensitive, specific and cost-effective method for mutation detection in patients with OI and MFS. Careful selection of platform and analysis programs is necessary to maximize success. PMID:24501682

  7. Comparative genomics of phylogenetically diverse unicellular eukaryotes provide new insights into the genetic basis for the evolution of the programmed cell death machinery.

    PubMed

    Nedelcu, Aurora M

    2009-03-01

    Programmed cell death (PCD) represents a significant component of normal growth and development in multicellular organisms. Recently, PCD-like processes have been reported in single-celled eukaryotes, implying that some components of the PCD machinery existed early in eukaryotic evolution. This study provides a comparative analysis of PCD-related sequences across more than 50 unicellular genera from four eukaryotic supergroups: Unikonts, Excavata, Chromalveolata, and Plantae. A complex set of PCD-related sequences that correspond to domains or proteins associated with all main functional classes--from ligands and receptors to executors of PCD--was found in many unicellular lineages. Several PCD domains and proteins previously thought to be restricted to animals or land plants are also present in unicellular species. Noteworthy, the yeast, Saccharomyces cerevisiae--used as an experimental model system for PCD research, has a rather reduced set of PCD-related sequences relative to other unicellular species. The phylogenetic distribution of the PCD-related sequences identified in unicellular lineages suggests that the genetic basis for the evolution of the complex PCD machinery present in extant multicellular lineages has been established early in the evolution of eukaryotes. The shaping of the PCD machinery in multicellular lineages involved the duplication, co-option, recruitment, and shuffling of domains already present in their unicellular ancestors.

  8. HRV Analysis to Identify Stages of Home-based Telerehabilitation Exercise.

    PubMed

    Jeong, In Cheol; Finkelstein, Joseph

    2014-01-01

    Spectral analysis of heart rate variability (HRV) has been widely used to investigate activity of autonomous nervous system. Previous studies demonstrated potential of analysis of short-term sequences of heart rate data in a time domain for continuous monitoring of levels of physiological stress however the value of HRV parameters in frequency domain for monitoring cycling exercise has not been established. The goal of this study was to assess whether HRV parameters in frequency domain differ depending on a stage of cycling exercise. We compared major HRV parameters in high, low and very low frequency ranges during rest, height of exercise, and recovery during cycling exercise. Our results indicated responsiveness of frequency-domain indices to different phases of cycling exercise program and their potential in monitoring autonomic balance and stress levels as a part of a tailored home-based telerehabilitation program.

  9. Developmental assessment of the Fort St. Vrain version of the Composite HTGR Analysis Program (CHAP-2)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stroh, K.R.

    1980-01-01

    The Composite HTGR Analysis Program (CHAP) consists of a model-independent systems analysis mainframe named LASAN and model-dependent linked code modules, each representing a component, subsystem, or phenomenon of an HTGR plant. The Fort St. Vrain (FSV) version (CHAP-2) includes 21 coded modules that model the neutron kinetics and thermal response of the core; the thermal-hydraulics of the reactor primary coolant system, secondary steam supply system, and balance-of-plant; the actions of the control system and plant protection system; the response of the reactor building; and the relative hazard resulting from fuel particle failure. FSV steady-state and transient plant data are beingmore » used to partially verify the component modeling and dynamic smulation techniques used to predict plant response to postulated accident sequences.« less

  10. MaxAlign: maximizing usable data in an alignment.

    PubMed

    Gouveia-Oliveira, Rodrigo; Sackett, Peter W; Pedersen, Anders G

    2007-08-28

    The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns - the alignment area - by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package.

  11. The Integration of Nutrition Education in the Basic Biomedical Sciences

    ERIC Educational Resources Information Center

    Raw, Isaias

    1977-01-01

    At the Center for Biomedical Education at the City University of New York, nutrition is integrated into the chemistry-biochemistry sequence of a six-year B.S.-M.D. program. Students perform an actual analysis of a sample of their own food, learning basic techniques and concepts, and also carry on experiments with rats on other diets. (Editor/LBH)

  12. A computer program for fast and easy typing of partial endoglucanase gene sequence into phylotypes and sequevars 1&2 (select agents) of Ralstonia solanacearum

    USDA-ARS?s Scientific Manuscript database

    The phytopathogen Ralstonia solanacearum is a species complex that contains a subset of strains that are quarantined or select agent pathogens. An unidentified R. solanacearum strain is considered a select agent in the US until proven otherwise, which can be done by phylogenetic analysis of a partia...

  13. Use of vectors in sequence analysis.

    PubMed

    Ishikawa, T; Yamamoto, K; Yoshikura, H

    1987-10-01

    Applications of the vector diagram, a new type of representation of protein structure, in homology search of various proteins including oncogene products are presented. The method takes account of various kinds of information concerning the properties of amino acids, such as Chou and Fasman's probability data. The method can detect conformational similarities of proteins which may not be detected by the conventional programs.

  14. A genome-wide screening of BEL-Pao like retrotransposons in Anopheles gambiae by the LTR_STRUC program.

    PubMed

    Marsano, Renè Massimiliano; Caizzi, Ruggiero

    2005-09-12

    The advanced status of assembly of the nematoceran Anopheles gambiae genomic sequence allowed us to perform a wide genome analysis to looking at the presence of Long Terminal Repeats (LTRs) in the range of 10 kb by means of the LTR_STRUC tool. More than three hundred sequences were retrieved and 210 were treated as putative complete retrotransposons that were individually analysed with respect to known retrotransposons of A. gambiae and D. melanogaster. The results show that the vast majority of the retrotransposons analysed belong to the Ty3/gypsy class and only 8% to the Ty1/copia class. In addition, phylogenetic analysis allowed us to characterize in more detail the relationship of a large BEL-Pao lineage in which a single family was shown to harbour an additional env gene.

  15. Genetic diversity and population structure analysis of spinach by single-nucleotide polymorphisms identified through genotyping-by-sequencing.

    PubMed

    Shi, Ainong; Qin, Jun; Mou, Beiquan; Correll, James; Weng, Yuejin; Brenner, David; Feng, Chunda; Motes, Dennis; Yang, Wei; Dong, Lingdi; Bhattarai, Gehendra; Ravelombola, Waltram

    2017-01-01

    Spinach (Spinacia oleracea L., 2n = 2x = 12) is an economically important vegetable crop worldwide and one of the healthiest vegetables due to its high concentrations of nutrients and minerals. The objective of this research was to conduct genetic diversity and population structure analysis of a collection of world-wide spinach genotypes using single nucleotide polymorphisms (SNPs) markers. Genotyping by sequencing (GBS) was used to discover SNPs in spinach genotypes. Three sets of spinach genotypes were used: 1) 268 USDA GRIN spinach germplasm accessions originally collected from 30 countries; 2) 45 commercial spinach F1 hybrids from three countries; and 3) 30 US Arkansas spinach cultivars/breeding lines. The results from this study indicated that there was genetic diversity among the 343 spinach genotypes tested. Furthermore, the genetic background in improved commercial F1 hybrids and in Arkansas cultivars/lines had a different structured populations from the USDA germplasm. In addition, the genetic diversity and population structures were associated with geographic origin and germplasm from the US Arkansas breeding program had a unique genetic background. These data could provide genetic diversity information and the molecular markers for selecting parents in spinach breeding programs.

  16. Genetic diversity and population structure analysis of spinach by single-nucleotide polymorphisms identified through genotyping-by-sequencing

    PubMed Central

    Qin, Jun; Mou, Beiquan; Correll, James; Weng, Yuejin; Brenner, David; Feng, Chunda; Motes, Dennis; Yang, Wei; Dong, Lingdi; Bhattarai, Gehendra; Ravelombola, Waltram

    2017-01-01

    Spinach (Spinacia oleracea L., 2n = 2x = 12) is an economically important vegetable crop worldwide and one of the healthiest vegetables due to its high concentrations of nutrients and minerals. The objective of this research was to conduct genetic diversity and population structure analysis of a collection of world-wide spinach genotypes using single nucleotide polymorphisms (SNPs) markers. Genotyping by sequencing (GBS) was used to discover SNPs in spinach genotypes. Three sets of spinach genotypes were used: 1) 268 USDA GRIN spinach germplasm accessions originally collected from 30 countries; 2) 45 commercial spinach F1 hybrids from three countries; and 3) 30 US Arkansas spinach cultivars/breeding lines. The results from this study indicated that there was genetic diversity among the 343 spinach genotypes tested. Furthermore, the genetic background in improved commercial F1 hybrids and in Arkansas cultivars/lines had a different structured populations from the USDA germplasm. In addition, the genetic diversity and population structures were associated with geographic origin and germplasm from the US Arkansas breeding program had a unique genetic background. These data could provide genetic diversity information and the molecular markers for selecting parents in spinach breeding programs. PMID:29190770

  17. Refining and end use study of coal liquids II - linear programming analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lowe, C.; Tam, S.

    1995-12-31

    A DOE-funded study is underway to determine the optimum refinery processing schemes for producing transportation fuels that will meet CAAA regulations from direct and indirect coal liquids. The study consists of three major parts: pilot plant testing of critical upgrading processes, linear programming analysis of different processing schemes, and engine emission testing of final products. Currently, fractions of a direct coal liquid produced form bituminous coal are being tested in sequence of pilot plant upgrading processes. This work is discussed in a separate paper. The linear programming model, which is the subject of this paper, has been completed for themore » petroleum refinery and is being modified to handle coal liquids based on the pilot plant test results. Preliminary coal liquid evaluation studies indicate that, if a refinery expansion scenario is adopted, then the marginal value of the coal liquid (over the base petroleum crude) is $3-4/bbl.« less

  18. JavaScript DNA translator: DNA-aligned protein translations.

    PubMed

    Perry, William L

    2002-12-01

    There are many instances in molecular biology when it is necessary to identify ORFs in a DNA sequence. While programs exist for displaying protein translations in multiple ORFs in alignment with a DNA sequence, they are often expensive, exist as add-ons to software that must be purchased, or are only compatible with a particular operating system. JavaScript DNA Translator is a shareware application written in JavaScript, a scripting language interpreted by the Netscape Communicator and Internet Explorer Web browsers, which makes it compatible with several different operating systems. While the program uses a familiar Web page interface, it requires no connection to the Internet since calculations are performed on the user's own computer. The program analyzes one or multiple DNA sequences and generates translations in up to six reading frames aligned to a DNA sequence, in addition to displaying translations as separate sequences in FASTA format. ORFs within a reading frame can also be displayed as separate sequences. Flexible formatting options are provided, including the ability to hide ORFs below a minimum size specified by the user. The program is available free of charge at the BioTechniques Software Library (www.Biotechniques.com).

  19. Genetic diversity analysis in Malaysian giant prawns using expressed sequence tag microsatellite markers for stock improvement program.

    PubMed

    Atin, K H; Christianus, A; Fatin, N; Lutas, A C; Shabanimofrad, M; Subha, B

    2017-08-17

    The Malaysian giant prawn is among the most commonly cultured species of the genus Macrobrachium. Stocks of giant prawns from four rivers in Peninsular Malaysia have been used for aquaculture over the past 25 years, which has led to repeated harvesting, restocking, and transplantation between rivers. Consequently, a stock improvement program is now important to avoid the depletion of wild stocks and the loss of genetic diversity. However, the success of such an improvement program depends on our knowledge of the genetic variation of these base populations. The aim of the current study was to estimate genetic variation and differentiation of these riverine sources using novel expressed sequence tag-microsatellite (EST-SSR) markers, which not only are informative on genetic diversity but also provide information on immune and metabolic traits. Our findings indicated that the tested stocks have inbreeding depression due to a significant deficiency in heterozygotes, and F IS was estimated as 0.15538 to 0.31938. An F-statistics analysis suggested that the stocks are composed of one large panmictic population. Among the four locations, stocks from Johor, in the southern region of the peninsular, showed higher allelic and genetic diversity than the other stocks. To overcome inbreeding problems, the Johor population could be used as a base population in a stock improvement program by crossing to the other populations. The study demonstrated that EST-SSR markers can be incorporated in future marker assisted breeding to aid the proper management of the stocks by breeders and stakeholders in Malaysia.

  20. Pulseq: A rapid and hardware-independent pulse sequence prototyping framework.

    PubMed

    Layton, Kelvin J; Kroboth, Stefan; Jia, Feng; Littin, Sebastian; Yu, Huijun; Leupold, Jochen; Nielsen, Jon-Fredrik; Stöcker, Tony; Zaitsev, Maxim

    2017-04-01

    Implementing new magnetic resonance experiments, or sequences, often involves extensive programming on vendor-specific platforms, which can be time consuming and costly. This situation is exacerbated when research sequences need to be implemented on several platforms simultaneously, for example, at different field strengths. This work presents an alternative programming environment that is hardware-independent, open-source, and promotes rapid sequence prototyping. A novel file format is described to efficiently store the hardware events and timing information required for an MR pulse sequence. Platform-dependent interpreter modules convert the file to appropriate instructions to run the sequence on MR hardware. Sequences can be designed in high-level languages, such as MATLAB, or with a graphical interface. Spin physics simulation tools are incorporated into the framework, allowing for comparison between real and virtual experiments. Minimal effort is required to implement relatively advanced sequences using the tools provided. Sequences are executed on three different MR platforms, demonstrating the flexibility of the approach. A high-level, flexible and hardware-independent approach to sequence programming is ideal for the rapid development of new sequences. The framework is currently not suitable for large patient studies or routine scanning although this would be possible with deeper integration into existing workflows. Magn Reson Med 77:1544-1552, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

  1. Monitoring Error Rates In Illumina Sequencing.

    PubMed

    Manley, Leigh J; Ma, Duanduan; Levine, Stuart S

    2016-12-01

    Guaranteeing high-quality next-generation sequencing data in a rapidly changing environment is an ongoing challenge. The introduction of the Illumina NextSeq 500 and the depreciation of specific metrics from Illumina's Sequencing Analysis Viewer (SAV; Illumina, San Diego, CA, USA) have made it more difficult to determine directly the baseline error rate of sequencing runs. To improve our ability to measure base quality, we have created an open-source tool to construct the Percent Perfect Reads (PPR) plot, previously provided by the Illumina sequencers. The PPR program is compatible with HiSeq 2000/2500, MiSeq, and NextSeq 500 instruments and provides an alternative to Illumina's quality value (Q) scores for determining run quality. Whereas Q scores are representative of run quality, they are often overestimated and are sourced from different look-up tables for each platform. The PPR's unique capabilities as a cross-instrument comparison device, as a troubleshooting tool, and as a tool for monitoring instrument performance can provide an increase in clarity over SAV metrics that is often crucial for maintaining instrument health. These capabilities are highlighted.

  2. Whole Wiskott‑Aldrich syndrome protein gene deletion identified by high throughput sequencing.

    PubMed

    He, Xiangling; Zou, Runying; Zhang, Bing; You, Yalan; Yang, Yang; Tian, Xin

    2017-11-01

    Wiskott‑Aldrich syndrome (WAS) is a rare X‑linked recessive immunodeficiency disorder, characterized by thrombocytopenia, small platelets, eczema and recurrent infections associated with increased risk of autoimmunity and malignancy disorders. Mutations in the WAS protein (WASP) gene are responsible for WAS. To date, WASP mutations, including missense/nonsense, splicing, small deletions, small insertions, gross deletions, and gross insertions have been identified in patients with WAS. In addition, WASP‑interacting proteins are suspected in patients with clinical features of WAS, in whom the WASP gene sequence and mRNA levels are normal. The present study aimed to investigate the application of next generation sequencing in definitive diagnosis and clinical therapy for WAS. A 5 month‑old child with WAS who displayed symptoms of thrombocytopenia was examined. Whole exome sequence analysis of genomic DNA showed that the coverage and depth of WASP were extremely low. Quantitative polymerase chain reaction indicated total WASP gene deletion in the proband. In conclusion, high throughput sequencing is useful for the verification of WAS on the genetic profile, and has implications for family planning guidance and establishment of clinical programs.

  3. GenBank.

    PubMed

    Benson, Dennis A; Karsch-Mizrachi, Ilene; Lipman, David J; Ostell, James; Sayers, Eric W

    2011-01-01

    GenBank® is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI Homepage: www.ncbi.nlm.nih.gov.

  4. The complete mitochondrial genome of the Anabas testudineus (Perciformes, Anabantidae) and its comparison with other related fish species.

    PubMed

    Behera, Bijay Kumar; Baisvar, Vishwamitra Singh; Kumari, Kavita; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Rao, A R; Rai, Anil

    2017-03-01

    In the present study, the complete mitochondrial genome sequence of Anabas testudineusis reported using PGM sequencer (Ion Torrent, Life Technologies, La Jolla, CA). The complete mitogenome of climbing perch, A. testudineusis obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP), which is 16 603 bp in length. The mitogenome of A. testudineus composed of 13 protein- coding genes, two rRNA, and 22 tRNAs. Here, 20 tRNAs genes showed typical clover leaf model, and D-Loop as the control region along with gene order and organization, being closely similar to Osphronemidae and most of other Perciformes fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of earlier reported A. testudineus. The phylogenetic analysis of Anabantidae depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of A. testudineus would be helpful in understanding the population genetics, phylogenetics, and evolution of Anabantidae.

  5. A multiple-alignment based primer design algorithm for genetically highly variable DNA targets

    PubMed Central

    2013-01-01

    Background Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to population diversity and primer localization in relatively conserved regions, in addition to recognized constraints typically considered in primer design. Results Design constraints include degenerate sites to maximize population coverage, matching of melting temperatures, optimizing de novo sequence length, finding optimal bio-barcodes to allow efficient downstream analyses, and minimizing risk of dimerization. To facilitate primer design addressing these and other constraints, we created a novel computer program (PrimerDesign) that automates this complex procedure. We show its powers and limitations and give examples of successful designs for the analysis of HIV-1 populations. Conclusions PrimerDesign is useful for researchers who want to design DNA primers and probes for analyzing highly variable DNA populations. It can be used to design primers for PCR, RT-PCR, Sanger sequencing, next-generation sequencing, and other experimental protocols targeting highly variable DNA samples. PMID:23965160

  6. DECIPHER, a Search-Based Approach to Chimera Identification for 16S rRNA Sequences

    PubMed Central

    Wright, Erik S.; Yilmaz, L. Safak

    2012-01-01

    DECIPHER is a new method for finding 16S rRNA chimeric sequences by the use of a search-based approach. The method is based upon detecting short fragments that are uncommon in the phylogenetic group where a query sequence is classified but frequently found in another phylogenetic group. The algorithm was calibrated for full sequences (fs_DECIPHER) and short sequences (ss_DECIPHER) and benchmarked against WigeoN (Pintail), ChimeraSlayer, and Uchime using artificially generated chimeras. Overall, ss_DECIPHER and Uchime provided the highest chimera detection for sequences 100 to 600 nucleotides long (79% and 81%, respectively), but Uchime's performance deteriorated for longer sequences, while ss_DECIPHER maintained a high detection rate (89%). Both methods had low false-positive rates (1.3% and 1.6%). The more conservative fs_DECIPHER, benchmarked only for sequences longer than 600 nucleotides, had an overall detection rate lower than that of ss_DECIPHER (75%) but higher than those of the other programs. In addition, fs_DECIPHER had the lowest false-positive rate among all the benchmarked programs (<0.20%). DECIPHER was outperformed only by ChimeraSlayer and Uchime when chimeras were formed from closely related parents (less than 10% divergence). Given the differences in the programs, it was possible to detect over 89% of all chimeras with just the combination of ss_DECIPHER and Uchime. Using fs_DECIPHER, we detected between 1% and 2% additional chimeras in the RDP, SILVA, and Greengenes databases from which chimeras had already been removed with Pintail or Bellerophon. DECIPHER was implemented in the R programming language and is directly accessible through a webpage or by downloading the program as an R package (http://DECIPHER.cee.wisc.edu). PMID:22101057

  7. obitools: a unix-inspired software package for DNA metabarcoding.

    PubMed

    Boyer, Frédéric; Mercier, Céline; Bonin, Aurélie; Le Bras, Yvan; Taberlet, Pierre; Coissac, Eric

    2016-01-01

    DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next-generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor-made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools. A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org. © 2015 John Wiley & Sons Ltd.

  8. Whole Genome Sequence Analysis of Salmonella Typhi Isolated in Thailand before and after the Introduction of a National Immunization Program

    PubMed Central

    Thanh, Duy Pham; Bodhidatta, Ladaporn; Mason, Carl Jeffries; Srijan, Apichai; Rabaa, Maia A.; Vinh, Phat Voong; Thanh, Tuyen Ha; Thwaites, Guy E.; Baker, Stephen; Holt, Kathryn E.

    2017-01-01

    Vaccines against Salmonella Typhi, the causative agent of typhoid fever, are commonly used by travellers, however, there are few examples of national immunization programs in endemic areas. There is therefore a paucity of data on the impact of typhoid immunization programs on localised populations of S. Typhi. Here we have used whole genome sequencing (WGS) to characterise 44 historical bacterial isolates collected before and after a national typhoid immunization program that was implemented in Thailand in 1977 in response to a large outbreak; the program was highly effective in reducing typhoid case numbers. Thai isolates were highly diverse, including 10 distinct phylogenetic lineages or genotypes. Novel prophage and plasmids were also detected, including examples that were previously only reported in Shigella sonnei and Escherichia coli. The majority of S. Typhi genotypes observed prior to the immunization program were not observed following it. Post-vaccine era isolates were more closely related to S. Typhi isolated from neighbouring countries than to earlier Thai isolates, providing no evidence for the local persistence of endemic S. Typhi following the national immunization program. Rather, later cases of typhoid appeared to be caused by the occasional importation of common genotypes from neighbouring Vietnam, Laos, and Cambodia. These data show the value of WGS in understanding the impacts of vaccination on pathogen populations and provide support for the proposal that large-scale typhoid immunization programs in endemic areas could result in lasting local disease elimination, although larger prospective studies are needed to test this directly. PMID:28060810

  9. The NASTRAN user's manual

    NASA Technical Reports Server (NTRS)

    1983-01-01

    All information directly associated with problem solving using the NASTRAN program is presented. This structural analysis program uses the finite element approach to structural modeling wherein the distributed finite properties of a structure are represented by a finite element of structural elements which are interconnected at a finite number of grid points, to which loads are applied and for which displacements are calculated. Procedures are described for defining and loading a structural model. Functional references for every card used for structural modeling, the NASTRAN data deck and control cards, problem solution sequences (rigid formats), using the plotting capability, writing a direct matrix abstraction program, and diagnostic messages are explained. A dictionary of mnemonics, acronyms, phrases, and other commonly used NASTRAN terms is included.

  10. Molecular phylogeny of some avian species using Cytochrome b gene sequence analysis

    PubMed Central

    Awad, A; Khalil, S. R; Abd-Elhakim, Y. M

    2015-01-01

    Veritable identification and differentiation of avian species is a vital step in conservative, taxonomic, forensic, legal and other ornithological interventions. Therefore, this study involved the application of molecular approach to identify some avian species i.e. Chicken (Gallus gallus), Muskovy duck (Cairina moschata), Japanese quail (Coturnix japonica), Laughing dove (Streptopelia senegalensis), and Rock pigeon (Columba livia). Genomic DNA was extracted from blood samples and partial sequence of the mitochondrial cytochrome b gene (358 bp) was amplified and sequenced using universal primers. Sequences alignment and phylogenetic analyses were performed by CLC main workbench program. The obtained five sequences were deposited in GenBank and compared with those previously registered in GenBank. The similarity percentage was 88.60% between Gallus gallus and Coturnix japonica and 80.46% between Gallus gallus and Columba livia. The percentage of identity between the studied species and GenBank species ranged from 77.20% (Columba oenas and Anas platyrhynchos) to 100% (Gallus gallus and Gallus sonneratii, Coturnix coturnix and Coturnix japonica, Meleagris gallopavo and Columba livia). Amplification of the partial sequence of mitochondrial cytochrome b gene proved to be practical for identification of an avian species unambiguously. PMID:27175180

  11. Fast alignment-free sequence comparison using spaced-word frequencies.

    PubMed

    Leimeister, Chris-Andre; Boden, Marcus; Horwege, Sebastian; Lindner, Sebastian; Morgenstern, Burkhard

    2014-07-15

    Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. To reduce the statistical dependency between adjacent word matches, we propose to use 'spaced words', defined by patterns of 'match' and 'don't care' positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. Our program is freely available at http://spaced.gobics.de/. © The Author 2014. Published by Oxford University Press.

  12. MendeLIMS: a web-based laboratory information management system for clinical genome sequencing.

    PubMed

    Grimes, Susan M; Ji, Hanlee P

    2014-08-27

    Large clinical genomics studies using next generation DNA sequencing require the ability to select and track samples from a large population of patients through many experimental steps. With the number of clinical genome sequencing studies increasing, it is critical to maintain adequate laboratory information management systems to manage the thousands of patient samples that are subject to this type of genetic analysis. To meet the needs of clinical population studies using genome sequencing, we developed a web-based laboratory information management system (LIMS) with a flexible configuration that is adaptable to continuously evolving experimental protocols of next generation DNA sequencing technologies. Our system is referred to as MendeLIMS, is easily implemented with open source tools and is also highly configurable and extensible. MendeLIMS has been invaluable in the management of our clinical genome sequencing studies. We maintain a publicly available demonstration version of the application for evaluation purposes at http://mendelims.stanford.edu. MendeLIMS is programmed in Ruby on Rails (RoR) and accesses data stored in SQL-compliant relational databases. Software is freely available for non-commercial use at http://dna-discovery.stanford.edu/software/mendelims/.

  13. Common Bolted Joint Analysis Tool

    NASA Technical Reports Server (NTRS)

    Imtiaz, Kauser

    2011-01-01

    Common Bolted Joint Analysis Tool (comBAT) is an Excel/VB-based bolted joint analysis/optimization program that lays out a systematic foundation for an inexperienced or seasoned analyst to determine fastener size, material, and assembly torque for a given design. Analysts are able to perform numerous what-if scenarios within minutes to arrive at an optimal solution. The program evaluates input design parameters, performs joint assembly checks, and steps through numerous calculations to arrive at several key margins of safety for each member in a joint. It also checks for joint gapping, provides fatigue calculations, and generates joint diagrams for a visual reference. Optimum fastener size and material, as well as correct torque, can then be provided. Analysis methodology, equations, and guidelines are provided throughout the solution sequence so that this program does not become a "black box:" for the analyst. There are built-in databases that reduce the legwork required by the analyst. Each step is clearly identified and results are provided in number format, as well as color-coded spelled-out words to draw user attention. The three key features of the software are robust technical content, innovative and user friendly I/O, and a large database. The program addresses every aspect of bolted joint analysis and proves to be an instructional tool at the same time. It saves analysis time, has intelligent messaging features, and catches operator errors in real time.

  14. LabVIEW-based sequential-injection analysis system for the determination of trace metals by square-wave anodic and adsorptive stripping voltammetry on mercury-film electrodes.

    PubMed

    Economou, Anastasios; Voulgaropoulos, Anastasios

    2003-01-01

    The development of a dedicated automated sequential-injection analysis apparatus for anodic stripping voltammetry (ASV) and adsorptive stripping voltammetry (AdSV) is reported. The instrument comprised a peristaltic pump, a multiposition selector valve and a home-made potentiostat and used a mercury-film electrode as the working electrodes in a thin-layer electrochemical detector. Programming of the experimental sequence was performed in LabVIEW 5.1. The sequence of operations included formation of the mercury film, electrolytic or adsorptive accumulation of the analyte on the electrode surface, recording of the voltammetric current-potential response, and cleaning of the electrode. The stripping step was carried out by applying a square-wave (SW) potential-time excitation signal to the working electrode. The instrument allowed unattended operation since multiple-step sequences could be readily implemented through the purpose-built software. The utility of the analyser was tested for the determination of copper(II), cadmium(II), lead(II) and zinc(II) by SWASV and of nickel(II), cobalt(II) and uranium(VI) by SWAdSV.

  15. LabVIEW-based sequential-injection analysis system for the determination of trace metals by square-wave anodic and adsorptive stripping voltammetry on mercury-film electrodes

    PubMed Central

    Economou, Anastasios; Voulgaropoulos, Anastasios

    2003-01-01

    The development of a dedicated automated sequential-injection analysis apparatus for anodic stripping voltammetry (ASV) and adsorptive stripping voltammetry (AdSV) is reported. The instrument comprised a peristaltic pump, a multiposition selector valve and a home-made potentiostat and used a mercury-film electrode as the working electrodes in a thin-layer electrochemical detector. Programming of the experimental sequence was performed in LabVIEW 5.1. The sequence of operations included formation of the mercury film, electrolytic or adsorptive accumulation of the analyte on the electrode surface, recording of the voltammetric current-potential response, and cleaning of the electrode. The stripping step was carried out by applying a square-wave (SW) potential-time excitation signal to the working electrode. The instrument allowed unattended operation since multiple-step sequences could be readily implemented through the purpose-built software. The utility of the analyser was tested for the determination of copper(II), cadmium(II), lead(II) and zinc(II) by SWASV and of nickel(II), cobalt(II) and uranium(VI) by SWAdSV. PMID:18924623

  16. RNAstructure: software for RNA secondary structure prediction and analysis.

    PubMed

    Reuter, Jessica S; Mathews, David H

    2010-03-15

    To understand an RNA sequence's mechanism of action, the structure must be known. Furthermore, target RNA structure is an important consideration in the design of small interfering RNAs and antisense DNA oligonucleotides. RNA secondary structure prediction, using thermodynamics, can be used to develop hypotheses about the structure of an RNA sequence. RNAstructure is a software package for RNA secondary structure prediction and analysis. It uses thermodynamics and utilizes the most recent set of nearest neighbor parameters from the Turner group. It includes methods for secondary structure prediction (using several algorithms), prediction of base pair probabilities, bimolecular structure prediction, and prediction of a structure common to two sequences. This contribution describes new extensions to the package, including a library of C++ classes for incorporation into other programs, a user-friendly graphical user interface written in JAVA, and new Unix-style text interfaces. The original graphical user interface for Microsoft Windows is still maintained. The extensions to RNAstructure serve to make RNA secondary structure prediction user-friendly. The package is available for download from the Mathews lab homepage at http://rna.urmc.rochester.edu/RNAstructure.html.

  17. Dynamic programming algorithms for biological sequence comparison.

    PubMed

    Pearson, W R; Miller, W

    1992-01-01

    Efficient dynamic programming algorithms are available for a broad class of protein and DNA sequence comparison problems. These algorithms require computer time proportional to the product of the lengths of the two sequences being compared [O(N2)] but require memory space proportional only to the sum of these lengths [O(N)]. Although the requirement for O(N2) time limits use of the algorithms to the largest computers when searching protein and DNA sequence databases, many other applications of these algorithms, such as calculation of distances for evolutionary trees and comparison of a new sequence to a library of sequence profiles, are well within the capabilities of desktop computers. In particular, the results of library searches with rapid searching programs, such as FASTA or BLAST, should be confirmed by performing a rigorous optimal alignment. Whereas rapid methods do not overlook significant sequence similarities, FASTA limits the number of gaps that can be inserted into an alignment, so that a rigorous alignment may extend the alignment substantially in some cases. BLAST does not allow gaps in the local regions that it reports; a calculation that allows gaps is very likely to extend the alignment substantially. Although a Monte Carlo evaluation of the statistical significance of a similarity score with a rigorous algorithm is much slower than the heuristic approach used by the RDF2 program, the dynamic programming approach should take less than 1 hr on a 386-based PC or desktop Unix workstation. For descriptive purposes, we have limited our discussion to methods for calculating similarity scores and distances that use gap penalties of the form g = rk. Nevertheless, programs for the more general case (g = q+rk) are readily available. Versions of these programs that run either on Unix workstations, IBM-PC class computers, or the Macintosh can be obtained from either of the authors.

  18. Motor programming when sequencing multiple elements of the same duration.

    PubMed

    Magnuson, Curt E; Robin, Donald A; Wright, David L

    2008-11-01

    Motor programming at the self-select paradigm was adopted in 2 experiments to examine the processing demands of independent processes. One process (INT) is responsible for organizing the internal features of the individual elements in a movement (e.g., response duration). The 2nd process (SEQ) is responsible for placing the elements into the proper serial order before execution. Participants in Experiment 1 performed tasks involving 1 key press or sequences of 4 key presses of the same duration. Implementing INT and SEQ was more time consuming for key-pressing sequences than for single key-press tasks. Experiment 2 examined whether the INT costs resulting from the increase in sequence length observed in Experiment 1 resulted from independent planning of each sequence element or via a separate "multiplier" process that handled repetitions of elements of the same duration. Findings from Experiment 2, in which participants performed single key presses or double or triple key sequences of the same duration, suggested that INT is involved with the independent organization of each element contained in the sequence. Researchers offer an elaboration of the 2-process account of motor programming to incorporate the present findings and the findings from other recent sequence-learning research.

  19. The Integrated Sequence: An Innovative Component of Four Courses in the General Education Program at Davis & Elkins College. A Digest of Program Elements, Developmental Background, and Faculty Dynamics.

    ERIC Educational Resources Information Center

    Gartmann, Will

    The Integrated Sequence Program at Davis & Elkins College, which consists of four team-taught, interdisciplinary courses, is described, along with the origins and philosophy of the program. The courses are as follows: Human Freedom and the Counterforces (freshman year); World Culture (sophomore year); Comparative Ideas (junior year); and The…

  20. The COG database: a tool for genome-scale analysis of protein functions and evolution

    PubMed Central

    Tatusov, Roman L.; Galperin, Michael Y.; Natale, Darren A.; Koonin, Eugene V.

    2000-01-01

    Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www.ncbi.nlm.nih.gov/COG ). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56–83% of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes. PMID:10592175

Top