source code freely: Topics by Science.gov

Sample records for source code freely

VLF Source Localization with a Freely Drifting Sensor Array

DTIC Science & Technology

1992-09-01

Simultaneous Measurement of Infra - sonic Acoustic Particle Velocity and Acoustic Pressure in the Ocean by F-ely Drifting Swallow Floats," IEEEJ. Ocean. Eng., vol...Pacific. Marine Physical Laboratory’s set of nine freely drifting, infrasonic sensors, capable of recording ocean ambient noise in the 1- to 25-Hz range...Terms. 15. Number of Pages, Swallow float, matched-field processing, infrasonic sensor, vlf source localization 153 16. Price Code. 17. Seorlity
Cloudy's Journey from FORTRAN to C, Why and How

NASA Astrophysics Data System (ADS)

Ferland, G. J.

Cloudy is a large-scale plasma simulation code that is widely used across the astronomical community as an aid in the interpretation of spectroscopic data. The cover of the ADAS VI book featured predictions of the code. The FORTRAN 77 source code has always been freely available on the Internet, contributing to its widespread use. The coming of PCs and Linux has fundamentally changed the computing environment. Modern Fortran compilers (F90 and F95) are not freely available. A common-use code must be written in either FORTRAN 77 or C to be Open Source/GNU/Linux friendly. F77 has serious drawbacks - modern language constructs cannot be used, students do not have skills in this language, and it does not contribute to their future employability. It became clear that the code would have to be ported to C to have a viable future. I describe the approach I used to convert Cloudy from FORTRAN 77 with MILSPEC extensions to ANSI/ISO 89 C. Cloudy is now openly available as a C code, and will evolve to C++ as gcc and standard C++ mature. Cloudy looks to a bright future with a modern language.
Biopython: freely available Python tools for computational molecular biology and bioinformatics.

PubMed

Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L

2009-06-01

The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.
Practices in Code Discoverability: Astrophysics Source Code Library

NASA Astrophysics Data System (ADS)

Allen, A.; Teuben, P.; Nemiroff, R. J.; Shamir, L.

2012-09-01

Here we describe the Astrophysics Source Code Library (ASCL), which takes an active approach to sharing astrophysics source code. ASCL's editor seeks out both new and old peer-reviewed papers that describe methods or experiments that involve the development or use of source code, and adds entries for the found codes to the library. This approach ensures that source codes are added without requiring authors to actively submit them, resulting in a comprehensive listing that covers a significant number of the astrophysics source codes used in peer-reviewed studies. The ASCL now has over 340 codes in it and continues to grow. In 2011, the ASCL has on average added 19 codes per month. An advisory committee has been established to provide input and guide the development and expansion of the new site, and a marketing plan has been developed and is being executed. All ASCL source codes have been used to generate results published in or submitted to a refereed journal and are freely available either via a download site or from an identified source. This paper provides the history and description of the ASCL. It lists the requirements for including codes, examines the advantages of the ASCL, and outlines some of its future plans.
DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra

PubMed Central

2013-01-01

De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com. PMID:24295440
DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.

PubMed

Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald

2014-02-07

De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .
Saint: a lightweight integration environment for model annotation.

PubMed

Lister, Allyson L; Pocock, Matthew; Taschuk, Morgan; Wipat, Anil

2009-11-15

Saint is a web application which provides a lightweight annotation integration environment for quantitative biological models. The system enables modellers to rapidly mark up models with biological information derived from a range of data sources. Saint is freely available for use on the web at http://www.cisban.ac.uk/saint. The web application is implemented in Google Web Toolkit and Tomcat, with all major browsers supported. The Java source code is freely available for download at http://saint-annotate.sourceforge.net. The Saint web server requires an installation of libSBML and has been tested on Linux (32-bit Ubuntu 8.10 and 9.04).
The Astrophysics Source Code Library: An Update

NASA Astrophysics Data System (ADS)

Allen, Alice; Nemiroff, R. J.; Shamir, L.; Teuben, P. J.

2012-01-01

The Astrophysics Source Code Library (ASCL), founded in 1999, takes an active approach to sharing astrophysical source code. ASCL's editor seeks out both new and old peer-reviewed papers that describe methods or experiments that involve the development or use of source code, and adds entries for the found codes to the library. This approach ensures that source codes are added without requiring authors to actively submit them, resulting in a comprehensive listing that covers a significant number of the astrophysics source codes used in peer-reviewed studies. The ASCL moved to a new location in 2010, and has over 300 codes in it and continues to grow. In 2011, the ASCL (http://asterisk.apod.com/viewforum.php?f=35) has on average added 19 new codes per month; we encourage scientists to submit their codes for inclusion. An advisory committee has been established to provide input and guide the development and expansion of its new site, and a marketing plan has been developed and is being executed. All ASCL source codes have been used to generate results published in or submitted to a refereed journal and are freely available either via a download site or from an identified source. This presentation covers the history of the ASCL and examines the current state and benefits of the ASCL, the means of and requirements for including codes, and outlines its future plans.
Software for Computing, Archiving, and Querying Semisimple Braided Monoidal Category Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

This software package collects various open source and freely available codes and algorithms to compute and archive the categorical data for certain semisimple braided monoidal categories. In particular, it computes the data for of group theoretical categories for academic research.
Open Source and These United States

DTIC Science & Technology

1999-04-01

the ability of all participants to freely access the source code and keep abreast of progress. There can be no information hoarding on an open source... developed in this way depends upon ready and reliable communications. Just as the internet has increased the ability of people to exchange information...investment is maximized through long use and reuse. This process results in systems which harnesses the collaborative abilities of its user developers
Installing a Local Copy of the Reactome Web Site and Knowledgebase

PubMed Central

McKay, Sheldon J; Weiser, Joel

2015-01-01

The Reactome project builds, maintains, and publishes a knowledgebase of biological pathways. The information in the knowledgebase is gathered from the experts in the field, peer reviewed, and edited by Reactome editorial staff and then published to the Reactome Web site, http://www.reactome.org (see UNIT 8.7; Croft et al., 2013). The Reactome software is open source and builds on top of other open-source or freely available software. Reactome data and code can be freely downloaded in its entirety and the Web site installed locally. This allows for more flexible interrogation of the data and also makes it possible to add one’s own information to the knowledgebase. PMID:26087747
Biopython: freely available Python tools for computational molecular biology and bioinformatics

PubMed Central

Cock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J. L.

2009-01-01

Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk. PMID:19304878
HippDB: a database of readily targeted helical protein-protein interactions.

PubMed

Bergey, Christina M; Watkins, Andrew M; Arora, Paramjit S

2013-11-01

HippDB catalogs every protein-protein interaction whose structure is available in the Protein Data Bank and which exhibits one or more helices at the interface. The Web site accepts queries on variables such as helix length and sequence, and it provides computational alanine scanning and change in solvent-accessible surface area values for every interfacial residue. HippDB is intended to serve as a starting point for structure-based small molecule and peptidomimetic drug development. HippDB is freely available on the web at http://www.nyu.edu/projects/arora/hippdb. The Web site is implemented in PHP, MySQL and Apache. Source code freely available for download at http://code.google.com/p/helidb, implemented in Perl and supported on Linux. arora@nyu.edu.
SPOCS: software for predicting and visualizing orthology/paralogy relationships among genomes.

PubMed

Curtis, Darren S; Phillips, Aaron R; Callister, Stephen J; Conlan, Sean; McCue, Lee Ann

2013-10-15

At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required.
AMIDE: a free software tool for multimodality medical image analysis.

PubMed

Loening, Andreas Markus; Gambhir, Sanjiv Sam

2003-07-01

Amide's a Medical Image Data Examiner (AMIDE) has been developed as a user-friendly, open-source software tool for displaying and analyzing multimodality volumetric medical images. Central to the package's abilities to simultaneously display multiple data sets (e.g., PET, CT, MRI) and regions of interest is the on-demand data reslicing implemented within the program. Data sets can be freely shifted, rotated, viewed, and analyzed with the program automatically handling interpolation as needed from the original data. Validation has been performed by comparing the output of AMIDE with that of several existing software packages. AMIDE runs on UNIX, Macintosh OS X, and Microsoft Windows platforms, and it is freely available with source code under the terms of the GNU General Public License.
QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials.

PubMed

Giannozzi, Paolo; Baroni, Stefano; Bonini, Nicola; Calandra, Matteo; Car, Roberto; Cavazzoni, Carlo; Ceresoli, Davide; Chiarotti, Guido L; Cococcioni, Matteo; Dabo, Ismaila; Dal Corso, Andrea; de Gironcoli, Stefano; Fabris, Stefano; Fratesi, Guido; Gebauer, Ralph; Gerstmann, Uwe; Gougoussis, Christos; Kokalj, Anton; Lazzeri, Michele; Martin-Samos, Layla; Marzari, Nicola; Mauri, Francesco; Mazzarello, Riccardo; Paolini, Stefano; Pasquarello, Alfredo; Paulatto, Lorenzo; Sbraccia, Carlo; Scandolo, Sandro; Sclauzero, Gabriele; Seitsonen, Ari P; Smogunov, Alexander; Umari, Paolo; Wentzcovitch, Renata M

2009-09-30

QUANTUM ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). The acronym ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newly-restructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are still its main focus, with special attention paid to massively parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and interoperable codes in the spirit of an open-source project, where researchers active in the field of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.
General Mission Analysis Tool (GMAT): Mission, Vision, and Business Case

NASA Technical Reports Server (NTRS)

Hughes, Steven P.

2007-01-01

The Goal of the GMAT project is to develop new space trajectory optimization and mission design technology by working inclusively with ordinary people, universities businesses and other government organizations; and to share that technology in an open and unhindered way. GMAT's a free and open source software system; free for anyone to use in development of new mission concepts or to improve current missions, freely available in source code form for enhancement or future technology development.
A database of natural products and chemical entities from marine habitat

PubMed Central

Babu, Padavala Ajay; Puppala, Suma Sree; Aswini, Satyavarapu Lakshmi; Vani, Metta Ramya; Kumar, Chinta Narasimha; Prasanna, Tallapragada

2008-01-01

Marine compound database consists of marine natural products and chemical entities, collected from various literature sources, which are known to possess bioactivity against human diseases. The database is constructed using html code. The 12 categories of 182 compounds are provided with the source, compound name, 2-dimensional structure, bioactivity and clinical trial information. The database is freely available online and can be accessed at http://www.progenebio.in/mcdb/index.htm PMID:19238254
Open source tools and toolkits for bioinformatics: significance, and where are we?

PubMed

Stajich, Jason E; Lapp, Hilmar

2006-09-01

This review summarizes important work in open-source bioinformatics software that has occurred over the past couple of years. The survey is intended to illustrate how programs and toolkits whose source code has been developed or released under an Open Source license have changed informatics-heavy areas of life science research. Rather than creating a comprehensive list of all tools developed over the last 2-3 years, we use a few selected projects encompassing toolkit libraries, analysis tools, data analysis environments and interoperability standards to show how freely available and modifiable open-source software can serve as the foundation for building important applications, analysis workflows and resources.
InterProScan 5: genome-scale protein function classification

PubMed Central

Jones, Philip; Binns, David; Chang, Hsin-Yu; Fraser, Matthew; Li, Weizhong; McAnulla, Craig; McWilliam, Hamish; Maslen, John; Mitchell, Alex; Nuka, Gift; Pesseat, Sebastien; Quinn, Antony F.; Sangrador-Vegas, Amaia; Scheremetjew, Maxim; Yong, Siew-Yit; Lopez, Rodrigo; Hunter, Sarah

2014-01-01

Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code. Availability and implementation: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk or mitchell@ebi.ac.uk PMID:24451626

JADAMILU: a software code for computing selected eigenvalues of large sparse symmetric matrices

NASA Astrophysics Data System (ADS)

Bollhöfer, Matthias; Notay, Yvan

2007-12-01

A new software code for computing selected eigenvalues and associated eigenvectors of a real symmetric matrix is described. The eigenvalues are either the smallest or those closest to some specified target, which may be in the interior of the spectrum. The underlying algorithm combines the Jacobi-Davidson method with efficient multilevel incomplete LU (ILU) preconditioning. Key features are modest memory requirements and robust convergence to accurate solutions. Parameters needed for incomplete LU preconditioning are automatically computed and may be updated at run time depending on the convergence pattern. The software is easy to use by non-experts and its top level routines are written in FORTRAN 77. Its potentialities are demonstrated on a few applications taken from computational physics. Program summaryProgram title: JADAMILU Catalogue identifier: ADZT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADZT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 101 359 No. of bytes in distributed program, including test data, etc.: 7 493 144 Distribution format: tar.gz Programming language: Fortran 77 Computer: Intel or AMD with g77 and pgf; Intel EM64T or Itanium with ifort; AMD Opteron with g77, pgf and ifort; Power (IBM) with xlf90. Operating system: Linux, AIX RAM: problem dependent Word size: real:8; integer: 4 or 8, according to user's choice Classification: 4.8 Nature of problem: Any physical problem requiring the computation of a few eigenvalues of a symmetric matrix. Solution method: Jacobi-Davidson combined with multilevel ILU preconditioning. Additional comments: We supply binaries rather than source code because JADAMILU uses the following external packages: MC64. This software is copyrighted software and not freely available. COPYRIGHT (c) 1999 Council for the Central Laboratory of the Research Councils. AMD. Copyright (c) 2004-2006 by Timothy A. Davis, Patrick R. Amestoy, and Iain S. Duff. Source code is distributed by the authors under the GNU LGPL licence. BLAS. The reference BLAS is a freely-available software package. It is available from netlib via anonymous ftp and the World Wide Web. LAPACK. The complete LAPACK package or individual routines from LAPACK are freely available on netlib and can be obtained via the World Wide Web or anonymous ftp. For maximal benefit to the community, we added the sources we are proprietary of to the tar.gz file submitted for inclusion in the CPC library. However, as explained in the README file, users willing to compile the code instead of using binaries should first obtain the sources for the external packages mentioned above (email and/or web addresses are provided). Running time: Problem dependent; the test examples provided with the code only take a few seconds to run; timing results for large scale problems are given in Section 5.
GANDALF - Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids

NASA Astrophysics Data System (ADS)

Hubber, D. A.; Rosotti, G. P.; Booth, R. A.

2018-01-01

GANDALF is a new hydrodynamics and N-body dynamics code designed for investigating planet formation, star formation and star cluster problems. GANDALF is written in C++, parallelized with both OPENMP and MPI and contains a PYTHON library for analysis and visualization. The code has been written with a fully object-oriented approach to easily allow user-defined implementations of physics modules or other algorithms. The code currently contains implementations of smoothed particle hydrodynamics, meshless finite-volume and collisional N-body schemes, but can easily be adapted to include additional particle schemes. We present in this paper the details of its implementation, results from the test suite, serial and parallel performance results and discuss the planned future development. The code is freely available as an open source project on the code-hosting website github at https://github.com/gandalfcode/gandalf and is available under the GPLv2 license.
Performance of Compiler-Assisted Memory Safety Checking

DTIC Science & Technology

2014-08-01

software developer has in mind a particular object to which the pointer should point, the intended referent. A memory access error occurs when an ac...Performance of Compiler-Assisted Memory Safety Checking David Keaton Robert C. Seacord August 2014 TECHNICAL NOTE CMU/SEI-2014-TN...based memory safety checking tool and the performance that can be achieved with two such tools whose source code is freely available. The note then
Proteomics to go: Proteomatic enables the user-friendly creation of versatile MS/MS data evaluation workflows.

PubMed

Specht, Michael; Kuhlgert, Sebastian; Fufezan, Christian; Hippler, Michael

2011-04-15

We present Proteomatic, an operating system independent and user-friendly platform that enables the construction and execution of MS/MS data evaluation pipelines using free and commercial software. Required external programs such as for peptide identification are downloaded automatically in the case of free software. Due to a strict separation of functionality and presentation, and support for multiple scripting languages, new processing steps can be added easily. Proteomatic is implemented in C++/Qt, scripts are implemented in Ruby, Python and PHP. All source code is released under the LGPL. Source code and installers for Windows, Mac OS X, and Linux are freely available at http://www.proteomatic.org. michael.specht@uni-muenster.de Supplementary data are available at Bioinformatics online.
GenomeDiagram: a python package for the visualization of large-scale genomic data.

PubMed

Pritchard, Leighton; White, Jennifer A; Birch, Paul R J; Toth, Ian K

2006-03-01

We present GenomeDiagram, a flexible, open-source Python module for the visualization of large-scale genomic, comparative genomic and other data with reference to a single chromosome or other biological sequence. GenomeDiagram may be used to generate publication-quality vector graphics, rastered images and in-line streamed graphics for webpages. The package integrates with datatypes from the BioPython project, and is available for Windows, Linux and Mac OS X systems. GenomeDiagram is freely available as source code (under GNU Public License) at http://bioinf.scri.ac.uk/lp/programs.html, and requires Python 2.3 or higher, and recent versions of the ReportLab and BioPython packages. A user manual, example code and images are available at http://bioinf.scri.ac.uk/lp/programs.html.
ChemoPy: freely available python package for computational biology and chemoinformatics.

PubMed

Cao, Dong-Sheng; Xu, Qing-Song; Hu, Qian-Nan; Liang, Yi-Zeng

2013-04-15

Molecular representation for small molecules has been routinely used in QSAR/SAR, virtual screening, database search, ranking, drug ADME/T prediction and other drug discovery processes. To facilitate extensive studies of drug molecules, we developed a freely available, open-source python package called chemoinformatics in python (ChemoPy) for calculating the commonly used structural and physicochemical features. It computes 16 drug feature groups composed of 19 descriptors that include 1135 descriptor values. In addition, it provides seven types of molecular fingerprint systems for drug molecules, including topological fingerprints, electro-topological state (E-state) fingerprints, MACCS keys, FP4 keys, atom pairs fingerprints, topological torsion fingerprints and Morgan/circular fingerprints. By applying a semi-empirical quantum chemistry program MOPAC, ChemoPy can also compute a large number of 3D molecular descriptors conveniently. The python package, ChemoPy, is freely available via http://code.google.com/p/pychem/downloads/list, and it runs on Linux and MS-Windows. Supplementary data are available at Bioinformatics online.
Phylowood: interactive web-based animations of biogeographic and phylogeographic histories.

PubMed

Landis, Michael J; Bedford, Trevor

2014-01-01

Phylowood is a web service that uses JavaScript to generate in-browser animations of biogeographic and phylogeographic histories from annotated phylogenetic input. The animations are interactive, allowing the user to adjust spatial and temporal resolution, and highlight phylogenetic lineages of interest. All documentation and source code for Phylowood is freely available at https://github.com/mlandis/phylowood, and a live web application is available at https://mlandis.github.io/phylowood.
PyPanda: a Python package for gene regulatory network reconstruction

PubMed Central

van IJzendoorn, David G.P.; Glass, Kimberly; Quackenbush, John; Kuijjer, Marieke L.

2016-01-01

Summary: PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of ‘omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. Availability and implementation: The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda. Contact: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc.nl PMID:27402905
PyPanda: a Python package for gene regulatory network reconstruction.

PubMed

van IJzendoorn, David G P; Glass, Kimberly; Quackenbush, John; Kuijjer, Marieke L

2016-11-01

PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of 'omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda CONTACT: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc.nl. © The Author 2016. Published by Oxford University Press.
Rosetta3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules

PubMed Central

Leaver-Fay, Andrew; Tyka, Michael; Lewis, Steven M.; Lange, Oliver F.; Thompson, James; Jacak, Ron; Kaufman, Kristian; Renfrew, P. Douglas; Smith, Colin A.; Sheffler, Will; Davis, Ian W.; Cooper, Seth; Treuille, Adrien; Mandell, Daniel J.; Richter, Florian; Ban, Yih-En Andrew; Fleishman, Sarel J.; Corn, Jacob E.; Kim, David E.; Lyskov, Sergey; Berrondo, Monica; Mentzer, Stuart; Popović, Zoran; Havranek, James J.; Karanicolas, John; Das, Rhiju; Meiler, Jens; Kortemme, Tanja; Gray, Jeffrey J.; Kuhlman, Brian; Baker, David; Bradley, Philip

2013-01-01

We have recently completed a full re-architecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy to use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This document describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform. PMID:21187238
ProteoCloud: a full-featured open source proteomics cloud computing pipeline.

PubMed

Muth, Thilo; Peters, Julian; Blackburn, Jonathan; Rapp, Erdmann; Martens, Lennart

2013-08-02

We here present the ProteoCloud pipeline, a freely available, full-featured cloud-based platform to perform computationally intensive, exhaustive searches in a cloud environment using five different peptide identification algorithms. ProteoCloud is entirely open source, and is built around an easy to use and cross-platform software client with a rich graphical user interface. This client allows full control of the number of cloud instances to initiate and of the spectra to assign for identification. It also enables the user to track progress, and to visualize and interpret the results in detail. Source code, binaries and documentation are all available at http://proteocloud.googlecode.com. Copyright © 2012 Elsevier B.V. All rights reserved.
ANT: Software for Generating and Evaluating Degenerate Codons for Natural and Expanded Genetic Codes.

PubMed

Engqvist, Martin K M; Nielsen, Jens

2015-08-21

The Ambiguous Nucleotide Tool (ANT) is a desktop application that generates and evaluates degenerate codons. Degenerate codons are used to represent DNA positions that have multiple possible nucleotide alternatives. This is useful for protein engineering and directed evolution, where primers specified with degenerate codons are used as a basis for generating libraries of protein sequences. ANT is intuitive and can be used in a graphical user interface or by interacting with the code through a defined application programming interface. ANT comes with full support for nonstandard, user-defined, or expanded genetic codes (translation tables), which is important because synthetic biology is being applied to an ever widening range of natural and engineered organisms. The Python source code for ANT is freely distributed so that it may be used without restriction, modified, and incorporated in other software or custom data pipelines.
MRMPlus: an open source quality control and assessment tool for SRM/MRM assay development.

PubMed

Aiyetan, Paul; Thomas, Stefani N; Zhang, Zhen; Zhang, Hui

2015-12-12

Selected and multiple reaction monitoring involves monitoring a multiplexed assay of proteotypic peptides and associated transitions in mass spectrometry runs. To describe peptide and associated transitions as stable, quantifiable, and reproducible representatives of proteins of interest, experimental and analytical validation is required. However, inadequate and disparate analytical tools and validation methods predispose assay performance measures to errors and inconsistencies. Implemented as a freely available, open-source tool in the platform independent Java programing language, MRMPlus computes analytical measures as recommended recently by the Clinical Proteomics Tumor Analysis Consortium Assay Development Working Group for "Tier 2" assays - that is, non-clinical assays sufficient enough to measure changes due to both biological and experimental perturbations. Computed measures include; limit of detection, lower limit of quantification, linearity, carry-over, partial validation of specificity, and upper limit of quantification. MRMPlus streamlines assay development analytical workflow and therefore minimizes error predisposition. MRMPlus may also be used for performance estimation for targeted assays not described by the Assay Development Working Group. MRMPlus' source codes and compiled binaries can be freely downloaded from https://bitbucket.org/paiyetan/mrmplusgui and https://bitbucket.org/paiyetan/mrmplusgui/downloads respectively.
PyCorrFit-generic data evaluation for fluorescence correlation spectroscopy.

PubMed

Müller, Paul; Schwille, Petra; Weidemann, Thomas

2014-09-01

We present a graphical user interface (PyCorrFit) for the fitting of theoretical model functions to experimental data obtained by fluorescence correlation spectroscopy (FCS). The program supports many data file formats and features a set of tools specialized in FCS data evaluation. The Python source code is freely available for download from the PyCorrFit web page at http://pycorrfit.craban.de. We offer binaries for Ubuntu Linux, Mac OS X and Microsoft Windows. © The Author 2014. Published by Oxford University Press.
CHROMA: consensus-based colouring of multiple alignments for publication.

PubMed

Goodstadt, L; Ponting, C P

2001-09-01

CHROMA annotates multiple protein sequence alignments by consensus to produce formatted and coloured text suitable for incorporation into other documents for publication. The package is designed to be flexible and reliable, and has a simple-to-use graphical user interface running under Microsoft Windows. Both the executables and source code for CHROMA running under Windows and Linux (portable command-line only) are freely available at http://www.lg.ndirect.co.uk/chroma. Software enquiries should be directed to CHROMA@lg.ndirect.co.uk.
IgSimulator: a versatile immunosequencing simulator.

PubMed

Safonova, Yana; Lapidus, Alla; Lill, Jennie

2015-10-01

The recent introduction of next-generation sequencing technologies to antibody studies have resulted in a growing number of immunoinformatics tools for antibody repertoire analysis. However, benchmarking these newly emerging tools remains problematic since the gold standard datasets that are needed to validate these tools are typically not available. Since simulating antibody repertoires is often the only feasible way to benchmark new immunoinformatics tools, we developed the IgSimulator tool that addresses various complications in generating realistic antibody repertoires. IgSimulator's code has modular structure and can be easily adapted to new requirements to simulation. IgSimulator is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from yana-safonova.github.io/ig_simulator. safonova.yana@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
BioCIDER: a Contextualisation InDEx for biological Resources discovery

PubMed Central

Horro, Carlos; Cook, Martin; Attwood, Teresa K.; Brazas, Michelle D.; Hancock, John M.; Palagi, Patricia; Corpas, Manuel; Jimenez, Rafael

2017-01-01

Abstract Summary The vast, uncoordinated proliferation of bioinformatics resources (databases, software tools, training materials etc.) makes it difficult for users to find them. To facilitate their discovery, various services are being developed to collect such resources into registries. We have developed BioCIDER, which, rather like online shopping ‘recommendations’, provides a contextualization index to help identify biological resources relevant to the content of the sites in which it is embedded. Availability and Implementation BioCIDER (www.biocider.org) is an open-source platform. Documentation is available online (https://goo.gl/Klc51G), and source code is freely available via GitHub (https://github.com/BioCIDER). The BioJS widget that enables websites to embed contextualization is available from the BioJS registry (http://biojs.io/). All code is released under an MIT licence. Contact carlos.horro@earlham.ac.uk or rafael.jimenez@elixir-europe.org or manuel@repositive.io PMID:28407033
Improvement of Mishchenko's T-matrix code for absorbing particles.

PubMed

Moroz, Alexander

2005-06-10

The use of Gaussian elimination with backsubstitution for matrix inversion in scattering theories is discussed. Within the framework of the T-matrix method (the state-of-the-art code by Mishchenko is freely available at http://www.giss.nasa.gov/-crmim), it is shown that the domain of applicability of Mishchenko's FORTRAN 77 (F77) code can be substantially expanded in the direction of strongly absorbing particles where the current code fails to converge. Such an extension is especially important if the code is to be used in nanoplasmonic or nanophotonic applications involving metallic particles. At the same time, convergence can also be achieved for large nonabsorbing particles, in which case the non-Numerical Algorithms Group option of Mishchenko's code diverges. Computer F77 implementation of Mishchenko's code supplemented with Gaussian elimination with backsubstitution is freely available at http://www.wave-scattering.com.
SBEToolbox: A Matlab Toolbox for Biological Network Analysis

PubMed Central

Konganti, Kranti; Wang, Gang; Yang, Ence; Cai, James J.

2013-01-01

We present SBEToolbox (Systems Biology and Evolution Toolbox), an open-source Matlab toolbox for biological network analysis. It takes a network file as input, calculates a variety of centralities and topological metrics, clusters nodes into modules, and displays the network using different graph layout algorithms. Straightforward implementation and the inclusion of high-level functions allow the functionality to be easily extended or tailored through developing custom plugins. SBEGUI, a menu-driven graphical user interface (GUI) of SBEToolbox, enables easy access to various network and graph algorithms for programmers and non-programmers alike. All source code and sample data are freely available at https://github.com/biocoder/SBEToolbox/releases. PMID:24027418
SBEToolbox: A Matlab Toolbox for Biological Network Analysis.

PubMed

Konganti, Kranti; Wang, Gang; Yang, Ence; Cai, James J

2013-01-01

We present SBEToolbox (Systems Biology and Evolution Toolbox), an open-source Matlab toolbox for biological network analysis. It takes a network file as input, calculates a variety of centralities and topological metrics, clusters nodes into modules, and displays the network using different graph layout algorithms. Straightforward implementation and the inclusion of high-level functions allow the functionality to be easily extended or tailored through developing custom plugins. SBEGUI, a menu-driven graphical user interface (GUI) of SBEToolbox, enables easy access to various network and graph algorithms for programmers and non-programmers alike. All source code and sample data are freely available at https://github.com/biocoder/SBEToolbox/releases.

Nmrglue: an open source Python package for the analysis of multidimensional NMR data.

PubMed

Helmus, Jonathan J; Jaroniec, Christopher P

2013-04-01

Nmrglue, an open source Python package for working with multidimensional NMR data, is described. When used in combination with other Python scientific libraries, nmrglue provides a highly flexible and robust environment for spectral processing, analysis and visualization and includes a number of common utilities such as linear prediction, peak picking and lineshape fitting. The package also enables existing NMR software programs to be readily tied together, currently facilitating the reading, writing and conversion of data stored in Bruker, Agilent/Varian, NMRPipe, Sparky, SIMPSON, and Rowland NMR Toolkit file formats. In addition to standard applications, the versatility offered by nmrglue makes the package particularly suitable for tasks that include manipulating raw spectrometer data files, automated quantitative analysis of multidimensional NMR spectra with irregular lineshapes such as those frequently encountered in the context of biomacromolecular solid-state NMR, and rapid implementation and development of unconventional data processing methods such as covariance NMR and other non-Fourier approaches. Detailed documentation, install files and source code for nmrglue are freely available at http://nmrglue.com. The source code can be redistributed and modified under the New BSD license.
Nmrglue: An Open Source Python Package for the Analysis of Multidimensional NMR Data

PubMed Central

Helmus, Jonathan J.; Jaroniec, Christopher P.

2013-01-01

Nmrglue, an open source Python package for working with multidimensional NMR data, is described. When used in combination with other Python scientific libraries, nmrglue provides a highly flexible and robust environment for spectral processing, analysis and visualization and includes a number of common utilities such as linear prediction, peak picking and lineshape fitting. The package also enables existing NMR software programs to be readily tied together, currently facilitating the reading, writing and conversion of data stored in Bruker, Agilent/Varian, NMRPipe, Sparky, SIMPSON, and Rowland NMR Toolkit file formats. In addition to standard applications, the versatility offered by nmrglue makes the package particularly suitable for tasks that include manipulating raw spectrometer data files, automated quantitative analysis of multidimensional NMR spectra with irregular lineshapes such as those frequently encountered in the context of biomacromolecular solid-state NMR, and rapid implementation and development of unconventional data processing methods such as covariance NMR and other non-Fourier approaches. Detailed documentation, install files and source code for nmrglue are freely available at http://nmrglue.com. The source code can be redistributed and modified under the New BSD license. PMID:23456039
RANGER-DTL 2.0: Rigorous Reconstruction of Gene-Family Evolution by Duplication, Transfer, and Loss.

PubMed

Bansal, Mukul S; Kellis, Manolis; Kordi, Misagh; Kundu, Soumya

2018-04-24

RANGER-DTL 2.0 is a software program for inferring gene family evolution using Duplication-Transfer-Loss reconciliation. This new software is highly scalable and easy to use, and offers many new features not currently available in any other reconciliation program. RANGER-DTL 2.0 has a particular focus on reconciliation accuracy and can account for many sources of reconciliation uncertainty including uncertain gene tree rooting, gene tree topological uncertainty, multiple optimal reconciliations, and alternative event cost assignments. RANGER-DTL 2.0 is open-source and written in C ++ and Python. Pre-compiled executables, source code (open-source under GNU GPL), and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/RANGER-DTL/. mukul.bansal@uconn.edu.
M2Lite: An Open-source, Light-weight, Pluggable and Fast Proteome Discoverer MSF to mzIdentML Tool.

PubMed

Aiyetan, Paul; Zhang, Bai; Chen, Lily; Zhang, Zhen; Zhang, Hui

2014-04-28

Proteome Discoverer is one of many tools used for protein database search and peptide to spectrum assignment in mass spectrometry-based proteomics. However, the inadequacy of conversion tools makes it challenging to compare and integrate its results to those of other analytical tools. Here we present M2Lite, an open-source, light-weight, easily pluggable and fast conversion tool. M2Lite converts proteome discoverer derived MSF files to the proteomics community defined standard - the mzIdentML file format. M2Lite's source code is available as open-source at https://bitbucket.org/paiyetan/m2lite/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/m2lite/downloads.
General Mission Analysis Tool (GMAT)

NASA Technical Reports Server (NTRS)

Hughes, Steven P.

2007-01-01

The General Mission Analysis Tool (GMAT) is a space trajectory optimization and mission analysis system developed by NASA and private industry in the spirit of the NASA Mission. GMAT contains new technology and is a testbed for future technology development. The goal of the GMAT project is to develop new space trajectory optimization and mission design technology by working inclusively with ordinary people, universities, businesses, and other government organizations, and to share that technology in an open and unhindered way. GMAT is a free and open source software system licensed under the NASA Open Source Agreement: free for anyone to use in development of new mission concepts or to improve current missions, freely available in source code form for enhancement or further technology development.
Design and validation of Segment--freely available software for cardiovascular image analysis.

PubMed

Heiberg, Einar; Sjögren, Jane; Ugander, Martin; Carlsson, Marcus; Engblom, Henrik; Arheden, Håkan

2010-01-11

Commercially available software for cardiovascular image analysis often has limited functionality and frequently lacks the careful validation that is required for clinical studies. We have already implemented a cardiovascular image analysis software package and released it as freeware for the research community. However, it was distributed as a stand-alone application and other researchers could not extend it by writing their own custom image analysis algorithms. We believe that the work required to make a clinically applicable prototype can be reduced by making the software extensible, so that researchers can develop their own modules or improvements. Such an initiative might then serve as a bridge between image analysis research and cardiovascular research. The aim of this article is therefore to present the design and validation of a cardiovascular image analysis software package (Segment) and to announce its release in a source code format. Segment can be used for image analysis in magnetic resonance imaging (MRI), computed tomography (CT), single photon emission computed tomography (SPECT) and positron emission tomography (PET). Some of its main features include loading of DICOM images from all major scanner vendors, simultaneous display of multiple image stacks and plane intersections, automated segmentation of the left ventricle, quantification of MRI flow, tools for manual and general object segmentation, quantitative regional wall motion analysis, myocardial viability analysis and image fusion tools. Here we present an overview of the validation results and validation procedures for the functionality of the software. We describe a technique to ensure continued accuracy and validity of the software by implementing and using a test script that tests the functionality of the software and validates the output. The software has been made freely available for research purposes in a source code format on the project home page http://segment.heiberg.se. Segment is a well-validated comprehensive software package for cardiovascular image analysis. It is freely available for research purposes provided that relevant original research publications related to the software are cited.
NSDann2BS, a neutron spectrum unfolding code based on neural networks technology and two bonner spheres

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ortiz-Rodriguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.

In this work a neutron spectrum unfolding code, based on artificial intelligence technology is presented. The code called ''Neutron Spectrometry and Dosimetry with Artificial Neural Networks and two Bonner spheres'', (NSDann2BS), was designed in a graphical user interface under the LabVIEW programming environment. The main features of this code are to use an embedded artificial neural network architecture optimized with the ''Robust design of artificial neural networks methodology'' and to use two Bonner spheres as the only piece of information. In order to build the code here presented, once the net topology was optimized and properly trained, knowledge stored atmore » synaptic weights was extracted and using a graphical framework build on the LabVIEW programming environment, the NSDann2BS code was designed. This code is friendly, intuitive and easy to use for the end user. The code is freely available upon request to authors. To demonstrate the use of the neural net embedded in the NSDann2BS code, the rate counts of {sup 252}Cf, {sup 241}AmBe and {sup 239}PuBe neutron sources measured with a Bonner spheres system.« less
NSDann2BS, a neutron spectrum unfolding code based on neural networks technology and two bonner spheres

NASA Astrophysics Data System (ADS)

Ortiz-Rodríguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.; Solís Sánches, L. O.; Miranda, R. Castañeda; Cervantes Viramontes, J. M.; Vega-Carrillo, H. R.

2013-07-01

In this work a neutron spectrum unfolding code, based on artificial intelligence technology is presented. The code called "Neutron Spectrometry and Dosimetry with Artificial Neural Networks and two Bonner spheres", (NSDann2BS), was designed in a graphical user interface under the LabVIEW programming environment. The main features of this code are to use an embedded artificial neural network architecture optimized with the "Robust design of artificial neural networks methodology" and to use two Bonner spheres as the only piece of information. In order to build the code here presented, once the net topology was optimized and properly trained, knowledge stored at synaptic weights was extracted and using a graphical framework build on the LabVIEW programming environment, the NSDann2BS code was designed. This code is friendly, intuitive and easy to use for the end user. The code is freely available upon request to authors. To demonstrate the use of the neural net embedded in the NSDann2BS code, the rate counts of 252Cf, 241AmBe and 239PuBe neutron sources measured with a Bonner spheres system.
SynergyFinder: a web application for analyzing drug combination dose-response matrix data.

PubMed

Ianevski, Aleksandr; He, Liye; Aittokallio, Tero; Tang, Jing

2017-08-01

Rational design of drug combinations has become a promising strategy to tackle the drug sensitivity and resistance problem in cancer treatment. To systematically evaluate the pre-clinical significance of pairwise drug combinations, functional screening assays that probe combination effects in a dose-response matrix assay are commonly used. To facilitate the analysis of such drug combination experiments, we implemented a web application that uses key functions of R-package SynergyFinder, and provides not only the flexibility of using multiple synergy scoring models, but also a user-friendly interface for visualizing the drug combination landscapes in an interactive manner. The SynergyFinder web application is freely accessible at https://synergyfinder.fimm.fi ; The R-package and its source-code are freely available at http://bioconductor.org/packages/release/bioc/html/synergyfinder.html . jing.tang@helsinki.fi. © The Author(s) 2017. Published by Oxford University Press.
PhamDB: a web-based application for building Phamerator databases.

PubMed

Lamine, James G; DeJong, Randall J; Nelesen, Serita M

2016-07-01

PhamDB is a web application which creates databases of bacteriophage genes, grouped by gene similarity. It is backwards compatible with the existing Phamerator desktop software while providing an improved database creation workflow. Key features include a graphical user interface, validation of uploaded GenBank files, and abilities to import phages from existing databases, modify existing databases and queue multiple jobs. Source code and installation instructions for Linux, Windows and Mac OSX are freely available at https://github.com/jglamine/phage PhamDB is also distributed as a docker image which can be managed via Kitematic. This docker image contains the application and all third party software dependencies as a pre-configured system, and is freely available via the installation instructions provided. snelesen@calvin.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
PDB file parser and structure class implemented in Python.

PubMed

Hamelryck, Thomas; Manderick, Bernard

2003-11-22

The biopython project provides a set of bioinformatics tools implemented in Python. Recently, biopython was extended with a set of modules that deal with macromolecular structure. Biopython now contains a parser for PDB files that makes the atomic information available in an easy-to-use but powerful data structure. The parser and data structure deal with features that are often left out or handled inadequately by other packages, e.g. atom and residue disorder (if point mutants are present in the crystal), anisotropic B factors, multiple models and insertion codes. In addition, the parser performs some sanity checking to detect obvious errors. The Biopython distribution (including source code and documentation) is freely available (under the Biopython license) from http://www.biopython.org
Training alignment parameters for arbitrary sequencers with LAST-TRAIN.

PubMed

Hamada, Michiaki; Ono, Yukiteru; Asai, Kiyoshi; Frith, Martin C

2017-03-15

LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. the source code is freely available at http://last.cbrc.jp/. mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
A new JAVA interface implementation of THESIAS: testing haplotype effects in association studies.

PubMed

Tregouet, D A; Garelle, V

2007-04-15

THESIAS (Testing Haplotype EffectS In Association Studies) is a popular software for carrying haplotype association analysis in unrelated individuals. In addition to the command line interface, a graphical JAVA interface is now proposed allowing one to run THESIAS in a user-friendly manner. Besides, new functionalities have been added to THESIAS including the possibility to analyze polychotomous phenotype and X-linked polymorphisms. The software package including documentation and example data files is freely available at http://genecanvas.ecgene.net. The source codes are also available upon request.
Interactive web-based identification and visualization of transcript shared sequences.

PubMed

Azhir, Alaleh; Merino, Louis-Henri; Nauen, David W

2018-05-12

We have developed TraC (Transcript Consensus), a web-based tool for detecting and visualizing shared sequences among two or more mRNA transcripts such as splice variants. Results including exon-exon boundaries are returned in a highly intuitive, data-rich, interactive plot that permits users to explore the similarities and differences of multiple transcript sequences. The online tool (http://labs.pathology.jhu.edu/nauen/trac/) is free to use. The source code is freely available for download (https://github.com/nauenlab/TraC). Copyright © 2018 Elsevier Inc. All rights reserved.
MARE2DEM: a 2-D inversion code for controlled-source electromagnetic and magnetotelluric data

NASA Astrophysics Data System (ADS)

Key, Kerry

2016-10-01

This work presents MARE2DEM, a freely available code for 2-D anisotropic inversion of magnetotelluric (MT) data and frequency-domain controlled-source electromagnetic (CSEM) data from onshore and offshore surveys. MARE2DEM parametrizes the inverse model using a grid of arbitrarily shaped polygons, where unstructured triangular or quadrilateral grids are typically used due to their ease of construction. Unstructured grids provide significantly more geometric flexibility and parameter efficiency than the structured rectangular grids commonly used by most other inversion codes. Transmitter and receiver components located on topographic slopes can be tilted parallel to the boundary so that the simulated electromagnetic fields accurately reproduce the real survey geometry. The forward solution is implemented with a goal-oriented adaptive finite-element method that automatically generates and refines unstructured triangular element grids that conform to the inversion parameter grid, ensuring accurate responses as the model conductivity changes. This dual-grid approach is significantly more efficient than the conventional use of a single grid for both the forward and inverse meshes since the more detailed finite-element meshes required for accurate responses do not increase the memory requirements of the inverse problem. Forward solutions are computed in parallel with a highly efficient scaling by partitioning the data into smaller independent modeling tasks consisting of subsets of the input frequencies, transmitters and receivers. Non-linear inversion is carried out with a new Occam inversion approach that requires fewer forward calls. Dense matrix operations are optimized for memory and parallel scalability using the ScaLAPACK parallel library. Free parameters can be bounded using a new non-linear transformation that leaves the transformed parameters nearly the same as the original parameters within the bounds, thereby reducing non-linear smoothing effects. Data balancing normalization weights for the joint inversion of two or more data sets encourages the inversion to fit each data type equally well. A synthetic joint inversion of marine CSEM and MT data illustrates the algorithm's performance and parallel scaling on up to 480 processing cores. CSEM inversion of data from the Middle America Trench offshore Nicaragua demonstrates a real world application. The source code and MATLAB interface tools are freely available at http://mare2dem.ucsd.edu.
ASA24 enables multiple automatically coded self-administered 24-hour recalls and food records

Cancer.gov

A freely available web-based tool for epidemiologic, interventional, behavioral, or clinical research from NCI that enables multiple automatically coded self-administered 24-hour recalls and food records.
OntoBrowser: a collaborative tool for curation of ontologies by subject matter experts.

PubMed

Ravagli, Carlo; Pognan, Francois; Marc, Philippe

2017-01-01

The lack of controlled terminology and ontology usage leads to incomplete search results and poor interoperability between databases. One of the major underlying challenges of data integration is curating data to adhere to controlled terminologies and/or ontologies. Finding subject matter experts with the time and skills required to perform data curation is often problematic. In addition, existing tools are not designed for continuous data integration and collaborative curation. This results in time-consuming curation workflows that often become unsustainable. The primary objective of OntoBrowser is to provide an easy-to-use online collaborative solution for subject matter experts to map reported terms to preferred ontology (or code list) terms and facilitate ontology evolution. Additional features include web service access to data, visualization of ontologies in hierarchical/graph format and a peer review/approval workflow with alerting. The source code is freely available under the Apache v2.0 license. Source code and installation instructions are available at http://opensource.nibr.com This software is designed to run on a Java EE application server and store data in a relational database. philippe.marc@novartis.com. © The Author 2016. Published by Oxford University Press.
OntoBrowser: a collaborative tool for curation of ontologies by subject matter experts

PubMed Central

Ravagli, Carlo; Pognan, Francois

2017-01-01

Summary: The lack of controlled terminology and ontology usage leads to incomplete search results and poor interoperability between databases. One of the major underlying challenges of data integration is curating data to adhere to controlled terminologies and/or ontologies. Finding subject matter experts with the time and skills required to perform data curation is often problematic. In addition, existing tools are not designed for continuous data integration and collaborative curation. This results in time-consuming curation workflows that often become unsustainable. The primary objective of OntoBrowser is to provide an easy-to-use online collaborative solution for subject matter experts to map reported terms to preferred ontology (or code list) terms and facilitate ontology evolution. Additional features include web service access to data, visualization of ontologies in hierarchical/graph format and a peer review/approval workflow with alerting. Availability and implementation: The source code is freely available under the Apache v2.0 license. Source code and installation instructions are available at http://opensource.nibr.com. This software is designed to run on a Java EE application server and store data in a relational database. Contact: philippe.marc@novartis.com PMID:27605099
The SAMI Galaxy Survey: A prototype data archive for Big Science exploration

NASA Astrophysics Data System (ADS)

Konstantopoulos, I. S.; Green, A. W.; Foster, C.; Scott, N.; Allen, J. T.; Fogarty, L. M. R.; Lorente, N. P. F.; Sweet, S. M.; Hopkins, A. M.; Bland-Hawthorn, J.; Bryant, J. J.; Croom, S. M.; Goodwin, M.; Lawrence, J. S.; Owers, M. S.; Richards, S. N.

2015-11-01

We describe the data archive and database for the SAMI Galaxy Survey, an ongoing observational program that will cover ≈3400 galaxies with integral-field (spatially-resolved) spectroscopy. Amounting to some three million spectra, this is the largest sample of its kind to date. The data archive and built-in query engine use the versatile Hierarchical Data Format (HDF5), which precludes the need for external metadata tables and hence the setup and maintenance overhead those carry. The code produces simple outputs that can easily be translated to plots and tables, and the combination of these tools makes for a light system that can handle heavy data. This article acts as a contextual companion to the SAMI Survey Database source code repository, samiDB, which is freely available online and written entirely in Python. We also discuss the decisions related to the selection of tools and the creation of data visualisation modules. It is our aim that the work presented in this article-descriptions, rationale, and source code-will be of use to scientists looking to set up a maintenance-light data archive for a Big Science data load.
Mod3DMT and EMTF: Free Software for MT Data Processing and Inversion

NASA Astrophysics Data System (ADS)

Egbert, G. D.; Kelbert, A.; Meqbel, N. M.

2017-12-01

"ModEM" was developed at Oregon State University as a modular system for inversion of electromagnetic (EM) geophysical data (Egbert and Kelbert, 2012; Kelbert et al., 2014). Although designed for more general (frequency domain) EM applications, and originally intended as a testbed for exploring inversion search and regularization strategies, our own initial uses of ModEM were for 3-D imaging of the deep crust and upper mantle at large scales. Since 2013 we have offered a version of the source code suitable for 3D magnetotelluric (MT) inversion on an "as is, user beware" basis for free for non-commercial applications. This version, which we refer to as Mod3DMT, has since been widely used by the international MT community. Over 250 users have registered to download the source code, and at least 50 MT studies in the refereed literature, covering locations around the globe at a range of spatial scales, cite use of ModEM for 3D inversion. For over 30 years I have also made MT processing software available for free use. In this presentation, I will discuss my experience with these freely available (but perhaps not truly open-source) computer codes. Although users are allowed to make modifications to the codes (on conditions that they provide a copy of the modified version) only a handful of users have tried to make any modification, and only rarely are modifications even reported, much less provided back to the developers.

A convolutional neural network approach to calibrating the rotation axis for X-ray computed tomography.

PubMed

Yang, Xiaogang; De Carlo, Francesco; Phatak, Charudatta; Gürsoy, Dogˇa

2017-03-01

This paper presents an algorithm to calibrate the center-of-rotation for X-ray tomography by using a machine learning approach, the Convolutional Neural Network (CNN). The algorithm shows excellent accuracy from the evaluation of synthetic data with various noise ratios. It is further validated with experimental data of four different shale samples measured at the Advanced Photon Source and at the Swiss Light Source. The results are as good as those determined by visual inspection and show better robustness than conventional methods. CNN has also great potential for reducing or removing other artifacts caused by instrument instability, detector non-linearity, etc. An open-source toolbox, which integrates the CNN methods described in this paper, is freely available through GitHub at tomography/xlearn and can be easily integrated into existing computational pipelines available at various synchrotron facilities. Source code, documentation and information on how to contribute are also provided.
SPV: a JavaScript Signaling Pathway Visualizer.

PubMed

Calderone, Alberto; Cesareni, Gianni

2018-03-24

The visualization of molecular interactions annotated in web resources is useful to offer to users such information in a clear intuitive layout. These interactions are frequently represented as binary interactions that are laid out in free space where, different entities, cellular compartments and interaction types are hardly distinguishable. SPV (Signaling Pathway Visualizer) is a free open source JavaScript library which offers a series of pre-defined elements, compartments and interaction types meant to facilitate the representation of signaling pathways consisting of causal interactions without neglecting simple protein-protein interaction networks. freely available under Apache version 2 license; Source code: https://github.com/Sinnefa/SPV_Signaling_Pathway_Visualizer_v1.0. Language: JavaScript; Web technology: Scalable Vector Graphics; Libraries: D3.js. sinnefa@gmail.com.
The new on-line Czech Food Composition Database.

PubMed

Machackova, Marie; Holasova, Marie; Maskova, Eva

2013-10-01

The new on-line Czech Food Composition Database (FCDB) was launched on http://www.czfcdb.cz in December 2010 as a main freely available channel for dissemination of Czech food composition data. The application is based on a complied FCDB documented according to the EuroFIR standardised procedure for full value documentation and indexing of foods by the LanguaL™ Thesaurus. A content management system was implemented for administration of the website and performing data export (comma-separated values or EuroFIR XML transport package formats) by a compiler. Reference/s are provided for each published value with linking to available freely accessible on-line sources of data (e.g. full texts, EuroFIR Document Repository, on-line national FCDBs). LanguaL™ codes are displayed within each food record as searchable keywords of the database. A photo (or a photo gallery) is used as a visual descriptor of a food item. The application is searchable on foods, components, food groups, alphabet and a multi-field advanced search. Copyright © 2013 Elsevier Ltd. All rights reserved.
SCRAM: a pipeline for fast index-free small RNA read alignment and visualization.

PubMed

Fletcher, Stephen J; Boden, Mikael; Mitter, Neena; Carroll, Bernard J

2018-03-15

Small RNAs play key roles in gene regulation, defense against viral pathogens and maintenance of genome stability, though many aspects of their biogenesis and function remain to be elucidated. SCRAM (Small Complementary RNA Mapper) is a novel, simple-to-use short read aligner and visualization suite that enhances exploration of small RNA datasets. The SCRAM pipeline is implemented in Go and Python, and is freely available under MIT license. Source code, multiplatform binaries and a Docker image can be accessed via https://sfletc.github.io/scram/. s.fletcher@uq.edu.au. Supplementary data are available at Bioinformatics online.
CytoSPADE: high-performance analysis and visualization of high-dimensional cytometry data

PubMed Central

Linderman, Michael D.; Simonds, Erin F.; Qiu, Peng; Bruggner, Robert V.; Sheode, Ketaki; Meng, Teresa H.; Plevritis, Sylvia K.; Nolan, Garry P.

2012-01-01

Motivation: Recent advances in flow cytometry enable simultaneous single-cell measurement of 30+ surface and intracellular proteins. CytoSPADE is a high-performance implementation of an interface for the Spanning-tree Progression Analysis of Density-normalized Events algorithm for tree-based analysis and visualization of this high-dimensional cytometry data. Availability: Source code and binaries are freely available at http://cytospade.org and via Bioconductor version 2.10 onwards for Linux, OSX and Windows. CytoSPADE is implemented in R, C++ and Java. Contact: michael.linderman@mssm.edu Supplementary Information: Additional documentation available at http://cytospade.org. PMID:22782546
LIGSIFT: an open-source tool for ligand structural alignment and virtual screening.

PubMed

Roy, Ambrish; Skolnick, Jeffrey

2015-02-15

Shape-based alignment of small molecules is a widely used approach in computer-aided drug discovery. Most shape-based ligand structure alignment applications, both commercial and freely available ones, use the Tanimoto coefficient or similar functions for evaluating molecular similarity. Major drawbacks of using such functions are the size dependence of the score and the fact that the statistical significance of the molecular match using such metrics is not reported. We describe a new open-source ligand structure alignment and virtual screening (VS) algorithm, LIGSIFT, that uses Gaussian molecular shape overlay for fast small molecule alignment and a size-independent scoring function for efficient VS based on the statistical significance of the score. LIGSIFT was tested against the compounds for 40 protein targets available in the Directory of Useful Decoys and the performance was evaluated using the area under the ROC curve (AUC), the Enrichment Factor (EF) and Hit Rate (HR). LIGSIFT-based VS shows an average AUC of 0.79, average EF values of 20.8 and a HR of 59% in the top 1% of the screened library. LIGSIFT software, including the source code, is freely available to academic users at http://cssb.biology.gatech.edu/LIGSIFT. Supplementary data are available at Bioinformatics online. skolnick@gatech.edu. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Unipro UGENE: a unified bioinformatics toolkit.

PubMed

Okonechnikov, Konstantin; Golosova, Olga; Fursov, Mikhail

2012-04-15

Unipro UGENE is a multiplatform open-source software with the main goal of assisting molecular biologists without much expertise in bioinformatics to manage, analyze and visualize their data. UGENE integrates widely used bioinformatics tools within a common user interface. The toolkit supports multiple biological data formats and allows the retrieval of data from remote data sources. It provides visualization modules for biological objects such as annotated genome sequences, Next Generation Sequencing (NGS) assembly data, multiple sequence alignments, phylogenetic trees and 3D structures. Most of the integrated algorithms are tuned for maximum performance by the usage of multithreading and special processor instructions. UGENE includes a visual environment for creating reusable workflows that can be launched on local resources or in a High Performance Computing (HPC) environment. UGENE is written in C++ using the Qt framework. The built-in plugin system and structured UGENE API make it possible to extend the toolkit with new functionality. UGENE binaries are freely available for MS Windows, Linux and Mac OS X at http://ugene.unipro.ru/download.html. UGENE code is licensed under the GPLv2; the information about the code licensing and copyright of integrated tools can be found in the LICENSE.3rd_party file provided with the source bundle.
pyOpenMS: a Python-based interface to the OpenMS mass-spectrometry algorithm library.

PubMed

Röst, Hannes L; Schmitt, Uwe; Aebersold, Ruedi; Malmström, Lars

2014-01-01

pyOpenMS is an open-source, Python-based interface to the C++ OpenMS library, providing facile access to a feature-rich, open-source algorithm library for MS-based proteomics analysis. It contains Python bindings that allow raw access to the data structures and algorithms implemented in OpenMS, specifically those for file access (mzXML, mzML, TraML, mzIdentML among others), basic signal processing (smoothing, filtering, de-isotoping, and peak-picking) and complex data analysis (including label-free, SILAC, iTRAQ, and SWATH analysis tools). pyOpenMS thus allows fast prototyping and efficient workflow development in a fully interactive manner (using the interactive Python interpreter) and is also ideally suited for researchers not proficient in C++. In addition, our code to wrap a complex C++ library is completely open-source, allowing other projects to create similar bindings with ease. The pyOpenMS framework is freely available at https://pypi.python.org/pypi/pyopenms while the autowrap tool to create Cython code automatically is available at https://pypi.python.org/pypi/autowrap (both released under the 3-clause BSD licence). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
SU-E-T-103: Development and Implementation of Web Based Quality Control Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Studinski, R; Taylor, R; Angers, C

Purpose: Historically many radiation medicine programs have maintained their Quality Control (QC) test results in paper records or Microsoft Excel worksheets. Both these approaches represent significant logistical challenges, and are not predisposed to data review and approval. It has been our group's aim to develop and implement web based software designed not just to record and store QC data in a centralized database, but to provide scheduling and data review tools to help manage a radiation therapy clinics Equipment Quality control program. Methods: The software was written in the Python programming language using the Django web framework. In order tomore » promote collaboration and validation from other centres the code was made open source and is freely available to the public via an online source code repository. The code was written to provide a common user interface for data entry, formalize the review and approval process, and offer automated data trending and process control analysis of test results. Results: As of February 2014, our installation of QAtrack+ has 180 tests defined in its database and has collected ∼22 000 test results, all of which have been reviewed and approved by a physicist via QATrack+'s review tools. These results include records for quality control of Elekta accelerators, CT simulators, our brachytherapy programme, TomoTherapy and Cyberknife units. Currently at least 5 other centres are known to be running QAtrack+ clinically, forming the start of an international user community. Conclusion: QAtrack+ has proven to be an effective tool for collecting radiation therapy QC data, allowing for rapid review and trending of data for a wide variety of treatment units. As free and open source software, all source code, documentation and a bug tracker are available to the public at https://bitbucket.org/tohccmedphys/qatrackplus/.« less
EPA Remote Sensing Information Gateway

NASA Astrophysics Data System (ADS)

Paulsen, H. K.; Szykman, J. J.; Plessel, T.; Freeman, M.; Dimmick, F.

2009-12-01

The Remote Sensing Information Gateway was developed by the U.S. Environmental Protection Agency (EPA) to assist researchers in easily obtaining and combining a variety of environmental datasets related to air quality research. Current datasets available include, but are not limited to surface PM2.5 and O3 data, satellite derived aerosol optical depth , and 3-dimensional output from U.S. EPA's Models 3/Community Multi-scale Air Quality (CMAQ) modeling system. The presentation will include a demonstration that illustrates several scenarios of how researchers use the tool to help them visualize and obtain data for their work; with a particular focus on episode analysis related to biomass burning impacts on air quality. The presentation will provide an overview on how RSIG works and how the code has been—and can be—adapted for other projects. One example is the Virtual Estuary, which focuses on automating the retrieval and pre-processing of a variety of data needed for estuarine research. RSIG’s source codes are freely available to researchers with permission from the EPA principal investigator, Dr. Jim Szykman. RSIG is available to the community and can be accessed online at http://www.epa.gov/rsig. Once the JAVA policy file is configured on your computer you can run the RSIG applet on your computer and connect to the RSIG server to visualize and retrieve available data sets. The applet allows the user to specify the temporal/spatial areas of interest, and the types of data to retrieve. The applet then communicates with RSIG subsetter codes located on the data owners’ remote servers; the subsetter codes assemble and transfer via ordinary Internet protocols only the specified data to the researcher’s computer. This is much faster than the usual method of transferring large files via FTP and greatly reduces network traffic. The RSIG applet then visualizes the transferred data on a latitude-longitude map, automatically locating the data in the correct geographic position. Images, animations, and aggregated data can be saved or exported in a variety of data formats: Binary External Data Representation (XDR) format (simple, efficient), NetCDF-COARDS format, NetCDF-IOAPI format (regridding the data to a CMAQ grid), HDF (unsubsetted satellite files), ASCII tab-delimited spreadsheet, MCMC (used for input into HB model), PNG images, MPG movies, KMZ movies (for display in Google Earth and similar applications), GeoTIFF RGB format and 32-bit float format. RSIG’s source codes are freely available to researchers with permission from the EPA. Contacts for obtaining RSIG code are available at the RSIG website.
LakeMetabolizer: An R package for estimating lake metabolism from free-water oxygen using diverse statistical models

USGS Publications Warehouse

Winslow, Luke; Zwart, Jacob A.; Batt, Ryan D.; Dugan, Hilary; Woolway, R. Iestyn; Corman, Jessica; Hanson, Paul C.; Read, Jordan S.

2016-01-01

Metabolism is a fundamental process in ecosystems that crosses multiple scales of organization from individual organisms to whole ecosystems. To improve sharing and reuse of published metabolism models, we developed LakeMetabolizer, an R package for estimating lake metabolism from in situ time series of dissolved oxygen, water temperature, and, optionally, additional environmental variables. LakeMetabolizer implements 5 different metabolism models with diverse statistical underpinnings: bookkeeping, ordinary least squares, maximum likelihood, Kalman filter, and Bayesian. Each of these 5 metabolism models can be combined with 1 of 7 models for computing the coefficient of gas exchange across the air–water interface (k). LakeMetabolizer also features a variety of supporting functions that compute conversions and implement calculations commonly applied to raw data prior to estimating metabolism (e.g., oxygen saturation and optical conversion models). These tools have been organized into an R package that contains example data, example use-cases, and function documentation. The release package version is available on the Comprehensive R Archive Network (CRAN), and the full open-source GPL-licensed code is freely available for examination and extension online. With this unified, open-source, and freely available package, we hope to improve access and facilitate the application of metabolism in studies and management of lentic ecosystems.
FRAGS: estimation of coding sequence substitution rates from fragmentary data

PubMed Central

Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

2004-01-01

Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802
YNOGK: A New Public Code for Calculating Null Geodesics in the Kerr Spacetime

NASA Astrophysics Data System (ADS)

Yang, Xiaolin; Wang, Jiancheng

2013-07-01

Following the work of Dexter & Agol, we present a new public code for the fast calculation of null geodesics in the Kerr spacetime. Using Weierstrass's and Jacobi's elliptic functions, we express all coordinates and affine parameters as analytical and numerical functions of a parameter p, which is an integral value along the geodesic. This is the main difference between our code and previous similar ones. The advantage of this treatment is that the information about the turning points does not need to be specified in advance by the user, and many applications such as imaging, the calculation of line profiles, and the observer-emitter problem, become root-finding problems. All elliptic integrations are computed by Carlson's elliptic integral method as in Dexter & Agol, which guarantees the fast computational speed of our code. The formulae to compute the constants of motion given by Cunningham & Bardeen have been extended, which allow one to readily handle situations in which the emitter or the observer has an arbitrary distance from, and motion state with respect to, the central compact object. The validation of the code has been extensively tested through applications to toy problems from the literature. The source FORTRAN code is freely available for download on our Web site http://www1.ynao.ac.cn/~yangxl/yxl.html.
The ZPIC educational code suite

NASA Astrophysics Data System (ADS)

Calado, R.; Pardal, M.; Ninhos, P.; Helm, A.; Mori, W. B.; Decyk, V. K.; Vieira, J.; Silva, L. O.; Fonseca, R. A.

2017-10-01

Particle-in-Cell (PIC) codes are used in almost all areas of plasma physics, such as fusion energy research, plasma accelerators, space physics, ion propulsion, and plasma processing, and many other areas. In this work, we present the ZPIC educational code suite, a new initiative to foster training in plasma physics using computer simulations. Leveraging on our expertise and experience from the development and use of the OSIRIS PIC code, we have developed a suite of 1D/2D fully relativistic electromagnetic PIC codes, as well as 1D electrostatic. These codes are self-contained and require only a standard laptop/desktop computer with a C compiler to be run. The output files are written in a new file format called ZDF that can be easily read using the supplied routines in a number of languages, such as Python, and IDL. The code suite also includes a number of example problems that can be used to illustrate several textbook and advanced plasma mechanisms, including instructions for parameter space exploration. We also invite contributions to this repository of test problems that will be made freely available to the community provided the input files comply with the format defined by the ZPIC team. The code suite is freely available and hosted on GitHub at https://github.com/zambzamb/zpic. Work partially supported by PICKSC.
A new free and open source tool for space plasma modeling.

NASA Astrophysics Data System (ADS)

Honkonen, I. J.

2014-12-01

I will present a new distributed memory parallel, free and open source computational model for studying space plasma. The model is written in C++ with emphasis on good software development practices and code readability without sacrificing serial or parallel performance. As such the model could be especially useful for education, for learning both (magneto)hydrodynamics (MHD) and computational model development. By using latest features of the C++ standard (2011) it has been possible to develop a very modular program which improves not only the readability of code but also the testability of the model and decreases the effort required to make changes to various parts of the program. Major parts of the model, functionality not directly related to (M)HD, have been outsourced to other freely available libraries which has reduced the development time of the model significantly. I will present an overview of the code architecture as well as details of different parts of the model and will show examples of using the model including preparing input files and plotting results. A multitude of 1-, 2- and 3-dimensional test cases are included in the software distribution and the results of, for example, Kelvin-Helmholtz, bow shock, blast wave and reconnection tests, will be presented.
AMPA: an automated web server for prediction of protein antimicrobial regions.

PubMed

Torrent, Marc; Di Tommaso, Paolo; Pulido, David; Nogués, M Victòria; Notredame, Cedric; Boix, Ester; Andreu, David

2012-01-01

AMPA is a web application for assessing the antimicrobial domains of proteins, with a focus on the design on new antimicrobial drugs. The application provides fast discovery of antimicrobial patterns in proteins that can be used to develop new peptide-based drugs against pathogens. Results are shown in a user-friendly graphical interface and can be downloaded as raw data for later examination. AMPA is freely available on the web at http://tcoffee.crg.cat/apps/ampa. The source code is also available in the web. marc.torrent@upf.edu; david.andreu@upf.edu Supplementary data are available at Bioinformatics online.
The Ensembl REST API: Ensembl Data for Any Language.

PubMed

Yates, Andrew; Beal, Kathryn; Keenan, Stephen; McLaren, William; Pignatelli, Miguel; Ritchie, Graham R S; Ruffier, Magali; Taylor, Kieron; Vullo, Alessandro; Flicek, Paul

2015-01-01

We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest. © The Author 2014. Published by Oxford University Press.
Thermo-msf-parser: an open source Java library to parse and visualize Thermo Proteome Discoverer msf files.

PubMed

Colaert, Niklaas; Barsnes, Harald; Vaudel, Marc; Helsens, Kenny; Timmerman, Evy; Sickmann, Albert; Gevaert, Kris; Martens, Lennart

2011-08-05

The Thermo Proteome Discoverer program integrates both peptide identification and quantification into a single workflow for peptide-centric proteomics. Furthermore, its close integration with Thermo mass spectrometers has made it increasingly popular in the field. Here, we present a Java library to parse the msf files that constitute the output of Proteome Discoverer. The parser is also implemented as a graphical user interface allowing convenient access to the information found in the msf files, and in Rover, a program to analyze and validate quantitative proteomics information. All code, binaries, and documentation is freely available at http://thermo-msf-parser.googlecode.com.
A simple system for detection of EEG artifacts in polysomnographic recordings.

PubMed

Durka, P J; Klekowicz, H; Blinowska, K J; Szelenberger, W; Niemcewicz, Sz

2003-04-01

We present an efficient parametric system for automatic detection of electroencephalogram (EEG) artifacts in polysomnographic recordings. For each of the selected types of artifacts, a relevant parameter was calculated for a given epoch. If any of these parameters exceeded a threshold, the epoch was marked as an artifact. Performance of the system, evaluated on 18 overnight polysomnographic recordings, revealed concordance with decisions of human experts close to the interexpert agreement and the repeatability of expert's decisions, assessed via a double-blind test. Complete software (Matlab source code) for the presented system is freely available from the Internet at http://brain.fuw.edu.pl/artifacts.
clusterProfiler: an R package for comparing biological themes among gene clusters.

PubMed

Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

2012-05-01

Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.

A convolutional neural network approach to calibrating the rotation axis for X-ray computed tomography

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Xiaogang; De Carlo, Francesco; Phatak, Charudatta

This paper presents an algorithm to calibrate the center-of-rotation for X-ray tomography by using a machine learning approach, the Convolutional Neural Network (CNN). The algorithm shows excellent accuracy from the evaluation of synthetic data with various noise ratios. It is further validated with experimental data of four different shale samples measured at the Advanced Photon Source and at the Swiss Light Source. The results are as good as those determined by visual inspection and show better robustness than conventional methods. CNN has also great potential forreducing or removingother artifacts caused by instrument instability, detector non-linearity,etc. An open-source toolbox, which integratesmore » the CNN methods described in this paper, is freely available through GitHub at tomography/xlearn and can be easily integrated into existing computational pipelines available at various synchrotron facilities. Source code, documentation and information on how to contribute are also provided.« less
BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

PubMed

Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins

2018-06-05

With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
BioNetFit: a fitting tool compatible with BioNetGen, NFsim and distributed computing environments

PubMed Central

Thomas, Brandon R.; Chylek, Lily A.; Colvin, Joshua; Sirimulla, Suman; Clayton, Andrew H.A.; Hlavacek, William S.; Posner, Richard G.

2016-01-01

Summary: Rule-based models are analyzed with specialized simulators, such as those provided by the BioNetGen and NFsim open-source software packages. Here, we present BioNetFit, a general-purpose fitting tool that is compatible with BioNetGen and NFsim. BioNetFit is designed to take advantage of distributed computing resources. This feature facilitates fitting (i.e. optimization of parameter values for consistency with data) when simulations are computationally expensive. Availability and implementation: BioNetFit can be used on stand-alone Mac, Windows/Cygwin, and Linux platforms and on Linux-based clusters running SLURM, Torque/PBS, or SGE. The BioNetFit source code (Perl) is freely available (http://bionetfit.nau.edu). Supplementary information: Supplementary data are available at Bioinformatics online. Contact: bionetgen.help@gmail.com PMID:26556387
CHiCP: a web-based tool for the integrative and interactive visualization of promoter capture Hi-C datasets.

PubMed

Schofield, E C; Carver, T; Achuthan, P; Freire-Pritchett, P; Spivakov, M; Todd, J A; Burren, O S

2016-08-15

Promoter capture Hi-C (PCHi-C) allows the genome-wide interrogation of physical interactions between distal DNA regulatory elements and gene promoters in multiple tissue contexts. Visual integration of the resultant chromosome interaction maps with other sources of genomic annotations can provide insight into underlying regulatory mechanisms. We have developed Capture HiC Plotter (CHiCP), a web-based tool that allows interactive exploration of PCHi-C interaction maps and integration with both public and user-defined genomic datasets. CHiCP is freely accessible from www.chicp.org and supports most major HTML5 compliant web browsers. Full source code and installation instructions are available from http://github.com/D-I-L/django-chicp ob219@cam.ac.uk. © The Author 2016. Published by Oxford University Press. All rights reserved.
The IntAct molecular interaction database in 2012

PubMed Central

Kerrien, Samuel; Aranda, Bruno; Breuza, Lionel; Bridge, Alan; Broackes-Carter, Fiona; Chen, Carol; Duesbury, Margaret; Dumousseau, Marine; Feuermann, Marc; Hinz, Ursula; Jandrasits, Christine; Jimenez, Rafael C.; Khadake, Jyoti; Mahadevan, Usha; Masson, Patrick; Pedruzzi, Ivo; Pfeiffenberger, Eric; Porras, Pablo; Raghunath, Arathi; Roechert, Bernd; Orchard, Sandra; Hermjakob, Henning

2012-01-01

IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275 000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intact. PMID:22121220
Development of an IHE MRRT-compliant open-source web-based reporting platform.

PubMed

Pinto Dos Santos, Daniel; Klos, G; Kloeckner, R; Oberle, R; Dueber, C; Mildenberger, P

2017-01-01

To develop a platform that uses structured reporting templates according to the IHE Management of Radiology Report Templates (MRRT) profile, and to implement this platform into clinical routine. The reporting platform uses standard web technologies (HTML / JavaScript and PHP / MySQL) only. Several freely available external libraries were used to simplify the programming. The platform runs on a standard web server, connects with the radiology information system (RIS) and PACS, and is easily accessible via a standard web browser. A prototype platform that allows structured reporting to be easily incorporated into the clinical routine was developed and successfully tested. To date, 797 reports were generated using IHE MRRT-compliant templates (many of them downloaded from the RSNA's radreport.org website). Reports are stored in a MySQL database and are easily accessible for further analyses. Development of an IHE MRRT-compliant platform for structured reporting is feasible using only standard web technologies. All source code will be made available upon request under a free license, and the participation of other institutions in further development is welcome. • A platform for structured reporting using IHE MRRT-compliant templates is presented. • Incorporating structured reporting into clinical routine is feasible. • Full source code will be provided upon request under a free license.
"Just Google It"--The Scope of Freely Available Information Sources for Doctoral Thesis Writing

ERIC Educational Resources Information Center

Grigas, Vincas; Juzeniene, Simona; Velickaite, Jone

2017-01-01

Introduction: Recent developments in the field of scientific information resource provision lead us to the key research question, namely,what is the coverage of freely available information sources when writing doctoral theses, and whether the academic library can assume the leading role as a direct intermediator for information users. Method:…
A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature.

PubMed

Hart, Reece K; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A

2015-01-15

Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

PubMed Central

Hart, Reece K.; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A.

2015-01-01

Summary: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Contact: reecehart@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25273102
DATA-MEAns: an open source tool for the classification and management of neural ensemble recordings.

PubMed

Bonomini, María P; Ferrandez, José M; Bolea, Jose Angel; Fernandez, Eduardo

2005-10-30

The number of laboratories using techniques that allow to acquire simultaneous recordings of as many units as possible is considerably increasing. However, the development of tools used to analyse this multi-neuronal activity is generally lagging behind the development of the tools used to acquire these data. Moreover, the data exchange between research groups using different multielectrode acquisition systems is hindered by commercial constraints such as exclusive file structures, high priced licenses and hard policies on intellectual rights. This paper presents a free open-source software for the classification and management of neural ensemble data. The main goal is to provide a graphical user interface that links the experimental data to a basic set of routines for analysis, visualization and classification in a consistent framework. To facilitate the adaptation and extension as well as the addition of new routines, tools and algorithms for data analysis, the source code and documentation are freely available.
[MapDraw: a microsoft excel macro for drawing genetic linkage maps based on given genetic linkage data].

PubMed

Liu, Ren-Hu; Meng, Jin-Ling

2003-05-01

MAPMAKER is one of the most widely used computer software package for constructing genetic linkage maps.However, the PC version, MAPMAKER 3.0 for PC, could not draw the genetic linkage maps that its Macintosh version, MAPMAKER 3.0 for Macintosh,was able to do. Especially in recent years, Macintosh computer is much less popular than PC. Most of the geneticists use PC to analyze their genetic linkage data. So a new computer software to draw the same genetic linkage maps on PC as the MAPMAKER for Macintosh to do on Macintosh has been crying for. Microsoft Excel,one component of Microsoft Office package, is one of the most popular software in laboratory data processing. Microsoft Visual Basic for Applications (VBA) is one of the most powerful functions of Microsoft Excel. Using this program language, we can take creative control of Excel, including genetic linkage map construction, automatic data processing and more. In this paper, a Microsoft Excel macro called MapDraw is constructed to draw genetic linkage maps on PC computer based on given genetic linkage data. Use this software,you can freely construct beautiful genetic linkage map in Excel and freely edit and copy it to Word or other application. This software is just an Excel format file. You can freely copy it from ftp://211.69.140.177 or ftp://brassica.hzau.edu.cn and the source code can be found in Excel's Visual Basic Editor.
CRITICA: coding region identification tool invoking comparative analysis

NASA Technical Reports Server (NTRS)

Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

1999-01-01

Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer.

PubMed

Wong, Wing Chung; Kim, Dewey; Carter, Hannah; Diekhans, Mark; Ryan, Michael C; Karchin, Rachel

2011-08-01

Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline. MySQL database, source code and binaries freely available for academic/government use at http://wiki.chasmsoftware.org, Source in Python and C++. Requires 32 or 64-bit Linux system (tested on Fedora Core 8,10,11 and Ubuntu 10), 2.5*≤ Python <3.0*, MySQL server >5.0, 60 GB available hard disk space (50 MB for software and data files, 40 GB for MySQL database dump when uncompressed), 2 GB of RAM.
BioNetFit: a fitting tool compatible with BioNetGen, NFsim and distributed computing environments.

PubMed

Thomas, Brandon R; Chylek, Lily A; Colvin, Joshua; Sirimulla, Suman; Clayton, Andrew H A; Hlavacek, William S; Posner, Richard G

2016-03-01

Rule-based models are analyzed with specialized simulators, such as those provided by the BioNetGen and NFsim open-source software packages. Here, we present BioNetFit, a general-purpose fitting tool that is compatible with BioNetGen and NFsim. BioNetFit is designed to take advantage of distributed computing resources. This feature facilitates fitting (i.e. optimization of parameter values for consistency with data) when simulations are computationally expensive. BioNetFit can be used on stand-alone Mac, Windows/Cygwin, and Linux platforms and on Linux-based clusters running SLURM, Torque/PBS, or SGE. The BioNetFit source code (Perl) is freely available (http://bionetfit.nau.edu). Supplementary data are available at Bioinformatics online. bionetgen.help@gmail.com. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
eQuilibrator--the biochemical thermodynamics calculator.

PubMed

Flamholz, Avi; Noor, Elad; Bar-Even, Arren; Milo, Ron

2012-01-01

The laws of thermodynamics constrain the action of biochemical systems. However, thermodynamic data on biochemical compounds can be difficult to find and is cumbersome to perform calculations with manually. Even simple thermodynamic questions like 'how much Gibbs energy is released by ATP hydrolysis at pH 5?' are complicated excessively by the search for accurate data. To address this problem, eQuilibrator couples a comprehensive and accurate database of thermodynamic properties of biochemical compounds and reactions with a simple and powerful online search and calculation interface. The web interface to eQuilibrator (http://equilibrator.weizmann.ac.il) enables easy calculation of Gibbs energies of compounds and reactions given arbitrary pH, ionic strength and metabolite concentrations. The eQuilibrator code is open-source and all thermodynamic source data are freely downloadable in standard formats. Here we describe the database characteristics and implementation and demonstrate its use.
HitWalker2: visual analytics for precision medicine and beyond.

PubMed

Bottomly, Daniel; McWeeney, Shannon K; Wilmot, Beth

2016-04-15

The lack of visualization frameworks to guide interpretation and facilitate discovery is a potential bottleneck for precision medicine, systems genetics and other studies. To address this we have developed an interactive, reproducible, web-based prioritization approach that builds on our earlier work. HitWalker2 is highly flexible and can utilize many data types and prioritization methods based upon available data and desired questions, allowing it to be utilized in a diverse range of studies such as cancer, infectious disease and psychiatric disorders. Source code is freely available at https://github.com/biodev/HitWalker2 and implemented using Python/Django, Neo4j and Javascript (D3.js and jQuery). We support major open source browsers (e.g. Firefox and Chromium/Chrome). wilmotb@ohsu.edu Supplementary data are available at Bioinformatics online. Additional information/instructions are available at https://github.com/biodev/HitWalker2/wiki. © The Author 2015. Published by Oxford University Press.
eQuilibrator—the biochemical thermodynamics calculator

PubMed Central

Flamholz, Avi; Noor, Elad; Bar-Even, Arren; Milo, Ron

2012-01-01

The laws of thermodynamics constrain the action of biochemical systems. However, thermodynamic data on biochemical compounds can be difficult to find and is cumbersome to perform calculations with manually. Even simple thermodynamic questions like ‘how much Gibbs energy is released by ATP hydrolysis at pH 5?’ are complicated excessively by the search for accurate data. To address this problem, eQuilibrator couples a comprehensive and accurate database of thermodynamic properties of biochemical compounds and reactions with a simple and powerful online search and calculation interface. The web interface to eQuilibrator (http://equilibrator.weizmann.ac.il) enables easy calculation of Gibbs energies of compounds and reactions given arbitrary pH, ionic strength and metabolite concentrations. The eQuilibrator code is open-source and all thermodynamic source data are freely downloadable in standard formats. Here we describe the database characteristics and implementation and demonstrate its use. PMID:22064852
Microsoft Biology Initiative: .NET Bioinformatics Platform and Tools

PubMed Central

Diaz Acosta, B.

2011-01-01

The Microsoft Biology Initiative (MBI) is an effort in Microsoft Research to bring new technology and tools to the area of bioinformatics and biology. This initiative is comprised of two primary components, the Microsoft Biology Foundation (MBF) and the Microsoft Biology Tools (MBT). MBF is a language-neutral bioinformatics toolkit built as an extension to the Microsoft .NET Framework—initially aimed at the area of Genomics research. Currently, it implements a range of parsers for common bioinformatics file formats; a range of algorithms for manipulating DNA, RNA, and protein sequences; and a set of connectors to biological web services such as NCBI BLAST. MBF is available under an open source license, and executables, source code, demo applications, documentation and training materials are freely downloadable from http://research.microsoft.com/bio. MBT is a collection of tools that enable biology and bioinformatics researchers to be more productive in making scientific discoveries.
MetaQuant: a tool for the automatic quantification of GC/MS-based metabolome data.

PubMed

Bunk, Boyke; Kucklick, Martin; Jonas, Rochus; Münch, Richard; Schobert, Max; Jahn, Dieter; Hiller, Karsten

2006-12-01

MetaQuant is a Java-based program for the automatic and accurate quantification of GC/MS-based metabolome data. In contrast to other programs MetaQuant is able to quantify hundreds of substances simultaneously with minimal manual intervention. The integration of a self-acting calibration function allows the parallel and fast calibration for several metabolites simultaneously. Finally, MetaQuant is able to import GC/MS data in the common NetCDF format and to export the results of the quantification into Systems Biology Markup Language (SBML), Comma Separated Values (CSV) or Microsoft Excel (XLS) format. MetaQuant is written in Java and is available under an open source license. Precompiled packages for the installation on Windows or Linux operating systems are freely available for download. The source code as well as the installation packages are available at http://bioinformatics.org/metaquant
Software support for SBGN maps: SBGN-ML and LibSBGN.

PubMed

van Iersel, Martijn P; Villéger, Alice C; Czauderna, Tobias; Boyd, Sarah E; Bergmann, Frank T; Luna, Augustin; Demir, Emek; Sorokin, Anatoly; Dogrusoz, Ugur; Matsuoka, Yukiko; Funahashi, Akira; Aladjem, Mirit I; Mi, Huaiyu; Moodie, Stuart L; Kitano, Hiroaki; Le Novère, Nicolas; Schreiber, Falk

2012-08-01

LibSBGN is a software library for reading, writing and manipulating Systems Biology Graphical Notation (SBGN) maps stored using the recently developed SBGN-ML file format. The library (available in C++ and Java) makes it easy for developers to add SBGN support to their tools, whereas the file format facilitates the exchange of maps between compatible software applications. The library also supports validation of maps, which simplifies the task of ensuring compliance with the detailed SBGN specifications. With this effort we hope to increase the adoption of SBGN in bioinformatics tools, ultimately enabling more researchers to visualize biological knowledge in a precise and unambiguous manner. Milestone 2 was released in December 2011. Source code, example files and binaries are freely available under the terms of either the LGPL v2.1+ or Apache v2.0 open source licenses from http://libsbgn.sourceforge.net. sbgn-libsbgn@lists.sourceforge.net.

LSST communications middleware implementation

NASA Astrophysics Data System (ADS)

Mills, Dave; Schumacher, German; Lotz, Paul

2016-07-01

The LSST communications middleware is based on a set of software abstractions; which provide standard interfaces for common communications services. The observatory requires communication between diverse subsystems, implemented by different contractors, and comprehensive archiving of subsystem status data. The Service Abstraction Layer (SAL) is implemented using open source packages that implement open standards of DDS (Data Distribution Service1) for data communication, and SQL (Standard Query Language) for database access. For every subsystem, abstractions for each of the Telemetry datastreams, along with Command/Response and Events, have been agreed with the appropriate component vendor (such as Dome, TMA, Hexapod), and captured in ICD's (Interface Control Documents).The OpenSplice (Prismtech) Community Edition of DDS provides an LGPL licensed distribution which may be freely redistributed. The availability of the full source code provides assurances that the project will be able to maintain it over the full 10 year survey, independent of the fortunes of the original providers.
CellTracker (not only) for dummies.

PubMed

Piccinini, Filippo; Kiss, Alexa; Horvath, Peter

2016-03-15

Time-lapse experiments play a key role in studying the dynamic behavior of cells. Single-cell tracking is one of the fundamental tools for such analyses. The vast majority of the recently introduced cell tracking methods are limited to fluorescently labeled cells. An equally important limitation is that most software cannot be effectively used by biologists without reasonable expertise in image processing. Here we present CellTracker, a user-friendly open-source software tool for tracking cells imaged with various imaging modalities, including fluorescent, phase contrast and differential interference contrast (DIC) techniques. CellTracker is written in MATLAB (The MathWorks, Inc., USA). It works with Windows, Macintosh and UNIX-based systems. Source code and graphical user interface (GUI) are freely available at: http://celltracker.website/ horvath.peter@brc.mta.hu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Cloudy - simulating the non-equilibrium microphysics of gas and dust, and its observed spectrum

NASA Astrophysics Data System (ADS)

Ferland, Gary J.

2014-01-01

Cloudy is an open-source plasma/spectral simulation code, last described in the open-access journal Revista Mexicana (Ferland et al. 2013, 2013RMxAA..49..137F). The project goal is a complete simulation of the microphysics of gas and dust over the full range of density, temperature, and ionization that we encounter in astrophysics, together with a prediction of the observed spectrum. Cloudy is one of the more widely used theory codes in astrophysics with roughly 200 papers citing its documentation each year. It is developed by graduate students, postdocs, and an international network of collaborators. Cloudy is freely available on the web at trac.nublado.org, the user community can post questions on http://groups.yahoo.com/neo/groups/cloudy_simulations/info, and summer schools are organized to learn more about Cloudy and its use (http://cloud9.pa.uky.edu gary/cloudy/CloudySummerSchool/). The code’s widespread use is possible because of extensive automatic testing. It is exercised over its full range of applicability whenever the source is changed. Changes in predicted quantities are automatically detected along with any newly introduced problems. The code is designed to be autonomous and self-aware. It generates a report at the end of a calculation that summarizes any problems encountered along with suggestions of potentially incorrect boundary conditions. This self-monitoring is a core feature since the code is now often used to generate large MPI grids of simulations, making it impossible for a user to verify each calculation by hand. I will describe some challenges in developing a large physics code, with its many interconnected physical processes, many at the frontier of research in atomic or molecular physics, all in an open environment.
Web3DMol: interactive protein structure visualization based on WebGL.

PubMed

Shi, Maoxiang; Gao, Juntao; Zhang, Michael Q

2017-07-03

A growing number of web-based databases and tools for protein research are being developed. There is now a widespread need for visualization tools to present the three-dimensional (3D) structure of proteins in web browsers. Here, we introduce our 3D modeling program-Web3DMol-a web application focusing on protein structure visualization in modern web browsers. Users submit a PDB identification code or select a PDB archive from their local disk, and Web3DMol will display and allow interactive manipulation of the 3D structure. Featured functions, such as sequence plot, fragment segmentation, measure tool and meta-information display, are offered for users to gain a better understanding of protein structure. Easy-to-use APIs are available for developers to reuse and extend Web3DMol. Web3DMol can be freely accessed at http://web3dmol.duapp.com/, and the source code is distributed under the MIT license. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit

PubMed Central

Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R.; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer

2012-01-01

MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/. PMID:23082188
MOCAT: a metagenomics assembly and gene prediction toolkit.

PubMed

Kultima, Jens Roat; Sunagawa, Shinichi; Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer

2012-01-01

MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
Software Tool for Researching Annotations of Proteins (STRAP): Open-Source Protein Annotation Software with Data Visualization

PubMed Central

Bhatia, Vivek N.; Perlman, David H.; Costello, Catherine E.; McComb, Mark E.

2009-01-01

In order that biological meaning may be derived and testable hypotheses may be built from proteomics experiments, assignments of proteins identified by mass spectrometry or other techniques must be supplemented with additional notation, such as information on known protein functions, protein-protein interactions, or biological pathway associations. Collecting, organizing, and interpreting this data often requires the input of experts in the biological field of study, in addition to the time-consuming search for and compilation of information from online protein databases. Furthermore, visualizing this bulk of information can be challenging due to the limited availability of easy-to-use and freely available tools for this process. In response to these constraints, we have undertaken the design of software to automate annotation and visualization of proteomics data in order to accelerate the pace of research. Here we present the Software Tool for Researching Annotations of Proteins (STRAP) – a user-friendly, open-source C# application. STRAP automatically obtains gene ontology (GO) terms associated with proteins in a proteomics results ID list using the freely accessible UniProtKB and EBI GOA databases. Summarized in an easy-to-navigate tabular format, STRAP includes meta-information on the protein in addition to complimentary GO terminology. Additionally, this information can be edited by the user so that in-house expertise on particular proteins may be integrated into the larger dataset. STRAP provides a sortable tabular view for all terms, as well as graphical representations of GO-term association data in pie (biological process, cellular component and molecular function) and bar charts (cross comparison of sample sets) to aid in the interpretation of large datasets and differential analyses experiments. Furthermore, proteins of interest may be exported as a unique FASTA-formatted file to allow for customizable re-searching of mass spectrometry data, and gene names corresponding to the proteins in the lists may be encoded in the Gaggle microformat for further characterization, including pathway analysis. STRAP, a tutorial, and the C# source code are freely available from http://cpctools.sourceforge.net. PMID:19839595
STELLAR: fast and exact local alignments

PubMed Central

2011-01-01

Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de. PMID:22151882
PLIP: fully automated protein-ligand interaction profiler.

PubMed

Salentin, Sebastian; Schreiber, Sven; Haupt, V Joachim; Adasme, Melissa F; Schroeder, Michael

2015-07-01

The characterization of interactions in protein-ligand complexes is essential for research in structural bioinformatics, drug discovery and biology. However, comprehensive tools are not freely available to the research community. Here, we present the protein-ligand interaction profiler (PLIP), a novel web service for fully automated detection and visualization of relevant non-covalent protein-ligand contacts in 3D structures, freely available at projects.biotec.tu-dresden.de/plip-web. The input is either a Protein Data Bank structure, a protein or ligand name, or a custom protein-ligand complex (e.g. from docking). In contrast to other tools, the rule-based PLIP algorithm does not require any structure preparation. It returns a list of detected interactions on single atom level, covering seven interaction types (hydrogen bonds, hydrophobic contacts, pi-stacking, pi-cation interactions, salt bridges, water bridges and halogen bonds). PLIP stands out by offering publication-ready images, PyMOL session files to generate custom images and parsable result files to facilitate successive data processing. The full python source code is available for download on the website. PLIP's command-line mode allows for high-throughput interaction profiling. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
cit: hypothesis testing software for mediation analysis in genomic applications.

PubMed

Millstein, Joshua; Chen, Gary K; Breton, Carrie V

2016-08-01

The challenges of successfully applying causal inference methods include: (i) satisfying underlying assumptions, (ii) limitations in data/models accommodated by the software and (iii) low power of common multiple testing approaches. The causal inference test (CIT) is based on hypothesis testing rather than estimation, allowing the testable assumptions to be evaluated in the determination of statistical significance. A user-friendly software package provides P-values and optionally permutation-based FDR estimates (q-values) for potential mediators. It can handle single and multiple binary and continuous instrumental variables, binary or continuous outcome variables and adjustment covariates. Also, the permutation-based FDR option provides a non-parametric implementation. Simulation studies demonstrate the validity of the cit package and show a substantial advantage of permutation-based FDR over other common multiple testing strategies. The cit open-source R package is freely available from the CRAN website (https://cran.r-project.org/web/packages/cit/index.html) with embedded C ++ code that utilizes the GNU Scientific Library, also freely available (http://www.gnu.org/software/gsl/). joshua.millstein@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
ICT: isotope correction toolbox.

PubMed

Jungreuthmayer, Christian; Neubauer, Stefan; Mairinger, Teresa; Zanghellini, Jürgen; Hann, Stephan

2016-01-01

Isotope tracer experiments are an invaluable technique to analyze and study the metabolism of biological systems. However, isotope labeling experiments are often affected by naturally abundant isotopes especially in cases where mass spectrometric methods make use of derivatization. The correction of these additive interferences--in particular for complex isotopic systems--is numerically challenging and still an emerging field of research. When positional information is generated via collision-induced dissociation, even more complex calculations for isotopic interference correction are necessary. So far, no freely available tools can handle tandem mass spectrometry data. We present isotope correction toolbox, a program that corrects tandem mass isotopomer data from tandem mass spectrometry experiments. Isotope correction toolbox is written in the multi-platform programming language Perl and, therefore, can be used on all commonly available computer platforms. Source code and documentation can be freely obtained under the Artistic License or the GNU General Public License from: https://github.com/jungreuc/isotope_correction_toolbox/ {christian.jungreuthmayer@boku.ac.at,juergen.zanghellini@boku.ac.at} Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association

PubMed Central

Ma, Jian; Casey, Cameron P.; Zheng, Xueyun; Ibrahim, Yehia M.; Wilkins, Christopher S.; Renslow, Ryan S.; Thomas, Dennis G.; Payne, Samuel H.; Monroe, Matthew E.; Smith, Richard D.; Teeguarden, Justin G.; Baker, Erin S.; Metz, Thomas O.

2017-01-01

Abstract Motivation: Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. Results: We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. Availability and implementation: PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library. Contact: erin.baker@pnnl.gov or thomas.metz@pnnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28505286
PIXiE: an algorithm for automated ion mobility arrival time extraction and collision cross section calculation using global data association.

PubMed

Ma, Jian; Casey, Cameron P; Zheng, Xueyun; Ibrahim, Yehia M; Wilkins, Christopher S; Renslow, Ryan S; Thomas, Dennis G; Payne, Samuel H; Monroe, Matthew E; Smith, Richard D; Teeguarden, Justin G; Baker, Erin S; Metz, Thomas O

2017-09-01

Drift tube ion mobility spectrometry coupled with mass spectrometry (DTIMS-MS) is increasingly implemented in high throughput omics workflows, and new informatics approaches are necessary for processing the associated data. To automatically extract arrival times for molecules measured by DTIMS at multiple electric fields and compute their associated collisional cross sections (CCS), we created the PNNL Ion Mobility Cross Section Extractor (PIXiE). The primary application presented for this algorithm is the extraction of data that can then be used to create a reference library of experimental CCS values for use in high throughput omics analyses. We demonstrate the utility of this approach by automatically extracting arrival times and calculating the associated CCSs for a set of endogenous metabolites and xenobiotics. The PIXiE-generated CCS values were within error of those calculated using commercially available instrument vendor software. PIXiE is an open-source tool, freely available on Github. The documentation, source code of the software, and a GUI can be found at https://github.com/PNNL-Comp-Mass-Spec/PIXiE and the source code of the backend workflow library used by PIXiE can be found at https://github.com/PNNL-Comp-Mass-Spec/IMS-Informed-Library . erin.baker@pnnl.gov or thomas.metz@pnnl.gov. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
VAC: Versatile Advection Code

NASA Astrophysics Data System (ADS)

Tóth, Gábor; Keppens, Rony

2012-07-01

The Versatile Advection Code (VAC) is a freely available general hydrodynamic and magnetohydrodynamic simulation software that works in 1, 2 or 3 dimensions on Cartesian and logically Cartesian grids. VAC runs on any Unix/Linux system with a Fortran 90 (or 77) compiler and Perl interpreter. VAC can run on parallel machines using either the Message Passing Interface (MPI) library or a High Performance Fortran (HPF) compiler.
A Validated Open-Source Multisolver Fourth-Generation Composite Femur Model.

PubMed

MacLeod, Alisdair R; Rose, Hannah; Gill, Harinderjit S

2016-12-01

Synthetic biomechanical test specimens are frequently used for preclinical evaluation of implant performance, often in combination with numerical modeling, such as finite-element (FE) analysis. Commercial and freely available FE packages are widely used with three FE packages in particular gaining popularity: abaqus (Dassault Systèmes, Johnston, RI), ansys (ANSYS, Inc., Canonsburg, PA), and febio (University of Utah, Salt Lake City, UT). To the best of our knowledge, no study has yet made a comparison of these three commonly used solvers. Additionally, despite the femur being the most extensively studied bone in the body, no freely available validated model exists. The primary aim of the study was primarily to conduct a comparison of mesh convergence and strain prediction between the three solvers (abaqus, ansys, and febio) and to provide validated open-source models of a fourth-generation composite femur for use with all the three FE packages. Second, we evaluated the geometric variability around the femoral neck region of the composite femurs. Experimental testing was conducted using fourth-generation Sawbones® composite femurs instrumented with strain gauges at four locations. A generic FE model and four specimen-specific FE models were created from CT scans. The study found that the three solvers produced excellent agreement, with strain predictions being within an average of 3.0% for all the solvers (r2 > 0.99) and 1.4% for the two commercial codes. The average of the root mean squared error against the experimental results was 134.5% (r2 = 0.29) for the generic model and 13.8% (r2 = 0.96) for the specimen-specific models. It was found that composite femurs had variations in cortical thickness around the neck of the femur of up to 48.4%. For the first time, an experimentally validated, finite-element model of the femur is presented for use in three solvers. This model is freely available online along with all the supporting validation data.
The MIMIC Code Repository: enabling reproducibility in critical care research.

PubMed

Johnson, Alistair Ew; Stone, David J; Celi, Leo A; Pollard, Tom J

2018-01-01

Lack of reproducibility in medical studies is a barrier to the generation of a robust knowledge base to support clinical decision-making. In this paper we outline the Medical Information Mart for Intensive Care (MIMIC) Code Repository, a centralized code base for generating reproducible studies on an openly available critical care dataset. Code is provided to load the data into a relational structure, create extractions of the data, and reproduce entire analysis plans including research studies. Concepts extracted include severity of illness scores, comorbid status, administrative definitions of sepsis, physiologic criteria for sepsis, organ failure scores, treatment administration, and more. Executable documents are used for tutorials and reproduce published studies end-to-end, providing a template for future researchers to replicate. The repository's issue tracker enables community discussion about the data and concepts, allowing users to collaboratively improve the resource. The centralized repository provides a platform for users of the data to interact directly with the data generators, facilitating greater understanding of the data. It also provides a location for the community to collaborate on necessary concepts for research progress and share them with a larger audience. Consistent application of the same code for underlying concepts is a key step in ensuring that research studies on the MIMIC database are comparable and reproducible. By providing open source code alongside the freely accessible MIMIC-III database, we enable end-to-end reproducible analysis of electronic health records. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
CombiMotif: A new algorithm for network motifs discovery in protein-protein interaction networks

NASA Astrophysics Data System (ADS)

Luo, Jiawei; Li, Guanghui; Song, Dan; Liang, Cheng

2014-12-01

Discovering motifs in protein-protein interaction networks is becoming a current major challenge in computational biology, since the distribution of the number of network motifs can reveal significant systemic differences among species. However, this task can be computationally expensive because of the involvement of graph isomorphic detection. In this paper, we present a new algorithm (CombiMotif) that incorporates combinatorial techniques to count non-induced occurrences of subgraph topologies in the form of trees. The efficiency of our algorithm is demonstrated by comparing the obtained results with the current state-of-the art subgraph counting algorithms. We also show major differences between unicellular and multicellular organisms. The datasets and source code of CombiMotif are freely available upon request.
BioSmalltalk: a pure object system and library for bioinformatics.

PubMed

Morales, Hernán F; Giovambattista, Guillermo

2013-09-15

We have developed BioSmalltalk, a new environment system for pure object-oriented bioinformatics programming. Adaptive end-user programming systems tend to become more important for discovering biological knowledge, as is demonstrated by the emergence of open-source programming toolkits for bioinformatics in the past years. Our software is intended to bridge the gap between bioscientists and rapid software prototyping while preserving the possibility of scaling to whole-system biology applications. BioSmalltalk performs better in terms of execution time and memory usage than Biopython and BioPerl for some classical situations. BioSmalltalk is cross-platform and freely available (MIT license) through the Google Project Hosting at http://code.google.com/p/biosmalltalk hernan.morales@gmail.com Supplementary data are available at Bioinformatics online.
The Ensembl REST API: Ensembl Data for Any Language

PubMed Central

Yates, Andrew; Beal, Kathryn; Keenan, Stephen; McLaren, William; Pignatelli, Miguel; Ritchie, Graham R. S.; Ruffier, Magali; Taylor, Kieron; Vullo, Alessandro; Flicek, Paul

2015-01-01

Motivation: We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. Availability and implementation: The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest. Contact: ayates@ebi.ac.uk or flicek@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25236461
Transimulation - protein biosynthesis web service.

PubMed

Siwiak, Marlena; Zielenkiewicz, Piotr

2013-01-01

Although translation is the key step during gene expression, it remains poorly characterized at the level of individual genes. For this reason, we developed Transimulation - a web service measuring translational activity of genes in three model organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The calculations are based on our previous computational model of translation and experimental data sets. Transimulation quantifies mean translation initiation and elongation time (expressed in SI units), and the number of proteins produced per transcript. It also approximates the number of ribosomes that typically occupy a transcript during translation, and simulates their propagation. The simulation of ribosomes' movement is interactive and allows modifying the coding sequence on the fly. It also enables uploading any coding sequence and simulating its translation in one of three model organisms. In such a case, ribosomes propagate according to mean codon elongation times of the host organism, which may prove useful for heterologous expression. Transimulation was used to examine evolutionary conservation of translational parameters of orthologous genes. Transimulation may be accessed at http://nexus.ibb.waw.pl/Transimulation (requires Java version 1.7 or higher). Its manual and source code, distributed under the GPL-2.0 license, is freely available at the website.

GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains

PubMed Central

Lu, Zhiyong

2015-01-01

The automatic recognition of gene names and their associated database identifiers from biomedical text has been widely studied in recent years, as these tasks play an important role in many downstream text-mining applications. Despite significant previous research, only a small number of tools are publicly available and these tools are typically restricted to detecting only mention level gene names or only document level gene identifiers. In this work, we report GNormPlus: an end-to-end and open source system that handles both gene mention and identifier detection. We created a new corpus of 694 PubMed articles to support our development of GNormPlus, containing manual annotations for not only gene names and their identifiers, but also closely related concepts useful for gene name disambiguation, such as gene families and protein domains. GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names. As a result, GNormPlus compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset. The GNormPlus source code and its annotated corpus are freely available, and the results of applying GNormPlus to the entire PubMed are freely accessible through our web-based tool PubTator. PMID:26380306
Resources for comparing the speed and performance of medical autocoders.

PubMed

Berman, Jules J

2004-06-15

Concept indexing is a popular method for characterizing medical text, and is one of the most important early steps in many data mining efforts. Concept indexing differs from simple word or phrase indexing because concepts are typically represented by a nomenclature code that binds a medical concept to all equivalent representations. A concept search on the term renal cell carcinoma would be expected to find occurrences of hypernephroma, and renal carcinoma (concept equivalents). The purpose of this study is to provide freely available resources to compare speed and performance among different autocoders. These tools consist of: 1) a public domain autocoder written in Perl (a free and open source programming language that installs on any operating system); 2) a nomenclature database derived from the unencumbered subset of the publicly available Unified Medical Language System; 3) a large corpus of autocoded output derived from a publicly available medical text. A simple lexical autocoder was written that parses plain-text into a listing of all 1,2,3, and 4-word strings contained in text, assigning a nomenclature code for text strings that match terms in the nomenclature. The nomenclature used is the unencumbered subset of the 2003 Unified Medical Language System (UMLS). The unencumbered subset of UMLS was reduced to exclude homonymous one-word terms and proper names, resulting in a term/code data dictionary containing about a half million medical terms. The Online Mendelian Inheritance in Man (OMIM), a 92+ Megabyte publicly available medical opus, was used as sample medical text for the autocoder. The autocoding Perl script is remarkably short, consisting of just 38 command lines. The 92+ Megabyte OMIM file was completely autocoded in 869 seconds on a 2.4 GHz processor (less than 10 seconds per Megabyte of text). The autocoded output file (9,540,442 bytes) contains 367,963 coded terms from OMIM and is distributed with this manuscript. A public domain Perl script is provided that can parse through plain-text files of any length, matching concepts against an external nomenclature. The script and associated files can be used freely to compare the speed and performance of autocoding software.
XGlycScan: An Open-source Software For N-linked Glycosite Assignment, Quantification and Quality Assessment of Data from Mass Spectrometry-based Glycoproteomic Analysis.

PubMed

Aiyetan, Paul; Zhang, Bai; Zhang, Zhen; Zhang, Hui

2014-01-01

Mass spectrometry based glycoproteomics has become a major means of identifying and characterizing previously N-linked glycan attached loci (glycosites). In the bottom-up approach, several factors which include but not limited to sample preparation, mass spectrometry analyses, and protein sequence database searches result in previously N-linked peptide spectrum matches (PSMs) of varying lengths. Given that multiple PSM scan map to a glycosite, we reason that identified PSMs are varying length peptide species of a unique set of glycosites. Because associated spectra of these PSMs are typically summed separately, true glycosite associated spectra counts are lost or complicated. Also, these varying length peptide species complicate protein inference as smaller sized peptide sequences are more likely to map to more proteins than larger sized peptides or actual glycosite sequences. Here, we present XGlycScan. XGlycScan maps varying length peptide species to glycosites to facilitate an accurate quantification of glycosite associated spectra counts. We observed that this reduced the variability in reported identifications of mass spectrometry technical replicates of our sample dataset. We also observed that mapping identified peptides to glycosites provided an assessment of search-engine identification. Inherently, XGlycScan reported glycosites reduce the complexity in protein inference. We implemented XGlycScan in the platform independent Java programing language and have made it available as open source. XGlycScan's source code is freely available at https://bitbucket.org/paiyetan/xglycscan/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/xglycscan/downloads. The graphical user interface version can also be found at https://bitbucket.org/paiyetan/xglycscangui/src and https://bitbucket.org/paiyetan/xglycscangui/downloads respectively.
Implementation of GenePattern within the Stanford Microarray Database.

PubMed

Hubble, Jeremy; Demeter, Janos; Jin, Heng; Mao, Maria; Nitzberg, Michael; Reddy, T B K; Wymore, Farrell; Zachariah, Zachariah K; Sherlock, Gavin; Ball, Catherine A

2009-01-01

Hundreds of researchers across the world use the Stanford Microarray Database (SMD; http://smd.stanford.edu/) to store, annotate, view, analyze and share microarray data. In addition to providing registered users at Stanford access to their own data, SMD also provides access to public data, and tools with which to analyze those data, to any public user anywhere in the world. Previously, the addition of new microarray data analysis tools to SMD has been limited by available engineering resources, and in addition, the existing suite of tools did not provide a simple way to design, execute and share analysis pipelines, or to document such pipelines for the purposes of publication. To address this, we have incorporated the GenePattern software package directly into SMD, providing access to many new analysis tools, as well as a plug-in architecture that allows users to directly integrate and share additional tools through SMD. In this article, we describe our implementation of the GenePattern microarray analysis software package into the SMD code base. This extension is available with the SMD source code that is fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD with an enriched data analysis capability.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

PubMed

Chu, Annie; Cui, Jenny; Dinov, Ivo D

2009-03-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Dynamics of cortical dendritic membrane potential and spikes in freely behaving rats.

PubMed

Moore, Jason J; Ravassard, Pascal M; Ho, David; Acharya, Lavanya; Kees, Ashley L; Vuong, Cliff; Mehta, Mayank R

2017-03-24

Neural activity in vivo is primarily measured using extracellular somatic spikes, which provide limited information about neural computation. Hence, it is necessary to record from neuronal dendrites, which can generate dendritic action potentials (DAPs) in vitro, which can profoundly influence neural computation and plasticity. We measured neocortical sub- and suprathreshold dendritic membrane potential (DMP) from putative distal-most dendrites using tetrodes in freely behaving rats over multiple days with a high degree of stability and submillisecond temporal resolution. DAP firing rates were several-fold larger than somatic rates. DAP rates were also modulated by subthreshold DMP fluctuations, which were far larger than DAP amplitude, indicating hybrid, analog-digital coding in the dendrites. Parietal DAP and DMP exhibited egocentric spatial maps comparable to pyramidal neurons. These results have important implications for neural coding and plasticity. Copyright © 2017, American Association for the Advancement of Science.
Clinical laboratory sciences data transmission : the NPU coding system

PubMed Central

PONTET, Françoise; PETERSEN, Ulla MAGDAL; FUENTES-ARDERIU, Xavier; NORDIN, Gunnar; BRUUNSHUUS, Ivan; IHALAINEN, Jarkko; KARLSSON, Daniel; FORSUM, Urban; DYBKAER, René; SCHADOW, Gunther; KUELPMANN, Wolf; FÉRARD, Georges; KANG, Dongchon; McDONALD, Clement; HILL, Gilbert

2011-01-01

Introduction In health care services, technology requires that correct information be duly available to professionals, citizens and authorities, worldwide. Thus, clinical laboratory sciences require standardized electronic exchanges for results of laboratory examinations. Methods. The NPU (Nomenclature, Properties and Units) coding system provides a terminology for identification of result values (property values). It is structured according to BIPM, ISO, IUPAC and IFCC recommendations. It uses standard terms for established concepts and structured definitions describing: which part of the universe is examined, which component of relevance in that part, which kind-of-property is relevant. Unit and specifications can be added where relevant [System(spec) Component(spec); kind-of-property(spec) = ? unit]. Results. The English version of this terminology is freely accessible at http://dior.imt.liu.se/cnpu/ and http://www.labterm.dk, directly or through the IFCC and IUPAC websites. It has been nationally used for more than 10 years in Denmark and Sweden and has been translated into 6 other languages. Conclusions. The NPU coding system provides a terminology for dedicated kinds-of-property following the international recommendations. It fits well in the health network and is freely accessible. Clinical laboratory professionals worldwide will find many advantages in using the NPU coding system, notably with regards to an accreditation process. PMID:19745311
Clinical laboratory sciences data transmission: the NPU coding system.

PubMed

Pontet, Françoise; Magdal Petersen, Ulla; Fuentes-Arderiu, Xavier; Nordin, Gunnar; Bruunshuus, Ivan; Ihalainen, Jarkko; Karlsson, Daniel; Forsum, Urban; Dybkaer, René; Schadow, Gunther; Kuelpmann, Wolf; Férard, Georges; Kang, Dongchon; McDonald, Clement; Hill, Gilbert

2009-01-01

In health care services, technology requires that correct information be duly available to professionals, citizens and authorities, worldwide. Thus, clinical laboratory sciences require standardized electronic exchanges for results of laboratory examinations. The NPU (Nomenclature, Properties and Units) coding system provides a terminology for identification of result values (property values). It is structured according to BIPM, ISO, IUPAC and IFCC recommendations. It uses standard terms for established concepts and structured definitions describing: which part of the universe is examined, which component of relevance in that part, which kind-of-property is relevant. Unit and specifications can be added where relevant [System(spec)-Component(spec); kind-of-property(spec) = ? unit]. The English version of this terminology is freely accessible at http://dior.imt.liu.se/cnpu/ and http://www.labterm.dk, directly or through the IFCC and IUPAC websites. It has been nationally used for more than 10 years in Denmark and Sweden and has been translated into 6 other languages. The NPU coding system provides a terminology for dedicated kinds-of-property following the international recommendations. It fits well in the health network and is freely accessible. Clinical laboratory professionals worldwide will find many advantages in using the NPU coding system, notably with regards to an accreditation process.
nRC: non-coding RNA Classifier based on structural features.

PubMed

Fiannaca, Antonino; La Rosa, Massimo; La Paglia, Laura; Rizzo, Riccardo; Urso, Alfonso

2017-01-01

Non-coding RNA (ncRNA) are small non-coding sequences involved in gene expression regulation of many biological processes and diseases. The recent discovery of a large set of different ncRNAs with biologically relevant roles has opened the way to develop methods able to discriminate between the different ncRNA classes. Moreover, the lack of knowledge about the complete mechanisms in regulative processes, together with the development of high-throughput technologies, has required the help of bioinformatics tools in addressing biologists and clinicians with a deeper comprehension of the functional roles of ncRNAs. In this work, we introduce a new ncRNA classification tool, nRC (non-coding RNA Classifier). Our approach is based on features extraction from the ncRNA secondary structure together with a supervised classification algorithm implementing a deep learning architecture based on convolutional neural networks. We tested our approach for the classification of 13 different ncRNA classes. We obtained classification scores, using the most common statistical measures. In particular, we reach an accuracy and sensitivity score of about 74%. The proposed method outperforms other similar classification methods based on secondary structure features and machine learning algorithms, including the RNAcon tool that, to date, is the reference classifier. nRC tool is freely available as a docker image at https://hub.docker.com/r/tblab/nrc/. The source code of nRC tool is also available at https://github.com/IcarPA-TBlab/nrc.
Kangaroo – A pattern-matching program for biological sequences

PubMed Central

2002-01-01

Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718
BSDWormer; an Open Source Implementation of a Poisson Wavelet Multiscale Analysis for Potential Fields

NASA Astrophysics Data System (ADS)

Horowitz, F. G.; Gaede, O.

2014-12-01

Wavelet multiscale edge analysis of potential fields (a.k.a. "worms") has been known since Moreau et al. (1997) and was independently derived by Hornby et al. (1999). The technique is useful for producing a scale-explicit overview of the structures beneath a gravity or magnetic survey, including establishing the location and estimating the attitude of surface features, as well as incorporating information about the geometric class (point, line, surface, volume, fractal) of the underlying sources — in a fashion much like traditional structural indices from Euler solutions albeit with better areal coverage. Hornby et al. (2002) show that worms form the locally highest concentration of horizontal edges of a given strike — which in conjunction with the results from Mallat and Zhong (1992) induces a (non-unique!) inversion where the worms are physically interpretable as lateral boundaries in a source distribution that produces a close approximation of the observed potential field. The technique has enjoyed widespread adoption and success in the Australian mineral exploration community — including "ground truth" via successfully drilling structures indicated by the worms. Unfortunately, to our knowledge, all implementations of the code to calculate the worms/multiscale edges (including Horowitz' original research code) are either part of commercial software packages, or have copyright restrictions that impede the use of the technique by the wider community. The technique is completely described mathematically in Hornby et al. (1999) along with some later publications. This enables us to re-implement from scratch the code required to calculate and visualize the worms. We are freely releasing the results under an (open source) BSD two-clause software license. A git repository is available at . We will give an overview of the technique, show code snippets using the codebase, and present visualization results for example datasets (including the Surat basin of Australia, and the Lake Ontario region of North America). We invite you to join us in creating and using the best worming software for potential fields in existence — as both gratis and libre software!
Jwalk and MNXL Web Server: Model Validation using Restraints from Crosslinking Mass Spectrometry.

PubMed

Bullock, J M A; Thalassinos, K; Topf, M

2018-05-07

Crosslinking Mass Spectrometry generates restraints that can be used to model proteins and protein complexes. Previously, we have developed two methods, to help users achieve better modelling performance from their crosslinking restraints: Jwalk, to estimate solvent accessible distances between crosslinked residues and MNXL, to assess the quality of the models based on these distances. Here we present the Jwalk and MNXL webservers, which streamline the process of validating monomeric protein models using restraints from crosslinks. We demonstrate this by using the MNXL server to filter models made of varying quality, selecting the most native-like. The webserver and source code are freely available from jwalk.ismb.lon.ac.uk and mnxl.ismb.lon.ac.uk. m.topf@cryst.bbk.ac.uk, j.bullock@cryst.bbk.ac.uk.
BamTools: a C++ API and toolkit for analyzing and managing BAM files.

PubMed

Barnett, Derek W; Garrison, Erik K; Quinlan, Aaron R; Strömberg, Michael P; Marth, Gabor T

2011-06-15

Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research. We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools.
webMGR: an online tool for the multiple genome rearrangement problem.

PubMed

Lin, Chi Ho; Zhao, Hao; Lowcay, Sean Harry; Shahab, Atif; Bourque, Guillaume

2010-02-01

The algorithm MGR enables the reconstruction of rearrangement phylogenies based on gene or synteny block order in multiple genomes. Although MGR has been successfully applied to study the evolution of different sets of species, its utilization has been hampered by the prohibitive running time for some applications. In the current work, we have designed new heuristics that significantly speed up the tool without compromising its accuracy. Moreover, we have developed a web server (webMGR) that includes elaborate web output to facilitate navigation through the results. webMGR can be accessed via http://www.gis.a-star.edu.sg/~bourque. The source code of the improved standalone version of MGR is also freely available from the web site. Supplementary data are available at Bioinformatics online.
PconsD: ultra rapid, accurate model quality assessment for protein structure prediction.

PubMed

Skwark, Marcin J; Elofsson, Arne

2013-07-15

Clustering methods are often needed for accurately assessing the quality of modeled protein structures. Recent blind evaluation of quality assessment methods in CASP10 showed that there is little difference between many different methods as far as ranking models and selecting best model are concerned. When comparing many models, the computational cost of the model comparison can become significant. Here, we present PconsD, a fast, stream-computing method for distance-driven model quality assessment that runs on consumer hardware. PconsD is at least one order of magnitude faster than other methods of comparable accuracy. The source code for PconsD is freely available at http://d.pcons.net/. Supplementary benchmarking data are also available there. arne@bioinfo.se Supplementary data are available at Bioinformatics online.
Psynteract: A flexible, cross-platform, open framework for interactive experiments.

PubMed

Henninger, Felix; Kieslich, Pascal J; Hilbig, Benjamin E

2017-10-01

We introduce a novel platform for interactive studies, that is, any form of study in which participants' experiences depend not only on their own responses, but also on those of other participants who complete the same study in parallel, for example a prisoner's dilemma or an ultimatum game. The software thus especially serves the rapidly growing field of strategic interaction research within psychology and behavioral economics. In contrast to all available software packages, our platform does not handle stimulus display and response collection itself. Instead, we provide a mechanism to extend existing experimental software to incorporate interactive functionality. This approach allows us to draw upon the capabilities already available, such as accuracy of temporal measurement, integration with auxiliary hardware such as eye-trackers or (neuro-)physiological apparatus, and recent advances in experimental software, for example capturing response dynamics through mouse-tracking. Through integration with OpenSesame, an open-source graphical experiment builder, studies can be assembled via a drag-and-drop interface requiring little or no further programming skills. In addition, by using the same communication mechanism across software packages, we also enable interoperability between systems. Our source code, which provides support for all major operating systems and several popular experimental packages, can be freely used and distributed under an open source license. The communication protocols underlying its functionality are also well documented and easily adapted to further platforms. Code and documentation are available at https://github.com/psynteract/ .
Ground motion simulations in Marmara (Turkey) region from 3D finite difference method

NASA Astrophysics Data System (ADS)

Aochi, Hideo; Ulrich, Thomas; Douglas, John

2016-04-01

In the framework of the European project MARSite (2012-2016), one of the main contributions from our research team was to provide ground-motion simulations for the Marmara region from various earthquake source scenarios. We adopted a 3D finite difference code, taking into account the 3D structure around the Sea of Marmara (including the bathymetry) and the sea layer. We simulated two moderate earthquakes (about Mw4.5) and found that the 3D structure improves significantly the waveforms compared to the 1D layer model. Simulations were carried out for different earthquakes (moderate point sources and large finite sources) in order to provide shake maps (Aochi and Ulrich, BSSA, 2015), to study the variability of ground-motion parameters (Douglas & Aochi, BSSA, 2016) as well as to provide synthetic seismograms for the blind inversion tests (Diao et al., GJI, 2016). The results are also planned to be integrated in broadband ground-motion simulations, tsunamis generation and simulations of triggered landslides (in progress by different partners). The simulations are freely shared among the partners via the internet and the visualization of the results is diffused on the project's homepage. All these simulations should be seen as a reference for this region, as they are based on the latest knowledge that obtained during the MARSite project, although their refinement and validation of the model parameters and the simulations are a continuing research task relying on continuing observations. The numerical code used, the models and the simulations are available on demand.
SIMA: Python software for analysis of dynamic fluorescence imaging data.

PubMed

Kaifosh, Patrick; Zaremba, Jeffrey D; Danielson, Nathan B; Losonczy, Attila

2014-01-01

Fluorescence imaging is a powerful method for monitoring dynamic signals in the nervous system. However, analysis of dynamic fluorescence imaging data remains burdensome, in part due to the shortage of available software tools. To address this need, we have developed SIMA, an open source Python package that facilitates common analysis tasks related to fluorescence imaging. Functionality of this package includes correction of motion artifacts occurring during in vivo imaging with laser-scanning microscopy, segmentation of imaged fields into regions of interest (ROIs), and extraction of signals from the segmented ROIs. We have also developed a graphical user interface (GUI) for manual editing of the automatically segmented ROIs and automated registration of ROIs across multiple imaging datasets. This software has been designed with flexibility in mind to allow for future extension with different analysis methods and potential integration with other packages. Software, documentation, and source code for the SIMA package and ROI Buddy GUI are freely available at http://www.losonczylab.org/sima/.
CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer

PubMed Central

Carter, Hannah; Diekhans, Mark; Ryan, Michael C.; Karchin, Rachel

2011-01-01

Summary: Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline. Availability and Implementation: MySQL database, source code and binaries freely available for academic/government use at http://wiki.chasmsoftware.org, Source in Python and C++. Requires 32 or 64-bit Linux system (tested on Fedora Core 8,10,11 and Ubuntu 10), 2.5*≤ Python <3.0*, MySQL server >5.0, 60 GB available hard disk space (50 MB for software and data files, 40 GB for MySQL database dump when uncompressed), 2 GB of RAM. Contact: karchin@jhu.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21685053
MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud.

PubMed

Expósito, Roberto R; Veiga, Jorge; González-Domínguez, Jorge; Touriño, Juan

2017-09-01

This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool. Source code in Java and Hadoop as well as a user's guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es . rreye@udc.es. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

UltraTrack: Software for semi-automated tracking of muscle fascicles in sequences of B-mode ultrasound images.

PubMed

Farris, Dominic James; Lichtwark, Glen A

2016-05-01

Dynamic measurements of human muscle fascicle length from sequences of B-mode ultrasound images have become increasingly prevalent in biomedical research. Manual digitisation of these images is time consuming and algorithms for automating the process have been developed. Here we present a freely available software implementation of a previously validated algorithm for semi-automated tracking of muscle fascicle length in dynamic ultrasound image recordings, "UltraTrack". UltraTrack implements an affine extension to an optic flow algorithm to track movement of the muscle fascicle end-points throughout dynamically recorded sequences of images. The underlying algorithm has been previously described and its reliability tested, but here we present the software implementation with features for: tracking multiple fascicles in multiple muscles simultaneously; correcting temporal drift in measurements; manually adjusting tracking results; saving and re-loading of tracking results and loading a range of file formats. Two example runs of the software are presented detailing the tracking of fascicles from several lower limb muscles during a squatting and walking activity. We have presented a software implementation of a validated fascicle-tracking algorithm and made the source code and standalone versions freely available for download. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

PubMed

Mørk, Søren; Holmes, Ian

2012-03-01

Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.
Jannovar: a java library for exome annotation.

PubMed

Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

2014-05-01

Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. © 2014 WILEY PERIODICALS, INC.
The 2006 NESCent Phyloinformatics Hackathon: A Field Report

PubMed Central

Lapp, Hilmar; Bala, Sendu; Balhoff, James P.; Bouck, Amy; Goto, Naohisa; Holder, Mark; Holland, Richard; Holloway, Alisha; Katayama, Toshiaki; Lewis, Paul O.; Mackey, Aaron J.; Osborne, Brian I.; Piel, William H.; Kosakovsky Pond, Sergei L.; Poon, Art F.Y.; Qiu, Wei-Gang; Stajich, Jason E.; Stoltzfus, Arlin; Thierer, Tobias; Vilella, Albert J.; Vos, Rutger A.; Zmasek, Christian M.; Zwickl, Derrick J.; Vision, Todd J.

2007-01-01

In December, 2006, a group of 26 software developers from some of the most widely used life science programming toolkits and phylogenetic software projects converged on Durham, North Carolina, for a Phyloinformatics Hackathon, an intense five-day collaborative software coding event sponsored by the National Evolutionary Synthesis Center (NESCent). The goal was to help researchers to integrate multiple phylogenetic software tools into automated workflows. Participants addressed deficiencies in interoperability between programs by implementing “glue code” and improving support for phylogenetic data exchange standards (particularly NEXUS) across the toolkits. The work was guided by use-cases compiled in advance by both developers and users, and the code was documented as it was developed. The resulting software is freely available for both users and developers through incorporation into the distributions of several widely-used open-source toolkits. We explain the motivation for the hackathon, how it was organized, and discuss some of the outcomes and lessons learned. We conclude that hackathons are an effective mode of solving problems in software interoperability and usability, and are underutilized in scientific software development.
Pathogen metadata platform: software for accessing and analyzing pathogen strain information.

PubMed

Chang, Wenling E; Peterson, Matthew W; Garay, Christopher D; Korves, Tonia

2016-09-15

Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results. We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers. This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .
mGrid: A load-balanced distributed computing environment for the remote execution of the user-defined Matlab code

PubMed Central

Karpievitch, Yuliya V; Almeida, Jonas S

2006-01-01

Background Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. Results mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Conclusion Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet. PMID:16539707
mGrid: a load-balanced distributed computing environment for the remote execution of the user-defined Matlab code.

PubMed

Karpievitch, Yuliya V; Almeida, Jonas S

2006-03-15

Matlab, a powerful and productive language that allows for rapid prototyping, modeling and simulation, is widely used in computational biology. Modeling and simulation of large biological systems often require more computational resources then are available on a single computer. Existing distributed computing environments like the Distributed Computing Toolbox, MatlabMPI, Matlab*G and others allow for the remote (and possibly parallel) execution of Matlab commands with varying support for features like an easy-to-use application programming interface, load-balanced utilization of resources, extensibility over the wide area network, and minimal system administration skill requirements. However, all of these environments require some level of access to participating machines to manually distribute the user-defined libraries that the remote call may invoke. mGrid augments the usual process distribution seen in other similar distributed systems by adding facilities for user code distribution. mGrid's client-side interface is an easy-to-use native Matlab toolbox that transparently executes user-defined code on remote machines (i.e. the user is unaware that the code is executing somewhere else). Run-time variables are automatically packed and distributed with the user-defined code and automated load-balancing of remote resources enables smooth concurrent execution. mGrid is an open source environment. Apart from the programming language itself, all other components are also open source, freely available tools: light-weight PHP scripts and the Apache web server. Transparent, load-balanced distribution of user-defined Matlab toolboxes and rapid prototyping of many simple parallel applications can now be done with a single easy-to-use Matlab command. Because mGrid utilizes only Matlab, light-weight PHP scripts and the Apache web server, installation and configuration are very simple. Moreover, the web-based infrastructure of mGrid allows for it to be easily extensible over the Internet.
NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples.

PubMed

Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei

2018-01-01

Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
Rcount: simple and flexible RNA-Seq read counting.

PubMed

Schmid, Marc W; Grossniklaus, Ueli

2015-02-01

Analysis of differential gene expression by RNA sequencing (RNA-Seq) is frequently done using feature counts, i.e. the number of reads mapping to a gene. However, commonly used count algorithms (e.g. HTSeq) do not address the problem of reads aligning with multiple locations in the genome (multireads) or reads aligning with positions where two or more genes overlap (ambiguous reads). Rcount specifically addresses these issues. Furthermore, Rcount allows the user to assign priorities to certain feature types (e.g. higher priority for protein-coding genes compared to rRNA-coding genes) or to add flanking regions. Rcount provides a fast and easy-to-use graphical user interface requiring no command line or programming skills. It is implemented in C++ using the SeqAn (www.seqan.de) and the Qt libraries (qt-project.org). Source code and 64 bit binaries for (Ubuntu) Linux, Windows (7) and MacOSX are released under the GPLv3 license and are freely available on github.com/MWSchmid/Rcount. marcschmid@gmx.ch Test data, genome annotation files, useful Python and R scripts and a step-by-step user guide (including run-time and memory usage tests) are available on github.com/MWSchmid/Rcount. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Accessible and informative sectioned images, color-coded images, and surface models of the ear.

PubMed

Park, Hyo Seok; Chung, Min Suk; Shin, Dong Sun; Jung, Yong Wook; Park, Jin Seo

2013-08-01

In our previous research, we created state-of-the-art sectioned images, color-coded images, and surface models of the human ear. Our ear data would be more beneficial and informative if they were more easily accessible. Therefore, the purpose of this study was to distribute the browsing software and the PDF file in which ear images are to be readily obtainable and freely explored. Another goal was to inform other researchers of our methods for establishing the browsing software and the PDF file. To achieve this, sectioned images and color-coded images of ear were prepared (voxel size 0.1 mm). In the color-coded images, structures related to hearing, equilibrium, and structures originated from the first and second pharyngeal arches were segmented supplementarily. The sectioned and color-coded images of right ear were added to the browsing software, which displayed the images serially along with structure names. The surface models were reconstructed to be combined into the PDF file where they could be freely manipulated. Using the browsing software and PDF file, sectional and three-dimensional shapes of ear structures could be comprehended in detail. Furthermore, using the PDF file, clinical knowledge could be identified through virtual otoscopy. Therefore, the presented educational tools will be helpful to medical students and otologists by improving their knowledge of ear anatomy. The browsing software and PDF file can be downloaded without charge and registration at our homepage (http://anatomy.dongguk.ac.kr/ear/). Copyright © 2013 Wiley Periodicals, Inc.
SMASH - semi-automatic muscle analysis using segmentation of histology: a MATLAB application.

PubMed

Smith, Lucas R; Barton, Elisabeth R

2014-01-01

Histological assessment of skeletal muscle tissue is commonly applied to many areas of skeletal muscle physiological research. Histological parameters including fiber distribution, fiber type, centrally nucleated fibers, and capillary density are all frequently quantified measures of skeletal muscle. These parameters reflect functional properties of muscle and undergo adaptation in many muscle diseases and injuries. While standard operating procedures have been developed to guide analysis of many of these parameters, the software to freely, efficiently, and consistently analyze them is not readily available. In order to provide this service to the muscle research community we developed an open source MATLAB script to analyze immunofluorescent muscle sections incorporating user controls for muscle histological analysis. The software consists of multiple functions designed to provide tools for the analysis selected. Initial segmentation and fiber filter functions segment the image and remove non-fiber elements based on user-defined parameters to create a fiber mask. Establishing parameters set by the user, the software outputs data on fiber size and type, centrally nucleated fibers, and other structures. These functions were evaluated on stained soleus muscle sections from 1-year-old wild-type and mdx mice, a model of Duchenne muscular dystrophy. In accordance with previously published data, fiber size was not different between groups, but mdx muscles had much higher fiber size variability. The mdx muscle had a significantly greater proportion of type I fibers, but type I fibers did not change in size relative to type II fibers. Centrally nucleated fibers were highly prevalent in mdx muscle and were significantly larger than peripherally nucleated fibers. The MATLAB code described and provided along with this manuscript is designed for image processing of skeletal muscle immunofluorescent histological sections. The program allows for semi-automated fiber detection along with user correction. The output of the code provides data in accordance with established standards of practice. The results of the program have been validated using a small set of wild-type and mdx muscle sections. This program is the first freely available and open source image processing program designed to automate analysis of skeletal muscle histological sections.
A Low-Cost Multielectrode System for Data Acquisition Enabling Real-Time Closed-Loop Processing with Rapid Recovery from Stimulation Artifacts

PubMed Central

Rolston, John D.; Gross, Robert E.; Potter, Steve M.

2009-01-01

Commercially available data acquisition systems for multielectrode recording from freely moving animals are expensive, often rely on proprietary software, and do not provide detailed, modifiable circuit schematics. When used in conjunction with electrical stimulation, they are prone to prolonged, saturating stimulation artifacts that prevent the recording of short-latency evoked responses. Yet electrical stimulation is integral to many experimental designs, and critical for emerging brain-computer interfacing and neuroprosthetic applications. To address these issues, we developed an easy-to-use, modifiable, and inexpensive system for multielectrode neural recording and stimulation. Setup costs are less than US$10,000 for 64 channels, an order of magnitude lower than comparable commercial systems. Unlike commercial equipment, the system recovers rapidly from stimulation and allows short-latency action potentials (<1 ms post-stimulus) to be detected, facilitating closed-loop applications and exposing neural activity that would otherwise remain hidden. To illustrate this capability, evoked activity from microstimulation of the rodent hippocampus is presented. System noise levels are similar to existing platforms, and extracellular action potentials and local field potentials can be recorded simultaneously. The system is modular, in banks of 16 channels, and flexible in usage: while primarily designed for in vivo use, it can be combined with commercial preamplifiers to record from in vitro multielectrode arrays. The system's open-source control software, NeuroRighter, is implemented in C#, with an easy-to-use graphical interface. As C# functions in a managed code environment, which may impact performance, analysis was conducted to ensure comparable speed to C++ for this application. Hardware schematics, layout files, and software are freely available. Since maintaining wired headstage connections with freely moving animals is difficult, we describe a new method of electrode-headstage coupling using neodymium magnets. PMID:19668698
Open Science: A Zealot's View

EPA Science Inventory

Open science encompasses many concepts, but most agree that for science to be truly open four things must be true. First, all components of the scientific project must be freely available including manuscripts, code, and data. Second, others must be able to repeat your work and ...
Sources of Free and Open Source Spatial Data for Natural Disasters and Principles for Use in Developing Country Contexts

NASA Astrophysics Data System (ADS)

Taylor, Faith E.; Malamud, Bruce D.; Millington, James D. A.

2016-04-01

Access to reliable spatial and quantitative datasets (e.g., infrastructure maps, historical observations, environmental variables) at regional and site specific scales can be a limiting factor for understanding hazards and risks in developing country settings. Here we present a 'living database' of >75 freely available data sources relevant to hazard and risk in Africa (and more globally). Data sources include national scientific foundations, non-governmental bodies, crowd-sourced efforts, academic projects, special interest groups and others. The database is available at http://tinyurl.com/africa-datasets and is continually being updated, particularly in the context of broader natural hazards research we are doing in the context of Malawi and Kenya. For each data source, we review the spatiotemporal resolution and extent and make our own assessments of reliability and usability of datasets. Although such freely available datasets are sometimes presented as a panacea to improving our understanding of hazards and risk in developing countries, there are both pitfalls and opportunities unique to using this type of freely available data. These include factors such as resolution, homogeneity, uncertainty, access to metadata and training for usage. Based on our experience, use in the field and grey/peer-review literature, we present a suggested set of guidelines for using these free and open source data in developing country contexts.
Condor-COPASI: high-throughput computing for biochemical networks

PubMed Central

2012-01-01

Background Mathematical modelling has become a standard technique to improve our understanding of complex biological systems. As models become larger and more complex, simulations and analyses require increasing amounts of computational power. Clusters of computers in a high-throughput computing environment can help to provide the resources required for computationally expensive model analysis. However, exploiting such a system can be difficult for users without the necessary expertise. Results We present Condor-COPASI, a server-based software tool that integrates COPASI, a biological pathway simulation tool, with Condor, a high-throughput computing environment. Condor-COPASI provides a web-based interface, which makes it extremely easy for a user to run a number of model simulation and analysis tasks in parallel. Tasks are transparently split into smaller parts, and submitted for execution on a Condor pool. Result output is presented to the user in a number of formats, including tables and interactive graphical displays. Conclusions Condor-COPASI can effectively use a Condor high-throughput computing environment to provide significant gains in performance for a number of model simulation and analysis tasks. Condor-COPASI is free, open source software, released under the Artistic License 2.0, and is suitable for use by any institution with access to a Condor pool. Source code is freely available for download at http://code.google.com/p/condor-copasi/, along with full instructions on deployment and usage. PMID:22834945
Neurovascular Network Explorer 2.0: A Simple Tool for Exploring and Sharing a Database of Optogenetically-evoked Vasomotion in Mouse Cortex In Vivo.

PubMed

Uhlirova, Hana; Tian, Peifang; Kılıç, Kıvılcım; Thunemann, Martin; Sridhar, Vishnu B; Chmelik, Radim; Bartsch, Hauke; Dale, Anders M; Devor, Anna; Saisan, Payam A

2018-05-04

The importance of sharing experimental data in neuroscience grows with the amount and complexity of data acquired and various techniques used to obtain and process these data. However, the majority of experimental data, especially from individual studies of regular-sized laboratories never reach wider research community. A graphical user interface (GUI) engine called Neurovascular Network Explorer 2.0 (NNE 2.0) has been created as a tool for simple and low-cost sharing and exploring of vascular imaging data. NNE 2.0 interacts with a database containing optogenetically-evoked dilation/constriction time-courses of individual vessels measured in mice somatosensory cortex in vivo by 2-photon microscopy. NNE 2.0 enables selection and display of the time-courses based on different criteria (subject, branching order, cortical depth, vessel diameter, arteriolar tree) as well as simple mathematical manipulation (e.g. averaging, peak-normalization) and data export. It supports visualization of the vascular network in 3D and enables localization of the individual functional vessel diameter measurements within vascular trees. NNE 2.0, its source code, and the corresponding database are freely downloadable from UCSD Neurovascular Imaging Laboratory website 1 . The source code can be utilized by the users to explore the associated database or as a template for databasing and sharing their own experimental results provided the appropriate format.
FRETBursts: An Open Source Toolkit for Analysis of Freely-Diffusing Single-Molecule FRET

PubMed Central

Lerner, Eitan; Chung, SangYoon; Weiss, Shimon; Michalet, Xavier

2016-01-01

Single-molecule Förster Resonance Energy Transfer (smFRET) allows probing intermolecular interactions and conformational changes in biomacromolecules, and represents an invaluable tool for studying cellular processes at the molecular scale. smFRET experiments can detect the distance between two fluorescent labels (donor and acceptor) in the 3-10 nm range. In the commonly employed confocal geometry, molecules are free to diffuse in solution. When a molecule traverses the excitation volume, it emits a burst of photons, which can be detected by single-photon avalanche diode (SPAD) detectors. The intensities of donor and acceptor fluorescence can then be related to the distance between the two fluorophores. While recent years have seen a growing number of contributions proposing improvements or new techniques in smFRET data analysis, rarely have those publications been accompanied by software implementation. In particular, despite the widespread application of smFRET, no complete software package for smFRET burst analysis is freely available to date. In this paper, we introduce FRETBursts, an open source software for analysis of freely-diffusing smFRET data. FRETBursts allows executing all the fundamental steps of smFRET bursts analysis using state-of-the-art as well as novel techniques, while providing an open, robust and well-documented implementation. Therefore, FRETBursts represents an ideal platform for comparison and development of new methods in burst analysis. We employ modern software engineering principles in order to minimize bugs and facilitate long-term maintainability. Furthermore, we place a strong focus on reproducibility by relying on Jupyter notebooks for FRETBursts execution. Notebooks are executable documents capturing all the steps of the analysis (including data files, input parameters, and results) and can be easily shared to replicate complete smFRET analyzes. Notebooks allow beginners to execute complex workflows and advanced users to customize the analysis for their own needs. By bundling analysis description, code and results in a single document, FRETBursts allows to seamless share analysis workflows and results, encourages reproducibility and facilitates collaboration among researchers in the single-molecule community. PMID:27532626
The PLUTO code for astrophysical gasdynamics .

NASA Astrophysics Data System (ADS)

Mignone, A.

Present numerical codes appeal to a consolidated theory based on finite difference and Godunov-type schemes. In this context we have developed a versatile numerical code, PLUTO, suitable for the solution of high-mach number flow in 1, 2 and 3 spatial dimensions and different systems of coordinates. Different hydrodynamic modules and algorithms may be independently selected to properly describe Newtonian, relativistic, MHD, or relativistic MHD fluids. The modular structure exploits a general framework for integrating a system of conservation laws, built on modern Godunov-type shock-capturing schemes. The code is freely distributed under the GNU public license and it is available for download to the astrophysical community at the URL http://plutocode.to.astro.it.
BamTools: a C++ API and toolkit for analyzing and managing BAM files

PubMed Central

Barnett, Derek W.; Garrison, Erik K.; Quinlan, Aaron R.; Strömberg, Michael P.; Marth, Gabor T.

2011-01-01

Motivation: Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research. Results: We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. Availability: BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools. Contact: barnetde@bc.edu PMID:21493652
JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures.

PubMed

Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio

2012-02-15

We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.

NGL Viewer: Web-based molecular graphics for large complexes.

PubMed

Rose, Alexander S; Bradley, Anthony R; Valasatava, Yana; Duarte, Jose M; Prlic, Andreas; Rose, Peter W

2018-05-29

The interactive visualization of very large macromolecular complexes on the web is becoming a challenging problem as experimental techniques advance at an unprecedented rate and deliver structures of increasing size. We have tackled this problem by developing highly memory-efficient and scalable extensions for the NGL WebGL-based molecular viewer and by using MMTF, a binary and compressed Macromolecular Transmission Format. These enable NGL to download and render molecular complexes with millions of atoms interactively on desktop computers and smartphones alike, making it a tool of choice for web-based molecular visualization in research and education. The source code is freely available under the MIT license at github.com/arose/ngl and distributed on NPM (npmjs.com/package/ngl). MMTF-JavaScript encoders and decoders are available at github.com/rcsb/mmtf-javascript. asr.moin@gmail.com.
Calypso: a user-friendly web-server for mining and visualizing microbiome-environment interactions.

PubMed

Zakrzewski, Martha; Proietti, Carla; Ellis, Jonathan J; Hasan, Shihab; Brion, Marie-Jo; Berger, Bernard; Krause, Lutz

2017-03-01

Calypso is an easy-to-use online software suite that allows non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. Calypso has a focus on multivariate statistical approaches that can identify complex environment-microbiome associations. The software enables quantitative visualizations, statistical testing, multivariate analysis, supervised learning, factor analysis, multivariable regression, network analysis and diversity estimates. Comprehensive help pages, tutorials and videos are provided via a wiki page. The web-interface is accessible via http://cgenome.net/calypso/ . The software is programmed in Java, PERL and R and the source code is available from Zenodo ( https://zenodo.org/record/50931 ). The software is freely available for non-commercial users. l.krause@uq.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
ADOMA: A Command Line Tool to Modify ClustalW Multiple Alignment Output.

PubMed

Zaal, Dionne; Nota, Benjamin

2016-01-01

We present ADOMA, a command line tool that produces alternative outputs from ClustalW multiple alignments of nucleotide or protein sequences. ADOMA can simplify the output of alignments by showing only the different residues between sequences, which is often desirable when only small differences such as single nucleotide polymorphisms are present (e.g., between different alleles). Another feature of ADOMA is that it can enhance the ClustalW output by coloring the residues in the alignment. This tool is easily integrated into automated Linux pipelines for next-generation sequencing data analysis, and may be useful for researchers in a broad range of scientific disciplines including evolutionary biology and biomedical sciences. The source code is freely available at https://sourceforge. net/projects/adoma/. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
BioCichlid: central dogma-based 3D visualization system of time-course microarray data on a hierarchical biological network.

PubMed

Ishiwata, Ryosuke R; Morioka, Masaki S; Ogishima, Soichi; Tanaka, Hiroshi

2009-02-15

BioCichlid is a 3D visualization system of time-course microarray data on molecular networks, aiming at interpretation of gene expression data by transcriptional relationships based on the central dogma with physical and genetic interactions. BioCichlid visualizes both physical (protein) and genetic (regulatory) network layers, and provides animation of time-course gene expression data on the genetic network layer. Transcriptional regulations are represented to bridge the physical network (transcription factors) and genetic network (regulated genes) layers, thus integrating promoter analysis into the pathway mapping. BioCichlid enhances the interpretation of microarray data and allows for revealing the underlying mechanisms causing differential gene expressions. BioCichlid is freely available and can be accessed at http://newton.tmd.ac.jp/. Source codes for both biocichlid server and client are also available.
The geospatial data quality REST API for primary biodiversity data

PubMed Central

Otegui, Javier; Guralnick, Robert P.

2016-01-01

Summary: We present a REST web service to assess the geospatial quality of primary biodiversity data. It enables access to basic and advanced functions to detect completeness and consistency issues as well as general errors in the provided record or set of records. The API uses JSON for data interchange and efficient parallelization techniques for fast assessments of large datasets. Availability and implementation: The Geospatial Data Quality API is part of the VertNet set of APIs. It can be accessed at http://api-geospatial.vertnet-portal.appspot.com/geospatial and is already implemented in the VertNet data portal for quality reporting. Source code is freely available under GPL license from http://www.github.com/vertnet/api-geospatial. Contact: javier.otegui@gmail.com or rguralnick@flmnh.ufl.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26833340
MICA: Multiple interval-based curve alignment

NASA Astrophysics Data System (ADS)

Mann, Martin; Kahle, Hans-Peter; Beck, Matthias; Bender, Bela Johannes; Spiecker, Heinrich; Backofen, Rolf

2018-01-01

MICA enables the automatic synchronization of discrete data curves. To this end, characteristic points of the curves' shapes are identified. These landmarks are used within a heuristic curve registration approach to align profile pairs by mapping similar characteristics onto each other. In combination with a progressive alignment scheme, this enables the computation of multiple curve alignments. Multiple curve alignments are needed to derive meaningful representative consensus data of measured time or data series. MICA was already successfully applied to generate representative profiles of tree growth data based on intra-annual wood density profiles or cell formation data. The MICA package provides a command-line and graphical user interface. The R interface enables the direct embedding of multiple curve alignment computation into larger analyses pipelines. Source code, binaries and documentation are freely available at https://github.com/BackofenLab/MICA
The Microsoft Biology Foundation Applications for High-Throughput Sequencing

PubMed Central

Mercer, S.

2010-01-01

w9-2 The need for reusable libraries of bioinformatics functions has been recognized for many years and a number of language-specific toolkits have been constructed. Such toolkits have served as valuable nucleation points for the community, promoting the sharing of code and establishing standards. The majority of DNA sequencing machines and many other standard pieces of lab equipment are controlled by PCs using Windows, and a Microsoft genomics toolkit would enable initial processing and quality control to happen closer to the instrumentation and provide opportunities for added-value services within core facilities. The Microsoft Biology Foundation (MBF) is an open source software library, freely available for both commercial and academic use, available as an early-stage betafrom mbf.codeplex.com. This presentation will describe the structure and goals of MBF and demonstrate some of its uses.
An infrastructure for ontology-based information systems in biomedicine: RICORDO case study.

PubMed

Wimalaratne, Sarala M; Grenon, Pierre; Hoehndorf, Robert; Gkoutos, Georgios V; de Bono, Bernard

2012-02-01

The article presents an infrastructure for supporting the semantic interoperability of biomedical resources based on the management (storing and inference-based querying) of their ontology-based annotations. This infrastructure consists of: (i) a repository to store and query ontology-based annotations; (ii) a knowledge base server with an inference engine to support the storage of and reasoning over ontologies used in the annotation of resources; (iii) a set of applications and services allowing interaction with the integrated repository and knowledge base. The infrastructure is being prototyped and developed and evaluated by the RICORDO project in support of the knowledge management of biomedical resources, including physiology and pharmacology models and associated clinical data. The RICORDO toolkit and its source code are freely available from http://ricordo.eu/relevant-resources. sarala@ebi.ac.uk.
ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging

PubMed Central

Ovesný, Martin; Křížek, Pavel; Borkovec, Josef; Švindrych, Zdeněk; Hagen, Guy M.

2014-01-01

Summary: ThunderSTORM is an open-source, interactive and modular plug-in for ImageJ designed for automated processing, analysis and visualization of data acquired by single-molecule localization microscopy methods such as photo-activated localization microscopy and stochastic optical reconstruction microscopy. ThunderSTORM offers an extensive collection of processing and post-processing methods so that users can easily adapt the process of analysis to their data. ThunderSTORM also offers a set of tools for creation of simulated data and quantitative performance evaluation of localization algorithms using Monte Carlo simulations. Availability and implementation: ThunderSTORM and the online documentation are both freely accessible at https://code.google.com/p/thunder-storm/ Contact: guy.hagen@lf1.cuni.cz Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24771516
The geospatial data quality REST API for primary biodiversity data.

PubMed

Otegui, Javier; Guralnick, Robert P

2016-06-01

We present a REST web service to assess the geospatial quality of primary biodiversity data. It enables access to basic and advanced functions to detect completeness and consistency issues as well as general errors in the provided record or set of records. The API uses JSON for data interchange and efficient parallelization techniques for fast assessments of large datasets. The Geospatial Data Quality API is part of the VertNet set of APIs. It can be accessed at http://api-geospatial.vertnet-portal.appspot.com/geospatial and is already implemented in the VertNet data portal for quality reporting. Source code is freely available under GPL license from http://www.github.com/vertnet/api-geospatial javier.otegui@gmail.com or rguralnick@flmnh.ufl.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
phylo-node: A molecular phylogenetic toolkit using Node.js.

PubMed

O'Halloran, Damien M

2017-01-01

Node.js is an open-source and cross-platform environment that provides a JavaScript codebase for back-end server-side applications. JavaScript has been used to develop very fast and user-friendly front-end tools for bioinformatic and phylogenetic analyses. However, no such toolkits are available using Node.js to conduct comprehensive molecular phylogenetic analysis. To address this problem, I have developed, phylo-node, which was developed using Node.js and provides a stable and scalable toolkit that allows the user to perform diverse molecular and phylogenetic tasks. phylo-node can execute the analysis and process the resulting outputs from a suite of software options that provides tools for read processing and genome alignment, sequence retrieval, multiple sequence alignment, primer design, evolutionary modeling, and phylogeny reconstruction. Furthermore, phylo-node enables the user to deploy server dependent applications, and also provides simple integration and interoperation with other Node modules and languages using Node inheritance patterns, and a customized piping module to support the production of diverse pipelines. phylo-node is open-source and freely available to all users without sign-up or login requirements. All source code and user guidelines are openly available at the GitHub repository: https://github.com/dohalloran/phylo-node.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.

PubMed

Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B

2011-08-15

Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.
Inexpensive Open-Source Data Logging in the Field

NASA Astrophysics Data System (ADS)

Wickert, A. D.

2013-12-01

I present a general-purpose open-source field-capable data logger, which provides a mechanism to develop dense networks of inexpensive environmental sensors. This data logger was developed as a low-power variant of the Arduino open-source development system, and is named the ALog ("Arduino Logger") BottleLogger (it is slim enough to fit inside a Nalgene water bottle) version 1.0. It features an integrated high-precision real-time clock, SD card slot for high-volume data storage, and integrated power switching. The ALog can interface with sensors via six analog/digital pins, two digital pins, and one digital interrupt pin that can read event-based inputs, such as those from a tipping-bucket rain gauge. We have successfully tested the ALog BottleLogger with ultrasonic rangefinders (for water stage and snow accumulation and melt), temperature sensors, tipping-bucket rain gauges, soil moisture and water potential sensors, resistance-based tools to measure frost heave, and cameras that it triggers based on events. The source code for the ALog, including functions to interface with a a range of commercially-available sensors, is provided as an Arduino C++ library with example implementations. All schematics, circuit board layouts, and source code files are open-source and freely available under GNU GPL v3.0 and Creative Commons Attribution-ShareAlike 3.0 Unported licenses. Through this work, we hope to foster a community-driven movement to collect field environmental data on a budget that permits citizen-scientists and researchers from low-income countries to collect the same high-quality data as researchers in wealthy countries. These data can provide information about global change to managers, governments, scientists, and interested citizens worldwide. Watertight box with ALog BottleLogger data logger on the left and battery pack with 3 D cells on the right. Data can be collected for 3-5 years on one set of batteries.
DensToolKit: A comprehensive open-source package for analyzing the electron density and its derivative scalar and vector fields

NASA Astrophysics Data System (ADS)

Solano-Altamirano, J. M.; Hernández-Pérez, Julio M.

2015-11-01

DensToolKit is a suite of cross-platform, optionally parallelized, programs for analyzing the molecular electron density (ρ) and several fields derived from it. Scalar and vector fields, such as the gradient of the electron density (∇ρ), electron localization function (ELF) and its gradient, localized orbital locator (LOL), region of slow electrons (RoSE), reduced density gradient, localized electrons detector (LED), information entropy, molecular electrostatic potential, kinetic energy densities K and G, among others, can be evaluated on zero, one, two, and three dimensional grids. The suite includes a program for searching critical points and bond paths of the electron density, under the framework of Quantum Theory of Atoms in Molecules. DensToolKit also evaluates the momentum space electron density on spatial grids, and the reduced density matrix of order one along lines joining two arbitrary atoms of a molecule. The source code is distributed under the GNU-GPLv3 license, and we release the code with the intent of establishing an open-source collaborative project. The style of DensToolKit's code follows some of the guidelines of an object-oriented program. This allows us to supply the user with a simple manner for easily implement new scalar or vector fields, provided they are derived from any of the fields already implemented in the code. In this paper, we present some of the most salient features of the programs contained in the suite, some examples of how to run them, and the mathematical definitions of the implemented fields along with hints of how we optimized their evaluation. We benchmarked our suite against both a freely-available program and a commercial package. Speed-ups of ˜2×, and up to 12× were obtained using a non-parallel compilation of DensToolKit for the evaluation of fields. DensToolKit takes similar times for finding critical points, compared to a commercial package. Finally, we present some perspectives for the future development and growth of the suite.
caGrid 1.0 : an enterprise Grid infrastructure for biomedical research.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oster, S.; Langella, S.; Hastings, S.

To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research. Design: An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG{trademark}) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including (1) discovery, (2) integrated and large-scale data analysis, and (3) coordinated study. Measurements: The caGrid is built as a Grid software infrastructure andmore » leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications. Results: The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL: .« less
SarcOptiM for ImageJ: high-frequency online sarcomere length computing on stimulated cardiomyocytes.

PubMed

Pasqualin, Côme; Gannier, François; Yu, Angèle; Malécot, Claire O; Bredeloux, Pierre; Maupoil, Véronique

2016-08-01

Accurate measurement of cardiomyocyte contraction is a critical issue for scientists working on cardiac physiology and physiopathology of diseases implying contraction impairment. Cardiomyocytes contraction can be quantified by measuring sarcomere length, but few tools are available for this, and none is freely distributed. We developed a plug-in (SarcOptiM) for the ImageJ/Fiji image analysis platform developed by the National Institutes of Health. SarcOptiM computes sarcomere length via fast Fourier transform analysis of video frames captured or displayed in ImageJ and thus is not tied to a dedicated video camera. It can work in real time or offline, the latter overcoming rotating motion or displacement-related artifacts. SarcOptiM includes a simulator and video generator of cardiomyocyte contraction. Acquisition parameters, such as pixel size and camera frame rate, were tested with both experimental recordings of rat ventricular cardiomyocytes and synthetic videos. It is freely distributed, and its source code is available. It works under Windows, Mac, or Linux operating systems. The camera speed is the limiting factor, since the algorithm can compute online sarcomere shortening at frame rates >10 kHz. In conclusion, SarcOptiM is a free and validated user-friendly tool for studying cardiomyocyte contraction in all species, including human. Copyright © 2016 the American Physiological Society.
A Grassroots Remote Sensing Toolkit Using Live Coding, Smartphones, Kites and Lightweight Drones

PubMed Central

Anderson, K.; Griffiths, D.; DeBell, L.; Hancock, S.; Duffy, J. P.; Shutler, J. D.; Reinhardt, W. J.; Griffiths, A.

2016-01-01

This manuscript describes the development of an android-based smartphone application for capturing aerial photographs and spatial metadata automatically, for use in grassroots mapping applications. The aim of the project was to exploit the plethora of on-board sensors within modern smartphones (accelerometer, GPS, compass, camera) to generate ready-to-use spatial data from lightweight aerial platforms such as drones or kites. A visual coding ‘scheme blocks’ framework was used to build the application (‘app’), so that users could customise their own data capture tools in the field. The paper reports on the coding framework, then shows the results of test flights from kites and lightweight drones and finally shows how open-source geospatial toolkits were used to generate geographical information system (GIS)-ready GeoTIFF images from the metadata stored by the app. Two Android smartphones were used in testing–a high specification OnePlus One handset and a lower cost Acer Liquid Z3 handset, to test the operational limits of the app on phones with different sensor sets. We demonstrate that best results were obtained when the phone was attached to a stable single line kite or to a gliding drone. Results show that engine or motor vibrations from powered aircraft required dampening to ensure capture of high quality images. We demonstrate how the products generated from the open-source processing workflow are easily used in GIS. The app can be downloaded freely from the Google store by searching for ‘UAV toolkit’ (UAV toolkit 2016), and used wherever an Android smartphone and aerial platform are available to deliver rapid spatial data (e.g. in supporting decision-making in humanitarian disaster-relief zones, in teaching or for grassroots remote sensing and democratic mapping). PMID:27144310
A Grassroots Remote Sensing Toolkit Using Live Coding, Smartphones, Kites and Lightweight Drones.

PubMed

Anderson, K; Griffiths, D; DeBell, L; Hancock, S; Duffy, J P; Shutler, J D; Reinhardt, W J; Griffiths, A

2016-01-01

This manuscript describes the development of an android-based smartphone application for capturing aerial photographs and spatial metadata automatically, for use in grassroots mapping applications. The aim of the project was to exploit the plethora of on-board sensors within modern smartphones (accelerometer, GPS, compass, camera) to generate ready-to-use spatial data from lightweight aerial platforms such as drones or kites. A visual coding 'scheme blocks' framework was used to build the application ('app'), so that users could customise their own data capture tools in the field. The paper reports on the coding framework, then shows the results of test flights from kites and lightweight drones and finally shows how open-source geospatial toolkits were used to generate geographical information system (GIS)-ready GeoTIFF images from the metadata stored by the app. Two Android smartphones were used in testing-a high specification OnePlus One handset and a lower cost Acer Liquid Z3 handset, to test the operational limits of the app on phones with different sensor sets. We demonstrate that best results were obtained when the phone was attached to a stable single line kite or to a gliding drone. Results show that engine or motor vibrations from powered aircraft required dampening to ensure capture of high quality images. We demonstrate how the products generated from the open-source processing workflow are easily used in GIS. The app can be downloaded freely from the Google store by searching for 'UAV toolkit' (UAV toolkit 2016), and used wherever an Android smartphone and aerial platform are available to deliver rapid spatial data (e.g. in supporting decision-making in humanitarian disaster-relief zones, in teaching or for grassroots remote sensing and democratic mapping).
Academic Software Downloads from Google Code: Useful Usage Indicators?

ERIC Educational Resources Information Center

Thelwall, Mike; Kousha, Kayvan

2016-01-01

Introduction: Computer scientists and other researchers often make their programs freely available online. If this software makes a valuable contribution inside or outside of academia then its creators may want to demonstrate this with a suitable indicator, such as download counts. Methods: Download counts, citation counts, labels and licenses…
SearchGUI: An open-source graphical user interface for simultaneous OMSSA and X!Tandem searches.

PubMed

Vaudel, Marc; Barsnes, Harald; Berven, Frode S; Sickmann, Albert; Martens, Lennart

2011-03-01

The identification of proteins by mass spectrometry is a standard technique in the field of proteomics, relying on search engines to perform the identifications of the acquired spectra. Here, we present a user-friendly, lightweight and open-source graphical user interface called SearchGUI (http://searchgui.googlecode.com), for configuring and running the freely available OMSSA (open mass spectrometry search algorithm) and X!Tandem search engines simultaneously. Freely available under the permissible Apache2 license, SearchGUI is supported on Windows, Linux and OSX. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Landlab: an Open-Source Python Library for Modeling Earth Surface Dynamics

NASA Astrophysics Data System (ADS)

Gasparini, N. M.; Adams, J. M.; Hobley, D. E. J.; Hutton, E.; Nudurupati, S. S.; Istanbulluoglu, E.; Tucker, G. E.

2016-12-01

Landlab is an open-source Python modeling library that enables users to easily build unique models to explore earth surface dynamics. The Landlab library provides a number of tools and functionalities that are common to many earth surface models, thus eliminating the need for a user to recode fundamental model elements each time she explores a new problem. For example, Landlab provides a gridding engine so that a user can build a uniform or nonuniform grid in one line of code. The library has tools for setting boundary conditions, adding data to a grid, and performing basic operations on the data, such as calculating gradients and curvature. The library also includes a number of process components, which are numerical implementations of physical processes. To create a model, a user creates a grid and couples together process components that act on grid variables. The current library has components for modeling a diverse range of processes, from overland flow generation to bedrock river incision, from soil wetting and drying to vegetation growth, succession and death. The code is freely available for download (https://github.com/landlab/landlab) or can be installed as a Python package. Landlab models can also be built and run on Hydroshare (www.hydroshare.org), an online collaborative environment for sharing hydrologic data, models, and code. Tutorials illustrating a wide range of Landlab capabilities such as building a grid, setting boundary conditions, reading in data, plotting, using components and building models are also available (https://github.com/landlab/tutorials). The code is also comprehensively documented both online and natively in Python. In this presentation, we illustrate the diverse capabilities of Landlab. We highlight existing functionality by illustrating outcomes from a range of models built with Landlab - including applications that explore landscape evolution and ecohydrology. Finally, we describe the range of resources available for new users.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

PubMed Central

Chu, Annie; Cui, Jenny; Dinov, Ivo D.

2011-01-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Tentacle: distributed quantification of genes in metagenomes.

PubMed

Boulund, Fredrik; Sjögren, Anders; Kristiansson, Erik

2015-01-01

In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.
GeneXplorer: an interactive web application for microarray data visualization and analysis.

PubMed

Rees, Christian A; Demeter, Janos; Matese, John C; Botstein, David; Sherlock, Gavin

2004-10-01

When publishing large-scale microarray datasets, it is of great value to create supplemental websites where either the full data, or selected subsets corresponding to figures within the paper, can be browsed. We set out to create a CGI application containing many of the features of some of the existing standalone software for the visualization of clustered microarray data. We present GeneXplorer, a web application for interactive microarray data visualization and analysis in a web environment. GeneXplorer allows users to browse a microarray dataset in an intuitive fashion. It provides simple access to microarray data over the Internet and uses only HTML and JavaScript to display graphic and annotation information. It provides radar and zoom views of the data, allows display of the nearest neighbors to a gene expression vector based on their Pearson correlations and provides the ability to search gene annotation fields. The software is released under the permissive MIT Open Source license, and the complete documentation and the entire source code are freely available for download from CPAN http://search.cpan.org/dist/Microarray-GeneXplorer/.
Genoviz Software Development Kit: Java tool kit for building genomics visualization applications.

PubMed

Helt, Gregg A; Nicol, John W; Erwin, Ed; Blossom, Eric; Blanchard, Steven G; Chervitz, Stephen A; Harmon, Cyrus; Loraine, Ann E

2009-08-25

Visualization software can expose previously undiscovered patterns in genomic data and advance biological science. The Genoviz Software Development Kit (SDK) is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK framework provides a mechanism for incorporating adaptive, dynamic zooming into applications, a desirable feature of genome viewers. Visualization capabilities of the Genoviz SDK include automated layout of features along genetic or genomic axes; support for user interactions with graphical elements (Glyphs) in a map; a variety of Glyph sub-classes that promote experimentation with new ways of representing data in graphical formats; and support for adaptive, semantic zooming, whereby objects change their appearance depending on zoom level and zooming rate adapts to the current scale. Freely available demonstration and production quality applications, including the Integrated Genome Browser, illustrate Genoviz SDK capabilities. Separation between graphics components and genomic data models makes it easy for developers to add visualization capability to pre-existing applications or build new applications using third-party data models. Source code, documentation, sample applications, and tutorials are available at http://genoviz.sourceforge.net/.
IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis

PubMed Central

Safonova, Yana; Bonissone, Stefano; Kurpilyansky, Eugene; Starostina, Ekaterina; Lapidus, Alla; Stinson, Jeremy; DePalatis, Laura; Sandoval, Wendy; Lill, Jennie; Pevzner, Pavel A.

2015-01-01

The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. Availability and implementation: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. Contact: ppevzner@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26072509
Condenser: a statistical aggregation tool for multi-sample quantitative proteomic data from Matrix Science Mascot Distiller™.

PubMed

Knudsen, Anders Dahl; Bennike, Tue; Kjeldal, Henrik; Birkelund, Svend; Otzen, Daniel Erik; Stensballe, Allan

2014-05-30

We describe Condenser, a freely available, comprehensive open-source tool for merging multidimensional quantitative proteomics data from the Matrix Science Mascot Distiller Quantitation Toolbox into a common format ready for subsequent bioinformatic analysis. A number of different relative quantitation technologies, such as metabolic (15)N and amino acid stable isotope incorporation, label-free and chemical-label quantitation are supported. The program features multiple options for curative filtering of the quantified peptides, allowing the user to choose data quality thresholds appropriate for the current dataset, and ensure the quality of the calculated relative protein abundances. Condenser also features optional global normalization, peptide outlier removal, multiple testing and calculation of t-test statistics for highlighting and evaluating proteins with significantly altered relative protein abundances. Condenser provides an attractive addition to the gold-standard quantitative workflow of Mascot Distiller, allowing easy handling of larger multi-dimensional experiments. Source code, binaries, test data set and documentation are available at http://condenser.googlecode.com/. Copyright © 2014 Elsevier B.V. All rights reserved.
CONNJUR spectrum translator: an open source application for reformatting NMR spectral data.

PubMed

Nowling, Ronald J; Vyas, Jay; Weatherby, Gerard; Fenwick, Matthew W; Ellis, Heidi J C; Gryk, Michael R

2011-05-01

NMR spectroscopists are hindered by the lack of standardization for spectral data among the file formats for various NMR data processing tools. This lack of standardization is cumbersome as researchers must perform their own file conversion in order to switch between processing tools and also restricts the combination of tools employed if no conversion option is available. The CONNJUR Spectrum Translator introduces a new, extensible architecture for spectrum translation and introduces two key algorithmic improvements. This first is translation of NMR spectral data (time and frequency domain) to a single in-memory data model to allow addition of new file formats with two converter modules, a reader and a writer, instead of writing a separate converter to each existing format. Secondly, the use of layout descriptors allows a single fid data translation engine to be used for all formats. For the end user, sophisticated metadata readers allow conversion of the majority of files with minimum user configuration. The open source code is freely available at http://connjur.sourceforge.net for inspection and extension.
Geodesy: Modeling Earth's Post-Glacial Rebound

NASA Astrophysics Data System (ADS)

Spada, Giorgio; Antonioli, Andrea; Boschi, Lapo; Brandi, Valter; Cianetti, Spina; Galvani, Gabriele; Giunchi, Carlo; Perniola, Bruna; Agostinetti, Nicola Piana; Piersanti, Antonio; Stocchi, Paolo

2004-02-01

Efforts to mathematically model the Earth's post-glacial rebound, or, in general, long-term planetary-scale viscoelastic deformations, have been ongoing for several decades. Unfortunately, research in the post-glacial rebound community has not been characterized by much exchange of knowledge. Groups around the world have developed their code independently, sometimes with profoundly different approaches, occasionally leading to inconsistent results [e.g., Boschi et al., 1999]. Postglacial Rebound Calculator (TABOO) is a post-glacial rebound software that is being made freely available (through Samizdat Press at http://samizdat.mines.edu/taboo/)in the hope that it might become a common reference for all post-glacial rebound researchers. TABOO is portable and has been tested on Unix, Linux, and Windows systems; all it requires is a Fortran90 compiler supporting quadruple precision. The software is easy to use. It comes with a detailed guide that can work as a quick reference cookbook, and it is also accompanied by a textbook, The Theory Behind TABOO, collecting the most significant theoretical results from post-glacial rebound literature. TABOO is not a ``black-box,'' although it may easily be used as such. The entire source code is provided and should be easy to understand for intermediate-level Fortran programmers.
NetCoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks.

PubMed

Hu, Jialu; Kehr, Birte; Reinert, Knut

2014-02-15

Owing to recent advancements in high-throughput technologies, protein-protein interaction networks of more and more species become available in public databases. The question of how to identify functionally conserved proteins across species attracts a lot of attention in computational biology. Network alignments provide a systematic way to solve this problem. However, most existing alignment tools encounter limitations in tackling this problem. Therefore, the demand for faster and more efficient alignment tools is growing. We present a fast and accurate algorithm, NetCoffee, which allows to find a global alignment of multiple protein-protein interaction networks. NetCoffee searches for a global alignment by maximizing a target function using simulated annealing on a set of weighted bipartite graphs that are constructed using a triplet approach similar to T-Coffee. To assess its performance, NetCoffee was applied to four real datasets. Our results suggest that NetCoffee remedies several limitations of previous algorithms, outperforms all existing alignment tools in terms of speed and nevertheless identifies biologically meaningful alignments. The source code and data are freely available for download under the GNU GPL v3 license at https://code.google.com/p/netcoffee/.
rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.

PubMed

Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui

2015-07-01

High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
i-ADHoRe 2.0: an improved tool to detect degenerated genomic homology using genomic profiles.

PubMed

Simillion, Cedric; Janssens, Koen; Sterck, Lieven; Van de Peer, Yves

2008-01-01

i-ADHoRe is a software tool that combines gene content and gene order information of homologous genomic segments into profiles to detect highly degenerated homology relations within and between genomes. The new version offers, besides a significant increase in performance, several optimizations to the algorithm, most importantly to the profile alignment routine. As a result, the annotations of multiple genomes, or parts thereof, can be fed simultaneously into the program, after which it will report all regions of homology, both within and between genomes. The i-ADHoRe 2.0 package contains the C++ source code for the main program as well as various Perl scripts and a fully documented Perl API to facilitate post-processing. The software runs on any Linux- or -UNIX based platform. The package is freely available for academic users and can be downloaded from http://bioinformatics.psb.ugent.be/
fluff: exploratory analysis and visualization of high-throughput sequencing data

PubMed Central

Georgiou, Georgios

2016-01-01

Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532
Semantic Body Browser: graphical exploration of an organism and spatially resolved expression data visualization.

PubMed

Lekschas, Fritz; Stachelscheid, Harald; Seltmann, Stefanie; Kurtz, Andreas

2015-03-01

Advancing technologies generate large amounts of molecular and phenotypic data on cells, tissues and organisms, leading to an ever-growing detail and complexity while information retrieval and analysis becomes increasingly time-consuming. The Semantic Body Browser is a web application for intuitively exploring the body of an organism from the organ to the subcellular level and visualising expression profiles by means of semantically annotated anatomical illustrations. It is used to comprehend biological and medical data related to the different body structures while relying on the strong pattern recognition capabilities of human users. The Semantic Body Browser is a JavaScript web application that is freely available at http://sbb.cellfinder.org. The source code is provided on https://github.com/flekschas/sbb. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
PROVAT: a tool for Voronoi tessellation analysis of protein structures and complexes.

PubMed

Gore, Swanand P; Burke, David F; Blundell, Tom L

2005-08-01

Voronoi tessellation has proved to be a useful tool in protein structure analysis. We have developed PROVAT, a versatile public domain software that enables computation and visualization of Voronoi tessellations of proteins and protein complexes. It is a set of Python scripts that integrate freely available specialized software (Qhull, Pymol etc.) into a pipeline. The calculation component of the tool computes Voronoi tessellation of a given protein system in a way described by a user-supplied XML recipe and stores resulting neighbourhood information as text files with various styles. The Python pickle file generated in the process is used by the visualization component, a Pymol plug-in, that offers a GUI to explore the tessellation visually. PROVAT source code can be downloaded from http://raven.bioc.cam.ac.uk/~swanand/Provat1, which also provides a webserver for its calculation component, documentation and examples.
MUSCLE: multiple sequence alignment with high accuracy and high throughput.

PubMed

Edgar, Robert C

2004-01-01

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity

PubMed Central

Wang, Yupeng; Tang, Haibao; DeBarry, Jeremy D.; Tan, Xu; Li, Jingping; Wang, Xiyin; Lee, Tae-ho; Jin, Huizhe; Marler, Barry; Guo, Hui; Kissinger, Jessica C.; Paterson, Andrew H.

2012-01-01

MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/. PMID:22217600
OXSA: An open-source magnetic resonance spectroscopy analysis toolbox in MATLAB.

PubMed

Purvis, Lucian A B; Clarke, William T; Biasiolli, Luca; Valkovič, Ladislav; Robson, Matthew D; Rodgers, Christopher T

2017-01-01

In vivo magnetic resonance spectroscopy provides insight into metabolism in the human body. New acquisition protocols are often proposed to improve the quality or efficiency of data collection. Processing pipelines must also be developed to use these data optimally. Current fitting software is either targeted at general spectroscopy fitting, or for specific protocols. We therefore introduce the MATLAB-based OXford Spectroscopy Analysis (OXSA) toolbox to allow researchers to rapidly develop their own customised processing pipelines. The toolbox aims to simplify development by: being easy to install and use; seamlessly importing Siemens Digital Imaging and Communications in Medicine (DICOM) standard data; allowing visualisation of spectroscopy data; offering a robust fitting routine; flexibly specifying prior knowledge when fitting; and allowing batch processing of spectra. This article demonstrates how each of these criteria have been fulfilled, and gives technical details about the implementation in MATLAB. The code is freely available to download from https://github.com/oxsatoolbox/oxsa.
MOCCASIN: converting MATLAB ODE models to SBML.

PubMed

Gómez, Harold F; Hucka, Michael; Keating, Sarah M; Nudelman, German; Iber, Dagmar; Sealfon, Stuart C

2016-06-15

MATLAB is popular in biological research for creating and simulating models that use ordinary differential equations (ODEs). However, sharing or using these models outside of MATLAB is often problematic. A community standard such as Systems Biology Markup Language (SBML) can serve as a neutral exchange format, but translating models from MATLAB to SBML can be challenging-especially for legacy models not written with translation in mind. We developed MOCCASIN (Model ODE Converter for Creating Automated SBML INteroperability) to help. MOCCASIN can convert ODE-based MATLAB models of biochemical reaction networks into the SBML format. MOCCASIN is available under the terms of the LGPL 2.1 license (http://www.gnu.org/licenses/lgpl-2.1.html). Source code, binaries and test cases can be freely obtained from https://github.com/sbmlteam/moccasin : mhucka@caltech.edu More information is available at https://github.com/sbmlteam/moccasin. © The Author 2016. Published by Oxford University Press.
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data.

PubMed

Ching, Travers; Zhu, Xun; Garmire, Lana X

2018-04-01

Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.

regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests.

PubMed

Gel, Bernat; Díez-Villanueva, Anna; Serra, Eduard; Buschbeck, Marcus; Peinado, Miguel A; Malinverni, Roberto

2016-01-15

Statistically assessing the relation between a set of genomic regions and other genomic features is a common challenging task in genomic and epigenomic analyses. Randomization based approaches implicitly take into account the complexity of the genome without the need of assuming an underlying statistical model. regioneR is an R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. regioneR is an R package released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/regioneR). rmalinverni@carrerasresearch.org. © The Author 2015. Published by Oxford University Press.
GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data

PubMed Central

Dorff, Kevin C.; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien

2013-01-01

We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. PMID:23936070
Real-Time Motion Capture Toolbox (RTMocap): an open-source code for recording 3-D motion kinematics to study action-effect anticipations during motor and social interactions.

PubMed

Lewkowicz, Daniel; Delevoye-Turrell, Yvonne

2016-03-01

We present here a toolbox for the real-time motion capture of biological movements that runs in the cross-platform MATLAB environment (The MathWorks, Inc., Natick, MA). It provides instantaneous processing of the 3-D movement coordinates of up to 20 markers at a single instant. Available functions include (1) the setting of reference positions, areas, and trajectories of interest; (2) recording of the 3-D coordinates for each marker over the trial duration; and (3) the detection of events to use as triggers for external reinforcers (e.g., lights, sounds, or odors). Through fast online communication between the hardware controller and RTMocap, automatic trial selection is possible by means of either a preset or an adaptive criterion. Rapid preprocessing of signals is also provided, which includes artifact rejection, filtering, spline interpolation, and averaging. A key example is detailed, and three typical variations are developed (1) to provide a clear understanding of the importance of real-time control for 3-D motion in cognitive sciences and (2) to present users with simple lines of code that can be used as starting points for customizing experiments using the simple MATLAB syntax. RTMocap is freely available (http://sites.google.com/site/RTMocap/) under the GNU public license for noncommercial use and open-source development, together with sample data and extensive documentation.
XSEOS: An Open Software for Chemical Engineering Thermodynamics

ERIC Educational Resources Information Center

Castier, Marcelo

2008-01-01

An Excel add-in--XSEOS--that implements several excess Gibbs free energy models and equations of state has been developed for educational use. Several traditional and modern thermodynamic models are available in the package with a user-friendly interface. XSEOS has open code, is freely available, and should be useful for instructors and students…
Gravitational waves from rotating and precessing rigid bodies. 2: General solutions and computationally useful formulae

NASA Technical Reports Server (NTRS)

Zimmerman, M.

1979-01-01

The classical mechanics results for free precession which are needed in order to calculate the weak field, slow-motion, quadrupole-moment gravitational waves are reviewed. Within that formalism, algorithms are given for computing the exact gravitational power radiated and waveforms produced by arbitrary rigid-body freely-precessing sources. The dominant terms are presented in series expansions of the waveforms for the case of an almost spherical object precessing with a small wobble angle. These series expansions, which retain the precise frequency dependence of the waves, may be useful for gravitational astronomers when freely-precessing sources begin to be observed.
Skyline: an open source document editor for creating and analyzing targeted proteomics experiments.

PubMed

MacLean, Brendan; Tomazela, Daniela M; Shulman, Nicholas; Chambers, Matthew; Finney, Gregory L; Frewen, Barbara; Kern, Randall; Tabb, David L; Liebler, Daniel C; MacCoss, Michael J

2010-04-01

Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools. Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.
The Athena Astrophysical MHD Code in Cylindrical Geometry

NASA Astrophysics Data System (ADS)

Skinner, M. A.; Ostriker, E. C.

2011-10-01

We have developed a method for implementing cylindrical coordinates in the Athena MHD code (Skinner & Ostriker 2010). The extension has been designed to alter the existing Cartesian-coordinates code (Stone et al. 2008) as minimally and transparently as possible. The numerical equations in cylindrical coordinates are formulated to maintain consistency with constrained transport, a central feature of the Athena algorithm, while making use of previously implemented code modules such as the eigensystems and Riemann solvers. Angular-momentum transport, which is critical in astrophysical disk systems dominated by rotation, is treated carefully. We describe modifications for cylindrical coordinates of the higher-order spatial reconstruction and characteristic evolution steps as well as the finite-volume and constrained transport updates. Finally, we have developed a test suite of standard and novel problems in one-, two-, and three-dimensions designed to validate our algorithms and implementation and to be of use to other code developers. The code is suitable for use in a wide variety of astrophysical applications and is freely available for download on the web.
cloudPEST - A python module for cloud-computing deployment of PEST, a program for parameter estimation

USGS Publications Warehouse

Fienen, Michael N.; Kunicki, Thomas C.; Kester, Daniel E.

2011-01-01

This report documents cloudPEST-a Python module with functions to facilitate deployment of the model-independent parameter estimation code PEST on a cloud-computing environment. cloudPEST makes use of low-level, freely available command-line tools that interface with the Amazon Elastic Compute Cloud (EC2(TradeMark)) that are unlikely to change dramatically. This report describes the preliminary setup for both Python and EC2 tools and subsequently describes the functions themselves. The code and guidelines have been tested primarily on the Windows(Registered) operating system but are extensible to Linux(Registered).
Gravitational tree-code on graphics processing units: implementation in CUDA

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-05-01

We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s. It takes about a second to compute forces on a million particles with an opening angle of θ ≈ 0.5. The code has a convenient user interface and is freely available for use. http://castle.strw.leidenuniv.nl/software/octgrav.html
CAMPways: constrained alignment framework for the comparative analysis of a pair of metabolic pathways.

PubMed

Abaka, Gamze; Bıyıkoğlu, Türker; Erten, Cesim

2013-07-01

Given a pair of metabolic pathways, an alignment of the pathways corresponds to a mapping between similar substructures of the pair. Successful alignments may provide useful applications in phylogenetic tree reconstruction, drug design and overall may enhance our understanding of cellular metabolism. We consider the problem of providing one-to-many alignments of reactions in a pair of metabolic pathways. We first provide a constrained alignment framework applicable to the problem. We show that the constrained alignment problem even in a primitive setting is computationally intractable, which justifies efforts for designing efficient heuristics. We present our Constrained Alignment of Metabolic Pathways (CAMPways) algorithm designed for this purpose. Through extensive experiments involving a large pathway database, we demonstrate that when compared with a state-of-the-art alternative, the CAMPways algorithm provides better alignment results on metabolic networks as far as measures based on same-pathway inclusion and biochemical significance are concerned. The execution speed of our algorithm constitutes yet another important improvement over alternative algorithms. Open source codes, executable binary, useful scripts, all the experimental data and the results are freely available as part of the Supplementary Material at http://code.google.com/p/campways/. Supplementary data are available at Bioinformatics online.
MAGI: many-component galaxy initializer

NASA Astrophysics Data System (ADS)

Miki, Yohei; Umemura, Masayuki

2018-04-01

Providing initial conditions is an essential procedure for numerical simulations of galaxies. The initial conditions for idealized individual galaxies in N-body simulations should resemble observed galaxies and be dynamically stable for time-scales much longer than their characteristic dynamical times. However, generating a galaxy model ab initio as a system in dynamical equilibrium is a difficult task, since a galaxy contains several components, including a bulge, disc, and halo. Moreover, it is desirable that the initial-condition generator be fast and easy to use. We have now developed an initial-condition generator for galactic N-body simulations that satisfies these requirements. The developed generator adopts a distribution-function-based method, and it supports various kinds of density models, including custom-tabulated inputs and the presence of more than one disc. We tested the dynamical stability of systems generated by our code, representing early- and late-type galaxies, with N = 2097 152 and 8388 608 particles, respectively, and we found that the model galaxies maintain their initial distributions for at least 1 Gyr. The execution times required to generate the two models were 8.5 and 221.7 seconds, respectively, which is negligible compared to typical execution times for N-body simulations. The code is provided as open-source software and is publicly and freely available at https://bitbucket.org/ymiki/magi.
Easily extensible unix software for spectral analysis, display, modification, and synthesis of musical sounds

NASA Astrophysics Data System (ADS)

Beauchamp, James W.

2002-11-01

Software has been developed which enables users to perform time-varying spectral analysis of individual musical tones or successions of them and to perform further processing of the data. The package, called sndan, is freely available in source code, uses EPS graphics for display, and is written in ansi c for ease of code modification and extension. Two analyzers, a fixed-filter-bank phase vocoder (''pvan'') and a frequency-tracking analyzer (''mqan'') constitute the analysis front end of the package. While pvan's output consists of continuous amplitudes and frequencies of harmonics, mqan produces disjoint ''tracks.'' However, another program extracts a fundamental frequency and separates harmonics from the tracks, resulting in a continuous harmonic output. ''monan'' is a program used to display harmonic data in a variety of formats, perform various spectral modifications, and perform additive resynthesis of the harmonic partials, including possible pitch-shifting and time-scaling. Sounds can also be synthesized according to a musical score using a companion synthesis language, Music 4C. Several other programs in the sndan suite can be used for specialized tasks, such as signal display and editing. Applications of the software include producing specialized sounds for music compositions or psychoacoustic experiments or as a basis for developing new synthesis algorithms.
PENTACLE: Parallelized particle-particle particle-tree code for planet formation

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Oshino, Shoichi; Fujii, Michiko S.; Hori, Yasunori

2017-10-01

We have newly developed a parallelized particle-particle particle-tree code for planet formation, PENTACLE, which is a parallelized hybrid N-body integrator executed on a CPU-based (super)computer. PENTACLE uses a fourth-order Hermite algorithm to calculate gravitational interactions between particles within a cut-off radius and a Barnes-Hut tree method for gravity from particles beyond. It also implements an open-source library designed for full automatic parallelization of particle simulations, FDPS (Framework for Developing Particle Simulator), to parallelize a Barnes-Hut tree algorithm for a memory-distributed supercomputer. These allow us to handle 1-10 million particles in a high-resolution N-body simulation on CPU clusters for collisional dynamics, including physical collisions in a planetesimal disc. In this paper, we show the performance and the accuracy of PENTACLE in terms of \\tilde{R}_cut and a time-step Δt. It turns out that the accuracy of a hybrid N-body simulation is controlled through Δ t / \\tilde{R}_cut and Δ t / \\tilde{R}_cut ˜ 0.1 is necessary to simulate accurately the accretion process of a planet for ≥106 yr. For all those interested in large-scale particle simulations, PENTACLE, customized for planet formation, will be freely available from https://github.com/PENTACLE-Team/PENTACLE under the MIT licence.
SNPversity: a web-based tool for visualizing diversity

PubMed Central

Schott, David A; Vinnakota, Abhinav G; Portwood, John L; Andorf, Carson M

2018-01-01

Abstract Many stand-alone desktop software suites exist to visualize single nucleotide polymorphism (SNP) diversity, but web-based software that can be easily implemented and used for biological databases is absent. SNPversity was created to answer this need by building an open-source visualization tool that can be implemented on a Unix-like machine and served through a web browser that can be accessible worldwide. SNPversity consists of a HDF5 database back-end for SNPs, a data exchange layer powered by TASSEL libraries that represent data in JSON format, and an interface layer using PHP to visualize SNP information. SNPversity displays data in real-time through a web browser in grids that are color-coded according to a given SNP’s allelic status and mutational state. SNPversity is currently available at MaizeGDB, the maize community’s database, and will be soon available at GrainGenes, the clade-oriented database for Triticeae and Avena species, including wheat, barley, rye, and oat. The code and documentation are uploaded onto github, and they are freely available to the public. We expect that the tool will be highly useful for other biological databases with a similar need to display SNP diversity through their web interfaces. Database URL: https://www.maizegdb.org/snpversity PMID:29688387
IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis.

PubMed

Safonova, Yana; Bonissone, Stefano; Kurpilyansky, Eugene; Starostina, Ekaterina; Lapidus, Alla; Stinson, Jeremy; DePalatis, Laura; Sandoval, Wendy; Lill, Jennie; Pevzner, Pavel A

2015-06-15

The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. ppevzner@ucsd.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
A collection of open source applications for mass spectrometry data mining.

PubMed

Gallardo, Óscar; Ovelleiro, David; Gay, Marina; Carrascal, Montserrat; Abian, Joaquin

2014-10-01

We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front-end graphical user interface that combines several Thermo RAW formats to MASCOT™ Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
IPeak: An open source tool to combine results from multiple MS/MS search engines.

PubMed

Wen, Bo; Du, Chaoqin; Li, Guilin; Ghali, Fawaz; Jones, Andrew R; Käll, Lukas; Xu, Shaohang; Zhou, Ruo; Ren, Zhe; Feng, Qiang; Xu, Xun; Wang, Jun

2015-09-01

Liquid chromatography coupled tandem mass spectrometry (LC-MS/MS) is an important technique for detecting peptides in proteomics studies. Here, we present an open source software tool, termed IPeak, a peptide identification pipeline that is designed to combine the Percolator post-processing algorithm and multi-search strategy to enhance the sensitivity of peptide identifications without compromising accuracy. IPeak provides a graphical user interface (GUI) as well as a command-line interface, which is implemented in JAVA and can work on all three major operating system platforms: Windows, Linux/Unix and OS X. IPeak has been designed to work with the mzIdentML standard from the Proteomics Standards Initiative (PSI) as an input and output, and also been fully integrated into the associated mzidLibrary project, providing access to the overall pipeline, as well as modules for calling Percolator on individual search engine result files. The integration thus enables IPeak (and Percolator) to be used in conjunction with any software packages implementing the mzIdentML data standard. IPeak is freely available and can be downloaded under an Apache 2.0 license at https://code.google.com/p/mzidentml-lib/. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Reconstruction Toolkit (RTK), an open-source cone-beam CT reconstruction toolkit based on the Insight Toolkit (ITK)

NASA Astrophysics Data System (ADS)

Rit, S.; Vila Oliva, M.; Brousmiche, S.; Labarbe, R.; Sarrut, D.; Sharp, G. C.

2014-03-01

We propose the Reconstruction Toolkit (RTK, http://www.openrtk.org), an open-source toolkit for fast cone-beam CT reconstruction, based on the Insight Toolkit (ITK) and using GPU code extracted from Plastimatch. RTK is developed by an open consortium (see affiliations) under the non-contaminating Apache 2.0 license. The quality of the platform is daily checked with regression tests in partnership with Kitware, the company supporting ITK. Several features are already available: Elekta, Varian and IBA inputs, multi-threaded Feldkamp-David-Kress reconstruction on CPU and GPU, Parker short scan weighting, multi-threaded CPU and GPU forward projectors, etc. Each feature is either accessible through command line tools or C++ classes that can be included in independent software. A MIDAS community has been opened to share CatPhan datasets of several vendors (Elekta, Varian and IBA). RTK will be used in the upcoming cone-beam CT scanner developed by IBA for proton therapy rooms. Many features are under development: new input format support, iterative reconstruction, hybrid Monte Carlo / deterministic CBCT simulation, etc. RTK has been built to freely share tomographic reconstruction developments between researchers and is open for new contributions.
SMART-on-FHIR implemented over i2b2

PubMed Central

Mandel, Joshua C; Klann, Jeffery G; Wattanasin, Nich; Mendis, Michael; Chute, Christopher G; Mandl, Kenneth D; Murphy, Shawn N

2017-01-01

We have developed an interface to serve patient data from Informatics for Integrating Biology and the Bedside (i2b2) repositories in the Fast Healthcare Interoperability Resources (FHIR) format, referred to as a SMART-on-FHIR cell. The cell serves FHIR resources on a per-patient basis, and supports the “substitutable” modular third-party applications (SMART) OAuth2 specification for authorization of client applications. It is implemented as an i2b2 server plug-in, consisting of 6 modules: authentication, REST, i2b2-to-FHIR converter, resource enrichment, query engine, and cache. The source code is freely available as open source. We tested the cell by accessing resources from a test i2b2 installation, demonstrating that a SMART app can be launched from the cell that accesses patient data stored in i2b2. We successfully retrieved demographics, medications, labs, and diagnoses for test patients. The SMART-on-FHIR cell will enable i2b2 sites to provide simplified but secure data access in FHIR format, and will spur innovation and interoperability. Further, it transforms i2b2 into an apps platform. PMID:27274012
Do people differentiate between intrinsic and extrinsic goals for physical activity?

PubMed

McLachlan, Sarah; Hagger, Martin S

2011-04-01

The distinction between intrinsic and extrinsic goals, and between goal pursuit for intrinsically and extrinsically motivated reasons, is a central premise of self-determination theory. Proponents of the theory have proposed that the pursuit of intrinsic goals and intrinsically motivated goal striving each predict adaptive psychological and behavioral outcomes relative to the pursuit of extrinsic goals and extrinsically motivated goal striving. Despite evidence to support these predictions, research has not explored whether individuals naturally differentiate between intrinsic and extrinsic goals. Two studies tested whether people make this differentiation when recalling goals for leisure-time physical activity. Using memory-recall methods, participants in Study 1 were asked to freely generate physical activity goals. A subsample (N = 43) was asked to code their freely generated goals as intrinsic or extrinsic. In Study 2, participants were asked to recall intrinsic and extrinsic goals after making a decision regarding their future physical activity. Results of these studies revealed that individuals' goal generation and recall exhibited significant clustering by goal type. Participants encountered some difficulties when explicitly coding goals. Findings support self-determination theory and indicate that individuals discriminate between intrinsic and extrinsic goals.

Bottled SAFT: A Web App Providing SAFT-γ Mie Force Field Parameters for Thousands of Molecular Fluids.

PubMed

Ervik, Åsmund; Mejía, Andrés; Müller, Erich A

2016-09-26

Coarse-grained molecular simulation has become a popular tool for modeling simple and complex fluids alike. The defining aspects of a coarse grained model are the force field parameters, which must be determined for each particular fluid. Because the number of molecular fluids of interest in nature and in engineering processes is immense, constructing force field parameter tables by individually fitting to experimental data is a futile task. A step toward solving this challenge was taken recently by Mejía et al., who proposed a correlation that provides SAFT-γ Mie force field parameters for a fluid provided one knows the critical temperature, the acentric factor and a liquid density, all relatively accessible properties. Building on this, we have applied the correlation to more than 6000 fluids, and constructed a web application, called "Bottled SAFT", which makes this data set easily searchable by CAS number, name or chemical formula. Alternatively, the application allows the user to calculate parameters for components not present in the database. Once the intermolecular potential has been found through Bottled SAFT, code snippets are provided for simulating the desired substance using the "raaSAFT" framework, which leverages established molecular dynamics codes to run the simulations. The code underlying the web application is written in Python using the Flask microframework; this allows us to provide a modern high-performance web app while also making use of the scientific libraries available in Python. Bottled SAFT aims at taking the complexity out of obtaining force field parameters for a wide range of molecular fluids, and facilitates setting up and running coarse-grained molecular simulations. The web application is freely available at http://www.bottledsaft.org . The underlying source code is available on Bitbucket under a permissive license.
Fgf8 morphogen gradient forms by a source-sink mechanism with freely diffusing molecules.

PubMed

Yu, Shuizi Rachel; Burkhardt, Markus; Nowak, Matthias; Ries, Jonas; Petrásek, Zdenek; Scholpp, Steffen; Schwille, Petra; Brand, Michael

2009-09-24

It is widely accepted that tissue differentiation and morphogenesis in multicellular organisms are regulated by tightly controlled concentration gradients of morphogens. How exactly these gradients are formed, however, remains unclear. Here we show that Fgf8 morphogen gradients in living zebrafish embryos are established and maintained by two essential factors: fast, free diffusion of single molecules away from the source through extracellular space, and a sink function of the receiving cells, regulated by receptor-mediated endocytosis. Evidence is provided by directly examining single molecules of Fgf8 in living tissue by fluorescence correlation spectroscopy, quantifying their local mobility and concentration with high precision. By changing the degree of uptake of Fgf8 into its target cells, we are able to alter the shape of the Fgf8 gradient. Our results demonstrate that a freely diffusing morphogen can set up concentration gradients in a complex multicellular tissue by a simple source-sink mechanism.
VLF Source Localization with a Freely Drifting Acoustic Sensor Array

DTIC Science & Technology

1992-09-01

A,’. vol. 89, no. 3, pp. 1134-1158, March 1991. D’Spain, G. L., W. S. Hodgkiss, and G. L. Edmonds, "The Simultaneous Measurement of Infra - sonic ...RESULTS Marine Physical Laboratory’s set of nine freely drifting, infrasonic sensors, capable of record- ing ocean ambient noise in the I- to 25-Hz range...provide thc ship’s thrust, are a well-known contributor to the infrasonic sound field [Ross, 1976; D’Spain et. al., 1991]. The Swallow float deployment
Losing a jewel—Rapid declines in Myanmar’s intact forests from 2002-2014

PubMed Central

Horning, Ned; Khaing, Thiri; Thein, Zaw Min; Aung, Kyaw Moe; Aung, Kyaw Htet; Phyo, Paing; Tun, Ye Lin; Oo, Aung Htat; Neil, Anthony; Thu, Win Myo; Songer, Melissa; Huang, Qiongyu; Connette, Grant; Leimgruber, Peter

2017-01-01

New and rapid political and economic changes in Myanmar are increasing the pressures on the country’s forests. Yet, little is known about the past and current condition of these forests and how fast they are declining. We mapped forest cover in Myanmar through a consortium of international organizations and environmental non-governmental groups, using freely-available public domain data and open source software tools. We used Landsat satellite imagery to assess the condition and spatial distribution of Myanmar’s intact and degraded forests with special focus on changes in intact forest between 2002 and 2014. We found that forests cover 42,365,729 ha or 63% of Myanmar, making it one of the most forested countries in the region. However, severe logging, expanding plantations, and degradation pose increasing threats. Only 38% of the country’s forests can be considered intact with canopy cover >80%. Between 2002 and 2014, intact forests declined at a rate of 0.94% annually, totaling more than 2 million ha forest loss. Losses can be extremely high locally and we identified 9 townships as forest conversion hotspots. We also delineated 13 large (>100,000 ha) and contiguous intact forest landscapes, which are dispersed across Myanmar. The Northern Forest Complex supports four of these landscapes, totaling over 6.1 million ha of intact forest, followed by the Southern Forest Complex with three landscapes, comprising 1.5 million ha. These remaining contiguous forest landscape should have high priority for protection. Our project demonstrates how open source data and software can be used to develop and share critical information on forests when such data are not readily available elsewhere. We provide all data, code, and outputs freely via the internet at (for scripts: https://bitbucket.org/rsbiodiv/; for the data: http://geonode.themimu.info/layers/geonode%3Amyan_lvl2_smoothed_dec2015_resamp) PMID:28520726
Losing a jewel-Rapid declines in Myanmar's intact forests from 2002-2014.

PubMed

Bhagwat, Tejas; Hess, Andrea; Horning, Ned; Khaing, Thiri; Thein, Zaw Min; Aung, Kyaw Moe; Aung, Kyaw Htet; Phyo, Paing; Tun, Ye Lin; Oo, Aung Htat; Neil, Anthony; Thu, Win Myo; Songer, Melissa; LaJeunesse Connette, Katherine; Bernd, Asja; Huang, Qiongyu; Connette, Grant; Leimgruber, Peter

2017-01-01

New and rapid political and economic changes in Myanmar are increasing the pressures on the country's forests. Yet, little is known about the past and current condition of these forests and how fast they are declining. We mapped forest cover in Myanmar through a consortium of international organizations and environmental non-governmental groups, using freely-available public domain data and open source software tools. We used Landsat satellite imagery to assess the condition and spatial distribution of Myanmar's intact and degraded forests with special focus on changes in intact forest between 2002 and 2014. We found that forests cover 42,365,729 ha or 63% of Myanmar, making it one of the most forested countries in the region. However, severe logging, expanding plantations, and degradation pose increasing threats. Only 38% of the country's forests can be considered intact with canopy cover >80%. Between 2002 and 2014, intact forests declined at a rate of 0.94% annually, totaling more than 2 million ha forest loss. Losses can be extremely high locally and we identified 9 townships as forest conversion hotspots. We also delineated 13 large (>100,000 ha) and contiguous intact forest landscapes, which are dispersed across Myanmar. The Northern Forest Complex supports four of these landscapes, totaling over 6.1 million ha of intact forest, followed by the Southern Forest Complex with three landscapes, comprising 1.5 million ha. These remaining contiguous forest landscape should have high priority for protection. Our project demonstrates how open source data and software can be used to develop and share critical information on forests when such data are not readily available elsewhere. We provide all data, code, and outputs freely via the internet at (for scripts: https://bitbucket.org/rsbiodiv/; for the data: http://geonode.themimu.info/layers/geonode%3Amyan_lvl2_smoothed_dec2015_resamp).
METHES: A Monte Carlo collision code for the simulation of electron transport in low temperature plasmas

NASA Astrophysics Data System (ADS)

Rabie, M.; Franck, C. M.

2016-06-01

We present a freely available MATLAB code for the simulation of electron transport in arbitrary gas mixtures in the presence of uniform electric fields. For steady-state electron transport, the program provides the transport coefficients, reaction rates and the electron energy distribution function. The program uses established Monte Carlo techniques and is compatible with the electron scattering cross section files from the open-access Plasma Data Exchange Project LXCat. The code is written in object-oriented design, allowing the tracing and visualization of the spatiotemporal evolution of electron swarms and the temporal development of the mean energy and the electron number due to attachment and/or ionization processes. We benchmark our code with well-known model gases as well as the real gases argon, N2, O2, CF4, SF6 and mixtures of N2 and O2.
An Ada programming support environment

NASA Technical Reports Server (NTRS)

Tyrrill, AL; Chan, A. David

1986-01-01

The toolset of an Ada Programming Support Environment (APSE) being developed at North American Aircraft Operations (NAAO) of Rockwell International, is described. The APSE is resident on three different hosts and must support developments for the hosts and for embedded targets. Tools and developed software must be freely portable between the hosts. The toolset includes the usual editors, compilers, linkers, debuggers, configuration magnagers, and documentation tools. Generally, these are being supplied by the host computer vendors. Other tools, for example, pretty printer, cross referencer, compilation order tool, and management tools were obtained from public-domain sources, are implemented in Ada and are being ported to the hosts. Several tools being implemented in-house are of interest, these include an Ada Design Language processor based on compilable Ada. A Standalone Test Environment Generator facilitates test tool construction and partially automates unit level testing. A Code Auditor/Static Analyzer permits the Ada programs to be evaluated against measures of quality. An Ada Comment Box Generator partially automates generation of header comment boxes.
PathwayAccess: CellDesigner plugins for pathway databases.

PubMed

Van Hemert, John L; Dickerson, Julie A

2010-09-15

CellDesigner provides a user-friendly interface for graphical biochemical pathway description. Many pathway databases are not directly exportable to CellDesigner models. PathwayAccess is an extensible suite of CellDesigner plugins, which connect CellDesigner directly to pathway databases using respective Java application programming interfaces. The process is streamlined for creating new PathwayAccess plugins for specific pathway databases. Three PathwayAccess plugins, MetNetAccess, BioCycAccess and ReactomeAccess, directly connect CellDesigner to the pathway databases MetNetDB, BioCyc and Reactome. PathwayAccess plugins enable CellDesigner users to expose pathway data to analytical CellDesigner functions, curate their pathway databases and visually integrate pathway data from different databases using standard Systems Biology Markup Language and Systems Biology Graphical Notation. Implemented in Java, PathwayAccess plugins run with CellDesigner version 4.0.1 and were tested on Ubuntu Linux, Windows XP and 7, and MacOSX. Source code, binaries, documentation and video walkthroughs are freely available at http://vrac.iastate.edu/~jlv.
Fluidica CFD software for fluids instruction

NASA Astrophysics Data System (ADS)

Colonius, Tim

2008-11-01

Fluidica is an open-source freely available Matlab graphical user interface (GUI) to to an immersed-boundary Navier- Stokes solver. The algorithm is programmed in Fortran and compiled into Matlab as mex-function. The user can create external flows about arbitrarily complex bodies and collections of free vortices. The code runs fast enough for complex 2D flows to be computed and visualized in real-time on the screen. This facilitates its use in homework and in the classroom for demonstrations of various potential-flow and viscous flow phenomena. The GUI has been written with the goal of allowing the student to learn how to use the software as she goes along. The user can select which quantities are viewed on the screen, including contours of various scalars, velocity vectors, streamlines, particle trajectories, streaklines, and finite-time Lyapunov exponents. In this talk, we demonstrate the software in the context of worked classroom examples demonstrating lift and drag, starting vortices, separation, and vortex dynamics.
Primer3_masker: integrating masking of template sequence with primer design software.

PubMed

Kõressaar, Triinu; Lepamets, Maarja; Kaplinski, Lauris; Raime, Kairi; Andreson, Reidar; Remm, Maido

2018-06-01

Designing PCR primers for amplifying regions of eukaryotic genomes is a complicated task because the genomes contain a large number of repeat sequences and other regions unsuitable for amplification by PCR. We have developed a novel k-mer based masking method that uses a statistical model to detect and mask failure-prone regions on the DNA template prior to primer design. We implemented the software as a standalone software primer3_masker and integrated it into the primer design program Primer3. The standalone version of primer3_masker is implemented in C. The source code is freely available at https://github.com/bioinfo-ut/primer3_masker/ (standalone version for Linux and macOS) and at https://github.com/primer3-org/primer3/ (integrated version). Primer3 web application that allows masking sequences of 196 animal and plant genomes is available at http://primer3.ut.ee/. maido.remm@ut.ee. Supplementary data are available at Bioinformatics online.
BigWig and BigBed: enabling browsing of large distributed datasets.

PubMed

Kent, W J; Zweig, A S; Barber, G; Hinrichs, A S; Karolchik, D

2010-09-01

BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/. Source code for the creation and visualization software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. The UCSC Genome Browser is available at http://genome.ucsc.edu.
Fast scaffolding with small independent mixed integer programs

PubMed Central

Salmela, Leena; Mäkinen, Veli; Välimäki, Niko; Ylinen, Johannes; Ukkonen, Esko

2011-01-01

Motivation: Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study the scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data. Results: We present MIP Scaffolder that divides the scaffolding problem into smaller subproblems and solves these with mixed integer programming. The scaffolding problem can be represented as a graph and the biconnected components of this graph can be solved independently. We present a technique for restricting the size of these subproblems so that they can be solved accurately with mixed integer programming. We compare MIP Scaffolder to two state of the art methods, SOPRA and SSPACE. MIP Scaffolder is fast and produces better or as good scaffolds as its competitors on large genomes. Availability: The source code of MIP Scaffolder is freely available at http://www.cs.helsinki.fi/u/lmsalmel/mip-scaffolder/. Contact: leena.salmela@cs.helsinki.fi PMID:21998153
Biological knowledge bases using Wikis: combining the flexibility of Wikis with the structure of databases.

PubMed

Brohée, Sylvain; Barriot, Roland; Moreau, Yves

2010-09-01

In recent years, the number of knowledge bases developed using Wiki technology has exploded. Unfortunately, next to their numerous advantages, classical Wikis present a critical limitation: the invaluable knowledge they gather is represented as free text, which hinders their computational exploitation. This is in sharp contrast with the current practice for biological databases where the data is made available in a structured way. Here, we present WikiOpener an extension for the classical MediaWiki engine that augments Wiki pages by allowing on-the-fly querying and formatting resources external to the Wiki. Those resources may provide data extracted from databases or DAS tracks, or even results returned by local or remote bioinformatics analysis tools. This also implies that structured data can be edited via dedicated forms. Hence, this generic resource combines the structure of biological databases with the flexibility of collaborative Wikis. The source code and its documentation are freely available on the MediaWiki website: http://www.mediawiki.org/wiki/Extension:WikiOpener.
Azahar: a PyMOL plugin for construction, visualization and analysis of glycan molecules

NASA Astrophysics Data System (ADS)

Arroyuelo, Agustina; Vila, Jorge A.; Martin, Osvaldo A.

2016-08-01

Glycans are key molecules in many physiological and pathological processes. As with other molecules, like proteins, visualization of the 3D structures of glycans adds valuable information for understanding their biological function. Hence, here we introduce Azahar, a computing environment for the creation, visualization and analysis of glycan molecules. Azahar is implemented in Python and works as a plugin for the well known PyMOL package (Schrodinger in The PyMOL molecular graphics system, version 1.3r1, 2010). Besides the already available visualization and analysis options provided by PyMOL, Azahar includes 3 cartoon-like representations and tools for 3D structure caracterization such as a comformational search using a Monte Carlo with minimization routine and also tools to analyse single glycans or trajectories/ensembles including the calculation of radius of gyration, Ramachandran plots and hydrogen bonds. Azahar is freely available to download from http://www.pymolwiki.org/index.php/Azahar and the source code is available at https://github.com/agustinaarroyuelo/Azahar.
An Analysis of Scalable GPU-Based Ray-Guided Volume Rendering

PubMed Central

Fogal, Thomas; Schiewe, Alexander; Krüger, Jens

2014-01-01

Volume rendering continues to be a critical method for analyzing large-scale scalar fields, in disciplines as diverse as biomedical engineering and computational fluid dynamics. Commodity desktop hardware has struggled to keep pace with data size increases, challenging modern visualization software to deliver responsive interactions for O(N3) algorithms such as volume rendering. We target the data type common in these domains: regularly-structured data. In this work, we demonstrate that the major limitation of most volume rendering approaches is their inability to switch the data sampling rate (and thus data size) quickly. Using a volume renderer inspired by recent work, we demonstrate that the actual amount of visualizable data for a scene is typically bound considerably lower than the memory available on a commodity GPU. Our instrumented renderer is used to investigate design decisions typically swept under the rug in volume rendering literature. The renderer is freely available, with binaries for all major platforms as well as full source code, to encourage reproduction and comparison with future research. PMID:25506079
Excel2Genie: A Microsoft Excel application to improve the flexibility of the Genie-2000 Spectroscopic software.

PubMed

Forgács, Attila; Balkay, László; Trón, Lajos; Raics, Péter

2014-12-01

Excel2Genie, a simple and user-friendly Microsoft Excel interface, has been developed to the Genie-2000 Spectroscopic Software of Canberra Industries. This Excel application can directly control Canberra Multichannel Analyzer (MCA), process the acquired data and visualize them. Combination of Genie-2000 with Excel2Genie results in remarkably increased flexibility and a possibility to carry out repetitive data acquisitions even with changing parameters and more sophisticated analysis. The developed software package comprises three worksheets: display parameters and results of data acquisition, data analysis and mathematical operations carried out on the measured gamma spectra. At the same time it also allows control of these processes. Excel2Genie is freely available to assist gamma spectrum measurements and data evaluation by the interested Canberra users. With access to the Visual Basic Application (VBA) source code of this application users are enabled to modify the developed interface according to their intentions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Phyx: phylogenetic tools for unix.

PubMed

Brown, Joseph W; Walker, Joseph F; Smith, Stephen A

2017-06-15

The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx : a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx. eebsmith@umich.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Phyx: phylogenetic tools for unix

PubMed Central

Brown, Joseph W.; Walker, Joseph F.; Smith, Stephen A.

2017-01-01

Abstract Summary: The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx: a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. Availability and Implementation: phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx Contact: eebsmith@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28174903
G2S: a web-service for annotating genomic variants on 3D protein structures.

PubMed

Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

2018-06-01

Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.
Faster sequence homology searches by clustering subsequences.

PubMed

Suzuki, Shuji; Kakuta, Masanori; Ishida, Takashi; Akiyama, Yutaka

2015-04-15

Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX. The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ akiyama@cs.titech.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

DASS-GUI: a user interface for identification and analysis of significant patterns in non-sequential data.

PubMed

Hollunder, Jens; Friedel, Maik; Kuiper, Martin; Wilhelm, Thomas

2010-04-01

Many large 'omics' datasets have been published and many more are expected in the near future. New analysis methods are needed for best exploitation. We have developed a graphical user interface (GUI) for easy data analysis. Our discovery of all significant substructures (DASS) approach elucidates the underlying modularity, a typical feature of complex biological data. It is related to biclustering and other data mining approaches. Importantly, DASS-GUI also allows handling of multi-sets and calculation of statistical significances. DASS-GUI contains tools for further analysis of the identified patterns: analysis of the pattern hierarchy, enrichment analysis, module validation, analysis of additional numerical data, easy handling of synonymous names, clustering, filtering and merging. Different export options allow easy usage of additional tools such as Cytoscape. Source code, pre-compiled binaries for different systems, a comprehensive tutorial, case studies and many additional datasets are freely available at http://www.ifr.ac.uk/dass/gui/. DASS-GUI is implemented in Qt.
An interactive environment for the analysis of large Earth observation and model data sets

NASA Technical Reports Server (NTRS)

Bowman, Kenneth P.; Walsh, John E.; Wilhelmson, Robert B.

1994-01-01

Envision is an interactive environment that provides researchers in the earth sciences convenient ways to manage, browse, and visualize large observed or model data sets. Its main features are support for the netCDF and HDF file formats, an easy to use X/Motif user interface, a client-server configuration, and portability to many UNIX workstations. The Envision package also provides new ways to view and change metadata in a set of data files. It permits a scientist to conveniently and efficiently manage large data sets consisting of many data files. It also provides links to popular visualization tools so that data can be quickly browsed. Envision is a public domain package, freely available to the scientific community. Envision software (binaries and source code) and documentation can be obtained from either of these servers: ftp://vista.atmos.uiuc.edu/pub/envision/ and ftp://csrp.tamu.edu/pub/envision/. Detailed descriptions of Envision capabilities and operations can be found in the User's Guide and Reference Manuals distributed with Envision software.
Genome assembly from synthetic long read clouds

PubMed Central

Kuleshov, Volodymyr; Snyder, Michael P.; Batzoglou, Serafim

2016-01-01

Motivation: Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads. Results: Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR’s underlying short reads, which we refer to as read clouds. This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads. Availability and Implementation: Our source code is freely available at https://github.com/kuleshov/architect. Contact: kuleshov@stanford.edu PMID:27307620
Brain Tumor Database, a free relational database for collection and analysis of brain tumor patient information.

PubMed

Bergamino, Maurizio; Hamilton, David J; Castelletti, Lara; Barletta, Laura; Castellan, Lucio

2015-03-01

In this study, we describe the development and utilization of a relational database designed to manage the clinical and radiological data of patients with brain tumors. The Brain Tumor Database was implemented using MySQL v.5.0, while the graphical user interface was created using PHP and HTML, thus making it easily accessible through a web browser. This web-based approach allows for multiple institutions to potentially access the database. The BT Database can record brain tumor patient information (e.g. clinical features, anatomical attributes, and radiological characteristics) and be used for clinical and research purposes. Analytic tools to automatically generate statistics and different plots are provided. The BT Database is a free and powerful user-friendly tool with a wide range of possible clinical and research applications in neurology and neurosurgery. The BT Database graphical user interface source code and manual are freely available at http://tumorsdatabase.altervista.org. © The Author(s) 2013.
libSRES: a C library for stochastic ranking evolution strategy for parameter estimation.

PubMed

Ji, Xinglai; Xu, Ying

2006-01-01

Estimation of kinetic parameters in a biochemical pathway or network represents a common problem in systems studies of biological processes. We have implemented a C library, named libSRES, to facilitate a fast implementation of computer software for study of non-linear biochemical pathways. This library implements a (mu, lambda)-ES evolutionary optimization algorithm that uses stochastic ranking as the constraint handling technique. Considering the amount of computing time it might require to solve a parameter-estimation problem, an MPI version of libSRES is provided for parallel implementation, as well as a simple user interface. libSRES is freely available and could be used directly in any C program as a library function. We have extensively tested the performance of libSRES on various pathway parameter-estimation problems and found its performance to be satisfactory. The source code (in C) is free for academic users at http://csbl.bmb.uga.edu/~jix/science/libSRES/
iFORM: Incorporating Find Occurrence of Regulatory Motifs.

PubMed

Ren, Chao; Chen, Hebing; Yang, Bite; Liu, Feng; Ouyang, Zhangyi; Bo, Xiaochen; Shu, Wenjie

2016-01-01

Accurately identifying the binding sites of transcription factors (TFs) is crucial to understanding the mechanisms of transcriptional regulation and human disease. We present incorporating Find Occurrence of Regulatory Motifs (iFORM), an easy-to-use and efficient tool for scanning DNA sequences with TF motifs described as position weight matrices (PWMs). Both performance assessment with a receiver operating characteristic (ROC) curve and a correlation-based approach demonstrated that iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher's combined probability test. We have used iFORM to provide accurate results on a variety of data in the ENCODE Project and the NIH Roadmap Epigenomics Project, and the tool has demonstrated its utility in further elucidating individual roles of functional elements. Both the source and binary codes for iFORM can be freely accessed at https://github.com/wenjiegroup/iFORM. The identified TF binding sites across human cell and tissue types using iFORM have been deposited in the Gene Expression Omnibus under the accession ID GSE53962.
SVGMap: configurable image browser for experimental data.

PubMed

Rafael-Palou, Xavier; Schroeder, Michael P; Lopez-Bigas, Nuria

2012-01-01

Spatial data visualization is very useful to represent biological data and quickly interpret the results. For instance, to show the expression pattern of a gene in different tissues of a fly, an intuitive approach is to draw the fly with the corresponding tissues and color the expression of the gene in each of them. However, the creation of these visual representations may be a burdensome task. Here we present SVGMap, a java application that automatizes the generation of high-quality graphics for singular data items (e.g. genes) and biological conditions. SVGMap contains a browser that allows the user to navigate the different images created and can be used as a web-based results publishing tool. SVGMap is freely available as precompiled java package as well as source code at http://bg.upf.edu/svgmap. It requires Java 6 and any recent web browser with JavaScript enabled. The software can be run on Linux, Mac OS X and Windows systems. nuria.lopez@upf.edu
JASPAR RESTful API: accessing JASPAR data from any programming language.

PubMed

Khan, Aziz; Mathelier, Anthony

2018-05-01

JASPAR is a widely used open-access database of curated, non-redundant transcription factor binding profiles. Currently, data from JASPAR can be retrieved as flat files or by using programming language-specific interfaces. Here, we present a programming language-independent application programming interface (API) to access JASPAR data using the Representational State Transfer (REST) architecture. The REST API enables programmatic access to JASPAR by most programming languages and returns data in eight widely used formats. Several endpoints are available to access the data and an endpoint is available to infer the TF binding profile(s) likely bound by a given DNA binding domain protein sequence. Additionally, it provides an interactive browsable interface for bioinformatics tool developers. This REST API is implemented in Python using the Django REST Framework. It is accessible at http://jaspar.genereg.net/api/ and the source code is freely available at https://bitbucket.org/CBGR/jaspar under GPL v3 license. aziz.khan@ncmm.uio.no or anthony.mathelier@ncmm.uio.no. Supplementary data are available at Bioinformatics online.
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

DOE PAGES

Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

2014-09-29

Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
bwtool: a tool for bigWig files

PubMed Central

Pohl, Andy; Beato, Miguel

2014-01-01

BigWig files are a compressed, indexed, binary format for genome-wide signal data for calculations (e.g. GC percent) or experiments (e.g. ChIP-seq/RNA-seq read depth). bwtool is a tool designed to read bigWig files rapidly and efficiently, providing functionality for extracting data and summarizing it in several ways, globally or at specific regions. Additionally, the tool enables the conversion of the positions of signal data from one genome assembly to another, also known as ‘lifting’. We believe bwtool can be useful for the analyst frequently working with bigWig data, which is becoming a standard format to represent functional signals along genomes. The article includes supplementary examples of running the software. Availability and implementation: The C source code is freely available under the GNU public license v3 at http://cromatina.crg.eu/bwtool. Contact: andrew.pohl@crg.eu, andypohl@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24489365
TreSpEx—Detection of Misleading Signal in Phylogenetic Reconstructions Based on Tree Information

PubMed Central

Struck, Torsten H

2014-01-01

Phylogenies of species or genes are commonplace nowadays in many areas of comparative biological studies. However, for phylogenetic reconstructions one must refer to artificial signals such as paralogy, long-branch attraction, saturation, or conflict between different datasets. These signals might eventually mislead the reconstruction even in phylogenomic studies employing hundreds of genes. Unfortunately, there has been no program allowing the detection of such effects in combination with an implementation into automatic process pipelines. TreSpEx (Tree Space Explorer) now combines different approaches (including statistical tests), which utilize tree-based information like nodal support or patristic distances (PDs) to identify misleading signals. The program enables the parallel analysis of hundreds of trees and/or predefined gene partitions, and being command-line driven, it can be integrated into automatic process pipelines. TreSpEx is implemented in Perl and supported on Linux, Mac OS X, and MS Windows. Source code, binaries, and additional material are freely available at http://www.annelida.de/research/bioinformatics/software.html. PMID:24701118
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data

PubMed Central

Ching, Travers; Zhu, Xun

2018-01-01

Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet. PMID:29634719
DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis.

PubMed

Yu, Guangchuang; Wang, Li-Gen; Yan, Guang-Rong; He, Qing-Yu

2015-02-15

Disease ontology (DO) annotates human genes in the context of disease. DO is important annotation in translating molecular findings from high-throughput data to clinical relevance. DOSE is an R package providing semantic similarity computations among DO terms and genes which allows biologists to explore the similarities of diseases and of gene functions in disease perspective. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented to support discovering disease associations of high-throughput biological data. This allows biologists to verify disease relevance in a biological experiment and identify unexpected disease associations. Comparison among gene clusters is also supported. DOSE is released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/release/bioc/html/DOSE.html). Supplementary data are available at Bioinformatics online. gcyu@connect.hku.hk or tqyhe@jnu.edu.cn. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Charged particle tracking through electrostatic wire meshes using the finite element method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devlin, L. J.; Karamyshev, O.; Welsch, C. P., E-mail: carsten.welsch@cockcroft.ac.uk

Wire meshes are used across many disciplines to accelerate and focus charged particles, however, analytical solutions are non-exact and few codes exist which simulate the exact fields around a mesh with physical sizes. A tracking code based in Matlab-Simulink using field maps generated using finite element software has been developed which tracks electrons or ions through electrostatic wire meshes. The fields around such a geometry are presented as an analytical expression using several basic assumptions, however, it is apparent that computational calculations are required to obtain realistic values of electric potential and fields, particularly when multiple wire meshes are deployed.more » The tracking code is flexible in that any quantitatively describable particle distribution can be used for both electrons and ions as well as other benefits such as ease of export to other programs for analysis. The code is made freely available and physical examples are highlighted where this code could be beneficial for different applications.« less
LabKey Server: an open source platform for scientific data integration, analysis and collaboration.

PubMed

Nelson, Elizabeth K; Piehler, Britt; Eckels, Josh; Rauch, Adam; Bellew, Matthew; Hussey, Peter; Ramsay, Sarah; Nathe, Cory; Lum, Karl; Krouse, Kevin; Stearns, David; Connolly, Brian; Skillman, Tom; Igra, Mark

2011-03-09

Broad-based collaborations are becoming increasingly common among disease researchers. For example, the Global HIV Enterprise has united cross-disciplinary consortia to speed progress towards HIV vaccines through coordinated research across the boundaries of institutions, continents and specialties. New, end-to-end software tools for data and specimen management are necessary to achieve the ambitious goals of such alliances. These tools must enable researchers to organize and integrate heterogeneous data early in the discovery process, standardize processes, gain new insights into pooled data and collaborate securely. To meet these needs, we enhanced the LabKey Server platform, formerly known as CPAS. This freely available, open source software is maintained by professional engineers who use commercially proven practices for software development and maintenance. Recent enhancements support: (i) Submitting specimens requests across collaborating organizations (ii) Graphically defining new experimental data types, metadata and wizards for data collection (iii) Transitioning experimental results from a multiplicity of spreadsheets to custom tables in a shared database (iv) Securely organizing, integrating, analyzing, visualizing and sharing diverse data types, from clinical records to specimens to complex assays (v) Interacting dynamically with external data sources (vi) Tracking study participants and cohorts over time (vii) Developing custom interfaces using client libraries (viii) Authoring custom visualizations in a built-in R scripting environment. Diverse research organizations have adopted and adapted LabKey Server, including consortia within the Global HIV Enterprise. Atlas is an installation of LabKey Server that has been tailored to serve these consortia. It is in production use and demonstrates the core capabilities of LabKey Server. Atlas now has over 2,800 active user accounts originating from approximately 36 countries and 350 organizations. It tracks roughly 27,000 assay runs, 860,000 specimen vials and 1,300,000 vial transfers. Sharing data, analysis tools and infrastructure can speed the efforts of large research consortia by enhancing efficiency and enabling new insights. The Atlas installation of LabKey Server demonstrates the utility of the LabKey platform for collaborative research. Stable, supported builds of LabKey Server are freely available for download at http://www.labkey.org. Documentation and source code are available under the Apache License 2.0.
LabKey Server: An open source platform for scientific data integration, analysis and collaboration

PubMed Central

2011-01-01

Background Broad-based collaborations are becoming increasingly common among disease researchers. For example, the Global HIV Enterprise has united cross-disciplinary consortia to speed progress towards HIV vaccines through coordinated research across the boundaries of institutions, continents and specialties. New, end-to-end software tools for data and specimen management are necessary to achieve the ambitious goals of such alliances. These tools must enable researchers to organize and integrate heterogeneous data early in the discovery process, standardize processes, gain new insights into pooled data and collaborate securely. Results To meet these needs, we enhanced the LabKey Server platform, formerly known as CPAS. This freely available, open source software is maintained by professional engineers who use commercially proven practices for software development and maintenance. Recent enhancements support: (i) Submitting specimens requests across collaborating organizations (ii) Graphically defining new experimental data types, metadata and wizards for data collection (iii) Transitioning experimental results from a multiplicity of spreadsheets to custom tables in a shared database (iv) Securely organizing, integrating, analyzing, visualizing and sharing diverse data types, from clinical records to specimens to complex assays (v) Interacting dynamically with external data sources (vi) Tracking study participants and cohorts over time (vii) Developing custom interfaces using client libraries (viii) Authoring custom visualizations in a built-in R scripting environment. Diverse research organizations have adopted and adapted LabKey Server, including consortia within the Global HIV Enterprise. Atlas is an installation of LabKey Server that has been tailored to serve these consortia. It is in production use and demonstrates the core capabilities of LabKey Server. Atlas now has over 2,800 active user accounts originating from approximately 36 countries and 350 organizations. It tracks roughly 27,000 assay runs, 860,000 specimen vials and 1,300,000 vial transfers. Conclusions Sharing data, analysis tools and infrastructure can speed the efforts of large research consortia by enhancing efficiency and enabling new insights. The Atlas installation of LabKey Server demonstrates the utility of the LabKey platform for collaborative research. Stable, supported builds of LabKey Server are freely available for download at http://www.labkey.org. Documentation and source code are available under the Apache License 2.0. PMID:21385461
AphasiaBank: a resource for clinicians.

PubMed

Forbes, Margaret M; Fromm, Davida; Macwhinney, Brian

2012-08-01

AphasiaBank is a shared, multimedia database containing videos and transcriptions of ~180 aphasic individuals and 140 nonaphasic controls performing a uniform set of discourse tasks. The language in the videos is transcribed in Codes for the Human Analysis of Transcripts (CHAT) format and coded for analysis with Computerized Language ANalysis (CLAN) programs, which can perform a wide variety of language analyses. The database and the CLAN programs are freely available to aphasia researchers and clinicians for educational, clinical, and scholarly uses. This article describes the database, suggests some ways in which clinicians and clinician researchers might find these materials useful, and introduces a new language analysis program, EVAL, designed to streamline the transcription and coding processes, while still producing an extensive and useful language profile. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Inventory of Data Sources for Estimating Health Care Costs in the United States

PubMed Central

Lund, Jennifer L.; Yabroff, K. Robin; Ibuka, Yoko; Russell, Louise B.; Barnett, Paul G.; Lipscomb, Joseph; Lawrence, William F.; Brown, Martin L.

2011-01-01

Objective To develop an inventory of data sources for estimating health care costs in the United States and provide information to aid researchers in identifying appropriate data sources for their specific research questions. Methods We identified data sources for estimating health care costs using 3 approaches: (1) a review of the 18 articles included in this supplement, (2) an evaluation of websites of federal government agencies, non profit foundations, and related societies that support health care research or provide health care services, and (3) a systematic review of the recently published literature. Descriptive information was abstracted from each data source, including sponsor, website, lowest level of data aggregation, type of data source, population included, cross-sectional or longitudinal data capture, source of diagnosis information, and cost of obtaining the data source. Details about the cost elements available in each data source were also abstracted. Results We identified 88 data sources that can be used to estimate health care costs in the United States. Most data sources were sponsored by government agencies, national or nationally representative, and cross-sectional. About 40% were surveys, followed by administrative or linked administrative data, fee or cost schedules, discharges, and other types of data. Diagnosis information was available in most data sources through procedure or diagnosis codes, self-report, registry, or chart review. Cost elements included inpatient hospitalizations (42.0%), physician and other outpatient services (45.5%), outpatient pharmacy or laboratory (28.4%), out-of-pocket (22.7%), patient time and other direct nonmedical costs (35.2%), and wages (13.6%). About half were freely available for downloading or available for a nominal fee, and the cost of obtaining the remaining data sources varied by the scope of the project. Conclusions Available data sources vary in population included, type of data source, scope, and accessibility, and have different strengths and weaknesses for specific research questions. PMID:19536009
Syndrome-source-coding and its universal generalization. [error correcting codes for data compression

NASA Technical Reports Server (NTRS)

Ancheta, T. C., Jr.

1976-01-01

A method of using error-correcting codes to obtain data compression, called syndrome-source-coding, is described in which the source sequence is treated as an error pattern whose syndrome forms the compressed data. It is shown that syndrome-source-coding can achieve arbitrarily small distortion with the number of compressed digits per source digit arbitrarily close to the entropy of a binary memoryless source. A 'universal' generalization of syndrome-source-coding is formulated which provides robustly effective distortionless coding of source ensembles. Two examples are given, comparing the performance of noiseless universal syndrome-source-coding to (1) run-length coding and (2) Lynch-Davisson-Schalkwijk-Cover universal coding for an ensemble of binary memoryless sources.

Skyline: an open source document editor for creating and analyzing targeted proteomics experiments

PubMed Central

MacLean, Brendan; Tomazela, Daniela M.; Shulman, Nicholas; Chambers, Matthew; Finney, Gregory L.; Frewen, Barbara; Kern, Randall; Tabb, David L.; Liebler, Daniel C.; MacCoss, Michael J.

2010-01-01

Summary: Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools. Availability: Single-click, self-updating web installation is available at http://proteome.gs.washington.edu/software/skyline. This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project. Contact: brendanx@u.washington.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20147306
Advantages and Disadvantages in Image Processing with Free Software in Radiology.

PubMed

Mujika, Katrin Muradas; Méndez, Juan Antonio Juanes; de Miguel, Andrés Framiñan

2018-01-15

Currently, there are sophisticated applications that make it possible to visualize medical images and even to manipulate them. These software applications are of great interest, both from a teaching and a radiological perspective. In addition, some of these applications are known as Free Open Source Software because they are free and the source code is freely available, and therefore it can be easily obtained even on personal computers. Two examples of free open source software are Osirix Lite® and 3D Slicer®. However, this last group of free applications have limitations in its use. For the radiological field, manipulating and post-processing images is increasingly important. Consequently, sophisticated computing tools that combine software and hardware to process medical images are needed. In radiology, graphic workstations allow their users to process, review, analyse, communicate and exchange multidimensional digital images acquired with different image-capturing radiological devices. These radiological devices are basically CT (Computerised Tomography), MRI (Magnetic Resonance Imaging), PET (Positron Emission Tomography), etc. Nevertheless, the programs included in these workstations have a high cost which always depends on the software provider and is always subject to its norms and requirements. With this study, we aim to present the advantages and disadvantages of these radiological image visualization systems in the advanced management of radiological studies. We will compare the features of the VITREA2® and AW VolumeShare 5® radiology workstation with free open source software applications like OsiriX® and 3D Slicer®, with examples from specific studies.
iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D

PubMed Central

Liluashvili, Vaja; Kalayci, Selim; Fluder, Eugene; Wilson, Manda; Gabow, Aaron

2017-01-01

Abstract Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field. PMID:28814063
iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D.

PubMed

Liluashvili, Vaja; Kalayci, Selim; Fluder, Eugene; Wilson, Manda; Gabow, Aaron; Gümüs, Zeynep H

2017-08-01

Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field. © The Authors 2017. Published by Oxford University Press.
CoNekT: an open-source framework for comparative genomic and transcriptomic network analyses.

PubMed

Proost, Sebastian; Mutwil, Marek

2018-05-01

The recent accumulation of gene expression data in the form of RNA sequencing creates unprecedented opportunities to study gene regulation and function. Furthermore, comparative analysis of the expression data from multiple species can elucidate which functional gene modules are conserved across species, allowing the study of the evolution of these modules. However, performing such comparative analyses on raw data is not feasible for many biologists. Here, we present CoNekT (Co-expression Network Toolkit), an open source web server, that contains user-friendly tools and interactive visualizations for comparative analyses of gene expression data and co-expression networks. These tools allow analysis and cross-species comparison of (i) gene expression profiles; (ii) co-expression networks; (iii) co-expressed clusters involved in specific biological processes; (iv) tissue-specific gene expression; and (v) expression profiles of gene families. To demonstrate these features, we constructed CoNekT-Plants for green alga, seed plants and flowering plants (Picea abies, Chlamydomonas reinhardtii, Vitis vinifera, Arabidopsis thaliana, Oryza sativa, Zea mays and Solanum lycopersicum) and thus provide a web-tool with the broadest available collection of plant phyla. CoNekT-Plants is freely available from http://conekt.plant.tools, while the CoNekT source code and documentation can be found at https://github.molgen.mpg.de/proost/CoNekT/.
Identification of common, unique and polymorphic microsatellites among 73 cyanobacterial genomes.

PubMed

Kabra, Ritika; Kapil, Aditi; Attarwala, Kherunnisa; Rai, Piyush Kant; Shanker, Asheesh

2016-04-01

Microsatellites also known as Simple Sequence Repeats are short tandem repeats of 1-6 nucleotides. These repeats are found in coding as well as non-coding regions of both prokaryotic and eukaryotic genomes and play a significant role in the study of gene regulation, genetic mapping, DNA fingerprinting and evolutionary studies. The availability of 73 complete genome sequences of cyanobacteria enabled us to mine and statistically analyze microsatellites in these genomes. The cyanobacterial microsatellites identified through bioinformatics analysis were stored in a user-friendly database named CyanoSat, which is an efficient data representation and query system designed using ASP.net. The information in CyanoSat comprises of perfect, imperfect and compound microsatellites found in coding, non-coding and coding-non-coding regions. Moreover, it contains PCR primers with 200 nucleotides long flanking region. The mined cyanobacterial microsatellites can be freely accessed at www.compubio.in/CyanoSat/home.aspx. In addition to this 82 polymorphic, 13,866 unique and 2390 common microsatellites were also detected. These microsatellites will be useful in strain identification and genetic diversity studies of cyanobacteria.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gabriele, Fatuzzo; Michele, Mangiameli, E-mail: amichele.mangiameli@dica.unict.it; Giuseppe, Mussumeci

The laser scanning is a technology that allows in a short time to run the relief geometric objects with a high level of detail and completeness, based on the signal emitted by the laser and the corresponding return signal. When the incident laser radiation hits the object to detect, then the radiation is reflected. The purpose is to build a three-dimensional digital model that allows to reconstruct the reality of the object and to conduct studies regarding the design, restoration and/or conservation. When the laser scanner is equipped with a digital camera, the result of the measurement process is amore » set of points in XYZ coordinates showing a high density and accuracy with radiometric and RGB tones. In this case, the set of measured points is called “point cloud” and allows the reconstruction of the Digital Surface Model. Even the post-processing is usually performed by closed source software, which is characterized by Copyright restricting the free use, free and open source software can increase the performance by far. Indeed, this latter can be freely used providing the possibility to display and even custom the source code. The experience started at the Faculty of Engineering in Catania is aimed at finding a valuable free and open source tool, MeshLab (Italian Software for data processing), to be compared with a reference closed source software for data processing, i.e. RapidForm. In this work, we compare the results obtained with MeshLab and Rapidform through the planning of the survey and the acquisition of the point cloud of a morphologically complex statue.« less
[Drug consumption and risks of polypharmacotherapy in elderly population].

PubMed

Kubesová, H; Holík, J; Weber, P; Polcarová, V; Matejovský, J; Mazalová, K; Slapák, J

2006-01-01

One of our previous studies was aimed at the consumption of prescribed drugs by the elderly population. The average per day number of drugs was 4.6 (maximum 13). Existence of freely obtainable drugs with massive advertisements brings a question, how many of those drugs it is necessary to add in order to estimate probability of interaction and undesirable drug effects. In order to achieve valid information, students of the sixth year of General medicine program during their practical course at general practitioners were asked to interview randomly selected senior patients. They asked on the number, type, and price of freely obtainable drugs which they use. Data were evaluated from interviews accomplished during academic years 2001/2002 and 2004/2005. Our cohort included 252 men and 148 women with average age of 78.7 years. Average number of freely obtainable drugs was 2.26 at the beginning and 2.32 at the end of study. Only 34% of questioned did not buy any of those drugs at all or only exceptionally, 66% reported buying once a month or weekly. 44% of seniors buy analgetics, 58% buy vitamins, 37% food supplements, 36% non steroid antirheumatics, 46% cold prevention drugs, 30% anti-constipation drugs. Contrary to our expectation, positive correlation between the sums given for the personal participation on the drug costs and that given for freely obtainable drugs was found. It is not possible to expect, that polymorbidic patient with several prescribed drugs would buy less of freely obtainable drugs even due to the financial requirements. Freely obtainable drugs, many of them composites, can represent significant source of interactions and undesirable drug effects. They can also significantly modulate compliance of the senior. The high percentage of seniors buying freely obtainable drugs requires aimed questions on the pharmacological history.
Comparison of two freely available software packages for mass spectrometry imaging data analysis using brains from morphine addicted rats.

PubMed

Bodzon-Kulakowska, Anna; Marszalek-Grabska, Marta; Antolak, Anna; Drabik, Anna; Kotlinska, Jolanta H; Suder, Piotr

Data analysis from mass spectrometry imaging (MSI) imaging experiments is a very complex task. Most of the software packages devoted to this purpose are designed by the mass spectrometer manufacturers and, thus, are not freely available. Laboratories developing their own MS-imaging sources usually do not have access to the commercial software, and they must rely on the freely available programs. The most recognized ones are BioMap, developed by Novartis under Interactive Data Language (IDL), and Datacube, developed by the Dutch Foundation for Fundamental Research of Matter (FOM-Amolf). These two systems were used here for the analysis of images received from rat brain tissues subjected to morphine influence and their capabilities were compared in terms of ease of use and the quality of obtained results.
QuantWorm: a comprehensive software package for Caenorhabditis elegans phenotypic assays.

PubMed

Jung, Sang-Kyu; Aleman-Meza, Boanerges; Riepe, Celeste; Zhong, Weiwei

2014-01-01

Phenotypic assays are crucial in genetics; however, traditional methods that rely on human observation are unsuitable for quantitative, large-scale experiments. Furthermore, there is an increasing need for comprehensive analyses of multiple phenotypes to provide multidimensional information. Here we developed an automated, high-throughput computer imaging system for quantifying multiple Caenorhabditis elegans phenotypes. Our imaging system is composed of a microscope equipped with a digital camera and a motorized stage connected to a computer running the QuantWorm software package. Currently, the software package contains one data acquisition module and four image analysis programs: WormLifespan, WormLocomotion, WormLength, and WormEgg. The data acquisition module collects images and videos. The WormLifespan software counts the number of moving worms by using two time-lapse images; the WormLocomotion software computes the velocity of moving worms; the WormLength software measures worm body size; and the WormEgg software counts the number of eggs. To evaluate the performance of our software, we compared the results of our software with manual measurements. We then demonstrated the application of the QuantWorm software in a drug assay and a genetic assay. Overall, the QuantWorm software provided accurate measurements at a high speed. Software source code, executable programs, and sample images are available at www.quantworm.org. Our software package has several advantages over current imaging systems for C. elegans. It is an all-in-one package for quantifying multiple phenotypes. The QuantWorm software is written in Java and its source code is freely available, so it does not require use of commercial software or libraries. It can be run on multiple platforms and easily customized to cope with new methods and requirements.
Track-A-Worm, An Open-Source System for Quantitative Assessment of C. elegans Locomotory and Bending Behavior

PubMed Central

Wang, Sijie Jason; Wang, Zhao-Wen

2013-01-01

A major challenge of neuroscience is to understand the circuit and gene bases of behavior. C. elegans is commonly used as a model system to investigate how various gene products function at specific tissue, cellular, and synaptic foci to produce complicated locomotory and bending behavior. The investigation generally requires quantitative behavioral analyses using an automated single-worm tracker, which constantly records and analyzes the position and body shape of a freely moving worm at a high magnification. Many single-worm trackers have been developed to meet lab-specific needs, but none has been widely implemented for various reasons, such as hardware difficult to assemble, and software lacking sufficient functionality, having closed source code, or using a programming language that is not broadly accessible. The lack of a versatile system convenient for wide implementation makes data comparisons difficult and compels other labs to develop new worm trackers. Here we describe Track-A-Worm, a system rich in functionality, open in source code, and easy to use. The system includes plug-and-play hardware (a stereomicroscope, a digital camera and a motorized stage), custom software written to run with Matlab in Windows 7, and a detailed user manual. Grayscale images are automatically converted to binary images followed by head identification and placement of 13 markers along a deduced spline. The software can extract and quantify a variety of parameters, including distance traveled, average speed, distance/time/speed of forward and backward locomotion, frequency and amplitude of dominant bends, overall bending activities measured as root mean square, and sum of all bends. It also plots worm travel path, bend trace, and bend frequency spectrum. All functionality is performed through graphical user interfaces and data is exported to clearly-annotated and documented Excel files. These features make Track-A-Worm a good candidate for implementation in other labs. PMID:23922769
Fast tandem mass spectra-based protein identification regardless of the number of spectra or potential modifications examined.

PubMed

Falkner, Jayson; Andrews, Philip

2005-05-15

Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.
A suite of exercises for verifying dynamic earthquake rupture codes

USGS Publications Warehouse

Harris, Ruth A.; Barall, Michael; Aagaard, Brad T.; Ma, Shuo; Roten, Daniel; Olsen, Kim B.; Duan, Benchun; Liu, Dunyu; Luo, Bin; Bai, Kangchen; Ampuero, Jean-Paul; Kaneko, Yoshihiro; Gabriel, Alice-Agnes; Duru, Kenneth; Ulrich, Thomas; Wollherr, Stephanie; Shi, Zheqiang; Dunham, Eric; Bydlon, Sam; Zhang, Zhenguo; Chen, Xiaofei; Somala, Surendra N.; Pelties, Christian; Tago, Josue; Cruz-Atienza, Victor Manuel; Kozdon, Jeremy; Daub, Eric; Aslam, Khurram; Kase, Yuko; Withers, Kyle; Dalguer, Luis

2018-01-01

We describe a set of benchmark exercises that are designed to test if computer codes that simulate dynamic earthquake rupture are working as intended. These types of computer codes are often used to understand how earthquakes operate, and they produce simulation results that include earthquake size, amounts of fault slip, and the patterns of ground shaking and crustal deformation. The benchmark exercises examine a range of features that scientists incorporate in their dynamic earthquake rupture simulations. These include implementations of simple or complex fault geometry, off‐fault rock response to an earthquake, stress conditions, and a variety of formulations for fault friction. Many of the benchmarks were designed to investigate scientific problems at the forefronts of earthquake physics and strong ground motions research. The exercises are freely available on our website for use by the scientific community.
Use of Generalized Fluid System Simulation Program (GFSSP) for Teaching and Performing Senior Design Projects at the Educational Institutions

NASA Technical Reports Server (NTRS)

Majumdar, A. K.; Hedayat, A.

2015-01-01

This paper describes the experience of the authors in using the Generalized Fluid System Simulation Program (GFSSP) in teaching Design of Thermal Systems class at University of Alabama in Huntsville. GFSSP is a finite volume based thermo-fluid system network analysis code, developed at NASA/Marshall Space Flight Center, and is extensively used in NASA, Department of Defense, and aerospace industries for propulsion system design, analysis, and performance evaluation. The educational version of GFSSP is freely available to all US higher education institutions. The main purpose of the paper is to illustrate the utilization of this user-friendly code for the thermal systems design and fluid engineering courses and to encourage the instructors to utilize the code for the class assignments as well as senior design projects.
Dedicated vertical wind tunnel for the study of sedimentation of non-spherical particles.

PubMed

Bagheri, G H; Bonadonna, C; Manzella, I; Pontelandolfo, P; Haas, P

2013-05-01

A dedicated 4-m-high vertical wind tunnel has been designed and constructed at the University of Geneva in collaboration with the Groupe de compétence en mécanique des fluides et procédés énergétiques. With its diverging test section, the tunnel is designed to study the aero-dynamical behavior of non-spherical particles with terminal velocities between 5 and 27 ms(-1). A particle tracking velocimetry (PTV) code is developed to calculate drag coefficient of particles in standard conditions based on the real projected area of the particles. Results of our wind tunnel and PTV code are validated by comparing drag coefficient of smooth spherical particles and cylindrical particles to existing literature. Experiments are repeatable with average relative standard deviation of 1.7%. Our preliminary experiments on the effect of particle to fluid density ratio on drag coefficient of cylindrical particles show that the drag coefficient of freely suspended particles in air is lower than those measured in water or in horizontal wind tunnels. It is found that increasing aspect ratio of cylindrical particles reduces their secondary motions and they tend to be suspended with their maximum area normal to the airflow. The use of the vertical wind tunnel in combination with the PTV code provides a reliable and precise instrument for measuring drag coefficient of freely moving particles of various shapes. Our ultimate goal is the study of sedimentation and aggregation of volcanic particles (density between 500 and 2700 kgm(-3)) but the wind tunnel can be used in a wide range of applications.
Syndrome source coding and its universal generalization

NASA Technical Reports Server (NTRS)

Ancheta, T. C., Jr.

1975-01-01

A method of using error-correcting codes to obtain data compression, called syndrome-source-coding, is described in which the source sequence is treated as an error pattern whose syndrome forms the compressed data. It is shown that syndrome-source-coding can achieve arbitrarily small distortion with the number of compressed digits per source digit arbitrarily close to the entropy of a binary memoryless source. A universal generalization of syndrome-source-coding is formulated which provides robustly-effective, distortionless, coding of source ensembles.
Random sampling of elementary flux modes in large-scale metabolic networks.

PubMed

Machado, Daniel; Soons, Zita; Patil, Kiran Raosaheb; Ferreira, Eugénio C; Rocha, Isabel

2012-09-15

The description of a metabolic network in terms of elementary (flux) modes (EMs) provides an important framework for metabolic pathway analysis. However, their application to large networks has been hampered by the combinatorial explosion in the number of modes. In this work, we develop a method for generating random samples of EMs without computing the whole set. Our algorithm is an adaptation of the canonical basis approach, where we add an additional filtering step which, at each iteration, selects a random subset of the new combinations of modes. In order to obtain an unbiased sample, all candidates are assigned the same probability of getting selected. This approach avoids the exponential growth of the number of modes during computation, thus generating a random sample of the complete set of EMs within reasonable time. We generated samples of different sizes for a metabolic network of Escherichia coli, and observed that they preserve several properties of the full EM set. It is also shown that EM sampling can be used for rational strain design. A well distributed sample, that is representative of the complete set of EMs, should be suitable to most EM-based methods for analysis and optimization of metabolic networks. Source code for a cross-platform implementation in Python is freely available at http://code.google.com/p/emsampler. dmachado@deb.uminho.pt Supplementary data are available at Bioinformatics online.
Vfold: a web server for RNA structure and folding thermodynamics prediction.

PubMed

Xu, Xiaojun; Zhao, Peinan; Chen, Shi-Jie

2014-01-01

The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies. The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization. The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at "http://rna.physics.missouri.edu".
A method for closed-loop presentation of sensory stimuli conditional on the internal brain-state of awake animals

PubMed Central

Rutishauser, Ueli; Kotowicz, Andreas; Laurent, Gilles

2013-01-01

Brain activity often consists of interactions between internal—or on-going—and external—or sensory—activity streams, resulting in complex, distributed patterns of neural activity. Investigation of such interactions could benefit from closed-loop experimental protocols in which one stream can be controlled depending on the state of the other. We describe here methods to present rapid and precisely timed visual stimuli to awake animals, conditional on features of the animal’s on-going brain state; those features are the presence, power and phase of oscillations in local field potentials (LFP). The system can process up to 64 channels in real time. We quantified its performance using simulations, synthetic data and animal experiments (chronic recordings in the dorsal cortex of awake turtles). The delay from detection of an oscillation to the onset of a visual stimulus on an LCD screen was 47.5 ms and visual-stimulus onset could be locked to the phase of ongoing oscillations at any frequency ≤40 Hz. Our software’s architecture is flexible, allowing on-the-fly modifications by experimenters and the addition of new closed-loop control and analysis components through plugins. The source code of our system “StimOMatic” is available freely as open-source. PMID:23473800
detectIR: a novel program for detecting perfect and imperfect inverted repeats using complex numbers and vector calculation.

PubMed

Ye, Congting; Ji, Guoli; Li, Lei; Liang, Chun

2014-01-01

Inverted repeats are present in abundance in both prokaryotic and eukaryotic genomes and can form DNA secondary structures--hairpins and cruciforms that are involved in many important biological processes. Bioinformatics tools for efficient and accurate detection of inverted repeats are desirable, because existing tools are often less accurate and time consuming, sometimes incapable of dealing with genome-scale input data. Here, we present a MATLAB-based program called detectIR for the perfect and imperfect inverted repeat detection that utilizes complex numbers and vector calculation and allows genome-scale data inputs. A novel algorithm is adopted in detectIR to convert the conventional sequence string comparison in inverted repeat detection into vector calculation of complex numbers, allowing non-complementary pairs (mismatches) in the pairing stem and a non-palindromic spacer (loop or gaps) in the middle of inverted repeats. Compared with existing popular tools, our program performs with significantly higher accuracy and efficiency. Using genome sequence data from HIV-1, Arabidopsis thaliana, Homo sapiens and Zea mays for comparison, detectIR can find lots of inverted repeats missed by existing tools whose outputs often contain many invalid cases. detectIR is open source and its source code is freely available at: https://sourceforge.net/projects/detectir.

SmartR: an open-source platform for interactive visual analytics for translational research data

PubMed Central

Herzinger, Sascha; Gu, Wei; Satagopam, Venkata; Eifes, Serge; Rege, Kavita; Barbosa-Silva, Adriano; Schneider, Reinhard

2017-01-01

Abstract Summary: In translational research, efficient knowledge exchange between the different fields of expertise is crucial. An open platform that is capable of storing a multitude of data types such as clinical, pre-clinical or OMICS data combined with strong visual analytical capabilities will significantly accelerate the scientific progress by making data more accessible and hypothesis generation easier. The open data warehouse tranSMART is capable of storing a variety of data types and has a growing user community including both academic institutions and pharmaceutical companies. tranSMART, however, currently lacks interactive and dynamic visual analytics and does not permit any post-processing interaction or exploration. For this reason, we developed SmartR, a plugin for tranSMART, that equips the platform not only with several dynamic visual analytical workflows, but also provides its own framework for the addition of new custom workflows. Modern web technologies such as D3.js or AngularJS were used to build a set of standard visualizations that were heavily improved with dynamic elements. Availability and Implementation: The source code is licensed under the Apache 2.0 License and is freely available on GitHub: https://github.com/transmart/SmartR. Contact: reinhard.schneider@uni.lu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28334291
Optimal simultaneous superpositioning of multiple structures with missing data.

PubMed

Theobald, Douglas L; Steindel, Phillip A

2012-08-01

Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether. Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case. The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org. dtheobald@brandeis.edu Supplementary data are available at Bioinformatics online.
MASCOT HTML and XML parser: an implementation of a novel object model for protein identification data.

PubMed

Yang, Chunguang G; Granite, Stephen J; Van Eyk, Jennifer E; Winslow, Raimond L

2006-11-01

Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http://www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.
iGC-an integrated analysis package of gene expression and copy number alteration.

PubMed

Lai, Yi-Pin; Wang, Liang-Bo; Wang, Wei-An; Lai, Liang-Chuan; Tsai, Mong-Hsun; Lu, Tzu-Pin; Chuang, Eric Y

2017-01-14

With the advancement in high-throughput technologies, researchers can simultaneously investigate gene expression and copy number alteration (CNA) data from individual patients at a lower cost. Traditional analysis methods analyze each type of data individually and integrate their results using Venn diagrams. Challenges arise, however, when the results are irreproducible and inconsistent across multiple platforms. To address these issues, one possible approach is to concurrently analyze both gene expression profiling and CNAs in the same individual. We have developed an open-source R/Bioconductor package (iGC). Multiple input formats are supported and users can define their own criteria for identifying differentially expressed genes driven by CNAs. The analysis of two real microarray datasets demonstrated that the CNA-driven genes identified by the iGC package showed significantly higher Pearson correlation coefficients with their gene expression levels and copy numbers than those genes located in a genomic region with CNA. Compared with the Venn diagram approach, the iGC package showed better performance. The iGC package is effective and useful for identifying CNA-driven genes. By simultaneously considering both comparative genomic and transcriptomic data, it can provide better understanding of biological and medical questions. The iGC package's source code and manual are freely available at https://www.bioconductor.org/packages/release/bioc/html/iGC.html .
SmartR: an open-source platform for interactive visual analytics for translational research data.

PubMed

Herzinger, Sascha; Gu, Wei; Satagopam, Venkata; Eifes, Serge; Rege, Kavita; Barbosa-Silva, Adriano; Schneider, Reinhard

2017-07-15

In translational research, efficient knowledge exchange between the different fields of expertise is crucial. An open platform that is capable of storing a multitude of data types such as clinical, pre-clinical or OMICS data combined with strong visual analytical capabilities will significantly accelerate the scientific progress by making data more accessible and hypothesis generation easier. The open data warehouse tranSMART is capable of storing a variety of data types and has a growing user community including both academic institutions and pharmaceutical companies. tranSMART, however, currently lacks interactive and dynamic visual analytics and does not permit any post-processing interaction or exploration. For this reason, we developed SmartR , a plugin for tranSMART, that equips the platform not only with several dynamic visual analytical workflows, but also provides its own framework for the addition of new custom workflows. Modern web technologies such as D3.js or AngularJS were used to build a set of standard visualizations that were heavily improved with dynamic elements. The source code is licensed under the Apache 2.0 License and is freely available on GitHub: https://github.com/transmart/SmartR . reinhard.schneider@uni.lu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
pgRNAFinder: a web-based tool to design distance independent paired-gRNA.

PubMed

Xiong, Yuanyan; Xie, Xiaowei; Wang, Yanzhi; Ma, Wenbing; Liang, Puping; Songyang, Zhou; Dai, Zhiming

2017-11-15

The CRISPR/Cas System has been shown to be an efficient and accurate genome-editing technique. There exist a number of tools to design the guide RNA sequences and predict potential off-target sites. However, most of the existing computational tools on gRNA design are restricted to small deletions. To address this issue, we present pgRNAFinder, with an easy-to-use web interface, which enables researchers to design single or distance-free paired-gRNA sequences. The web interface of pgRNAFinder contains both gRNA search and scoring system. After users input query sequences, it searches gRNA by 3' protospacer-adjacent motif (PAM), and possible off-targets, and scores the conservation of the deleted sequences rapidly. Filters can be applied to identify high-quality CRISPR sites. PgRNAFinder offers gRNA design functionality for 8 vertebrate genomes. Furthermore, to keep pgRNAFinder open, extensible to any organism, we provide the source package for local use. The pgRNAFinder is freely available at http://songyanglab.sysu.edu.cn/wangwebs/pgRNAFinder/, and the source code and user manual can be obtained from https://github.com/xiexiaowei/pgRNAFinder. songyang@bcm.edu or daizhim@mail.sysu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
An architecture for genomics analysis in a clinical setting using Galaxy and Docker

PubMed Central

Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A

2017-01-01

Abstract Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. PMID:29048555
An architecture for genomics analysis in a clinical setting using Galaxy and Docker.

PubMed

Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A; Rance, B

2017-11-01

Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. © The Author 2017. Published by Oxford University Press.
Computations of steady-state and transient premixed turbulent flames using pdf methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hulek, T.; Lindstedt, R.P.

1996-03-01

Premixed propagating turbulent flames are modeled using a one-point, single time, joint velocity-composition probability density function (pdf) closure. The pdf evolution equation is solved using a Monte Carlo method. The unclosed terms in the pdf equation are modeled using a modified version of the binomial Langevin model for scalar mixing of Valino and Dopazo, and the Haworth and Pope (HP) and Lagrangian Speziale-Sarkar-Gatski (LSSG) models for the viscous dissipation of velocity and the fluctuating pressure gradient. The source terms for the presumed one-step chemical reaction are extracted from the rate of fuel consumption in laminar premixed hydrocarbon flames, computed usingmore » a detailed chemical kinetic mechanism. Steady-state and transient solutions are obtained for planar turbulent methane-air and propane-air flames. The transient solution method features a coupling with a Finite Volume (FV) code to obtain the mean pressure field. The results are compared with the burning velocity measurements of Abdel-Gayed et al. and with velocity measurements obtained in freely propagating propane-air flames by Videto and Santavicca. The effects of different upstream turbulence fields, chemical source terms (different fuels and strained/unstrained laminar flames) and the influence of the velocity statistics models (HP and LSSG) are assessed.« less
Mutation Update of ARSA and PSAP Genes Causing Metachromatic Leukodystrophy.

PubMed

Cesani, Martina; Lorioli, Laura; Grossi, Serena; Amico, Giulia; Fumagalli, Francesca; Spiga, Ivana; Filocamo, Mirella; Biffi, Alessandra

2016-01-01

Metachromatic leukodystrophy is a neurodegenerative disorder characterized by progressive demyelination. The disease is caused by variants in the ARSA gene, which codes for the lysosomal enzyme arylsulfatase A, or, more rarely, in the PSAP gene, which codes for the activator protein saposin B. In this Mutation Update, an extensive review of all the ARSA- and PSAP-causative variants published in the literature to date, accounting for a total of 200 ARSA and 10 PSAP allele types, is presented. The detailed ARSA and PSAP variant lists are freely available on the Leiden Online Variation Database (LOVD) platform at http://www.LOVD.nl/ARSA and http://www.LOVD.nl/PSAP, respectively. © 2015 WILEY PERIODICALS, INC.
A Fast Healthcare Interoperability Resources (FHIR) layer implemented over i2b2.

PubMed

Boussadi, Abdelali; Zapletal, Eric

2017-08-14

Standards and technical specifications have been developed to define how the information contained in Electronic Health Records (EHRs) should be structured, semantically described, and communicated. Current trends rely on differentiating the representation of data instances from the definition of clinical information models. The dual model approach, which combines a reference model (RM) and a clinical information model (CIM), sets in practice this software design pattern. The most recent initiative, proposed by HL7, is called Fast Health Interoperability Resources (FHIR). The aim of our study was to investigate the feasibility of applying the FHIR standard to modeling and exposing EHR data of the Georges Pompidou European Hospital (HEGP) integrating biology and the bedside (i2b2) clinical data warehouse (CDW). We implemented a FHIR server over i2b2 to expose EHR data in relation with five FHIR resources: DiagnosisReport, MedicationOrder, Patient, Encounter, and Medication. The architecture of the server combines a Data Access Object design pattern and FHIR resource providers, implemented using the Java HAPI FHIR API. Two types of queries were tested: query type #1 requests the server to display DiagnosticReport resources, for which the diagnosis code is equal to a given ICD-10 code. A total of 80 DiagnosticReport resources, corresponding to 36 patients, were displayed. Query type #2, requests the server to display MedicationOrder, for which the FHIR Medication identification code is equal to a given code expressed in a French coding system. A total of 503 MedicationOrder resources, corresponding to 290 patients, were displayed. Results were validated by manually comparing the results of each request to the results displayed by an ad-hoc SQL query. We showed the feasibility of implementing a Java layer over the i2b2 database model to expose data of the CDW as a set of FHIR resources. An important part of this work was the structural and semantic mapping between the i2b2 model and the FHIR RM. To accomplish this, developers must manually browse the specifications of the FHIR standard. Our source code is freely available and can be adapted for use in other i2b2 sites.
Interpolating between random walks and optimal transportation routes: Flow with multiple sources and targets

NASA Astrophysics Data System (ADS)

Guex, Guillaume

2016-05-01

In recent articles about graphs, different models proposed a formalism to find a type of path between two nodes, the source and the target, at crossroads between the shortest-path and the random-walk path. These models include a freely adjustable parameter, allowing to tune the behavior of the path toward randomized movements or direct routes. This article presents a natural generalization of these models, namely a model with multiple sources and targets. In this context, source nodes can be viewed as locations with a supply of a certain good (e.g. people, money, information) and target nodes as locations with a demand of the same good. An algorithm is constructed to display the flow of goods in the network between sources and targets. With again a freely adjustable parameter, this flow can be tuned to follow routes of minimum cost, thus displaying the flow in the context of the optimal transportation problem or, by contrast, a random flow, known to be similar to the electrical current flow if the random-walk is reversible. Moreover, a source-targetcoupling can be retrieved from this flow, offering an optimal assignment to the transportation problem. This algorithm is described in the first part of this article and then illustrated with case studies.
Earth-Observation based mapping and monitoring of exposure change in the megacity of Istanbul: open-source tools from the MARSITE project

NASA Astrophysics Data System (ADS)

De Vecchi, Daniele; Dell'Acqua, Fabio

2016-04-01

The EU FP7 MARSITE project aims at assessing the "state of the art" of seismic risk evaluation and management at European level, as a starting point to move a "step forward" towards new concepts of risk mitigation and management by long-term monitoring activities carried out both on land and at sea. Spaceborne Earth Observation (EO) is one of the means through which MARSITE is accomplishing this commitment, whose importance is growing as a consequence of the operational unfolding of the Copernicus initiative. Sentinel-2 data, with its open-data policy, represents an unprecedented opportunity to access global spaceborne multispectral data for various purposes including risk monitoring. In the framework of EU FP7 projects MARSITE, RASOR and SENSUM, our group has developed a suite of geospatial software tools to automatically extract risk-related features from EO data, especially on the exposure and vulnerability side of the "risk equation" [1]. These are for example the extension of a built-up area or the distribution of building density. These tools are available open-source as QGIS plug-ins [2] and their source code can be freely downloaded from GitHub [3]. A test case on the risk-prone mega city of Istanbul has been set up, and preliminary results will be presented in this paper. The output of the algorithms can be incorporated into a risk modeling process, whose output is very useful to stakeholders and decision makers who intend to assess and mitigate the risk level across the giant urban agglomerate. Keywords - Remote Sensing, Copernicus, Istanbul megacity, seismic risk, multi-risk, exposure, open-source References [1] Harb, M.M.; De Vecchi, D.; Dell'Acqua, F., "Physical Vulnerability Proxies from Remotes Sensing: Reviewing, Implementing and Disseminating Selected Techniques," Geoscience and Remote Sensing Magazine, IEEE , vol.3, no.1, pp.20,33, March 2015. doi: 10.1109/MGRS.2015.2398672 [2] SENSUM QGIS plugin, 2016, available online at: https://plugins.qgis.org/plugins/sensum_eo_tools/ [3] SENSUM QGIS code repository, 2016, available online at: https://github.com/SENSUM-project/sensum_rs_qgis
caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research

PubMed Central

Oster, Scott; Langella, Stephen; Hastings, Shannon; Ervin, David; Madduri, Ravi; Phillips, Joshua; Kurc, Tahsin; Siebenlist, Frank; Covitz, Peter; Shanbhag, Krishnakant; Foster, Ian; Saltz, Joel

2008-01-01

Objective To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research. Design An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG™) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including 1) discovery, 2) integrated and large-scale data analysis, and 3) coordinated study. Measurements The caGrid is built as a Grid software infrastructure and leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications. Results The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL: https://cabig.nci.nih.gov/workspaces/Architecture/caGrid. Conclusions While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community. PMID:18096909
caGrid 1.0: an enterprise Grid infrastructure for biomedical research.

PubMed

Oster, Scott; Langella, Stephen; Hastings, Shannon; Ervin, David; Madduri, Ravi; Phillips, Joshua; Kurc, Tahsin; Siebenlist, Frank; Covitz, Peter; Shanbhag, Krishnakant; Foster, Ian; Saltz, Joel

2008-01-01

To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research. An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including 1) discovery, 2) integrated and large-scale data analysis, and 3) coordinated study. The caGrid is built as a Grid software infrastructure and leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications. The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL: https://cabig.nci.nih.gov/workspaces/Architecture/caGrid. While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community.
RUCS: rapid identification of PCR primers for unique core sequences.

PubMed

Thomsen, Martin Christen Frølund; Hasman, Henrik; Westh, Henrik; Kaya, Hülya; Lund, Ole

2017-12-15

Designing PCR primers to target a specific selection of whole genome sequenced strains can be a long, arduous and sometimes impractical task. Such tasks would benefit greatly from an automated tool to both identify unique targets, and to validate the vast number of potential primer pairs for the targets in silico. Here we present RUCS, a program that will find PCR primer pairs and probes for the unique core sequences of a positive genome dataset complement to a negative genome dataset. The resulting primer pairs and probes are in addition to simple selection also validated through a complex in silico PCR simulation. We compared our method, which identifies the unique core sequences, against an existing tool called ssGeneFinder, and found that our method was 6.5-20 times more sensitive. We used RUCS to design primer pairs that would target a set of genomes known to contain the mcr-1 colistin resistance gene. Three of the predicted pairs were chosen for experimental validation using PCR and gel electrophoresis. All three pairs successfully produced an amplicon with the target length for the samples containing mcr-1 and no amplification products were produced for the negative samples. The novel methods presented in this manuscript can reduce the time needed to identify target sequences, and provide a quick virtual PCR validation to eliminate time wasted on ambiguously binding primers. Source code is freely available on https://bitbucket.org/genomicepidemiology/rucs. Web service is freely available on https://cge.cbs.dtu.dk/services/RUCS. mcft@cbs.dtu.dk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
A suite of R packages for web-enabled modeling and analysis of surface waters

NASA Astrophysics Data System (ADS)

Read, J. S.; Winslow, L. A.; Nüst, D.; De Cicco, L.; Walker, J. I.

2014-12-01

Researchers often create redundant methods for downloading, manipulating, and analyzing data from online resources. Moreover, the reproducibility of science can be hampered by complicated and voluminous data, lack of time for documentation and long-term maintenance of software, and fear of exposing programming skills. The combination of these factors can encourage unshared one-off programmatic solutions instead of openly provided reusable methods. Federal and academic researchers in the water resources and informatics domains have collaborated to address these issues. The result of this collaboration is a suite of modular R packages that can be used independently or as elements in reproducible analytical workflows. These documented and freely available R packages were designed to fill basic needs for the effective use of water data: the retrieval of time-series and spatial data from web resources (dataRetrieval, geoknife), performing quality assurance and quality control checks of these data with robust statistical methods (sensorQC), the creation of useful data derivatives (including physically- and biologically-relevant indices; GDopp, LakeMetabolizer), and the execution and evaluation of models (glmtools, rLakeAnalyzer). Here, we share details and recommendations for the collaborative coding process, and highlight the benefits of an open-source tool development pattern with a popular programming language in the water resources discipline (such as R). We provide examples of reproducible science driven by large volumes of web-available data using these tools, explore benefits of accessing packages as standardized web processing services (WPS) and present a working platform that allows domain experts to publish scientific algorithms in a service-oriented architecture (WPS4R). We assert that in the era of open data, tools that leverage these data should also be freely shared, transparent, and developed in an open innovation environment.
CrossQuery: a web tool for easy associative querying of transcriptome data.

PubMed

Wagner, Toni U; Fischer, Andreas; Thoma, Eva C; Schartl, Manfred

2011-01-01

Enormous amounts of data are being generated by modern methods such as transcriptome or exome sequencing and microarray profiling. Primary analyses such as quality control, normalization, statistics and mapping are highly complex and need to be performed by specialists. Thereafter, results are handed back to biomedical researchers, who are then confronted with complicated data lists. For rather simple tasks like data filtering, sorting and cross-association there is a need for new tools which can be used by non-specialists. Here, we describe CrossQuery, a web tool that enables straight forward, simple syntax queries to be executed on transcriptome sequencing and microarray datasets. We provide deep-sequencing data sets of stem cell lines derived from the model fish Medaka and microarray data of human endothelial cells. In the example datasets provided, mRNA expression levels, gene, transcript and sample identification numbers, GO-terms and gene descriptions can be freely correlated, filtered and sorted. Queries can be saved for later reuse and results can be exported to standard formats that allow copy-and-paste to all widespread data visualization tools such as Microsoft Excel. CrossQuery enables researchers to quickly and freely work with transcriptome and microarray data sets requiring only minimal computer skills. Furthermore, CrossQuery allows growing association of multiple datasets as long as at least one common point of correlated information, such as transcript identification numbers or GO-terms, is shared between samples. For advanced users, the object-oriented plug-in and event-driven code design of both server-side and client-side scripts allow easy addition of new features, data sources and data types.
M-Track: A New Software for Automated Detection of Grooming Trajectories in Mice

PubMed Central

Zhang, Lin

2016-01-01

Grooming is a complex and robust innate behavior, commonly performed by most vertebrate species. In mice, grooming consists of a series of stereotyped patterned strokes, performed along the rostro-caudal axis of the body. The frequency and duration of each grooming episode is sensitive to changes in stress levels, social interactions and pharmacological manipulations, and is therefore used in behavioral studies to gain insights into the function of brain regions that control movement execution and anxiety. Traditional approaches to analyze grooming rely on manually scoring the time of onset and duration of each grooming episode, and are often performed on grooming episodes triggered by stress exposure, which may not be entirely representative of spontaneous grooming in freely-behaving mice. This type of analysis is time-consuming and provides limited information about finer aspects of grooming behaviors, which are important to understand movement stereotypy and bilateral coordination in mice. Currently available commercial and freeware video-tracking software allow automated tracking of the whole body of a mouse or of its head and tail, not of individual forepaws. Here we describe a simple experimental set-up and a novel open-source code, named M-Track, for simultaneously tracking the movement of individual forepaws during spontaneous grooming in multiple freely-behaving mice. This toolbox provides a simple platform to perform trajectory analysis of forepaw movement during distinct grooming episodes. By using M-track we show that, in C57BL/6 wild type mice, the speed and bilateral coordination of the left and right forepaws remain unaltered during the execution of distinct grooming episodes. Stress exposure induces a profound increase in the length of the forepaw grooming trajectories. M-Track provides a valuable and user-friendly interface to streamline the analysis of spontaneous grooming in biomedical research studies. PMID:27636358
WebProtégé: a collaborative Web-based platform for editing biomedical ontologies.

PubMed

Horridge, Matthew; Tudorache, Tania; Nuylas, Csongor; Vendetti, Jennifer; Noy, Natalya F; Musen, Mark A

2014-08-15

WebProtégé is an open-source Web application for editing OWL 2 ontologies. It contains several features to aid collaboration, including support for the discussion of issues, change notification and revision-based change tracking. WebProtégé also features a simple user interface, which is geared towards editing the kinds of class descriptions and annotations that are prevalent throughout biomedical ontologies. Moreover, it is possible to configure the user interface using views that are optimized for editing Open Biomedical Ontology (OBO) class descriptions and metadata. Some of these views are shown in the Supplementary Material and can be seen in WebProtégé itself by configuring the project as an OBO project. WebProtégé is freely available for use on the Web at http://webprotege.stanford.edu. It is implemented in Java and JavaScript using the OWL API and the Google Web Toolkit. All major browsers are supported. For users who do not wish to host their ontologies on the Stanford servers, WebProtégé is available as a Web app that can be run locally using a Servlet container such as Tomcat. Binaries, source code and documentation are available under an open-source license at http://protegewiki.stanford.edu/wiki/WebProtege. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Plasma theory and simulation

NASA Astrophysics Data System (ADS)

Birdsall, Charles K.

1986-12-01

The Pierce diode linear behavior with external R, C, or L was verified very accurately by particle simulation. The Pierce diode non-linear equilibria with R, C, or L are described theoretically and explored via computer simulation. A simple model of the sheath outside the separatrix of an FRC was modeled electrostatically in 2d and large potentials due to the magnetic well and peak which were found. These may explain the anomalously high ion confinement in the FRC edge layer. A planar plasma source with cold ions and warm electrons produces a source sheath with sufficient potential drop to accelerate ions to sound velocity, which obviates the need for a Bohm pre-collector-sheath electric field. Final reports were prepared for collector sheath, presheath, and source sheath in a collisionless, finite ion temperature plasma; potential drop and transport in a bounded plasma with ion reflection at the collector; potential drop and transport in a bounded plasma with secondary electron emission at the collector. A movie has been made displaying the long-lived vortices resulting from the Kelvin-Helmholtz instability in a magnetized sheath. A relativistic Monte Carlo binary (Coulomb) collision model has been developed and tested for inclusion into the electrostatic particle simulation code TESS. Two direct implicit time integration schemes are tested for self-heating and self-cooling and regions of neither are found as a function of delta t and delta x for the model of a freely expanding plasma slab.
ReNE: A Cytoscape Plugin for Regulatory Network Enhancement

PubMed Central

Politano, Gianfranco; Benso, Alfredo; Savino, Alessandro; Di Carlo, Stefano

2014-01-01

One of the biggest challenges in the study of biological regulatory mechanisms is the integration, americanmodeling, and analysis of the complex interactions which take place in biological networks. Despite post transcriptional regulatory elements (i.e., miRNAs) are widely investigated in current research, their usage and visualization in biological networks is very limited. Regulatory networks are commonly limited to gene entities. To integrate networks with post transcriptional regulatory data, researchers are therefore forced to manually resort to specific third party databases. In this context, we introduce ReNE, a Cytoscape 3.x plugin designed to automatically enrich a standard gene-based regulatory network with more detailed transcriptional, post transcriptional, and translational data, resulting in an enhanced network that more precisely models the actual biological regulatory mechanisms. ReNE can automatically import a network layout from the Reactome or KEGG repositories, or work with custom pathways described using a standard OWL/XML data format that the Cytoscape import procedure accepts. Moreover, ReNE allows researchers to merge multiple pathways coming from different sources. The merged network structure is normalized to guarantee a consistent and uniform description of the network nodes and edges and to enrich all integrated data with additional annotations retrieved from genome-wide databases like NCBI, thus producing a pathway fully manageable through the Cytoscape environment. The normalized network is then analyzed to include missing transcription factors, miRNAs, and proteins. The resulting enhanced network is still a fully functional Cytoscape network where each regulatory element (transcription factor, miRNA, gene, protein) and regulatory mechanism (up-regulation/down-regulation) is clearly visually identifiable, thus enabling a better visual understanding of its role and the effect in the network behavior. The enhanced network produced by ReNE is exportable in multiple formats for further analysis via third party applications. ReNE can be freely installed from the Cytoscape App Store (http://apps.cytoscape.org/apps/rene) and the full source code is freely available for download through a SVN repository accessible at http://www.sysbio.polito.it/tools_svn/BioInformatics/Rene/releases/. ReNE enhances a network by only integrating data from public repositories, without any inference or prediction. The reliability of the introduced interactions only depends on the reliability of the source data, which is out of control of ReNe developers. PMID:25541727
ObsPy: A Python toolbox for seismology - Current state, applications, and ecosystem around it

NASA Astrophysics Data System (ADS)

Lecocq, Thomas; Megies, Tobias; Krischer, Lion; Sales de Andrade, Elliott; Barsch, Robert; Beyreuther, Moritz

2016-04-01

ObsPy (http://www.obspy.org) is a community-driven, open-source project offering a bridge for seismology into the scientific Python ecosystem. It provides * read and write support for essentially all commonly used waveform, station, and event metadata formats with a unified interface, * a comprehensive signal processing toolbox tuned to the needs of seismologists, * integrated access to all large data centers, web services and databases, and * convenient wrappers to third party codes like libmseed and evalresp. Python, in contrast to many other languages and tools, is simple enough to enable an exploratory and interactive coding style desired by many scientists. At the same time it is a full-fledged programming language usable by software engineers to build complex and large programs. This combination makes it very suitable for use in seismology where research code often has to be translated to stable and production ready environments. It furthermore offers many freely available high quality scientific modules covering most needs in developing scientific software. ObsPy has been in constant development for more than 5 years and nowadays enjoys a large rate of adoption in the community with thousands of users. Successful applications include time-dependent and rotational seismology, big data processing, event relocations, and synthetic studies about attenuation kernels and full-waveform inversions to name a few examples. Additionally it sparked the development of several more specialized packages slowly building a modern seismological ecosystem around it. This contribution will give a short introduction and overview of ObsPy and highlight a number of use cases and software built around it. We will furthermore discuss the issue of sustainability of scientific software.
ObsPy: A Python toolbox for seismology - Current state, applications, and ecosystem around it

NASA Astrophysics Data System (ADS)

Krischer, L.; Megies, T.; Sales de Andrade, E.; Barsch, R.; Beyreuther, M.

2015-12-01

ObsPy (http://www.obspy.org) is a community-driven, open-source project offering a bridge for seismology into the scientific Python ecosystem. It provides read and write support for essentially all commonly used waveform, station, and event metadata formats with a unified interface, a comprehensive signal processing toolbox tuned to the needs of seismologists, integrated access to all large data centers, web services and databases, and convenient wrappers to third party codes like libmseed and evalresp. Python, in contrast to many other languages and tools, is simple enough to enable an exploratory and interactive coding style desired by many scientists. At the same time it is a full-fledged programming language usable by software engineers to build complex and large programs. This combination makes it very suitable for use in seismology where research code often has to be translated to stable and production ready environments. It furthermore offers many freely available high quality scientific modules covering most needs in developing scientific software.ObsPy has been in constant development for more than 5 years and nowadays enjoys a large rate of adoption in the community with thousands of users. Successful applications include time-dependent and rotational seismology, big data processing, event relocations, and synthetic studies about attenuation kernels and full-waveform inversions to name a few examples. Additionally it sparked the development of several more specialized packages slowly building a modern seismological ecosystem around it.This contribution will give a short introduction and overview of ObsPy and highlight a number of us cases and software built around it. We will furthermore discuss the issue of sustainability of scientific software.
OpenCMISS: a multi-physics & multi-scale computational infrastructure for the VPH/Physiome project.

PubMed

Bradley, Chris; Bowery, Andy; Britten, Randall; Budelmann, Vincent; Camara, Oscar; Christie, Richard; Cookson, Andrew; Frangi, Alejandro F; Gamage, Thiranja Babarenda; Heidlauf, Thomas; Krittian, Sebastian; Ladd, David; Little, Caton; Mithraratne, Kumar; Nash, Martyn; Nickerson, David; Nielsen, Poul; Nordbø, Oyvind; Omholt, Stig; Pashaei, Ali; Paterson, David; Rajagopal, Vijayaraghavan; Reeve, Adam; Röhrle, Oliver; Safaei, Soroush; Sebastián, Rafael; Steghöfer, Martin; Wu, Tim; Yu, Ting; Zhang, Heye; Hunter, Peter

2011-10-01

The VPH/Physiome Project is developing the model encoding standards CellML (cellml.org) and FieldML (fieldml.org) as well as web-accessible model repositories based on these standards (models.physiome.org). Freely available open source computational modelling software is also being developed to solve the partial differential equations described by the models and to visualise results. The OpenCMISS code (opencmiss.org), described here, has been developed by the authors over the last six years to replace the CMISS code that has supported a number of organ system Physiome projects. OpenCMISS is designed to encompass multiple sets of physical equations and to link subcellular and tissue-level biophysical processes into organ-level processes. In the Heart Physiome project, for example, the large deformation mechanics of the myocardial wall need to be coupled to both ventricular flow and embedded coronary flow, and the reaction-diffusion equations that govern the propagation of electrical waves through myocardial tissue need to be coupled with equations that describe the ion channel currents that flow through the cardiac cell membranes. In this paper we discuss the design principles and distributed memory architecture behind the OpenCMISS code. We also discuss the design of the interfaces that link the sets of physical equations across common boundaries (such as fluid-structure coupling), or between spatial fields over the same domain (such as coupled electromechanics), and the concepts behind CellML and FieldML that are embodied in the OpenCMISS data structures. We show how all of these provide a flexible infrastructure for combining models developed across the VPH/Physiome community. Copyright © 2011 Elsevier Ltd. All rights reserved.
Using National Drug Codes and drug knowledge bases to organize prescription records from multiple sources.

PubMed

Simonaitis, Linas; McDonald, Clement J

2009-10-01

The utility of National Drug Codes (NDCs) and drug knowledge bases (DKBs) in the organization of prescription records from multiple sources was studied. The master files of most pharmacy systems include NDCs and local codes to identify the products they dispense. We obtained a large sample of prescription records from seven different sources. These records carried a national product code or a local code that could be translated into a national product code via their formulary master. We obtained mapping tables from five DKBs. We measured the degree to which the DKB mapping tables covered the national product codes carried in or associated with the sample of prescription records. Considering the total prescription volume, DKBs covered 93.0-99.8% of the product codes from three outpatient sources and 77.4-97.0% of the product codes from four inpatient sources. Among the in-patient sources, invented codes explained 36-94% of the noncoverage. Outpatient pharmacy sources rarely invented codes, which comprised only 0.11-0.21% of their total prescription volume, compared with inpatient pharmacy sources for which invented codes comprised 1.7-7.4% of their prescription volume. The distribution of prescribed products was highly skewed, with 1.4-4.4% of codes accounting for 50% of the message volume and 10.7-34.5% accounting for 90% of the message volume. DKBs cover the product codes used by outpatient sources sufficiently well to permit automatic mapping. Changes in policies and standards could increase coverage of product codes used by inpatient sources.
THE AUTOMATED GEOSPATIAL WATERSHED ASSESSMENT TOOL

EPA Science Inventory

A toolkit for distributed hydrologic modeling at multiple scales using a geographic information system is presented. This open-source, freely available software was developed through a collaborative endeavor involving two Universities and two government agencies. Called the Auto...
The Healthcare Complaints Analysis Tool: development and reliability testing of a method for service monitoring and organisational learning

PubMed Central

Gillespie, Alex; Reader, Tom W

2016-01-01

Background Letters of complaint written by patients and their advocates reporting poor healthcare experiences represent an under-used data source. The lack of a method for extracting reliable data from these heterogeneous letters hinders their use for monitoring and learning. To address this gap, we report on the development and reliability testing of the Healthcare Complaints Analysis Tool (HCAT). Methods HCAT was developed from a taxonomy of healthcare complaints reported in a previously published systematic review. It introduces the novel idea that complaints should be analysed in terms of severity. Recruiting three groups of educated lay participants (n=58, n=58, n=55), we refined the taxonomy through three iterations of discriminant content validity testing. We then supplemented this refined taxonomy with explicit coding procedures for seven problem categories (each with four levels of severity), stage of care and harm. These combined elements were further refined through iterative coding of a UK national sample of healthcare complaints (n= 25, n=80, n=137, n=839). To assess reliability and accuracy for the resultant tool, 14 educated lay participants coded a referent sample of 125 healthcare complaints. Results The seven HCAT problem categories (quality, safety, environment, institutional processes, listening, communication, and respect and patient rights) were found to be conceptually distinct. On average, raters identified 1.94 problems (SD=0.26) per complaint letter. Coders exhibited substantial reliability in identifying problems at four levels of severity; moderate and substantial reliability in identifying stages of care (except for ‘discharge/transfer’ that was only fairly reliable) and substantial reliability in identifying overall harm. Conclusions HCAT is not only the first reliable tool for coding complaints, it is the first tool to measure the severity of complaints. It facilitates service monitoring and organisational learning and it enables future research examining whether healthcare complaints are a leading indicator of poor service outcomes. HCAT is freely available to download and use. PMID:26740496
deepTools2: a next generation web server for deep-sequencing data analysis.

PubMed

Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas

2016-07-08

We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
jMetalCpp: optimizing molecular docking problems with a C++ metaheuristic framework.

PubMed

López-Camacho, Esteban; García Godoy, María Jesús; Nebro, Antonio J; Aldana-Montes, José F

2014-02-01

Molecular docking is a method for structure-based drug design and structural molecular biology, which attempts to predict the position and orientation of a small molecule (ligand) in relation to a protein (receptor) to produce a stable complex with a minimum binding energy. One of the most widely used software packages for this purpose is AutoDock, which incorporates three metaheuristic techniques. We propose the integration of AutoDock with jMetalCpp, an optimization framework, thereby providing both single- and multi-objective algorithms that can be used to effectively solve docking problems. The resulting combination of AutoDock + jMetalCpp allows users of the former to easily use the metaheuristics provided by the latter. In this way, biologists have at their disposal a richer set of optimization techniques than those already provided in AutoDock. Moreover, designers of metaheuristic techniques can use molecular docking for case studies, which can lead to more efficient algorithms oriented to solving the target problems. jMetalCpp software adapted to AutoDock is freely available as a C++ source code at http://khaos.uma.es/AutodockjMetal/.
Fast simulation tool for ultraviolet radiation at the earth's surface

NASA Astrophysics Data System (ADS)

Engelsen, Ola; Kylling, Arve

2005-04-01

FastRT is a fast, yet accurate, UV simulation tool that computes downward surface UV doses, UV indices, and irradiances in the spectral range 290 to 400 nm with a resolution as small as 0.05 nm. It computes a full UV spectrum within a few milliseconds on a standard PC, and enables the user to convolve the spectrum with user-defined and built-in spectral response functions including the International Commission on Illumination (CIE) erythemal response function used for UV index calculations. The program accounts for the main radiative input parameters, i.e., instrumental characteristics, solar zenith angle, ozone column, aerosol loading, clouds, surface albedo, and surface altitude. FastRT is based on look-up tables of carefully selected entries of atmospheric transmittances and spherical albedos, and exploits the smoothness of these quantities with respect to atmospheric, surface, geometrical, and spectral parameters. An interactive site, http://nadir.nilu.no/~olaeng/fastrt/fastrt.html, enables the public to run the FastRT program with most input options. This page also contains updated information about FastRT and links to freely downloadable source codes and binaries.
A multivariate distance-based analytic framework for microbial interdependence association test in longitudinal study.

PubMed

Zhang, Yilong; Han, Sung Won; Cox, Laura M; Li, Huilin

2017-12-01

Human microbiome is the collection of microbes living in and on the various parts of our body. The microbes living on our body in nature do not live alone. They act as integrated microbial community with massive competing and cooperating and contribute to our human health in a very important way. Most current analyses focus on examining microbial differences at a single time point, which do not adequately capture the dynamic nature of the microbiome data. With the advent of high-throughput sequencing and analytical tools, we are able to probe the interdependent relationship among microbial species through longitudinal study. Here, we propose a multivariate distance-based test to evaluate the association between key phenotypic variables and microbial interdependence utilizing the repeatedly measured microbiome data. Extensive simulations were performed to evaluate the validity and efficiency of the proposed method. We also demonstrate the utility of the proposed test using a well-designed longitudinal murine experiment and a longitudinal human study. The proposed methodology has been implemented in the freely distributed open-source R package and Python code. © 2017 WILEY PERIODICALS, INC.
CircularLogo: A lightweight web application to visualize intra-motif dependencies.

PubMed

Ye, Zhenqing; Ma, Tao; Kalmbach, Michael T; Dasari, Surendra; Kocher, Jean-Pierre A; Wang, Liguo

2017-05-22

The sequence logo has been widely used to represent DNA or RNA motifs for more than three decades. Despite its intelligibility and intuitiveness, the traditional sequence logo is unable to display the intra-motif dependencies and therefore is insufficient to fully characterize nucleotide motifs. Many methods have been developed to quantify the intra-motif dependencies, but fewer tools are available for visualization. We developed CircularLogo, a web-based interactive application, which is able to not only visualize the position-specific nucleotide consensus and diversity but also display the intra-motif dependencies. Applying CircularLogo to HNF6 binding sites and tRNA sequences demonstrated its ability to show intra-motif dependencies and intuitively reveal biomolecular structure. CircularLogo is implemented in JavaScript and Python based on the Django web framework. The program's source code and user's manual are freely available at http://circularlogo.sourceforge.net . CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html . CircularLogo is an innovative web application that is specifically designed to visualize and interactively explore intra-motif dependencies.
CellMiner Companion: an interactive web application to explore CellMiner NCI-60 data.

PubMed

Wang, Sufang; Gribskov, Michael; Hazbun, Tony R; Pascuzzi, Pete E

2016-08-01

The NCI-60 human tumor cell line panel is an invaluable resource for cancer researchers, providing drug sensitivity, molecular and phenotypic data for a range of cancer types. CellMiner is a web resource that provides tools for the acquisition and analysis of quality-controlled NCI-60 data. CellMiner supports queries of up to 150 drugs or genes, but the output is an Excel file for each drug or gene. This output format makes it difficult for researchers to explore the data from large queries. CellMiner Companion is a web application that facilitates the exploration and visualization of output from CellMiner, further increasing the accessibility of NCI-60 data. The web application is freely accessible at https://pul-bioinformatics.shinyapps.io/CellMinerCompanion The R source code can be downloaded at https://github.com/pepascuzzi/CellMinerCompanion.git ppascuzz@purdue.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Defining Geodetic Reference Frame using Matlab®: PlatEMotion 2.0

NASA Astrophysics Data System (ADS)

Cannavò, Flavio; Palano, Mimmo

2016-03-01

We describe the main features of the developed software tool, namely PlatE-Motion 2.0 (PEM2), which allows inferring the Euler pole parameters by inverting the observed velocities at a set of sites located on a rigid block (inverse problem). PEM2 allows also calculating the expected velocity value for any point located on the Earth providing an Euler pole (direct problem). PEM2 is the updated version of a previous software tool initially developed for easy-to-use file exchange with the GAMIT/GLOBK software package. The software tool is developed in Matlab® framework and, as the previous version, includes a set of MATLAB functions (m-files), GUIs (fig-files), map data files (mat-files) and user's manual as well as some example input files. New changes in PEM2 include (1) some bugs fixed, (2) improvements in the code, (3) improvements in statistical analysis, (4) new input/output file formats. In addition, PEM2 can be now run under the majority of operating systems. The tool is open source and freely available for the scientific community.
Protein-protein interaction specificity is captured by contact preferences and interface composition.

PubMed

Nadalin, Francesca; Carbone, Alessandra

2018-02-01

Large-scale computational docking will be increasingly used in future years to discriminate protein-protein interactions at the residue resolution. Complete cross-docking experiments make in silico reconstruction of protein-protein interaction networks a feasible goal. They ask for efficient and accurate screening of the millions structural conformations issued by the calculations. We propose CIPS (Combined Interface Propensity for decoy Scoring), a new pair potential combining interface composition with residue-residue contact preference. CIPS outperforms several other methods on screening docking solutions obtained either with all-atom or with coarse-grain rigid docking. Further testing on 28 CAPRI targets corroborates CIPS predictive power over existing methods. By combining CIPS with atomic potentials, discrimination of correct conformations in all-atom structures reaches optimal accuracy. The drastic reduction of candidate solutions produced by thousands of proteins docked against each other makes large-scale docking accessible to analysis. CIPS source code is freely available at http://www.lcqb.upmc.fr/CIPS. alessandra.carbone@lip6.fr. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Django Remote Submission

DOE Office of Scientific and Technical Information (OSTI.GOV)

Doucet, Mathieu; Hobson, Tanner C.; Ferraz Leal, Ricardo Miguel

The Django Remote Submission (DRS) is a Django (Django, n.d.) application to manage long running job submission, including starting the job, saving logs, and storing results. It is an independent project available as a standalone pypi package (PyPi, n.d.). It can be easily integrated in any Django project. The source code is freely available as a GitHub repository (django-remote-submission, n.d.). To run the jobs in background, DRS takes advantage of Celery (Celery, n.d.), a powerful asynchronous job queue used for running tasks in the background, and the Redis Server (Redis, n.d.), an in-memory data structure store. Celery uses brokers tomore » pass messages between a Django Project and the Celery workers. Redis is the message broker of DRS. In addition DRS provides real time monitoring of the progress of Jobs and associated logs. Through the Django Channels project (Channels, n.d.), and the usage of Web Sockets, it is possible to asynchronously display the Job Status and the live Job output (standard output and standard error) on a web page.« less
MFCompress: a compression tool for FASTA and multi-FASTA data.

PubMed

Pinho, Armando J; Pratas, Diogo

2014-01-01

The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed for the compression of genomics data, but unfortunately only a few of them have been made available as usable and reliable compression tools. In this article, we describe one such tool, MFCompress, specially designed for the compression of FASTA and multi-FASTA files. In comparison to gzip and applied to multi-FASTA files, MFCompress can provide additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost of some more computation time. On highly redundant datasets, and in comparison with gzip, 8-fold size reductions have been obtained. Both source code and binaries for several operating systems are freely available for non-commercial use at http://bioinformatics.ua.pt/software/mfcompress/.
CaFE: a tool for binding affinity prediction using end-point free energy methods.

PubMed

Liu, Hui; Hou, Tingjun

2016-07-15

Accurate prediction of binding free energy is of particular importance to computational biology and structure-based drug design. Among those methods for binding affinity predictions, the end-point approaches, such as MM/PBSA and LIE, have been widely used because they can achieve a good balance between prediction accuracy and computational cost. Here we present an easy-to-use pipeline tool named Calculation of Free Energy (CaFE) to conduct MM/PBSA and LIE calculations. Powered by the VMD and NAMD programs, CaFE is able to handle numerous static coordinate and molecular dynamics trajectory file formats generated by different molecular simulation packages and supports various force field parameters. CaFE source code and documentation are freely available under the GNU General Public License via GitHub at https://github.com/huiliucode/cafe_plugin It is a VMD plugin written in Tcl and the usage is platform-independent. tingjunhou@zju.edu.cn. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
ProGeRF: Proteome and Genome Repeat Finder Utilizing a Fast Parallel Hash Function

PubMed Central

Moraes, Walas Jhony Lopes; Rodrigues, Thiago de Souza; Bartholomeu, Daniella Castanheira

2015-01-01

Repetitive element sequences are adjacent, repeating patterns, also called motifs, and can be of different lengths; repetitions can involve their exact or approximate copies. They have been widely used as molecular markers in population biology. Given the sizes of sequenced genomes, various bioinformatics tools have been developed for the extraction of repetitive elements from DNA sequences. However, currently available tools do not provide options for identifying repetitive elements in the genome or proteome, displaying a user-friendly web interface, and performing-exhaustive searches. ProGeRF is a web site for extracting repetitive regions from genome and proteome sequences. It was designed to be efficient, fast, and accurate and primarily user-friendly web tool allowing many ways to view and analyse the results. ProGeRF (Proteome and Genome Repeat Finder) is freely available as a stand-alone program, from which the users can download the source code, and as a web tool. It was developed using the hash table approach to extract perfect and imperfect repetitive regions in a (multi)FASTA file, while allowing a linear time complexity. PMID:25811026

JSBML: a flexible Java library for working with SBML.

PubMed

Dräger, Andreas; Rodriguez, Nicolas; Dumousseau, Marine; Dörr, Alexander; Wrzodek, Clemens; Le Novère, Nicolas; Zell, Andreas; Hucka, Michael

2011-08-01

The specifications of the Systems Biology Markup Language (SBML) define standards for storing and exchanging computer models of biological processes in text files. In order to perform model simulations, graphical visualizations and other software manipulations, an in-memory representation of SBML is required. We developed JSBML for this purpose. In contrast to prior implementations of SBML APIs, JSBML has been designed from the ground up for the Java programming language, and can therefore be used on all platforms supported by a Java Runtime Environment. This offers important benefits for Java users, including the ability to distribute software as Java Web Start applications. JSBML supports all SBML Levels and Versions through Level 3 Version 1, and we have strived to maintain the highest possible degree of compatibility with the popular library libSBML. JSBML also supports modules that can facilitate the development of plugins for end user applications, as well as ease migration from a libSBML-based backend. Source code, binaries and documentation for JSBML can be freely obtained under the terms of the LGPL 2.1 from the website http://sbml.org/Software/JSBML.
Identifying functionally informative evolutionary sequence profiles.

PubMed

Gil, Nelson; Fiser, Andras

2018-04-15

Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
Simulation of networks of spiking neurons: A review of tools and strategies

PubMed Central

Brette, Romain; Rudolph, Michelle; Carnevale, Ted; Hines, Michael; Beeman, David; Bower, James M.; Diesmann, Markus; Morrison, Abigail; Goodman, Philip H.; Harris, Frederick C.; Zirpe, Milind; Natschläger, Thomas; Pecevski, Dejan; Ermentrout, Bard; Djurfeldt, Mikael; Lansner, Anders; Rochel, Olivier; Vieville, Thierry; Muller, Eilif; Davison, Andrew P.; El Boustani, Sami

2009-01-01

We review different aspects of the simulation of spiking neural networks. We start by reviewing the different types of simulation strategies and algorithms that are currently implemented. We next review the precision of those simulation strategies, in particular in cases where plasticity depends on the exact timing of the spikes. We overview different simulators and simulation environments presently available (restricted to those freely available, open source and documented). For each simulation tool, its advantages and pitfalls are reviewed, with an aim to allow the reader to identify which simulator is appropriate for a given task. Finally, we provide a series of benchmark simulations of different types of networks of spiking neurons, including Hodgkin–Huxley type, integrate-and-fire models, interacting with current-based or conductance-based synapses, using clock-driven or event-driven integration strategies. The same set of models are implemented on the different simulators, and the codes are made available. The ultimate goal of this review is to provide a resource to facilitate identifying the appropriate integration strategy and simulation tool to use for a given modeling problem related to spiking neural networks. PMID:17629781
Delta: a new web-based 3D genome visualization and analysis platform.

PubMed

Tang, Bixia; Li, Feifei; Li, Jing; Zhao, Wenming; Zhang, Zhihua

2018-04-15

Delta is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes. Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome. Delta features a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs. Finally, by visually comparing the 3D model of the β-globin gene locus and its annotation, we speculated a plausible transitory interaction pattern in the locus. Experimental evidence was found to support this speculation by literature survey. This served as an example of intuitive hypothesis testing with the help of Delta. Delta is freely accessible from http://delta.big.ac.cn, and the source code is available at https://github.com/zhangzhwlab/delta. zhangzhihua@big.ac.cn. Supplementary data are available at Bioinformatics online.
Data collection and storage in long-term ecological and evolutionary studies: The Mongoose 2000 system.

PubMed

Marshall, Harry H; Griffiths, David J; Mwanguhya, Francis; Businge, Robert; Griffiths, Amber G F; Kyabulima, Solomon; Mwesige, Kenneth; Sanderson, Jennifer L; Thompson, Faye J; Vitikainen, Emma I K; Cant, Michael A

2018-01-01

Studying ecological and evolutionary processes in the natural world often requires research projects to follow multiple individuals in the wild over many years. These projects have provided significant advances but may also be hampered by needing to accurately and efficiently collect and store multiple streams of the data from multiple individuals concurrently. The increase in the availability and sophistication of portable computers (smartphones and tablets) and the applications that run on them has the potential to address many of these data collection and storage issues. In this paper we describe the challenges faced by one such long-term, individual-based research project: the Banded Mongoose Research Project in Uganda. We describe a system we have developed called Mongoose 2000 that utilises the potential of apps and portable computers to meet these challenges. We discuss the benefits and limitations of employing such a system in a long-term research project. The app and source code for the Mongoose 2000 system are freely available and we detail how it might be used to aid data collection and storage in other long-term individual-based projects.
Gene Graphics: a genomic neighborhood data visualization web application.

PubMed

Harrison, Katherine J; Crécy-Lagard, Valérie de; Zallot, Rémi

2018-04-15

The examination of gene neighborhood is an integral part of comparative genomics but no tools to produce publication quality graphics of gene clusters are available. Gene Graphics is a straightforward web application for creating such visuals. Supported inputs include National Center for Biotechnology Information gene and protein identifiers with automatic fetching of neighboring information, GenBank files and data extracted from the SEED database. Gene representations can be customized for many parameters including gene and genome names, colors and sizes. Gene attributes can be copied and pasted for rapid and user-friendly customization of homologous genes between species. In addition to Portable Network Graphics and Scalable Vector Graphics, produced representations can be exported as Tagged Image File Format or Encapsulated PostScript, formats that are standard for publication. Hands-on tutorials with real life examples inspired from publications are available for training. Gene Graphics is freely available at https://katlabs.cc/genegraphics/ and source code is hosted at https://github.com/katlabs/genegraphics. katherinejh@ufl.edu or remizallot@ufl.edu. Supplementary data are available at Bioinformatics online.
Django Remote Submission

DOE PAGES

Doucet, Mathieu; Hobson, Tanner C.; Ferraz Leal, Ricardo Miguel

2017-08-01

The Django Remote Submission (DRS) is a Django (Django, n.d.) application to manage long running job submission, including starting the job, saving logs, and storing results. It is an independent project available as a standalone pypi package (PyPi, n.d.). It can be easily integrated in any Django project. The source code is freely available as a GitHub repository (django-remote-submission, n.d.). To run the jobs in background, DRS takes advantage of Celery (Celery, n.d.), a powerful asynchronous job queue used for running tasks in the background, and the Redis Server (Redis, n.d.), an in-memory data structure store. Celery uses brokers tomore » pass messages between a Django Project and the Celery workers. Redis is the message broker of DRS. In addition DRS provides real time monitoring of the progress of Jobs and associated logs. Through the Django Channels project (Channels, n.d.), and the usage of Web Sockets, it is possible to asynchronously display the Job Status and the live Job output (standard output and standard error) on a web page.« less
Surface-Source Downhole Seismic Analysis in R

USGS Publications Warehouse

Thompson, Eric M.

2007-01-01

This report discusses a method for interpreting a layered slowness or velocity model from surface-source downhole seismic data originally presented by Boore (2003). I have implemented this method in the statistical computing language R (R Development Core Team, 2007), so that it is freely and easily available to researchers and practitioners that may find it useful. I originally applied an early version of these routines to seismic cone penetration test data (SCPT) to analyze the horizontal variability of shear-wave velocity within the sediments in the San Francisco Bay area (Thompson et al., 2006). A more recent version of these codes was used to analyze the influence of interface-selection and model assumptions on velocity/slowness estimates and the resulting differences in site amplification (Boore and Thompson, 2007). The R environment has many benefits for scientific and statistical computation; I have chosen R to disseminate these routines because it is versatile enough to program specialized routines, is highly interactive which aids in the analysis of data, and is freely and conveniently available to install on a wide variety of computer platforms. These scripts are useful for the interpretation of layered velocity models from surface-source downhole seismic data such as deep boreholes and SCPT data. The inputs are the travel-time data and the offset of the source at the surface. The travel-time arrivals for the P- and S-waves must already be picked from the original data. An option in the inversion is to include estimates of the standard deviation of the travel-time picks for a weighted inversion of the velocity profile. The standard deviation of each travel-time pick is defined relative to the standard deviation of the best pick in a profile and is based on the accuracy with which the travel-time measurement could be determined from the seismogram. The analysis of the travel-time data consists of two parts: the identification of layer-interfaces, and the inversion for the velocity of each layer. The analyst usually picks layer-interfaces by visual inspection of the travel-time data. I have also developed an algorithm that automatically finds boundaries which can save a significant amount of the time when analyzing a large number of sites. The results of the automatic routines should be reviewed to check that they are reasonable. The interactivity of these scripts allows the user to add and to remove layers quickly, thus allowing rapid feedback on how the residuals are affected by each additional parameter in the inversion. In addition, the script allows many models to be compared at the same time.
Azimuthal Dependence of the Ground Motion Variability from Scenario Modeling of the 2014 Mw6.0 South Napa, California, Earthquake Using an Advanced Kinematic Source Model

NASA Astrophysics Data System (ADS)

Gallovič, F.

2017-09-01

Strong ground motion simulations require physically plausible earthquake source model. Here, I present the application of such a kinematic model introduced originally by Ruiz et al. (Geophys J Int 186:226-244, 2011). The model is constructed to inherently provide synthetics with the desired omega-squared spectral decay in the full frequency range. The source is composed of randomly distributed overlapping subsources with fractal number-size distribution. The position of the subsources can be constrained by prior knowledge of major asperities (stemming, e.g., from slip inversions), or can be completely random. From earthquake physics point of view, the model includes positive correlation between slip and rise time as found in dynamic source simulations. Rupture velocity and rise time follows local S-wave velocity profile, so that the rupture slows down and rise times increase close to the surface, avoiding unrealistically strong ground motions. Rupture velocity can also have random variations, which result in irregular rupture front while satisfying the causality principle. This advanced kinematic broadband source model is freely available and can be easily incorporated into any numerical wave propagation code, as the source is described by spatially distributed slip rate functions, not requiring any stochastic Green's functions. The source model has been previously validated against the observed data due to the very shallow unilateral 2014 Mw6 South Napa, California, earthquake; the model reproduces well the observed data including the near-fault directivity (Seism Res Lett 87:2-14, 2016). The performance of the source model is shown here on the scenario simulations for the same event. In particular, synthetics are compared with existing ground motion prediction equations (GMPEs), emphasizing the azimuthal dependence of the between-event ground motion variability. I propose a simple model reproducing the azimuthal variations of the between-event ground motion variability, providing an insight into possible refinement of GMPEs' functional forms.
PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

PubMed Central

Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

2008-01-01

Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802
The Wageningen Lowland Runoff Simulator (WALRUS): a Novel Open Source Rainfall-Runoff Model for Areas with Shallow Groundwater

NASA Astrophysics Data System (ADS)

Brauer, C.; Teuling, R.; Torfs, P.; Uijlenhoet, R.

2014-12-01

Recently, we developed the Wageningen Lowland Runoff Simulator (WALRUS) to fill the gap between complex, spatially distributed models which are often used in lowland regions and simple, parametric models which have mostly been developed for mountainous catchments. This parametric rainfall-runoff model can be used all over the world, both in freely draining lowland catchments and polders with controlled water levels. Here, we present the model implementation and our recent experience in training students and practitioners to use the model. WALRUS has several advantages that facilitate practical application. Firstly, WALRUS is computationally efficient, which allows for operational forecasting and uncertainty estimation by running ensembles. Secondly, the code is set-up such that it can be used by both practitioners and researchers. For direct use by practitioners, defaults are implemented for relations between model variables and for the computation of initial conditions based on discharge only, leaving only four parameters which require calibration. For research purposes, the defaults can easily be changed. Finally, an approach for flexible time steps increases numerical stability and makes model parameter values independent of time step size, which facilitates use of the model with the same parameter set for multi-year water balance studies as well as detailed analyses of individual flood peaks. The open source model code is currently implemented in R and compiled into a package. This package will be made available through the R CRAN server. A small massive open online course (MOOC) is being developed to give students, researchers and practitioners a step-by-step WALRUS-training. This course contains explanations about model elements and its advantages and limitations, as well as hands-on exercises to learn how to use WALRUS. All code, course, literature and examples will be collected on a dedicated website, which can be found via www.wageningenur.nl/hwm. References C.C. Brauer, et al. (2014a). Geosci. Model Dev. Discuss., 7, 1357—1411. C.C. Brauer, et al. (2014b). Hydrol. Earth Syst. Sci. Discuss., 11, 2091—2148.
Facilitating Internet-Scale Code Retrieval

ERIC Educational Resources Information Center

Bajracharya, Sushil Krishna

2010-01-01

Internet-Scale code retrieval deals with the representation, storage, and access of relevant source code from a large amount of source code available on the Internet. Internet-Scale code retrieval systems support common emerging practices among software developers related to finding and reusing source code. In this dissertation we focus on some…
Enabling cost-effective multimodal trip planners through open transit data.

DOT National Transportation Integrated Search

2011-05-01

This study examined whether multimodal trip planners can be developed using opensource software and open data sources. : OpenStreetMap (OSM), maintained by the nonprofit OpenStreetMap Foundation, is an open, freely available international : rep...
Enabling cost-effective multimodal trip planners through open transit data.

DOT National Transportation Integrated Search

2011-05-01

This study examined whether multimodal trip planners can be developed using opensource software and open data sources. OpenStreetMap (OSM), maintained by the nonprofit OpenStreetMap Foundation, is an open, freely available international reposit...
Maestro and Castro: Simulation Codes for Astrophysical Flows

NASA Astrophysics Data System (ADS)

Zingale, Michael; Almgren, Ann; Beckner, Vince; Bell, John; Friesen, Brian; Jacobs, Adam; Katz, Maximilian P.; Malone, Christopher; Nonaka, Andrew; Zhang, Weiqun

2017-01-01

Stellar explosions are multiphysics problems—modeling them requires the coordinated input of gravity solvers, reaction networks, radiation transport, and hydrodynamics together with microphysics recipes to describe the physics of matter under extreme conditions. Furthermore, these models involve following a wide range of spatial and temporal scales, which puts tough demands on simulation codes. We developed the codes Maestro and Castro to meet the computational challenges of these problems. Maestro uses a low Mach number formulation of the hydrodynamics to efficiently model convection. Castro solves the fully compressible radiation hydrodynamics equations to capture the explosive phases of stellar phenomena. Both codes are built upon the BoxLib adaptive mesh refinement library, which prepares them for next-generation exascale computers. Common microphysics shared between the codes allows us to transfer a problem from the low Mach number regime in Maestro to the explosive regime in Castro. Importantly, both codes are freely available (https://github.com/BoxLib-Codes). We will describe the design of the codes and some of their science applications, as well as future development directions.Support for development was provided by NSF award AST-1211563 and DOE/Office of Nuclear Physics grant DE-FG02-87ER40317 to Stony Brook and by the Applied Mathematics Program of the DOE Office of Advance Scientific Computing Research under US DOE contract DE-AC02-05CH11231 to LBNL.
A smooth particle hydrodynamics code to model collisions between solid, self-gravitating objects

NASA Astrophysics Data System (ADS)

Schäfer, C.; Riecker, S.; Maindl, T. I.; Speith, R.; Scherrer, S.; Kley, W.

2016-05-01

Context. Modern graphics processing units (GPUs) lead to a major increase in the performance of the computation of astrophysical simulations. Owing to the different nature of GPU architecture compared to traditional central processing units (CPUs) such as x86 architecture, existing numerical codes cannot be easily migrated to run on GPU. Here, we present a new implementation of the numerical method smooth particle hydrodynamics (SPH) using CUDA and the first astrophysical application of the new code: the collision between Ceres-sized objects. Aims: The new code allows for a tremendous increase in speed of astrophysical simulations with SPH and self-gravity at low costs for new hardware. Methods: We have implemented the SPH equations to model gas, liquids and elastic, and plastic solid bodies and added a fragmentation model for brittle materials. Self-gravity may be optionally included in the simulations and is treated by the use of a Barnes-Hut tree. Results: We find an impressive performance gain using NVIDIA consumer devices compared to our existing OpenMP code. The new code is freely available to the community upon request. If you are interested in our CUDA SPH code miluphCUDA, please write an email to Christoph Schäfer. miluphCUDA is the CUDA port of miluph. miluph is pronounced [maßl2v]. We do not support the use of the code for military purposes.
The Chandra X-ray Observatory: An Astronomical Facility Available to the World

NASA Technical Reports Server (NTRS)

Smith, Randall K.

2006-01-01

The Chandra X-ray observatory, one of NASA's "Great Observatories," provides high angular and spectral resolution X-ray data which is freely available to all. In this review I describe the instruments on chandra along with their current calibration, as well as the chandra proposal system, the freely-available Chandra analysis software package CIAO, and the Chandra archive. As Chandra is in its 6th year of operation, the archive already contains calibrated observations of a large range of X-ray sources. The Chandra X-ray Center is committed to assisting astronomers from any country who wish to use data from the archive or propose for observations
Joint source-channel coding for motion-compensated DCT-based SNR scalable video.

PubMed

Kondi, Lisimachos P; Ishtiaq, Faisal; Katsaggelos, Aggelos K

2002-01-01

In this paper, we develop an approach toward joint source-channel coding for motion-compensated DCT-based scalable video coding and transmission. A framework for the optimal selection of the source and channel coding rates over all scalable layers is presented such that the overall distortion is minimized. The algorithm utilizes universal rate distortion characteristics which are obtained experimentally and show the sensitivity of the source encoder and decoder to channel errors. The proposed algorithm allocates the available bit rate between scalable layers and, within each layer, between source and channel coding. We present the results of this rate allocation algorithm for video transmission over a wireless channel using the H.263 Version 2 signal-to-noise ratio (SNR) scalable codec for source coding and rate-compatible punctured convolutional (RCPC) codes for channel coding. We discuss the performance of the algorithm with respect to the channel conditions, coding methodologies, layer rates, and number of layers.
Implementation of a flexible and scalable particle-in-cell method for massively parallel computations in the mantle convection code ASPECT

NASA Astrophysics Data System (ADS)

Gassmöller, Rene; Bangerth, Wolfgang

2016-04-01

Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a modern advection-field approach, and demonstrate under which conditions which method is more efficient. We implemented the presented methods in ASPECT (aspect.dealii.org), a freely available open-source community code for geodynamic simulations. The structure of the particle code is highly modular, and segregated from the PDE solver, and can thus be easily transferred to other programs, or adapted for various application cases.
Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles.

PubMed

Piao, Yongjun; Piao, Minghao; Ryu, Keun Ho

2017-01-01

Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/. Copyright © 2016 Elsevier Ltd. All rights reserved.

REDItools: high-throughput RNA editing detection made easy.

PubMed

Picardi, Ernesto; Pesole, Graziano

2013-07-15

The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. ernesto.picardi@uniba.it or graziano.pesole@uniba.it Supplementary data are available at Bioinformatics online.
AQUIFEM-SALT; a finite-element model for aquifers containing a seawater interface

USGS Publications Warehouse

Voss, C.I.

1984-01-01

Described are modifications to AQUIFEM, a finite element areal ground-water flow model for aquifer evaluation. The modified model, AQUIFEM-SALT, simulates an aquifer containing a freshwater body that freely floats on seawater. Parts of the freshwater lens may be confined above and below by less permeable units. Theory, code modifications, and model verification are discussed. A modified input data list is included. This report is intended as a companion to the original AQUIFEM documentation. (USGS)
Report on the Service Academy Sexual Assault and Leadership Survey

DTIC Science & Technology

2005-03-04

decayed into something more idealistic than practical , which is troubling since we will all come out of here as 2nd Lt’s.” • Comment two: “Honor...cadets who practice the toleration portion of the honor code. Especially in the IC locker rooms it is almost impossible to turn someone in for honor...freely talk about porn, masturbation , sex, strip clubs, hookers, and various other things which I consider to be personal issues that I would rather
Small Business Innovations (Helicopters)

NASA Technical Reports Server (NTRS)

1992-01-01

The amount of engine power required for a helicopter to hover is an important, but difficult, consideration in helicopter design. The EHPIC program model produces converged, freely distorted wake geometries that generate accurate analysis of wake-induced downwash, allowing good predictions of rotor thrust and power requirements. Continuum Dynamics, Inc., the Small Business Innovation Research (SBIR) company that developed EHPIC, also produces RotorCRAFT, a program for analysis of aerodynamic loading of helicopter blades in forward flight. Both helicopter codes have been licensed to commercial manufacturers.
Authorship Attribution of Source Code

ERIC Educational Resources Information Center

Tennyson, Matthew F.

2013-01-01

Authorship attribution of source code is the task of deciding who wrote a program, given its source code. Applications include software forensics, plagiarism detection, and determining software ownership. A number of methods for the authorship attribution of source code have been presented in the past. A review of those existing methods is…
SpecBit, DecayBit and PrecisionBit: GAMBIT modules for computing mass spectra, particle decay rates and precision observables

NASA Astrophysics Data System (ADS)

Athron, Peter; Balázs, Csaba; Dal, Lars A.; Edsjö, Joakim; Farmer, Ben; Gonzalo, Tomás E.; Kvellestad, Anders; McKay, James; Putze, Antje; Rogan, Chris; Scott, Pat; Weniger, Christoph; White, Martin

2018-01-01

We present the GAMBIT modules SpecBit, DecayBit and PrecisionBit. Together they provide a new framework for linking publicly available spectrum generators, decay codes and other precision observable calculations in a physically and statistically consistent manner. This allows users to automatically run various combinations of existing codes as if they are a single package. The modular design allows software packages fulfilling the same role to be exchanged freely at runtime, with the results presented in a common format that can easily be passed to downstream dark matter, collider and flavour codes. These modules constitute an essential part of the broader GAMBIT framework, a major new software package for performing global fits. In this paper we present the observable calculations, data, and likelihood functions implemented in the three modules, as well as the conventions and assumptions used in interfacing them with external codes. We also present 3-BIT-HIT, a command-line utility for computing mass spectra, couplings, decays and precision observables in the MSSM, which shows how the three modules can easily be used independently of GAMBIT.
Spatial trends, sources, and air-water exchange of organochlorine pesticides in the Great Lakes basin using low density polyethylene passive samplers.

PubMed

Khairy, Mohammed; Muir, Derek; Teixeira, Camilla; Lohmann, Rainer

2014-08-19

Polyethylene passive samplers were deployed during summer and fall of 2011 in the lower Great Lakes to assess the spatial distribution and sources of gaseous and freely dissolved organochlorine pesticides (OCPs) and their air-water exchange. Average gaseous OCP concentrations ranged from nondetect to 133 pg/m(3). Gaseous concentrations of hexachlorobenzene, dieldrin, and chlordanes were significantly greater (Mann-Whitney test, p < 0.05) at Lake Erie than Lake Ontario. A multiple linear regression implied that both cropland and urban areas within 50 and 10 km buffer zones, respectively, were critical parameters to explain the total variability in atmospheric concentrations. Freely dissolved OCP concentrations (nondetect to 114 pg/L) were lower than previously reported. Aqueous half-lives generally ranged from 1.7 to 6.7 years. Nonetheless, concentrations of p,p'-DDE and chlordanes were higher than New York State Ambient Water Quality Standards for the protection of human health from the consumption of fish. Spatial distributions of freely dissolved OCPs in both lakes were influenced by loadings from areas of concern and the water circulation patterns. Flux calculations indicated net deposition of γ-hexachlorocyclohexane, heptachlor-epoxide, and α- and β-endosulfan (-0.02 to -33 ng/m(2)/day) and net volatilization of heptachlor, aldrin, trans-chlordane, and trans-nonachlor (0.0 to 9.0 ng/m(2)/day) in most samples.
An open annotation ontology for science on web 3.0

PubMed Central

2011-01-01

Background There is currently a gap between the rich and expressive collection of published biomedical ontologies, and the natural language expression of biomedical papers consumed on a daily basis by scientific researchers. The purpose of this paper is to provide an open, shareable structure for dynamic integration of biomedical domain ontologies with the scientific document, in the form of an Annotation Ontology (AO), thus closing this gap and enabling application of formal biomedical ontologies directly to the literature as it emerges. Methods Initial requirements for AO were elicited by analysis of integration needs between biomedical web communities, and of needs for representing and integrating results of biomedical text mining. Analysis of strengths and weaknesses of previous efforts in this area was also performed. A series of increasingly refined annotation tools were then developed along with a metadata model in OWL, and deployed for feedback and additional requirements the ontology to users at a major pharmaceutical company and a major academic center. Further requirements and critiques of the model were also elicited through discussions with many colleagues and incorporated into the work. Results This paper presents Annotation Ontology (AO), an open ontology in OWL-DL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables “stand-off” or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation. AO is freely available under open source license at http://purl.org/ao/, and extensive documentation including screencasts is available on AO’s Google Code page: http://code.google.com/p/annotation-ontology/ . Conclusions The Annotation Ontology meets critical requirements for an open, freely shareable model in OWL, of annotation metadata created against scientific documents on the Web. We believe AO can become a very useful common model for annotation metadata on Web documents, and will enable biomedical domain ontologies to be used quite widely to annotate the scientific literature. Potential collaborators and those with new relevant use cases are invited to contact the authors. PMID:21624159
An open annotation ontology for science on web 3.0.

PubMed

Ciccarese, Paolo; Ocana, Marco; Garcia Castro, Leyla Jael; Das, Sudeshna; Clark, Tim

2011-05-17

There is currently a gap between the rich and expressive collection of published biomedical ontologies, and the natural language expression of biomedical papers consumed on a daily basis by scientific researchers. The purpose of this paper is to provide an open, shareable structure for dynamic integration of biomedical domain ontologies with the scientific document, in the form of an Annotation Ontology (AO), thus closing this gap and enabling application of formal biomedical ontologies directly to the literature as it emerges. Initial requirements for AO were elicited by analysis of integration needs between biomedical web communities, and of needs for representing and integrating results of biomedical text mining. Analysis of strengths and weaknesses of previous efforts in this area was also performed. A series of increasingly refined annotation tools were then developed along with a metadata model in OWL, and deployed for feedback and additional requirements the ontology to users at a major pharmaceutical company and a major academic center. Further requirements and critiques of the model were also elicited through discussions with many colleagues and incorporated into the work. This paper presents Annotation Ontology (AO), an open ontology in OWL-DL for annotating scientific documents on the web. AO supports both human and algorithmic content annotation. It enables "stand-off" or independent metadata anchored to specific positions in a web document by any one of several methods. In AO, the document may be annotated but is not required to be under update control of the annotator. AO contains a provenance model to support versioning, and a set model for specifying groups and containers of annotation. AO is freely available under open source license at http://purl.org/ao/, and extensive documentation including screencasts is available on AO's Google Code page: http://code.google.com/p/annotation-ontology/ . The Annotation Ontology meets critical requirements for an open, freely shareable model in OWL, of annotation metadata created against scientific documents on the Web. We believe AO can become a very useful common model for annotation metadata on Web documents, and will enable biomedical domain ontologies to be used quite widely to annotate the scientific literature. Potential collaborators and those with new relevant use cases are invited to contact the authors.
Modularized seismic full waveform inversion based on waveform sensitivity kernels - The software package ASKI

NASA Astrophysics Data System (ADS)

Schumacher, Florian; Friederich, Wolfgang; Lamara, Samir; Gutt, Phillip; Paffrath, Marcel

2015-04-01

We present a seismic full waveform inversion concept for applications ranging from seismological to enineering contexts, based on sensitivity kernels for full waveforms. The kernels are derived from Born scattering theory as the Fréchet derivatives of linearized frequency-domain full waveform data functionals, quantifying the influence of elastic earth model parameters and density on the data values. For a specific source-receiver combination, the kernel is computed from the displacement and strain field spectrum originating from the source evaluated throughout the inversion domain, as well as the Green function spectrum and its strains originating from the receiver. By storing the wavefield spectra of specific sources/receivers, they can be re-used for kernel computation for different specific source-receiver combinations, optimizing the total number of required forward simulations. In the iterative inversion procedure, the solution of the forward problem, the computation of sensitivity kernels and the derivation of a model update is held completely separate. In particular, the model description for the forward problem and the description of the inverted model update are kept independent. Hence, the resolution of the inverted model as well as the complexity of solving the forward problem can be iteratively increased (with increasing frequency content of the inverted data subset). This may regularize the overall inverse problem and optimizes the computational effort of both, solving the forward problem and computing the model update. The required interconnection of arbitrary unstructured volume and point grids is realized by generalized high-order integration rules and 3D-unstructured interpolation methods. The model update is inferred solving a minimization problem in a least-squares sense, resulting in Gauss-Newton convergence of the overall inversion process. The inversion method was implemented in the modularized software package ASKI (Analysis of Sensitivity and Kernel Inversion), which provides a generalized interface to arbitrary external forward modelling codes. So far, the 3D spectral-element code SPECFEM3D (Tromp, Komatitsch and Liu, 2008) and the 1D semi-analytical code GEMINI (Friederich and Dalkolmo, 1995) in both, Cartesian and spherical framework are supported. The creation of interfaces to further forward codes is planned in the near future. ASKI is freely available under the terms of the GPL at www.rub.de/aski . Since the independent modules of ASKI must communicate via file output/input, large storage capacities need to be accessible conveniently. Storing the complete sensitivity matrix to file, however, permits the scientist full manual control over each step in a customized procedure of sensitivity/resolution analysis and full waveform inversion. In the presentation, we will show some aspects of the theory behind the full waveform inversion method and its practical realization by the software package ASKI, as well as synthetic and real-data applications from different scales and geometries.
Optimal simultaneous superpositioning of multiple structures with missing data

PubMed Central

Theobald, Douglas L.; Steindel, Phillip A.

2012-01-01

Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually ‘missing’ from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether. Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation–maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case. Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org. Contact: dtheobald@brandeis.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22543369
Evaluation of hierarchical models for integrative genomic analyses.

PubMed

Denis, Marie; Tadesse, Mahlet G

2016-03-01

Advances in high-throughput technologies have led to the acquisition of various types of -omic data on the same biological samples. Each data type gives independent and complementary information that can explain the biological mechanisms of interest. While several studies performing independent analyses of each dataset have led to significant results, a better understanding of complex biological mechanisms requires an integrative analysis of different sources of data. Flexible modeling approaches, based on penalized likelihood methods and expectation-maximization (EM) algorithms, are studied and tested under various biological relationship scenarios between the different molecular features and their effects on a clinical outcome. The models are applied to genomic datasets from two cancer types in the Cancer Genome Atlas project: glioblastoma multiforme and ovarian serous cystadenocarcinoma. The integrative models lead to improved model fit and predictive performance. They also provide a better understanding of the biological mechanisms underlying patients' survival. Source code implementing the integrative models is freely available at https://github.com/mgt000/IntegrativeAnalysis along with example datasets and sample R script applying the models to these data. The TCGA datasets used for analysis are publicly available at https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp marie.denis@cirad.fr or mgt26@georgetown.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Pgltools: a genomic arithmetic tool suite for manipulation of Hi-C peak and other chromatin interaction data.

PubMed

Greenwald, William W; Li, He; Smith, Erin N; Benaglio, Paola; Nariai, Naoki; Frazer, Kelly A

2017-04-07

Genomic interaction studies use next-generation sequencing (NGS) to examine the interactions between two loci on the genome, with subsequent bioinformatics analyses typically including annotation, intersection, and merging of data from multiple experiments. While many file types and analysis tools exist for storing and manipulating single locus NGS data, there is currently no file standard or analysis tool suite for manipulating and storing paired-genomic-loci: the data type resulting from "genomic interaction" studies. As genomic interaction sequencing data are becoming prevalent, a standard file format and tools for working with these data conveniently and efficiently are needed. This article details a file standard and novel software tool suite for working with paired-genomic-loci data. We present the paired-genomic-loci (PGL) file standard for genomic-interactions data, and the accompanying analysis tool suite "pgltools": a cross platform, pypy compatible python package available both as an easy-to-use UNIX package, and as a python module, for integration into pipelines of paired-genomic-loci analyses. Pgltools is a freely available, open source tool suite for manipulating paired-genomic-loci data. Source code, an in-depth manual, and a tutorial are available publicly at www.github.com/billgreenwald/pgltools , and a python module of the operations can be installed from PyPI via the PyGLtools module.
The BioGRID Interaction Database: 2011 update

PubMed Central

Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S.; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M.; Winter, Andrew; Dolinski, Kara; Tyers, Mike

2011-01-01

The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413
MAPPI-DAT: data management and analysis for protein-protein interaction data from the high-throughput MAPPIT cell microarray platform.

PubMed

Gupta, Surya; De Puysseleyr, Veronic; Van der Heyden, José; Maddelein, Davy; Lemmens, Irma; Lievens, Sam; Degroeve, Sven; Tavernier, Jan; Martens, Lennart

2017-05-01

Protein-protein interaction (PPI) studies have dramatically expanded our knowledge about cellular behaviour and development in different conditions. A multitude of high-throughput PPI techniques have been developed to achieve proteome-scale coverage for PPI studies, including the microarray based Mammalian Protein-Protein Interaction Trap (MAPPIT) system. Because such high-throughput techniques typically report thousands of interactions, managing and analysing the large amounts of acquired data is a challenge. We have therefore built the MAPPIT cell microArray Protein Protein Interaction-Data management & Analysis Tool (MAPPI-DAT) as an automated data management and analysis tool for MAPPIT cell microarray experiments. MAPPI-DAT stores the experimental data and metadata in a systematic and structured way, automates data analysis and interpretation, and enables the meta-analysis of MAPPIT cell microarray data across all stored experiments. MAPPI-DAT is developed in Python, using R for data analysis and MySQL as data management system. MAPPI-DAT is cross-platform and can be ran on Microsoft Windows, Linux and OS X/macOS. The source code and a Microsoft Windows executable are freely available under the permissive Apache2 open source license at https://github.com/compomics/MAPPI-DAT. jan.tavernier@vib-ugent.be or lennart.martens@vib-ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
jsPhyloSVG: a javascript library for visualizing interactive and vector-based phylogenetic trees on the web.

PubMed

Smits, Samuel A; Ouverney, Cleber C

2010-08-18

Many software packages have been developed to address the need for generating phylogenetic trees intended for print. With an increased use of the web to disseminate scientific literature, there is a need for phylogenetic trees to be viewable across many types of devices and feature some of the interactive elements that are integral to the browsing experience. We propose a novel approach for publishing interactive phylogenetic trees. We present a javascript library, jsPhyloSVG, which facilitates constructing interactive phylogenetic trees from raw Newick or phyloXML formats directly within the browser in Scalable Vector Graphics (SVG) format. It is designed to work across all major browsers and renders an alternative format for those browsers that do not support SVG. The library provides tools for building rectangular and circular phylograms with integrated charting. Interactive features may be integrated and made to respond to events such as clicks on any element of the tree, including labels. jsPhyloSVG is an open-source solution for rendering dynamic phylogenetic trees. It is capable of generating complex and interactive phylogenetic trees across all major browsers without the need for plugins. It is novel in supporting the ability to interpret the tree inference formats directly, exposing the underlying markup to data-mining services. The library source code, extensive documentation and live examples are freely accessible at www.jsphylosvg.com.
RevEcoR: an R package for the reverse ecology analysis of microbiomes.

PubMed

Cao, Yang; Wang, Yuanyuan; Zheng, Xiaofei; Li, Fei; Bo, Xiaochen

2016-07-29

All species live in complex ecosystems. The structure and complexity of a microbial community reflects not only diversity and function, but also the environment in which it occurs. However, traditional ecological methods can only be applied on a small scale and for relatively well-understood biological systems. Recently, a graph-theory-based algorithm called the reverse ecology approach has been developed that can analyze the metabolic networks of all the species in a microbial community, and predict the metabolic interface between species and their environment. Here, we present RevEcoR, an R package and a Shiny Web application that implements the reverse ecology algorithm for determining microbe-microbe interactions in microbial communities. This software allows users to obtain large-scale ecological insights into species' ecology directly from high-throughput metagenomic data. The software has great potential for facilitating the study of microbiomes. RevEcoR is open source software for the study of microbial community ecology. The RevEcoR R package is freely available under the GNU General Public License v. 2.0 at http://cran.r-project.org/web/packages/RevEcoR/ with the vignette and typical usage examples, and the interactive Shiny web application is available at http://yiluheihei.shinyapps.io/shiny-RevEcoR , or can be installed locally with the source code accessed from https://github.com/yiluheihei/shiny-RevEcoR .
SANDS: A Service-Oriented Architecture for Clinical Decision Support in a National Health Information Network

PubMed Central

Wright, Adam; Sittig, Dean F.

2008-01-01

In this paper we describe and evaluate a new distributed architecture for clinical decision support called SANDS (Service-oriented Architecture for NHIN Decision Support), which leverages current health information exchange efforts and is based on the principles of a service-oriented architecture. The architecture allows disparate clinical information systems and clinical decision support systems to be seamlessly integrated over a network according to a set of interfaces and protocols described in this paper. The architecture described is fully defined and developed, and six use cases have been developed and tested using a prototype electronic health record which links to one of the existing prototype National Health Information Networks (NHIN): drug interaction checking, syndromic surveillance, diagnostic decision support, inappropriate prescribing in older adults, information at the point of care and a simple personal health record. Some of these use cases utilize existing decision support systems, which are either commercially or freely available at present, and developed outside of the SANDS project, while other use cases are based on decision support systems developed specifically for the project. Open source code for many of these components is available, and an open source reference parser is also available for comparison and testing of other clinical information systems and clinical decision support systems that wish to implement the SANDS architecture. PMID:18434256
Extracting and standardizing medication information in clinical text - the MedEx-UIMA system.

PubMed

Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C; Xu, Hua

2014-01-01

Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/.
EVEREST: Pixel Level Decorrelation of K2 Light Curves

NASA Astrophysics Data System (ADS)

Luger, Rodrigo; Agol, Eric; Kruse, Ethan; Barnes, Rory; Becker, Andrew; Foreman-Mackey, Daniel; Deming, Drake

2016-10-01

We present EPIC Variability Extraction and Removal for Exoplanet Science Targets (EVEREST), an open-source pipeline for removing instrumental noise from K2 light curves. EVEREST employs a variant of pixel level decorrelation to remove systematics introduced by the spacecraft’s pointing error and a Gaussian process to capture astrophysical variability. We apply EVEREST to all K2 targets in campaigns 0-7, yielding light curves with precision comparable to that of the original Kepler mission for stars brighter than {K}p≈ 13, and within a factor of two of the Kepler precision for fainter targets. We perform cross-validation and transit injection and recovery tests to validate the pipeline, and compare our light curves to the other de-trended light curves available for download at the MAST High Level Science Products archive. We find that EVEREST achieves the highest average precision of any of these pipelines for unsaturated K2 stars. The improved precision of these light curves will aid in exoplanet detection and characterization, investigations of stellar variability, asteroseismology, and other photometric studies. The EVEREST pipeline can also easily be applied to future surveys, such as the TESS mission, to correct for instrumental systematics and enable the detection of low signal-to-noise transiting exoplanets. The EVEREST light curves and the source code used to generate them are freely available online.

Comet: an open-source MS/MS sequence database search tool.

PubMed

Eng, Jimmy K; Jahan, Tahmina A; Hoopmann, Michael R

2013-01-01

Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-10-01

Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.
The Utility of Free Software for Gravity and Magnetic Advanced Data Processing

NASA Astrophysics Data System (ADS)

Grandis, Hendra; Dahrin, Darharta

2017-04-01

The lack of computational tools, i.e. software, often hinders the proper teaching and application of geophysical data processing in academic institutions in Indonesia. Although there are academic licensing options for commercial software, such options are still way beyond the financial capability of some academic institutions. Academic community members (both lecturers and students) are supposed to be creative and resourceful to overcome such situation. Therefore, capability for writing computer programs or codes is a necessity. However, there are also many computer programs and even software that are freely available on the internet. Generally, the utility of the freely distributed software is limited for demonstration only or for visualizing and exchanging data. The paper discusses the utility of Geosoft’s Oasis Montaj Viewer along with USGS GX programs that are available for free. Useful gravity and magnetic advanced data processing (i.e. gradient calculation, spectral analysis etc.) can be performed “correctly” without any approximation that sometimes leads to dubious results and interpretation.
Schroedinger’s code: Source code availability and transparency in astrophysics

NASA Astrophysics Data System (ADS)

Ryan, PW; Allen, Alice; Teuben, Peter

2018-01-01

Astronomers use software for their research, but how many of the codes they use are available as source code? We examined a sample of 166 papers from 2015 for clearly identified software use, then searched for source code for the software packages mentioned in these research papers. We categorized the software to indicate whether source code is available for download and whether there are restrictions to accessing it, and if source code was not available, whether some other form of the software, such as a binary, was. Over 40% of the source code for the software used in our sample was not available for download.As URLs have often been used as proxy citations for software, we also extracted URLs from one journal’s 2015 research articles, removed those from certain long-term, reliable domains, and tested the remainder to determine what percentage of these URLs were still accessible in September and October, 2017.
Literature-based cheminformatics for research in chemical toxicity

EPA Science Inventory

PubMed is the largest freely available source of published literature available online with access to 27 million citations (as of October 2017). Contained within the literature is an abundance of information about the activity of chemicals in biological systems. Literature inform...
Sensory Coordination of Insect Flight

DTIC Science & Technology

2009-12-29

begun to study how fruit flies pinpoint the location of an odor source ( banana mash placed within a black pole, a strong visual landmark against a...hover feeding, flower tracking, odor tracking etc. Figure 4: Extracting wing and body kinematics from freely flying Drosophila melanogaster. (A
Adoption of Test Driven Development and Continuous Integration for the Development of the Trick Simulation Toolkit

NASA Technical Reports Server (NTRS)

Penn, John M.

2013-01-01

This paper describes the adoption of a Test Driven Development approach and a Continuous Integration System in the development of the Trick Simulation Toolkit, a generic simulation development environment for creating high fidelity training and engineering simulations at the NASA/Johnson Space Center and many other NASA facilities. It describes what was learned and the significant benefits seen, such as fast, thorough, and clear test feedback every time code is checked-in to the code repository. It also describes a system that encourages development of code that is much more flexible, maintainable, and reliable. The Trick Simulation Toolkit development environment provides a common architecture for user-defined simulations. Trick builds executable simulations using user-supplied simulation-definition files (S_define) and user supplied "model code". For each Trick-based simulation, Trick automatically provides job scheduling, checkpoint / restore, data-recording, interactive variable manipulation (variable server), and an input-processor. Also included are tools for plotting recorded data and various other supporting tools and libraries. Trick is written in C/C++ and Java and supports both Linux and MacOSX. Prior to adopting this new development approach, Trick testing consisted primarily of running a few large simulations, with the hope that their complexity and scale would exercise most of Trick's code and expose any recently introduced bugs. Unsurprising, this approach yielded inconsistent results. It was obvious that a more systematic, thorough approach was required. After seeing examples of some Java-based projects that used the JUnit test framework, similar test frameworks for C and C++ were sought. Several were found, all clearly inspired by JUnit. Googletest, a freely available Open source testing framework, was selected as the most appropriate and capable. The new approach was implemented while rewriting the Trick memory management component, to eliminate a fundamental design flaw. The benefits became obvious almost immediately, not just in the correctness of the individual functions and classes but also in the correctness and flexibility being added to the overall design. Creating code to be testable, and testing as it was created resulted not only in better working code, but also in better-organized, flexible, and readable (i.e., articulate) code. This was, in essence the Test-driven development (TDD) methodology created by Kent Beck. Seeing the benefits of Test Driven Development, other Trick components were refactored to make them more testable and tests were designed and implemented for them.
DaMoScope and its internet graphics for the visual control of adjusting mathematical models describing experimental data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Belousov, V. I.; Ezhela, V. V.; Kuyanov, Yu. V., E-mail: Yu.Kuyanov@gmail.com

The experience of using the dynamic atlas of the experimental data and mathematical models of their description in the problems of adjusting parametric models of observable values depending on kinematic variables is presented. The functional possibilities of an image of a large number of experimental data and the models describing them are shown by examples of data and models of observable values determined by the amplitudes of elastic scattering of hadrons. The Internet implementation of an interactive tool DaMoScope and its interface with the experimental data and codes of adjusted parametric models with the parameters of the best description ofmore » data are schematically shown. The DaMoScope codes are freely available.« less
The hacker ethic

DOE Office of Scientific and Technical Information (OSTI.GOV)

Granger, S.

The hacker ethic can be a peculiar concept to those unfamiliar with hacking and what really is. In fact, the entire definition of hacking is somewhat obscure. Hacking originated as a challenge between programmers. Programmers at MIT are known for coining the term. Individuals would hack at code meaning that they would work at programming problems until they could maniuplate their computers into doing exactly what they wanted. The MIT hackers began with simple programs and moved on to fidding with UNIX machines, especially those on the Arpanet. Hackers started freely distributing their code to their friends and eventually tomore » their friends across the network. This gave rise to a notion that software should be free. Eventually this was taken to the extreme information and network access should also be free.« less
A benchmark study of scoring methods for non-coding mutations.

PubMed

Drubay, Damien; Gautheret, Daniel; Michiels, Stefan

2018-05-15

Detailed knowledge of coding sequences has led to different candidate models for pathogenic variant prioritization. Several deleteriousness scores have been proposed for the non-coding part of the genome, but no large-scale comparison has been realized to date to assess their performance. We compared the leading scoring tools (CADD, FATHMM-MKL, Funseq2 and GWAVA) and some recent competitors (DANN, SNP and SOM scores) for their ability to discriminate assumed pathogenic variants from assumed benign variants (using the ClinVar, COSMIC and 1000 genomes project databases). Using the ClinVar benchmark, CADD was the best tool for detecting the pathogenic variants that are mainly located in protein coding gene regions. Using the COSMIC benchmark, FATHMM-MKL, GWAVA and SOMliver outperformed the other tools for pathogenic variants that are typically located in lincRNAs, pseudogenes and other parts of the non-coding genome. However, all tools had low precision, which could potentially be improved by future non-coding genome feature discoveries. These results may have been influenced by the presence of potential benign variants in the COSMIC database. The development of a gold standard as consistent as ClinVar for these regions will be necessary to confirm our tool ranking. The Snakemake, C++ and R codes are freely available from https://github.com/Oncostat/BenchmarkNCVTools and supported on Linux. damien.drubay@gustaveroussy.fr or stefan.michiels@gustaveroussy.fr. Supplementary data are available at Bioinformatics online.
An Open Source Model for Open Access Journal Publication

PubMed Central

Blesius, Carl R.; Williams, Michael A.; Holzbach, Ana; Huntley, Arthur C.; Chueh, Henry

2005-01-01

We describe an electronic journal publication infrastructure that allows a flexible publication workflow, academic exchange around different forms of user submissions, and the exchange of articles between publishers and archives using a common XML based standard. This web-based application is implemented on a freely available open source software stack. This publication demonstrates the Dermatology Online Journal's use of the platform for non-biased independent open access publication. PMID:16779183
Amino acid fermentation at the origin of the genetic code.

PubMed

de Vladar, Harold P

2012-02-10

There is evidence that the genetic code was established prior to the existence of proteins, when metabolism was powered by ribozymes. Also, early proto-organisms had to rely on simple anaerobic bioenergetic processes. In this work I propose that amino acid fermentation powered metabolism in the RNA world, and that this was facilitated by proto-adapters, the precursors of the tRNAs. Amino acids were used as carbon sources rather than as catalytic or structural elements. In modern bacteria, amino acid fermentation is known as the Stickland reaction. This pathway involves two amino acids: the first undergoes oxidative deamination, and the second acts as an electron acceptor through reductive deamination. This redox reaction results in two keto acids that are employed to synthesise ATP via substrate-level phosphorylation. The Stickland reaction is the basic bioenergetic pathway of some bacteria of the genus Clostridium. Two other facts support Stickland fermentation in the RNA world. First, several Stickland amino acid pairs are synthesised in abiotic amino acid synthesis. This suggests that amino acids that could be used as an energy substrate were freely available. Second, anticodons that have complementary sequences often correspond to amino acids that form Stickland pairs. The main hypothesis of this paper is that pairs of complementary proto-adapters were assigned to Stickland amino acids pairs. There are signatures of this hypothesis in the genetic code. Furthermore, it is argued that the proto-adapters formed double strands that brought amino acid pairs into proximity to facilitate their mutual redox reaction, structurally constraining the anticodon pairs that are assigned to these amino acid pairs. Significance tests which randomise the code are performed to study the extent of the variability of the energetic (ATP) yield. Random assignments can lead to a substantial yield of ATP and maintain enough variability, thus selection can act and refine the assignments into a proto-code that optimises the energetic yield. Monte Carlo simulations are performed to evaluate the establishment of these simple proto-codes, based on amino acid substitutions and codon swapping. In all cases, donor amino acids are assigned to anticodons composed of U+G, and have low redundancy (1-2 codons), whereas acceptor amino acids are assigned to the the remaining codons. These bioenergetic and structural constraints allow for a metabolic role for amino acids before their co-option as catalyst cofactors.
Measuring Diagnoses: ICD Code Accuracy

PubMed Central

O'Malley, Kimberly J; Cook, Karon F; Price, Matt D; Wildes, Kimberly Raiford; Hurdle, John F; Ashton, Carol M

2005-01-01

Objective To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. Data Sources/Study Setting The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. Study Design/Methods We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Principle Findings Main error sources along the “patient trajectory” include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the “paper trail” include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. Conclusions By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways. PMID:16178999
ReGaTE: Registration of Galaxy Tools in Elixir

PubMed Central

Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé

2017-01-01

Abstract Background: Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. Findings: We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. Conclusions: ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE. PMID:28402416
Inducible Knockout of the Cyclin-Dependent Kinase 5 Activator p35 Alters Hippocampal Spatial Coding and Neuronal Excitability

PubMed Central

Kamiki, Eriko; Boehringer, Roman; Polygalov, Denis; Ohshima, Toshio; McHugh, Thomas J.

2018-01-01

p35 is an activating co-factor of Cyclin-dependent kinase 5 (Cdk5), a protein whose dysfunction has been implicated in a wide-range of neurological disorders including cognitive impairment and disease. Inducible deletion of the p35 gene in adult mice results in profound deficits in hippocampal-dependent spatial learning and synaptic physiology, however the impact of the loss of p35 function on hippocampal in vivo physiology and spatial coding remains unknown. Here, we recorded CA1 pyramidal cell activity in freely behaving p35 cKO and control mice and found that place cells in the mutant mice have elevated firing rates and impaired spatial coding, accompanied by changes in the temporal organization of spiking both during exploration and rest. These data shed light on the role of p35 in maintaining cellular and network excitability and provide a physiological correlate of the spatial learning deficits in these mice. PMID:29867369
Localizer: fast, accurate, open-source, and modular software package for superresolution microscopy

PubMed Central

Duwé, Sam; Neely, Robert K.; Zhang, Jin

2012-01-01

Abstract. We present Localizer, a freely available and open source software package that implements the computational data processing inherent to several types of superresolution fluorescence imaging, such as localization (PALM/STORM/GSDIM) and fluctuation imaging (SOFI/pcSOFI). Localizer delivers high accuracy and performance and comes with a fully featured and easy-to-use graphical user interface but is also designed to be integrated in higher-level analysis environments. Due to its modular design, Localizer can be readily extended with new algorithms as they become available, while maintaining the same interface and performance. We provide front-ends for running Localizer from Igor Pro, Matlab, or as a stand-alone program. We show that Localizer performs favorably when compared with two existing superresolution packages, and to our knowledge is the only freely available implementation of SOFI/pcSOFI microscopy. By dramatically improving the analysis performance and ensuring the easy addition of current and future enhancements, Localizer strongly improves the usability of superresolution imaging in a variety of biomedical studies. PMID:23208219
Operational rate-distortion performance for joint source and channel coding of images.

PubMed

Ruf, M J; Modestino, J W

1999-01-01

This paper describes a methodology for evaluating the operational rate-distortion behavior of combined source and channel coding schemes with particular application to images. In particular, we demonstrate use of the operational rate-distortion function to obtain the optimum tradeoff between source coding accuracy and channel error protection under the constraint of a fixed transmission bandwidth for the investigated transmission schemes. Furthermore, we develop information-theoretic bounds on performance for specific source and channel coding systems and demonstrate that our combined source-channel coding methodology applied to different schemes results in operational rate-distortion performance which closely approach these theoretical limits. We concentrate specifically on a wavelet-based subband source coding scheme and the use of binary rate-compatible punctured convolutional (RCPC) codes for transmission over the additive white Gaussian noise (AWGN) channel. Explicit results for real-world images demonstrate the efficacy of this approach.
Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective☆

PubMed Central

Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio

2014-01-01

Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. PMID:23467006
PLUS: open-source toolkit for ultrasound-guided intervention systems.

PubMed

Lasso, Andras; Heffter, Tamas; Rankin, Adam; Pinter, Csaba; Ungi, Tamas; Fichtinger, Gabor

2014-10-01

A variety of advanced image analysis methods have been under the development for ultrasound-guided interventions. Unfortunately, the transition from an image analysis algorithm to clinical feasibility trials as part of an intervention system requires integration of many components, such as imaging and tracking devices, data processing algorithms, and visualization software. The objective of our paper is to provide a freely available open-source software platform-PLUS: Public software Library for Ultrasound-to facilitate rapid prototyping of ultrasound-guided intervention systems for translational clinical research. PLUS provides a variety of methods for interventional tool pose and ultrasound image acquisition from a wide range of tracking and imaging devices, spatial and temporal calibration, volume reconstruction, simulated image generation, and recording and live streaming of the acquired data. This paper introduces PLUS, explains its functionality and architecture, and presents typical uses and performance in ultrasound-guided intervention systems. PLUS fulfills the essential requirements for the development of ultrasound-guided intervention systems and it aspires to become a widely used translational research prototyping platform. PLUS is freely available as open source software under BSD license and can be downloaded from http://www.plustoolkit.org.
Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective.

PubMed

Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio

2014-01-01

Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.

Data processing with Pymicra, the Python tool for Micrometeorological Analyses

NASA Astrophysics Data System (ADS)

Chor, T. L.; Dias, N. L.

2017-12-01

With the ever-increasing capability of instrumentation of collecting high-frequency turbulence data, micrometeorological experiments are now generating significant amounts of data. Clearly, data processing -- and not data collection anymore -- has become the limiting factor for those very large data sets. The ability of extracting useful scientific information from those experiments, therefore, hinges on tools that (i) are able to process those data effectively and accurately, (ii) are flexible enough to be adapted to the specific requirements of each investigation, and (iii) are robust enough to make data analysis easily reproducible over different sets of large data sets. We have developed a framework for micrometeorological data analysis called Pymicra which does deliver such capabilities while maintaining proximity of the investigator with the data. It is fully written in an open-source, very high level language, Python, which has been gaining widespread acceptance as a scientific tool. It follows the philosophy of "not reinventing the wheel" and, as a result, relies on existing well-established open-source Python packages such as Numpy and Pandas. Thus, minimum effort is needed to program statistics, array processing, Fourier analysis, etc. Among the things that Pymicra does are reading and organizing data from virtually any format, applying common quality control procedures, extracting fluctuations in a number of ways, correcting for sensor drift, automatic calculation of fluid properties (such as air and dry air density), handling of units, calculation of cross-spectra, calculation of turbulent fluxes and scales, and all other features already provided by Pandas (interpolation, statistical tests, handling of missing data, etc.). Pymicra is freely available on Github and the fact that it makes heavy use of high-level programming makes adding and modifying code considerably easy for any scientific programmer, making it straightforward for other scientists to contribute with new functionality and point out room for improvements. Because of that, Pymicra is a candidate to be a community-developed code in the future and to centralize part of the data processing aimed at micrometeorology.
Technical Note: FreeCT_wFBP: A robust, efficient, open-source implementation of weighted filtered backprojection for helical, fan-beam CT.

PubMed

Hoffman, John; Young, Stefano; Noo, Frédéric; McNitt-Gray, Michael

2016-03-01

With growing interest in quantitative imaging, radiomics, and CAD using CT imaging, the need to explore the impacts of acquisition and reconstruction parameters has grown. This usually requires extensive access to the scanner on which the data were acquired and its workflow is not designed for large-scale reconstruction projects. Therefore, the authors have developed a freely available, open-source software package implementing a common reconstruction method, weighted filtered backprojection (wFBP), for helical fan-beam CT applications. FreeCT_wFBP is a low-dependency, GPU-based reconstruction program utilizing c for the host code and Nvidia CUDA C for GPU code. The software is capable of reconstructing helical scans acquired with arbitrary pitch-values, and sampling techniques such as flying focal spots and a quarter-detector offset. In this work, the software has been described and evaluated for reconstruction speed, image quality, and accuracy. Speed was evaluated based on acquisitions of the ACR CT accreditation phantom under four different flying focal spot configurations. Image quality was assessed using the same phantom by evaluating CT number accuracy, uniformity, and contrast to noise ratio (CNR). Finally, reconstructed mass-attenuation coefficient accuracy was evaluated using a simulated scan of a FORBILD thorax phantom and comparing reconstructed values to the known phantom values. The average reconstruction time evaluated under all flying focal spot configurations was found to be 17.4 ± 1.0 s for a 512 row × 512 column × 32 slice volume. Reconstructions of the ACR phantom were found to meet all CT Accreditation Program criteria including CT number, CNR, and uniformity tests. Finally, reconstructed mass-attenuation coefficient values of water within the FORBILD thorax phantom agreed with original phantom values to within 0.0001 mm(2)/g (0.01%). FreeCT_wFBP is a fast, highly configurable reconstruction package for third-generation CT available under the GNU GPL. It shows good performance with both clinical and simulated data.
Continuation of research into language concepts for the mission support environment: Source code

NASA Technical Reports Server (NTRS)

Barton, Timothy J.; Ratner, Jeremiah M.

1991-01-01

Research into language concepts for the Mission Control Center is presented. A computer code for source codes is presented. The file contains the routines which allow source code files to be created and compiled. The build process assumes that all elements and the COMP exist in the current directory. The build process places as much code generation as possible on the preprocessor as possible. A summary is given of the source files as used and/or manipulated by the build routine.
Measuring diagnoses: ICD code accuracy.

PubMed

O'Malley, Kimberly J; Cook, Karon F; Price, Matt D; Wildes, Kimberly Raiford; Hurdle, John F; Ashton, Carol M

2005-10-01

To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Main error sources along the "patient trajectory" include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the "paper trail" include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways.
Accessible Collaborative Learning Using Mobile Devices

ERIC Educational Resources Information Center

Wald, Mike; Li, Yunjia; Draffan, E. A.

2014-01-01

This paper describes accessible collaborative learning using mobile devices with mobile enhancements to Synote, the freely available, award winning, open source, web based application that makes web hosted recordings easier to access, search, manage, and exploit for all learners, teachers and other users. Notes taken live during lectures using…
MetaRanker 2.0: a web server for prioritization of genetic variation data

PubMed Central

Pers, Tune H.; Dworzyński, Piotr; Thomas, Cecilia Engel; Lage, Kasper; Brunak, Søren

2013-01-01

MetaRanker 2.0 is a web server for prioritization of common and rare frequency genetic variation data. Based on heterogeneous data sets including genetic association data, protein–protein interactions, large-scale text-mining data, copy number variation data and gene expression experiments, MetaRanker 2.0 prioritizes the protein-coding part of the human genome to shortlist candidate genes for targeted follow-up studies. MetaRanker 2.0 is made freely available at www.cbs.dtu.dk/services/MetaRanker-2.0. PMID:23703204
MetaRanker 2.0: a web server for prioritization of genetic variation data.

PubMed

Pers, Tune H; Dworzyński, Piotr; Thomas, Cecilia Engel; Lage, Kasper; Brunak, Søren

2013-07-01

MetaRanker 2.0 is a web server for prioritization of common and rare frequency genetic variation data. Based on heterogeneous data sets including genetic association data, protein-protein interactions, large-scale text-mining data, copy number variation data and gene expression experiments, MetaRanker 2.0 prioritizes the protein-coding part of the human genome to shortlist candidate genes for targeted follow-up studies. MetaRanker 2.0 is made freely available at www.cbs.dtu.dk/services/MetaRanker-2.0.
Schroedinger’s Code: A Preliminary Study on Research Source Code Availability and Link Persistence in Astrophysics

NASA Astrophysics Data System (ADS)

Allen, Alice; Teuben, Peter J.; Ryan, P. Wesley

2018-05-01

We examined software usage in a sample set of astrophysics research articles published in 2015 and searched for the source codes for the software mentioned in these research papers. We categorized the software to indicate whether the source code is available for download and whether there are restrictions to accessing it, and if the source code is not available, whether some other form of the software, such as a binary, is. We also extracted hyperlinks from one journal’s 2015 research articles, as links in articles can serve as an acknowledgment of software use and lead to the data used in the research, and tested them to determine which of these URLs are still accessible. For our sample of 715 software instances in the 166 articles we examined, we were able to categorize 418 records as according to whether source code was available and found that 285 unique codes were used, 58% of which offered the source code for download. Of the 2558 hyperlinks extracted from 1669 research articles, at best, 90% of them were available over our testing period.
RiboMaker: computational design of conformation-based riboregulation.

PubMed

Rodrigo, Guillermo; Jaramillo, Alfonso

2014-09-01

The ability to engineer control systems of gene expression is instrumental for synthetic biology. Thus, bioinformatic methods that assist such engineering are appealing because they can guide the sequence design and prevent costly experimental screening. In particular, RNA is an ideal substrate to de novo design regulators of protein expression by following sequence-to-function models. We have implemented a novel algorithm, RiboMaker, aimed at the computational, automated design of bacterial riboregulation. RiboMaker reads the sequence and structure specifications, which codify for a gene regulatory behaviour, and optimizes the sequences of a small regulatory RNA and a 5'-untranslated region for an efficient intermolecular interaction. To this end, it implements an evolutionary design strategy, where random mutations are selected according to a physicochemical model based on free energies. The resulting sequences can then be tested experimentally, providing a new tool for synthetic biology, and also for investigating the riboregulation principles in natural systems. Web server is available at http://ribomaker.jaramillolab.org/. Source code, instructions and examples are freely available for download at http://sourceforge.net/projects/ribomaker/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Data collection and storage in long-term ecological and evolutionary studies: The Mongoose 2000 system

PubMed Central

Griffiths, David J.; Mwanguhya, Francis; Businge, Robert; Griffiths, Amber G. F.; Kyabulima, Solomon; Mwesige, Kenneth; Sanderson, Jennifer L.; Thompson, Faye J.; Vitikainen, Emma I. K.; Cant, Michael A.

2018-01-01

Studying ecological and evolutionary processes in the natural world often requires research projects to follow multiple individuals in the wild over many years. These projects have provided significant advances but may also be hampered by needing to accurately and efficiently collect and store multiple streams of the data from multiple individuals concurrently. The increase in the availability and sophistication of portable computers (smartphones and tablets) and the applications that run on them has the potential to address many of these data collection and storage issues. In this paper we describe the challenges faced by one such long-term, individual-based research project: the Banded Mongoose Research Project in Uganda. We describe a system we have developed called Mongoose 2000 that utilises the potential of apps and portable computers to meet these challenges. We discuss the benefits and limitations of employing such a system in a long-term research project. The app and source code for the Mongoose 2000 system are freely available and we detail how it might be used to aid data collection and storage in other long-term individual-based projects. PMID:29315317
Identifying biologically relevant differences between metagenomic communities.

PubMed

Parks, Donovan H; Beiko, Robert G

2010-03-15

Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.
LiGRO: a graphical user interface for protein-ligand molecular dynamics.

PubMed

Kagami, Luciano Porto; das Neves, Gustavo Machado; da Silva, Alan Wilter Sousa; Caceres, Rafael Andrade; Kawano, Daniel Fábio; Eifler-Lima, Vera Lucia

2017-10-04

To speed up the drug-discovery process, molecular dynamics (MD) calculations performed in GROMACS can be coupled to docking simulations for the post-screening analyses of large compound libraries. This requires generating the topology of the ligands in different software, some basic knowledge of Linux command lines, and a certain familiarity in handling the output files. LiGRO-the python-based graphical interface introduced here-was designed to overcome these protein-ligand parameterization challenges by allowing the graphical (non command line-based) control of GROMACS (MD and analysis), ACPYPE (ligand topology builder) and PLIP (protein-binder interactions monitor)-programs that can be used together to fully perform and analyze the outputs of complex MD simulations (including energy minimization and NVT/NPT equilibration). By allowing the calculation of linear interaction energies in a simple and quick fashion, LiGRO can be used in the drug-discovery pipeline to select compounds with a better protein-binding interaction profile. The design of LiGRO allows researchers to freely download and modify the software, with the source code being available under the terms of a GPLv3 license from http://www.ufrgs.br/lasomfarmacia/ligro/ .
SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data

PubMed Central

Green, Kevin T.; Dutilh, Bas E.; Edwards, Robert A.

2016-01-01

Summary: Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools. Availability and implementation: SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS. Contact: redwards@mail.sdsu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26454280
Mycofier: a new machine learning-based classifier for fungal ITS sequences.

PubMed

Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel

2016-08-11

The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .
Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.

PubMed

Robinson, Oscar; Dylus, David; Dessimoz, Christophe

2016-08-01

Phylogenetic trees are pervasively used to depict evolutionary relationships. Increasingly, researchers need to visualize large trees and compare multiple large trees inferred for the same set of taxa (reflecting uncertainty in the tree inference or genuine discordance among the loci analyzed). Existing tree visualization tools are however not well suited to these tasks. In particular, side-by-side comparison of trees can prove challenging beyond a few dozen taxa. Here, we introduce Phylo.io, a web application to visualize and compare phylogenetic trees side-by-side. Its distinctive features are: highlighting of similarities and differences between two trees, automatic identification of the best matching rooting and leaf order, scalability to large trees, high usability, multiplatform support via standard HTML5 implementation, and possibility to store and share visualizations. The tool can be freely accessed at http://phylo.io and can easily be embedded in other web servers. The code for the associated JavaScript library is available at https://github.com/DessimozLab/phylo-io under an MIT open source license. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
SPRINT: ultrafast protein-protein interaction prediction of the entire human interactome.

PubMed

Li, Yiwei; Ilie, Lucian

2017-11-15

Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. SPRINT is the only sequence-based program that can effectively predict the entire human interactome: it requires between 15 and 100 min, depending on the dataset. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. The source code of SPRINT is freely available from https://github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/ .
MIAQuant, a novel system for automatic segmentation, measurement, and localization comparison of different biomarkers from serialized histological slices.

PubMed

Casiraghi, Elena; Cossa, Mara; Huber, Veronica; Rivoltini, Licia; Tozzi, Matteo; Villa, Antonello; Vergani, Barbara

2017-11-02

In the clinical practice, automatic image analysis methods quickly quantizing histological results by objective and replicable methods are getting more and more necessary and widespread. Despite several commercial software products are available for this task, they are very little flexible, and provided as black boxes without modifiable source code. To overcome the aforementioned problems, we employed the commonly used MATLAB platform to develop an automatic method, MIAQuant, for the analysis of histochemical and immunohistochemical images, stained with various methods and acquired by different tools. It automatically extracts and quantifies markers characterized by various colors and shapes; furthermore, it aligns contiguous tissue slices stained by different markers and overlaps them with differing colors for visual comparison of their localization. Application of MIAQuant for clinical research fields, such as oncology and cardiovascular disease studies, has proven its efficacy, robustness and flexibility with respect to various problems; we highlight that, the flexibility of MIAQuant makes it an important tool to be exploited for basic researches where needs are constantly changing. MIAQuant software and its user manual are freely available for clinical studies, pathological research, and diagnosis.
designGG: an R-package and web tool for the optimal design of genetical genomics experiments.

PubMed

Li, Yang; Swertz, Morris A; Vera, Gonzalo; Fu, Jingyuan; Breitling, Rainer; Jansen, Ritsert C

2009-06-18

High-dimensional biomolecular profiling of genetically different individuals in one or more environmental conditions is an increasingly popular strategy for exploring the functioning of complex biological systems. The optimal design of such genetical genomics experiments in a cost-efficient and effective way is not trivial. This paper presents designGG, an R package for designing optimal genetical genomics experiments. A web implementation for designGG is available at http://gbic.biol.rug.nl/designGG. All software, including source code and documentation, is freely available. DesignGG allows users to intelligently select and allocate individuals to experimental units and conditions such as drug treatment. The user can maximize the power and resolution of detecting genetic, environmental and interaction effects in a genome-wide or local mode by giving more weight to genome regions of special interest, such as previously detected phenotypic quantitative trait loci. This will help to achieve high power and more accurate estimates of the effects of interesting factors, and thus yield a more reliable biological interpretation of data. DesignGG is applicable to linkage analysis of experimental crosses, e.g. recombinant inbred lines, as well as to association analysis of natural populations.
Dynamic assessment of microbial ecology (DAME): a web app for interactive analysis and visualization of microbial sequencing data.

PubMed

Piccolo, Brian D; Wankhade, Umesh D; Chintapalli, Sree V; Bhattacharyya, Sudeepa; Chunqiao, Luo; Shankar, Kartik

2018-03-15

Dynamic assessment of microbial ecology (DAME) is a Shiny-based web application for interactive analysis and visualization of microbial sequencing data. DAME provides researchers not familiar with R programming the ability to access the most current R functions utilized for ecology and gene sequencing data analyses. Currently, DAME supports group comparisons of several ecological estimates of α-diversity and β-diversity, along with differential abundance analysis of individual taxa. Using the Shiny framework, the user has complete control of all aspects of the data analysis, including sample/experimental group selection and filtering, estimate selection, statistical methods and visualization parameters. Furthermore, graphical and tabular outputs are supported by R packages using D3.js and are fully interactive. DAME was implemented in R but can be modified by Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript. It is freely available on the web at https://acnc-shinyapps.shinyapps.io/DAME/. Local installation and source code are available through Github (https://github.com/bdpiccolo/ACNC-DAME). Any system with R can launch DAME locally provided the shiny package is installed. bdpiccolo@uams.edu.
CsSNP: A Web-Based Tool for the Detecting of Comparative Segments SNPs.

PubMed

Wang, Yi; Wang, Shuangshuang; Zhou, Dongjie; Yang, Shuai; Xu, Yongchao; Yang, Chao; Yang, Long

2016-07-01

SNP (single nucleotide polymorphism) is a popular tool for the study of genetic diversity, evolution, and other areas. Therefore, it is necessary to develop a convenient, utility, robust, rapid, and open source detecting-SNP tool for all researchers. Since the detection of SNPs needs special software and series steps including alignment, detection, analysis and present, the study of SNPs is limited for nonprofessional users. CsSNP (Comparative segments SNP, http://biodb.sdau.edu.cn/cssnp/ ) is a freely available web tool based on the Blat, Blast, and Perl programs to detect comparative segments SNPs and to show the detail information of SNPs. The results are filtered and presented in the statistics figure and a Gbrowse map. This platform contains the reference genomic sequences and coding sequences of 60 plant species, and also provides new opportunities for the users to detect SNPs easily. CsSNP is provided a convenient tool for nonprofessional users to find comparative segments SNPs in their own sequences, and give the users the information and the analysis of SNPs, and display these data in a dynamic map. It provides a new method to detect SNPs and may accelerate related studies.

Shape component analysis: structure-preserving dimension reduction on biological shape spaces.

PubMed

Lee, Hao-Chih; Liao, Tao; Zhang, Yongjie Jessica; Yang, Ge

2016-03-01

Quantitative shape analysis is required by a wide range of biological studies across diverse scales, ranging from molecules to cells and organisms. In particular, high-throughput and systems-level studies of biological structures and functions have started to produce large volumes of complex high-dimensional shape data. Analysis and understanding of high-dimensional biological shape data require dimension-reduction techniques. We have developed a technique for non-linear dimension reduction of 2D and 3D biological shape representations on their Riemannian spaces. A key feature of this technique is that it preserves distances between different shapes in an embedded low-dimensional shape space. We demonstrate an application of this technique by combining it with non-linear mean-shift clustering on the Riemannian spaces for unsupervised clustering of shapes of cellular organelles and proteins. Source code and data for reproducing results of this article are freely available at https://github.com/ccdlcmu/shape_component_analysis_Matlab The implementation was made in MATLAB and supported on MS Windows, Linux and Mac OS. geyang@andrew.cmu.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
BISQUE: locus- and variant-specific conversion of genomic, transcriptomic and proteomic database identifiers.

PubMed

Meyer, Michael J; Geske, Philip; Yu, Haiyuan

2016-05-15

Biological sequence databases are integral to efforts to characterize and understand biological molecules and share biological data. However, when analyzing these data, scientists are often left holding disparate biological currency-molecular identifiers from different databases. For downstream applications that require converting the identifiers themselves, there are many resources available, but analyzing associated loci and variants can be cumbersome if data is not given in a form amenable to particular analyses. Here we present BISQUE, a web server and customizable command-line tool for converting molecular identifiers and their contained loci and variants between different database conventions. BISQUE uses a graph traversal algorithm to generalize the conversion process for residues in the human genome, genes, transcripts and proteins, allowing for conversion across classes of molecules and in all directions through an intuitive web interface and a URL-based web service. BISQUE is freely available via the web using any major web browser (http://bisque.yulab.org/). Source code is available in a public GitHub repository (https://github.com/hyulab/BISQUE). haiyuan.yu@cornell.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Basin Hopping Graph: a computational framework to characterize RNA folding landscapes

PubMed Central

Kucharík, Marcel; Hofacker, Ivo L.; Stadler, Peter F.; Qin, Jing

2014-01-01

Motivation: RNA folding is a complicated kinetic process. The minimum free energy structure provides only a static view of the most stable conformational state of the system. It is insufficient to give detailed insights into the dynamic behavior of RNAs. A sufficiently sophisticated analysis of the folding free energy landscape, however, can provide the relevant information. Results: We introduce the Basin Hopping Graph (BHG) as a novel coarse-grained model of folding landscapes. Each vertex of the BHG is a local minimum, which represents the corresponding basin in the landscape. Its edges connect basins when the direct transitions between them are ‘energetically favorable’. Edge weights endcode the corresponding saddle heights and thus measure the difficulties of these favorable transitions. BHGs can be approximated accurately and efficiently for RNA molecules well beyond the length range accessible to enumerative algorithms. Availability and implementation: The algorithms described here are implemented in C++ as standalone programs. Its source code and supplemental material can be freely downloaded from http://www.tbi.univie.ac.at/bhg.html. Contact: qin@bioinf.uni-leipzig.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24648041
Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity

PubMed Central

Lee, Hui Sun; Im, Wonpil

2013-01-01

Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286
MAPI: a software framework for distributed biomedical applications

PubMed Central

2013-01-01

Background The amount of web-based resources (databases, tools etc.) in biomedicine has increased, but the integrated usage of those resources is complex due to differences in access protocols and data formats. However, distributed data processing is becoming inevitable in several domains, in particular in biomedicine, where researchers face rapidly increasing data sizes. This big data is difficult to process locally because of the large processing, memory and storage capacity required. Results This manuscript describes a framework, called MAPI, which provides a uniform representation of resources available over the Internet, in particular for Web Services. The framework enhances their interoperability and collaborative use by enabling a uniform and remote access. The framework functionality is organized in modules that can be combined and configured in different ways to fulfil concrete development requirements. Conclusions The framework has been tested in the biomedical application domain where it has been a base for developing several clients that are able to integrate different web resources. The MAPI binaries and documentation are freely available at http://www.bitlab-es.com/mapi under the Creative Commons Attribution-No Derivative Works 2.5 Spain License. The MAPI source code is available by request (GPL v3 license). PMID:23311574
ChIPWig: a random access-enabling lossless and lossy compression method for ChIP-seq data.

PubMed

Ravanmehr, Vida; Kim, Minji; Wang, Zhiying; Milenkovic, Olgica

2018-03-15

Chromatin immunoprecipitation sequencing (ChIP-seq) experiments are inexpensive and time-efficient, and result in massive datasets that introduce significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. ChIPWig enables random access, summary statistics lookups and it is based on the asymptotic theory of optimal point density design for nonuniform quantizers. We tested the ChIPWig compressor on 10 ChIP-seq datasets generated by the ENCODE consortium. On average, lossless ChIPWig reduced the file sizes to merely 6% of the original, and offered 6-fold compression rate improvement compared to bigWig. The lossy feature further reduced file sizes 2-fold compared to the lossless mode, with little or no effects on peak calling and motif discovery using specialized NarrowPeaks methods. The compression and decompression speed rates are of the order of 0.2 sec/MB using general purpose computers. The source code and binaries are freely available for download at https://github.com/vidarmehr/ChIPWig-v2, implemented in C ++. milenkov@illinois.edu. Supplementary data are available at Bioinformatics online.
nala: text mining natural language mutation mentions

PubMed Central

Cejuela, Juan Miguel; Bojchevski, Aleksandar; Uhlig, Carsten; Bekmukhametov, Rustem; Kumar Karn, Sanjeev; Mahmuti, Shpend; Baghudana, Ashish; Dubey, Ankit; Satagopam, Venkata P.; Rost, Burkhard

2017-01-01

Abstract Motivation: The extraction of sequence variants from the literature remains an important task. Existing methods primarily target standard (ST) mutation mentions (e.g. ‘E6V’), leaving relevant mentions natural language (NL) largely untapped (e.g. ‘glutamic acid was substituted by valine at residue 6’). Results: We introduced three new corpora suggesting named-entity recognition (NER) to be more challenging than anticipated: 28–77% of all articles contained mentions only available in NL. Our new method nala captured NL and ST by combining conditional random fields with word embedding features learned unsupervised from the entire PubMed. In our hands, nala substantially outperformed the state-of-the-art. For instance, we compared all unique mentions in new discoveries correctly detected by any of three methods (SETH, tmVar, or nala). Neither SETH nor tmVar discovered anything missed by nala, while nala uniquely tagged 33% mentions. For NL mentions the corresponding value shot up to 100% nala-only. Availability and Implementation: Source code, API and corpora freely available at: http://tagtog.net/-corpora/IDP4+. Contact: nala@rostlab.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28200120
SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data.

PubMed

Silva, Genivaldo Gueiros Z; Green, Kevin T; Dutilh, Bas E; Edwards, Robert A

2016-02-01

Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools. SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS. redwards@mail.sdsu.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
AlloPred: prediction of allosteric pockets on proteins using normal mode perturbation analysis.

PubMed

Greener, Joe G; Sternberg, Michael J E

2015-10-23

Despite being hugely important in biological processes, allostery is poorly understood and no universal mechanism has been discovered. Allosteric drugs are a largely unexplored prospect with many potential advantages over orthosteric drugs. Computational methods to predict allosteric sites on proteins are needed to aid the discovery of allosteric drugs, as well as to advance our fundamental understanding of allostery. AlloPred, a novel method to predict allosteric pockets on proteins, was developed. AlloPred uses perturbation of normal modes alongside pocket descriptors in a machine learning approach that ranks the pockets on a protein. AlloPred ranked an allosteric pocket top for 23 out of 40 known allosteric proteins, showing comparable and complementary performance to two existing methods. In 28 of 40 cases an allosteric pocket was ranked first or second. The AlloPred web server, freely available at http://www.sbg.bio.ic.ac.uk/allopred/home, allows visualisation and analysis of predictions. The source code and dataset information are also available from this site. Perturbation of normal modes can enhance our ability to predict allosteric sites on proteins. Computational methods such as AlloPred assist drug discovery efforts by suggesting sites on proteins for further experimental study.
Automated Concurrent Blackboard System Generation in C++

NASA Technical Reports Server (NTRS)

Kaplan, J. A.; McManus, J. W.; Bynum, W. L.

1999-01-01

In his 1992 Ph.D. thesis, "Design and Analysis Techniques for Concurrent Blackboard Systems", John McManus defined several performance metrics for concurrent blackboard systems and developed a suite of tools for creating and analyzing such systems. These tools allow a user to analyze a concurrent blackboard system design and predict the performance of the system before any code is written. The design can be modified until simulated performance is satisfactory. Then, the code generator can be invoked to generate automatically all of the code required for the concurrent blackboard system except for the code implementing the functionality of each knowledge source. We have completed the port of the source code generator and a simulator for a concurrent blackboard system. The source code generator generates the necessary C++ source code to implement the concurrent blackboard system using Parallel Virtual Machine (PVM) running on a heterogeneous network of UNIX(trademark) workstations. The concurrent blackboard simulator uses the blackboard specification file to predict the performance of the concurrent blackboard design. The only part of the source code for the concurrent blackboard system that the user must supply is the code implementing the functionality of the knowledge sources.
Molecular dynamics simulations of metallic friction and of its dependence on electric currents - development and first results

NASA Astrophysics Data System (ADS)

Meintanis, Evangelos Anastasios

We have extended the HOLA molecular dynamics (MD) code to run slider-on-block friction experiments for Al and Cu. Both objects are allowed to evolve freely and show marked deformation despite the hardness difference. We recover realistic coefficients of friction and verify the importance of cold-welding and plastic deformations in dry sliding friction. Our first data also show a mechanism for decoupling between load and friction at high velocities. Such a mechanism can explain an increase in the coefficient of friction of metals with velocity. The study of the effects of currents on our system required the development of a suitable electrodynamic (ED) solver, as the disparity of MD and ED time scales threatened the efficiency of our code. Our first simulations combining ED and MD are presented.
The Virtual Genetics Lab: A Freely-Available Open-Source Genetics Simulation

ERIC Educational Resources Information Center

White, Brian; Bolker, Ethan; Koolar, Nikunj; Ma, Wei; Maw, Naing Naing; Yu, Chung Ying

2007-01-01

This lab is a computer simulation of transmission genetics. It presents students with a genetic phenomenon--the inheritance of a randomly--selected trait. The students' task is to determine how this trait is inherited by designing their own crosses and analyzing the results produced by the software.
Decision-Theoretic Methods in Simulation Optimization

DTIC Science & Technology

2014-09-24

freely available, and is also being considered for use by Netflix . • [Scott et al., 2011] provides an easily computed approximation to the knowledge...Area, such as Netflix . It has received a great deal of attention from the com- munity, and is already quite popular on the open-source software
Stealing the Goose: Copyright and Learning

ERIC Educational Resources Information Center

McGreal, Rory

2004-01-01

The Internet is the world's largest knowledge common and the information source of first resort. Much of this information is open and freely available. However, there are organizations and companies today that are trying to close off the Internet commons and make it proprietary. These are the "copyright controllers." The preservation of the…
Public Data Tools for Project Managers: Helpful Websites for this Region

EPA Science Inventory

A side variety of tools are publically and freely available from the Internet. The presentation describes sources of tools from EPA, USGS, and the private sector. A demo was given over the on-line calculators on EPA's web site at http://www.epa.gov/athens/onsite.
You've Written a Cool Astronomy Code! Now What Do You Do with It?

NASA Astrophysics Data System (ADS)

Allen, Alice; Accomazzi, A.; Berriman, G. B.; DuPrie, K.; Hanisch, R. J.; Mink, J. D.; Nemiroff, R. J.; Shamir, L.; Shortridge, K.; Taylor, M. B.; Teuben, P. J.; Wallin, J. F.

2014-01-01

Now that you've written a useful astronomy code for your soon-to-be-published research, you have to figure out what you want to do with it. Our suggestion? Share it! This presentation highlights the means and benefits of sharing your code. Make your code citable -- submit it to the Astrophysics Source Code Library and have it indexed by ADS! The Astrophysics Source Code Library (ASCL) is a free online registry of source codes of interest to astronomers and astrophysicists. With over 700 codes, it is continuing its rapid growth, with an average of 17 new codes a month. The editors seek out codes for inclusion; indexing by ADS improves the discoverability of codes and provides a way to cite codes as separate entries, especially codes without papers that describe them.
MTpy - Python Tools for Magnetotelluric Data Processing and Analysis

NASA Astrophysics Data System (ADS)

Krieger, Lars; Peacock, Jared; Thiel, Stephan; Inverarity, Kent; Kirkby, Alison; Robertson, Kate; Soeffky, Paul; Didana, Yohannes

2014-05-01

We present the Python package MTpy, which provides functions for the processing, analysis, and handling of magnetotelluric (MT) data sets. MT is a relatively immature and not widely applied geophysical method in comparison to other geophysical techniques such as seismology. As a result, the data processing within the academic MT community is not thoroughly standardised and is often based on a loose collection of software, adapted to the respective local specifications. We have developed MTpy to overcome problems that arise from missing standards, and to provide a simplification of the general handling of MT data. MTpy is written in Python, and the open-source code is freely available from a GitHub repository. The setup follows the modular approach of successful geoscience software packages such as GMT or Obspy. It contains sub-packages and modules for the various tasks within the standard work-flow of MT data processing and interpretation. In order to allow the inclusion of already existing and well established software, MTpy does not only provide pure Python classes and functions, but also wrapping command-line scripts to run standalone tools, e.g. modelling and inversion codes. Our aim is to provide a flexible framework, which is open for future dynamic extensions. MTpy has the potential to promote the standardisation of processing procedures and at same time be a versatile supplement for existing algorithms. Here, we introduce the concept and structure of MTpy, and we illustrate the workflow of MT data processing, interpretation, and visualisation utilising MTpy on example data sets collected over different regions of Australia and the USA.
SIBIS: a Bayesian model for inconsistent protein sequence estimation.

PubMed

Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

2014-09-01

The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

PubMed

Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke

2012-01-01

Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.
osFP: a web server for predicting the oligomeric states of fluorescent proteins.

PubMed

Simeon, Saw; Shoombuatong, Watshara; Anuwongcharoen, Nuttapat; Preeyanon, Likit; Prachayasittikul, Virapong; Wikberg, Jarl E S; Nantasenamat, Chanin

2016-01-01

Currently, monomeric fluorescent proteins (FP) are ideal markers for protein tagging. The prediction of oligomeric states is helpful for enhancing live biomedical imaging. Computational prediction of FP oligomeric states can accelerate the effort of protein engineering efforts of creating monomeric FPs. To the best of our knowledge, this study represents the first computational model for predicting and analyzing FP oligomerization directly from the amino acid sequence. After data curation, an exhaustive data set consisting of 397 non-redundant FP oligomeric states was compiled from the literature. Results from benchmarking of the protein descriptors revealed that the model built with amino acid composition descriptors was the top performing model with accuracy, sensitivity and specificity in excess of 80% and MCC greater than 0.6 for all three data subsets (e.g. training, tenfold cross-validation and external sets). The model provided insights on the important residues governing the oligomerization of FP. To maximize the benefit of the generated predictive model, it was implemented as a web server under the R programming environment. osFP affords a user-friendly interface that can be used to predict the oligomeric state of FP using the protein sequence. The advantage of osFP is that it is platform-independent meaning that it can be accessed via a web browser on any operating system and device. osFP is freely accessible at http://codes.bio/osfp/ while the source code and data set is provided on GitHub at https://github.com/chaninn/osFP/.Graphical Abstract.

Proposal and evaluation of FASDIM, a Fast And Simple De-Identification Method for unstructured free-text clinical records.

PubMed

Chazard, Emmanuel; Mouret, Capucine; Ficheur, Grégoire; Schaffar, Aurélien; Beuscart, Jean-Baptiste; Beuscart, Régis

2014-04-01

Medical free-text records enable to get rich information about the patients, but often need to be de-identified by removing the Protected Health Information (PHI), each time the identification of the patient is not mandatory. Pattern matching techniques require pre-defined dictionaries, and machine learning techniques require an extensive training set. Methods exist in French, but either bring weak results or are not freely available. The objective is to define and evaluate FASDIM, a Fast And Simple De-Identification Method for French medical free-text records. FASDIM consists in removing all the words that are not present in the authorized word list, and in removing all the numbers except those that match a list of protection patterns. The corresponding lists are incremented in the course of the iterations of the method. For the evaluation, the workload is estimated in the course of records de-identification. The efficiency of the de-identification is assessed by independent medical experts on 508 discharge letters that are randomly selected and de-identified by FASDIM. Finally, the letters are encoded after and before de-identification according to 3 terminologies (ATC, ICD10, CCAM) and the codes are compared. The construction of the list of authorized words is progressive: 12h for the first 7000 letters, 16 additional hours for 20,000 additional letters. The Recall (proportion of removed Protected Health Information, PHI) is 98.1%, the Precision (proportion of PHI within the removed token) is 79.6% and the F-measure (harmonic mean) is 87.9%. In average 30.6 terminology codes are encoded per letter, and 99.02% of those codes are preserved despite the de-identification. FASDIM gets good results in French and is freely available. It is easy to implement and does not require any predefined dictionary. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement.

PubMed

Grundmeier, Robert W; Masino, Aaron J; Casper, T Charles; Dean, Jonathan M; Bell, Jamie; Enriquez, Rene; Deakyne, Sara; Chamberlain, James M; Alpern, Elizabeth R

2016-11-09

Important information to support healthcare quality improvement is often recorded in free text documents such as radiology reports. Natural language processing (NLP) methods may help extract this information, but these methods have rarely been applied outside the research laboratories where they were developed. To implement and validate NLP tools to identify long bone fractures for pediatric emergency medicine quality improvement. Using freely available statistical software packages, we implemented NLP methods to identify long bone fractures from radiology reports. A sample of 1,000 radiology reports was used to construct three candidate classification models. A test set of 500 reports was used to validate the model performance. Blinded manual review of radiology reports by two independent physicians provided the reference standard. Each radiology report was segmented and word stem and bigram features were constructed. Common English "stop words" and rare features were excluded. We used 10-fold cross-validation to select optimal configuration parameters for each model. Accuracy, recall, precision and the F1 score were calculated. The final model was compared to the use of diagnosis codes for the identification of patients with long bone fractures. There were 329 unique word stems and 344 bigrams in the training documents. A support vector machine classifier with Gaussian kernel performed best on the test set with accuracy=0.958, recall=0.969, precision=0.940, and F1 score=0.954. Optimal parameters for this model were cost=4 and gamma=0.005. The three classification models that we tested all performed better than diagnosis codes in terms of accuracy, precision, and F1 score (diagnosis code accuracy=0.932, recall=0.960, precision=0.896, and F1 score=0.927). NLP methods using a corpus of 1,000 training documents accurately identified acute long bone fractures from radiology reports. Strategic use of straightforward NLP methods, implemented with freely available software, offers quality improvement teams new opportunities to extract information from narrative documents.
On initial Brain Activity Mapping of episodic and semantic memory code in the hippocampus.

PubMed

Tsien, Joe Z; Li, Meng; Osan, Remus; Chen, Guifen; Lin, Longian; Wang, Phillip Lei; Frey, Sabine; Frey, Julietta; Zhu, Dajiang; Liu, Tianming; Zhao, Fang; Kuang, Hui

2013-10-01

It has been widely recognized that the understanding of the brain code would require large-scale recording and decoding of brain activity patterns. In 2007 with support from Georgia Research Alliance, we have launched the Brain Decoding Project Initiative with the basic idea which is now similarly advocated by BRAIN project or Brain Activity Map proposal. As the planning of the BRAIN project is currently underway, we share our insights and lessons from our efforts in mapping real-time episodic memory traces in the hippocampus of freely behaving mice. We show that appropriate large-scale statistical methods are essential to decipher and measure real-time memory traces and neural dynamics. We also provide an example of how the carefully designed, sometime thinking-outside-the-box, behavioral paradigms can be highly instrumental to the unraveling of memory-coding cell assembly organizing principle in the hippocampus. Our observations to date have led us to conclude that the specific-to-general categorical and combinatorial feature-coding cell assembly mechanism represents an emergent property for enabling the neural networks to generate and organize not only episodic memory, but also semantic knowledge and imagination. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
On Initial Brain Activity Mapping of Associative Memory Code in the Hippocampus

PubMed Central

Tsien, Joe Z.; Li, Meng; Osan, Remus; Chen, Guifen; Lin, Longian; Lei Wang, Phillip; Frey, Sabine; Frey, Julietta; Zhu, Dajiang; Liu, Tianming; Zhao, Fang; Kuang, Hui

2013-01-01

It has been widely recognized that the understanding of the brain code would require large-scale recording and decoding of brain activity patterns. In 2007 with support from Georgia Research Alliance, we have launched the Brain Decoding Project Initiative with the basic idea which is now similarly advocated by BRAIN project or Brain Activity Map proposal. As the planning of the BRAIN project is currently underway, we share our insights and lessons from our efforts in mapping real-time episodic memory traces in the hippocampus of freely behaving mice. We show that appropriate large-scale statistical methods are essential to decipher and measure real-time memory traces and neural dynamics. We also provide an example of how the carefully designed, sometime thinking-outside-the-box, behavioral paradigms can be highly instrumental to the unraveling of memory-coding cell assembly organizing principle in the hippocampus. Our observations to date have led us to conclude that the specific-to-general categorical and combinatorial feature-coding cell assembly mechanism represents an emergent property for enabling the neural networks to generate and organize not only episodic memory, but also semantic knowledge and imagination. PMID:23838072
Numerical implementation and oceanographic application of the thermodynamic potentials of water, vapour, ice, seawater and air - Part 2: The library routines

NASA Astrophysics Data System (ADS)

Wright, D. G.; Feistel, R.; Reissmann, J. H.; Miyagawa, K.; Jackett, D. R.; Wagner, W.; Overhoff, U.; Guder, C.; Feistel, A.; Marion, G. M.

2010-03-01

The SCOR/IAPSO1 Working Group 127 on Thermodynamics and Equation of State of Seawater has prepared recommendations for new methods and algorithms for numerical estimation of the thermophysical properties of seawater. As an outcome of this work, a new International Thermodynamic Equation of Seawater (TEOS-10) was endorsed by IOC/UNESCO2 in June 2009 as the official replacement and extension of the 1980 International Equation of State, EOS-80. As part of this new standard a source code package has been prepared that is now made freely available to users via the World Wide Web. This package includes two libraries referred to as the SIA (Sea-Ice-Air) library and the GSW (Gibbs SeaWater) library. Information on the GSW library may be found on the TEOS-10 web site (http://www.TEOS-10.org). This publication provides an introduction to the SIA library which contains routines to calculate various thermodynamic properties as discussed in the companion paper. The SIA library is very comprehensive, including routines to deal with fluid water, ice, seawater and humid air as well as equilibrium states involving various combinations of these, with equivalent code developed in different languages. The code is hierachically structured in modules that support (i) almost unlimited extension with respect to additional properties or relations, (ii) an extraction of self-contained sub-libraries, (iii) separate updating of the empirical thermodynamic potentials, and (iv) code verification on different platforms and between different languages. Error trapping is implemented to identify when one or more of the primary routines are accessed significantly beyond their established range of validity. The initial version of the SIA library is available in Visual Basic and FORTRAN as a supplement to this publication and updates will be maintained on the TEOS-10 web site. 1 SCOR/IAPSO: Scientific Committee on Oceanic Research/International Association for the Physical Sciences of the Oceans 2 IOC/UNESCO: Intergovernmental Oceanographic Commission/United Nations Educational, Scientific and Cultural Organization
Numerical implementation and oceanographic application of the thermodynamic potentials of liquid water, water vapour, ice, seawater and humid air - Part 2: The library routines

NASA Astrophysics Data System (ADS)

Wright, D. G.; Feistel, R.; Reissmann, J. H.; Miyagawa, K.; Jackett, D. R.; Wagner, W.; Overhoff, U.; Guder, C.; Feistel, A.; Marion, G. M.

2010-07-01

The SCOR/IAPSO1 Working Group 127 on Thermodynamics and Equation of State of Seawater has prepared recommendations for new methods and algorithms for numerical estimation of the the thermophysical properties of seawater. As an outcome of this work, a new International Thermodynamic Equation of Seawater (TEOS-10) was endorsed by IOC/UNESCO2 in June 2009 as the official replacement and extension of the 1980 International Equation of State, EOS-80. As part of this new standard a source code package has been prepared that is now made freely available to users via the World Wide Web. This package includes two libraries referred to as the SIA (Sea-Ice-Air) library and the GSW (Gibbs SeaWater) library. Information on the GSW library may be found on the TEOS-10 web site (http://www.TEOS-10.org). This publication provides an introduction to the SIA library which contains routines to calculate various thermodynamic properties as discussed in the companion paper. The SIA library is very comprehensive, including routines to deal with fluid water, ice, seawater and humid air as well as equilibrium states involving various combinations of these, with equivalent code developed in different languages. The code is hierachically structured in modules that support (i) almost unlimited extension with respect to additional properties or relations, (ii) an extraction of self-contained sub-libraries, (iii) separate updating of the empirical thermodynamic potentials, and (iv) code verification on different platforms and between different languages. Error trapping is implemented to identify when one or more of the primary routines are accessed significantly beyond their established range of validity. The initial version of the SIA library is available in Visual Basic and FORTRAN as a supplement to this publication and updates will be maintained on the TEOS-10 web site. 1SCOR/IAPSO: Scientific Committee on Oceanic Research/International Association for the Physical Sciences of the Oceans 2IOC/UNESCO: Intergovernmental Oceanographic Commission/United Nations Educational, Scientific and Cultural Organization
Comprehensive Identification of Long Non-coding RNAs in Purified Cell Types from the Brain Reveals Functional LncRNA in OPC Fate Determination.

PubMed

Dong, Xiaomin; Chen, Kenian; Cuevas-Diaz Duran, Raquel; You, Yanan; Sloan, Steven A; Zhang, Ye; Zong, Shan; Cao, Qilin; Barres, Ben A; Wu, Jia Qian

2015-12-01

Long non-coding RNAs (lncRNAs) (> 200 bp) play crucial roles in transcriptional regulation during numerous biological processes. However, it is challenging to comprehensively identify lncRNAs, because they are often expressed at low levels and with more cell-type specificity than are protein-coding genes. In the present study, we performed ab initio transcriptome reconstruction using eight purified cell populations from mouse cortex and detected more than 5000 lncRNAs. Predicting the functions of lncRNAs using cell-type specific data revealed their potential functional roles in Central Nervous System (CNS) development. We performed motif searches in ENCODE DNase I digital footprint data and Mouse ENCODE promoters to infer transcription factor (TF) occupancy. By integrating TF binding and cell-type specific transcriptomic data, we constructed a novel framework that is useful for systematically identifying lncRNAs that are potentially essential for brain cell fate determination. Based on this integrative analysis, we identified lncRNAs that are regulated during Oligodendrocyte Precursor Cell (OPC) differentiation from Neural Stem Cells (NSCs) and that are likely to be involved in oligodendrogenesis. The top candidate, lnc-OPC, shows highly specific expression in OPCs and remarkable sequence conservation among placental mammals. Interestingly, lnc-OPC is significantly up-regulated in glial progenitors from experimental autoimmune encephalomyelitis (EAE) mouse models compared to wild-type mice. OLIG2-binding sites in the upstream regulatory region of lnc-OPC were identified by ChIP (chromatin immunoprecipitation)-Sequencing and validated by luciferase assays. Loss-of-function experiments confirmed that lnc-OPC plays a functional role in OPC genesis. Overall, our results substantiated the role of lncRNA in OPC fate determination and provided an unprecedented data source for future functional investigations in CNS cell types. We present our datasets and analysis results via the interactive genome browser at our laboratory website that is freely accessible to the research community. This is the first lncRNA expression database of collective populations of glia, vascular cells, and neurons. We anticipate that these studies will advance the knowledge of this major class of non-coding genes and their potential roles in neurological development and diseases.
The random energy model in a magnetic field and joint source channel coding

NASA Astrophysics Data System (ADS)

Merhav, Neri

2008-09-01

We demonstrate that there is an intimate relationship between the magnetic properties of Derrida’s random energy model (REM) of spin glasses and the problem of joint source-channel coding in Information Theory. In particular, typical patterns of erroneously decoded messages in the coding problem have “magnetization” properties that are analogous to those of the REM in certain phases, where the non-uniformity of the distribution of the source in the coding problem plays the role of an external magnetic field applied to the REM. We also relate the ensemble performance (random coding exponents) of joint source-channel codes to the free energy of the REM in its different phases.
User Manual and Source Code for a LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E)

DTIC Science & Technology

2014-06-01

User Manual and Source Code for a LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E) by James P. Larentzos...Laboratory Aberdeen Proving Ground, MD 21005-5069 ARL-SR-290 June 2014 User Manual and Source Code for a LAMMPS Implementation of Constant...3. DATES COVERED (From - To) September 2013–February 2014 4. TITLE AND SUBTITLE User Manual and Source Code for a LAMMPS Implementation of
Reactome diagram viewer: data structures and strategies to boost performance.

PubMed

Fabregat, Antonio; Sidiropoulos, Konstantinos; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

2018-04-01

Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. For web-based pathway visualization, Reactome uses a custom pathway diagram viewer that has been evolved over the past years. Here, we present comprehensive enhancements in usability and performance based on extensive usability testing sessions and technology developments, aiming to optimize the viewer towards the needs of the community. The pathway diagram viewer version 3 achieves consistently better performance, loading and rendering of 97% of the diagrams in Reactome in less than 1 s. Combining the multi-layer html5 canvas strategy with a space partitioning data structure minimizes CPU workload, enabling the introduction of new features that further enhance user experience. Through the use of highly optimized data structures and algorithms, Reactome has boosted the performance and usability of the new pathway diagram viewer, providing a robust, scalable and easy-to-integrate solution to pathway visualization. As graph-based visualization of complex data is a frequent challenge in bioinformatics, many of the individual strategies presented here are applicable to a wide range of web-based bioinformatics resources. Reactome is available online at: https://reactome.org. The diagram viewer is part of the Reactome pathway browser (https://reactome.org/PathwayBrowser/) and also available as a stand-alone widget at: https://reactome.org/dev/diagram/. The source code is freely available at: https://github.com/reactome-pwp/diagram. fabregat@ebi.ac.uk or hhe@ebi.ac.uk. Supplementary data are available at Bioinformatics online.
NeuroSeg: automated cell detection and segmentation for in vivo two-photon Ca2+ imaging data.

PubMed

Guan, Jiangheng; Li, Jingcheng; Liang, Shanshan; Li, Ruijie; Li, Xingyi; Shi, Xiaozhe; Huang, Ciyu; Zhang, Jianxiong; Pan, Junxia; Jia, Hongbo; Zhang, Le; Chen, Xiaowei; Liao, Xiang

2018-01-01

Two-photon Ca 2+ imaging has become a popular approach for monitoring neuronal population activity with cellular or subcellular resolution in vivo. This approach allows for the recording of hundreds to thousands of neurons per animal and thus leads to a large amount of data to be processed. In particular, manually drawing regions of interest is the most time-consuming aspect of data analysis. However, the development of automated image analysis pipelines, which will be essential for dealing with the likely future deluge of imaging data, remains a major challenge. To address this issue, we developed NeuroSeg, an open-source MATLAB program that can facilitate the accurate and efficient segmentation of neurons in two-photon Ca 2+ imaging data. We proposed an approach using a generalized Laplacian of Gaussian filter to detect cells and weighting-based segmentation to separate individual cells from the background. We tested this approach on an in vivo two-photon Ca 2+ imaging dataset obtained from mouse cortical neurons with differently sized view fields. We show that this approach exhibits superior performance for cell detection and segmentation compared with the existing published tools. In addition, we integrated the previously reported, activity-based segmentation into our approach and found that this combined method was even more promising. The NeuroSeg software, including source code and graphical user interface, is freely available and will be a useful tool for in vivo brain activity mapping.
An algorithm to detect and communicate the differences in computational models describing biological systems.

PubMed

Scharm, Martin; Wolkenhauer, Olaf; Waltemath, Dagmar

2016-02-15

Repositories support the reuse of models and ensure transparency about results in publications linked to those models. With thousands of models available in repositories, such as the BioModels database or the Physiome Model Repository, a framework to track the differences between models and their versions is essential to compare and combine models. Difference detection not only allows users to study the history of models but also helps in the detection of errors and inconsistencies. Existing repositories lack algorithms to track a model's development over time. Focusing on SBML and CellML, we present an algorithm to accurately detect and describe differences between coexisting versions of a model with respect to (i) the models' encoding, (ii) the structure of biological networks and (iii) mathematical expressions. This algorithm is implemented in a comprehensive and open source library called BiVeS. BiVeS helps to identify and characterize changes in computational models and thereby contributes to the documentation of a model's history. Our work facilitates the reuse and extension of existing models and supports collaborative modelling. Finally, it contributes to better reproducibility of modelling results and to the challenge of model provenance. The workflow described in this article is implemented in BiVeS. BiVeS is freely available as source code and binary from sems.uni-rostock.de. The web interface BudHat demonstrates the capabilities of BiVeS at budhat.sems.uni-rostock.de. © The Author 2015. Published by Oxford University Press.
SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology.

PubMed

Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E; Troein, Carl; Millar, Andrew J; Goryanin, Igor; Gilmore, Stephen

2013-03-01

Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI's use of standard data formats. All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials.
SANDS: a service-oriented architecture for clinical decision support in a National Health Information Network.

PubMed

Wright, Adam; Sittig, Dean F

2008-12-01

In this paper, we describe and evaluate a new distributed architecture for clinical decision support called SANDS (Service-oriented Architecture for NHIN Decision Support), which leverages current health information exchange efforts and is based on the principles of a service-oriented architecture. The architecture allows disparate clinical information systems and clinical decision support systems to be seamlessly integrated over a network according to a set of interfaces and protocols described in this paper. The architecture described is fully defined and developed, and six use cases have been developed and tested using a prototype electronic health record which links to one of the existing prototype National Health Information Networks (NHIN): drug interaction checking, syndromic surveillance, diagnostic decision support, inappropriate prescribing in older adults, information at the point of care and a simple personal health record. Some of these use cases utilize existing decision support systems, which are either commercially or freely available at present, and developed outside of the SANDS project, while other use cases are based on decision support systems developed specifically for the project. Open source code for many of these components is available, and an open source reference parser is also available for comparison and testing of other clinical information systems and clinical decision support systems that wish to implement the SANDS architecture. The SANDS architecture for decision support has several significant advantages over other architectures for clinical decision support. The most salient of these are:
Scuba: scalable kernel-based gene prioritization.

PubMed

Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio

2018-01-25

The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .
Extracting and standardizing medication information in clinical text – the MedEx-UIMA system

PubMed Central

Jiang, Min; Wu, Yonghui; Shah, Anushi; Priyanka, Priyanka; Denny, Joshua C.; Xu, Hua

2014-01-01

Extraction of medication information embedded in clinical text is important for research using electronic health records (EHRs). However, most of current medication information extraction systems identify drug and signature entities without mapping them to standard representation. In this study, we introduced the open source Java implementation of MedEx, an existing high-performance medication information extraction system, based on the Unstructured Information Management Architecture (UIMA) framework. In addition, we developed new encoding modules in the MedEx-UIMA system, which mapped an extracted drug name/dose/form to both generalized and specific RxNorm concepts and translated drug frequency information to ISO standard. We processed 826 documents by both systems and verified that MedEx-UIMA and MedEx (the Python version) performed similarly by comparing both results. Using two manually annotated test sets that contained 300 drug entries from medication list and 300 drug entries from narrative reports, the MedEx-UIMA system achieved F-measures of 98.5% and 97.5% respectively for encoding drug names to corresponding RxNorm generic drug ingredients, and F-measures of 85.4% and 88.1% respectively for mapping drug names/dose/form to the most specific RxNorm concepts. It also achieved an F-measure of 90.4% for normalizing frequency information to ISO standard. The open source MedEx-UIMA system is freely available online at http://code.google.com/p/medex-uima/. PMID:25954575
RGG: A general GUI Framework for R scripts

PubMed Central

Visne, Ilhami; Dilaveroglu, Erkan; Vierlinger, Klemens; Lauss, Martin; Yildiz, Ahmet; Weinhaeusel, Andreas; Noehammer, Christa; Leisch, Friedrich; Kriegner, Albert

2009-01-01

Background R is the leading open source statistics software with a vast number of biostatistical and bioinformatical analysis packages. To exploit the advantages of R, extensive scripting/programming skills are required. Results We have developed a software tool called R GUI Generator (RGG) which enables the easy generation of Graphical User Interfaces (GUIs) for the programming language R by adding a few Extensible Markup Language (XML) – tags. RGG consists of an XML-based GUI definition language and a Java-based GUI engine. GUIs are generated in runtime from defined GUI tags that are embedded into the R script. User-GUI input is returned to the R code and replaces the XML-tags. RGG files can be developed using any text editor. The current version of RGG is available as a stand-alone software (RGGRunner) and as a plug-in for JGR. Conclusion RGG is a general GUI framework for R that has the potential to introduce R statistics (R packages, built-in functions and scripts) to users with limited programming skills and helps to bridge the gap between R developers and GUI-dependent users. RGG aims to abstract the GUI development from individual GUI toolkits by using an XML-based GUI definition language. Thus RGG can be easily integrated in any software. The RGG project further includes the development of a web-based repository for RGG-GUIs. RGG is an open source project licensed under the Lesser General Public License (LGPL) and can be downloaded freely at PMID:19254356
JCoDA: a tool for detecting evolutionary selection.

PubMed

Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir

2010-05-27

The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
JCoDA: a tool for detecting evolutionary selection

PubMed Central

2010-01-01

Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581
High fidelity kinetic modeling of magnetic reconnection in laboratory plasma

NASA Astrophysics Data System (ADS)

Stanier, A.; Daughton, W. S.

2017-12-01

Over the past decade, a great deal of progress has been made towards understanding the physics of magnetic reconnection in weakly collisional regimes of relevance to both fusion devices, and to space and astrophysical plasmas. However, there remain some outstanding unsolved problems in reconnection physics, such as the generation and influence of plasmoids (flux ropes) within reconnection layers, the development of magnetic turbulence, the role of current driven and streaming instabilities, and the influence of electron pressure anisotropy on the layer structure. Due to the importance of these questions, new laboratory reconnection experiments are being built to allow controlled and reproducible study of such questions with the simultaneous acquisition of high time resolution measurements at a large number of spatial points. These experiments include the FLARE facility at Princeton University and the T-REX experiment at the University of Wisconsin. To guide and interpret these new experiments, and to extrapolate the results to space applications, new investments in kinetic modeling tools are required. We have recently developed a cylindrical version of the VPIC Particle-In-Cell code with the capability to perform first-principles kinetic simulations that approach experimental device size with more realistic geometry and drive coils. This cylindrical version inherits much of the optimization work that has been done recently for the next generation many-cores architectures with wider vector registers, and achieves comparable conservation properties as the Cartesian code. Namely it features exact discrete charge conservation, and a so-called "energy-conserving" scheme where the energy is conserved in the limit of continuous time, i.e. without contribution from spatial discretization (Lewis, 1970). We will present initial results of modeling magnetic reconnection in the experiments mentioned above. Since the VPIC code is open source (https://github.com/losalamos/vpic), this new cylindrical version will also be freely available to the community.

Zoo agent's measure in applying the five freedoms principles for animal welfare.

PubMed

Demartoto, Argyo; Soemanto, Robertus Bellarminus; Zunariyah, Siti

2017-09-01

Animal welfare should be prioritized not only for the animal's life sustainability but also for supporting the sustainability of living organism's life on the earth. However, Indonesian people have not understood it yet, thereby still treating animals arbitrarily and not appreciating either domesticated or wild animals. This research aimed to analyze the zoo agent's action in applying the five freedoms principle for animal welfare in Taman Satwa Taru Jurug (thereafter called TSTJ) or Surakarta Zoo and Gembira Loka Zoo (GLZ) of Yogyakarta Indonesia using Giddens structuration theory. The informants in this comparative study with explorative were organizers, visitors, and stakeholders of zoos selected using purposive sampling technique. The informants consisted of 19 persons: 8 from TSTJ (Code T) and 10 from GLZ (Code G) and representatives from Natural Resource Conservation Center of Central Java (Code B). Data were collected through observation, in-depth interview, and Focus Group Discussion and Documentation. Data were analyzed using an interactive model of analysis consisting of three components: Data reduction, data display, and conclusion drawing. Data validation was carried out using method and data source triangulations. Food, nutrition, and nutrition level have been given consistent with the animals' habit and natural behavior. Animal keepers always maintain their self-cleanliness. GLZ has provided cages according to the technical instruction of constructing ideal cages, but the cages in TSTJ are worrying as they are not consistent with standard, rusty, and damaged, and animals have no partner. Some animals in GLZ are often sick, whereas some animals in TSTJ are dead due to poor maintenance. The iron pillars of cages restrict animal behavior in TSTJ so that they have not had freedom to behave normally yet, whereas, in GLZ, they can move freely in original habitat. The animals in the two zoos have not been free from disruption, stress, and pressure due to the passing over vehicles. There should be strategic communication, information, and education, community development, and law enforcement for the animal welfare.
Astronomy education and the Astrophysics Source Code Library

NASA Astrophysics Data System (ADS)

Allen, Alice; Nemiroff, Robert J.

2016-01-01

The Astrophysics Source Code Library (ASCL) is an online registry of source codes used in refereed astrophysics research. It currently lists nearly 1,200 codes and covers all aspects of computational astrophysics. How can this resource be of use to educators and to the graduate students they mentor? The ASCL serves as a discovery tool for codes that can be used for one's own research. Graduate students can also investigate existing codes to see how common astronomical problems are approached numerically in practice, and use these codes as benchmarks for their own solutions to these problems. Further, they can deepen their knowledge of software practices and techniques through examination of others' codes.
The Reporter's Privilege under Fire: Is the American Press Still Free?

ERIC Educational Resources Information Center

West, Natalie

2009-01-01

The First Amendment's guarantee of an independent press that may freely collect and disseminate news is often considered the bedrock of American democracy. Yet more than a century and a half after the "New York Herald's" John Nugent became the first American reporter jailed for refusing to identify a confidential source, reporters…
The Web as a Reference Tool: Comparisons with Traditional Sources.

ERIC Educational Resources Information Center

Janes, Joseph; McClure, Charles R.

1999-01-01

This preliminary study suggests that the same level of timeliness and accuracy can be obtained for answers to reference questions using resources in freely available World Wide Web sites as with traditional print-based resources. Discusses implications for library collection development, new models of consortia, training needs, and costing and…
Caught in the Network

ERIC Educational Resources Information Center

Cesarini, Paul

2007-01-01

This article describes The Onion Router (TOR). It is a freely available, open-source program developed by the U.S. Navy about a decade ago. A browser plug-in, it thwarts online traffic analysis and related forms of Internet surveillance by sending your data packets through different routers around the world. As each packet moves from one router to…
Unified EDGE

DOE Office of Scientific and Technical Information (OSTI.GOV)

2007-06-18

UEDGE is an interactive suite of physics packages using the Python or BASIS scripting systems. The plasma is described by time-dependent 2D plasma fluid equations that include equations for density, velocity, ion temperature, electron temperature, electrostatic potential, and gas density in the edge region of a magnetic fusion energy confinement device. Slab, cylindrical, and toroidal geometries are allowed, and closed and open magnetic field-line regions are included. Classical transport is assumed along magnetic field lines, and anomalous transport is assumed across field lines. Multi-charge state impurities can be included with the corresponding line-radiation energy loss. Although UEDGE is written inmore » Fortran, for efficient execution and analysis of results, it utilizes either Python or BASIS scripting shells. Python is easily available for many platforms (http://www.Python.org/). The features and availability of BASIS are described in "Basis Manual Set" by P.F. Dubois, Z.C. Motteler, et al., Lawrence Livermore National Laboratory report UCRL-MA-1 18541, June, 2002 and http://basis.llnl.gov. BASIS has been reviewed and released by LLNL for unlimited distribution. The Python version utilizes PYBASIS scripts developed by D.P. Grote, LLNL. The Python version also uses MPPL code and MAC Perl script, available from the public-domain BASIS source above. The Forthon version of UEDGE uses the same source files, but utilizes Forthon to produce a Python-compatible source. Forthon has been developed by D.P. Grote at LBL (see http://hifweb.lbl.gov/Forthon/ and Grote et al. in the references below), and it is freely available. The graphics can be performed by any package importable to Python, such as PYGIST.« less
ReGaTE: Registration of Galaxy Tools in Elixir.

PubMed

Doppelt-Azeroual, Olivia; Mareuil, Fabien; Deveaud, Eric; Kalaš, Matúš; Soranzo, Nicola; van den Beek, Marius; Grüning, Björn; Ison, Jon; Ménager, Hervé

2017-06-01

Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE . © The Author 2017. Published by Oxford University Press.
SCIFIO: an extensible framework to support scientific image formats.

PubMed

Hiner, Mark C; Rueden, Curtis T; Eliceiri, Kevin W

2016-12-07

No gold standard exists in the world of scientific image acquisition; a proliferation of instruments each with its own proprietary data format has made out-of-the-box sharing of that data nearly impossible. In the field of light microscopy, the Bio-Formats library was designed to translate such proprietary data formats to a common, open-source schema, enabling sharing and reproduction of scientific results. While Bio-Formats has proved successful for microscopy images, the greater scientific community was lacking a domain-independent framework for format translation. SCIFIO (SCientific Image Format Input and Output) is presented as a freely available, open-source library unifying the mechanisms of reading and writing image data. The core of SCIFIO is its modular definition of formats, the design of which clearly outlines the components of image I/O to encourage extensibility, facilitated by the dynamic discovery of the SciJava plugin framework. SCIFIO is structured to support coexistence of multiple domain-specific open exchange formats, such as Bio-Formats' OME-TIFF, within a unified environment. SCIFIO is a freely available software library developed to standardize the process of reading and writing scientific image formats.
Case study of open-source enterprise resource planning implementation in a small business

NASA Astrophysics Data System (ADS)

Olson, David L.; Staley, Jesse

2012-02-01

Enterprise resource planning (ERP) systems have been recognised as offering great benefit to some organisations, although they are expensive and problematic to implement. The cost and risk make well-developed proprietorial systems unaffordable to small businesses. Open-source software (OSS) has become a viable means of producing ERP system products. The question this paper addresses is the feasibility of OSS ERP systems for small businesses. A case is reported involving two efforts to implement freely distributed ERP software products in a small US make-to-order engineering firm. The case emphasises the potential of freely distributed ERP systems, as well as some of the hurdles involved in their implementation. The paper briefly reviews highlights of OSS ERP systems, with the primary focus on reporting the case experiences for efforts to implement ERPLite software and xTuple software. While both systems worked from a technical perspective, both failed due to economic factors. While these economic conditions led to imperfect results, the case demonstrates the feasibility of OSS ERP for small businesses. Both experiences are evaluated in terms of risk dimension.
Spatial and temporal variation of freely dissolved PAHs in an urban river undergoing Superfund remediation

PubMed Central

Sower, GJ; Anderson, K.A.

2014-01-01

Urban rivers with a history of industrial use can exhibit spatial and temporal variations in contaminant concentrations that may significantly affect risk evaluations and even the assessment of remediation efforts. Concentrations of 15 biologically available priority pollutant polycyclic aromatic hydrocarbons (PAHs) were measured over five years along 18.5 miles of the lower Willamette River using passive sampling devices and HPLC. The study area includes the Portland Harbor Superfund megasite with several PAH sources including remediation operations for coal tar at RM 6.3 west and an additional Superfund site, McCormick and Baxter, at RM 7 east consisting largely of creosote contamination. Study results show that organoclay capping at the McCormick and Baxter Superfund Site reduced PAHs from a pre-cap average of 440 ± 422 ng/L to 8 ± 3 ng/L post-capping. Results also reveal that dredging of submerged coal tar nearly tripled nearby freely dissolved PAH concentrations. For apportioning sources, fluoranthene/ pyrene and phenanthrene/anthracene diagnostic ratios from passive sampling devices were established for creosote and coal tar contamination and compared to published sediment values. PMID:19174872
Spatial and temporal variation of freely dissolved polycyclic aromatic hydrocarbons in an urban river undergoing Superfund remediation.

PubMed

Sower, Gregory James; Anderson, Kim A

2008-12-15

Urban rivers with a history of industrial use can exhibit spatial and temporal variations in contaminant concentrations that may significantly affect risk evaluations and even the assessment of remediation efforts. Concentrations of 15 biologically available priority pollutant polycyclic aromatic hydrocarbons (PAHs) were measured over five years along 18.5 miles of the lower Willamette River using passive sampling devices and HPLC. The study area includes the Portland Harbor Superfund megasite with several PAH sources including remediation operations for coal tar at RM 6.3 west and an additional Superfund site, McCormick and Baxter, at RM 7 east consisting largely of creosote contamination. Study results show that organoclay capping at the McCormick and Baxter Superfund Site reduced PAHs from a precap average of 440 +/- 422 ng/L to 8 +/- 3 ng/L postcapping. Results also reveal that dredging of submerged coal tar nearly tripled nearby freely dissolved PAH concentrations. For apportioning sources, fluoranthene/pyrene and phenanthrene/anthracene diagnostic ratios from passive sampling devices were established for creosote and coal tar contamination and compared to published sediment values.
Data processing with microcode designed with source coding

DOEpatents

McCoy, James A; Morrison, Steven E

2013-05-07

Programming for a data processor to execute a data processing application is provided using microcode source code. The microcode source code is assembled to produce microcode that includes digital microcode instructions with which to signal the data processor to execute the data processing application.
ms_lims, a simple yet powerful open source laboratory information management system for MS-driven proteomics.

PubMed

Helsens, Kenny; Colaert, Niklaas; Barsnes, Harald; Muth, Thilo; Flikka, Kristian; Staes, An; Timmerman, Evy; Wortelkamp, Steffi; Sickmann, Albert; Vandekerckhove, Joël; Gevaert, Kris; Martens, Lennart

2010-03-01

MS-based proteomics produces large amounts of mass spectra that require processing, identification and possibly quantification before interpretation can be undertaken. High-throughput studies require automation of these various steps, and management of the data in association with the results obtained. We here present ms_lims (http://genesis.UGent.be/ms_lims), a freely available, open-source system based on a central database to automate data management and processing in MS-driven proteomics analyses.
Adaptive distributed source coding.

PubMed

Varodayan, David; Lin, Yao-Chung; Girod, Bernd

2012-05-01

We consider distributed source coding in the presence of hidden variables that parameterize the statistical dependence among sources. We derive the Slepian-Wolf bound and devise coding algorithms for a block-candidate model of this problem. The encoder sends, in addition to syndrome bits, a portion of the source to the decoder uncoded as doping bits. The decoder uses the sum-product algorithm to simultaneously recover the source symbols and the hidden statistical dependence variables. We also develop novel techniques based on density evolution (DE) to analyze the coding algorithms. We experimentally confirm that our DE analysis closely approximates practical performance. This result allows us to efficiently optimize parameters of the algorithms. In particular, we show that the system performs close to the Slepian-Wolf bound when an appropriate doping rate is selected. We then apply our coding and analysis techniques to a reduced-reference video quality monitoring system and show a bit rate saving of about 75% compared with fixed-length coding.
Phase II evaluation of clinical coding schemes: completeness, taxonomy, mapping, definitions, and clarity. CPRI Work Group on Codes and Structures.

PubMed

Campbell, J R; Carpenter, P; Sneiderman, C; Cohn, S; Chute, C G; Warren, J

1997-01-01

To compare three potential sources of controlled clinical terminology (READ codes version 3.1, SNOMED International, and Unified Medical Language System (UMLS) version 1.6) relative to attributes of completeness, clinical taxonomy, administrative mapping, term definitions and clarity (duplicate coding rate). The authors assembled 1929 source concept records from a variety of clinical information taken from four medical centers across the United States. The source data included medical as well as ample nursing terminology. The source records were coded in each scheme by an investigator and checked by the coding scheme owner. The codings were then scored by an independent panel of clinicians for acceptability. Codes were checked for definitions provided with the scheme. Codes for a random sample of source records were analyzed by an investigator for "parent" and "child" codes within the scheme. Parent and child pairs were scored by an independent panel of medical informatics specialists for clinical acceptability. Administrative and billing code mapping from the published scheme were reviewed for all coded records and analyzed by independent reviewers for accuracy. The investigator for each scheme exhaustively searched a sample of coded records for duplications. SNOMED was judged to be significantly more complete in coding the source material than the other schemes (SNOMED* 70%; READ 57%; UMLS 50%; *p < .00001). SNOMED also had a richer clinical taxonomy judged by the number of acceptable first-degree relatives per coded concept (SNOMED* 4.56, UMLS 3.17; READ 2.14, *p < .005). Only the UMLS provided any definitions; these were found for 49% of records which had a coding assignment. READ and UMLS had better administrative mappings (composite score: READ* 40.6%; UMLS* 36.1%; SNOMED 20.7%, *p < .00001), and SNOMED had substantially more duplications of coding assignments (duplication rate: READ 0%; UMLS 4.2%; SNOMED* 13.9%, *p < .004) associated with a loss of clarity. No major terminology source can lay claim to being the ideal resource for a computer-based patient record. However, based upon this analysis of releases for April 1995, SNOMED International is considerably more complete, has a compositional nature and a richer taxonomy. Is suffers from less clarity, resulting from a lack of syntax and evolutionary changes in its coding scheme. READ has greater clarity and better mapping to administrative schemes (ICD-10 and OPCS-4), is rapidly changing and is less complete. UMLS is a rich lexical resource, with mappings to many source vocabularies. It provides definitions for many of its terms. However, due to the varying granularities and purposes of its source schemes, it has limitations for representation of clinical concepts within a computer-based patient record.
The Particle Accelerator Simulation Code PyORBIT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gorlov, Timofey V; Holmes, Jeffrey A; Cousineau, Sarah M

2015-01-01

The particle accelerator simulation code PyORBIT is presented. The structure, implementation, history, parallel and simulation capabilities, and future development of the code are discussed. The PyORBIT code is a new implementation and extension of algorithms of the original ORBIT code that was developed for the Spallation Neutron Source accelerator at the Oak Ridge National Laboratory. The PyORBIT code has a two level structure. The upper level uses the Python programming language to control the flow of intensive calculations performed by the lower level code implemented in the C++ language. The parallel capabilities are based on MPI communications. The PyORBIT ismore » an open source code accessible to the public through the Google Open Source Projects Hosting service.« less
Numerical Studies of Friction Between Metallic Surfaces and of its Dependence on Electric Currents

NASA Astrophysics Data System (ADS)

Meintanis, Evangelos; Marder, Michael

2009-03-01

We will present molecular dynamics simulations that explore the frictional mechanisms between clean metallic surfaces. We employ the HOLA molecular dynamics code to run slider-on-block experiments. Both objects are allowed to evolve freely. We recover realistic coefficients of friction and verify the importance of cold-welding and plastic deformations in dry sliding friction. We also find that plastic deformations can significantly affect both objects, despite a difference in hardness. Metallic contacts have significant technological applications in the transmission of electric currents. To explore the effects of the latter to sliding, we had to integrate an electrodynamics solver into the molecular dynamics code. The disparate time scales involved posed a challenge, but we have developed an efficient scheme for such an integration. A limited electrodynamic solver has been implemented and we are currently exploring the effects of currents in the friction and wear of metallic contacts.
The large-scale environment from cosmological simulations - I. The baryonic cosmic web

NASA Astrophysics Data System (ADS)

Cui, Weiguang; Knebe, Alexander; Yepes, Gustavo; Yang, Xiaohu; Borgani, Stefano; Kang, Xi; Power, Chris; Staveley-Smith, Lister

2018-01-01

Using a series of cosmological simulations that includes one dark-matter-only (DM-only) run, one gas cooling-star formation-supernova feedback (CSF) run and one that additionally includes feedback from active galactic nuclei (AGNs), we classify the large-scale structures with both a velocity-shear-tensor code (VWEB) and a tidal-tensor code (PWEB). We find that the baryonic processes have almost no impact on large-scale structures - at least not when classified using aforementioned techniques. More importantly, our results confirm that the gas component alone can be used to infer the filamentary structure of the universe practically un-biased, which could be applied to cosmology constraints. In addition, the gas filaments are classified with its velocity (VWEB) and density (PWEB) fields, which can theoretically connect to the radio observations, such as H I surveys. This will help us to bias-freely link the radio observations with dark matter distributions at large scale.
Impacts of DNAPL Source Treatment: Experimental and Modeling Assessment of the Benefits of Partial DNAPL Source Removal

DTIC Science & Technology

2009-09-01

nuclear industry for conducting performance assessment calculations. The analytical FORTRAN code for the DNAPL source function, REMChlor, was...project. The first was to apply existing deterministic codes , such as T2VOC and UTCHEM, to the DNAPL source zone to simulate the remediation processes...but describe the spatial variability of source zones unlike one-dimensional flow and transport codes that assume homogeneity. The Lagrangian models
NeisseriaBase: a specialised Neisseria genomic resource and analysis platform.

PubMed

Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh

2016-01-01

Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI) and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%), predicted hydrophobicity and molecular weight (Da) using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1) client workstation, (2) web server, (3) application server, and (4) database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC) framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs), 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence Factor Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.

NeisseriaBase: a specialised Neisseria genomic resource and analysis platform

PubMed Central

Zheng, Wenning; Mutha, Naresh V.R.; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S.; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah

2016-01-01

Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI) and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%), predicted hydrophobicity and molecular weight (Da) using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1) client workstation, (2) web server, (3) application server, and (4) database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC) framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs), 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence Factor Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my. PMID:27017950
Phase II Evaluation of Clinical Coding Schemes

PubMed Central

Campbell, James R.; Carpenter, Paul; Sneiderman, Charles; Cohn, Simon; Chute, Christopher G.; Warren, Judith

1997-01-01

Abstract Objective: To compare three potential sources of controlled clinical terminology (READ codes version 3.1, SNOMED International, and Unified Medical Language System (UMLS) version 1.6) relative to attributes of completeness, clinical taxonomy, administrative mapping, term definitions and clarity (duplicate coding rate). Methods: The authors assembled 1929 source concept records from a variety of clinical information taken from four medical centers across the United States. The source data included medical as well as ample nursing terminology. The source records were coded in each scheme by an investigator and checked by the coding scheme owner. The codings were then scored by an independent panel of clinicians for acceptability. Codes were checked for definitions provided with the scheme. Codes for a random sample of source records were analyzed by an investigator for “parent” and “child” codes within the scheme. Parent and child pairs were scored by an independent panel of medical informatics specialists for clinical acceptability. Administrative and billing code mapping from the published scheme were reviewed for all coded records and analyzed by independent reviewers for accuracy. The investigator for each scheme exhaustively searched a sample of coded records for duplications. Results: SNOMED was judged to be significantly more complete in coding the source material than the other schemes (SNOMED* 70%; READ 57%; UMLS 50%; *p <.00001). SNOMED also had a richer clinical taxonomy judged by the number of acceptable first-degree relatives per coded concept (SNOMED* 4.56; UMLS 3.17; READ 2.14, *p <.005). Only the UMLS provided any definitions; these were found for 49% of records which had a coding assignment. READ and UMLS had better administrative mappings (composite score: READ* 40.6%; UMLS* 36.1%; SNOMED 20.7%, *p <. 00001), and SNOMED had substantially more duplications of coding assignments (duplication rate: READ 0%; UMLS 4.2%; SNOMED* 13.9%, *p <. 004) associated with a loss of clarity. Conclusion: No major terminology source can lay claim to being the ideal resource for a computer-based patient record. However, based upon this analysis of releases for April 1995, SNOMED International is considerably more complete, has a compositional nature and a richer taxonomy. It suffers from less clarity, resulting from a lack of syntax and evolutionary changes in its coding scheme. READ has greater clarity and better mapping to administrative schemes (ICD-10 and OPCS-4), is rapidly changing and is less complete. UMLS is a rich lexical resource, with mappings to many source vocabularies. It provides definitions for many of its terms. However, due to the varying granularities and purposes of its source schemes, it has limitations for representation of clinical concepts within a computer-based patient record. PMID:9147343
Amino acid fermentation at the origin of the genetic code

PubMed Central

2012-01-01

There is evidence that the genetic code was established prior to the existence of proteins, when metabolism was powered by ribozymes. Also, early proto-organisms had to rely on simple anaerobic bioenergetic processes. In this work I propose that amino acid fermentation powered metabolism in the RNA world, and that this was facilitated by proto-adapters, the precursors of the tRNAs. Amino acids were used as carbon sources rather than as catalytic or structural elements. In modern bacteria, amino acid fermentation is known as the Stickland reaction. This pathway involves two amino acids: the first undergoes oxidative deamination, and the second acts as an electron acceptor through reductive deamination. This redox reaction results in two keto acids that are employed to synthesise ATP via substrate-level phosphorylation. The Stickland reaction is the basic bioenergetic pathway of some bacteria of the genus Clostridium. Two other facts support Stickland fermentation in the RNA world. First, several Stickland amino acid pairs are synthesised in abiotic amino acid synthesis. This suggests that amino acids that could be used as an energy substrate were freely available. Second, anticodons that have complementary sequences often correspond to amino acids that form Stickland pairs. The main hypothesis of this paper is that pairs of complementary proto-adapters were assigned to Stickland amino acids pairs. There are signatures of this hypothesis in the genetic code. Furthermore, it is argued that the proto-adapters formed double strands that brought amino acid pairs into proximity to facilitate their mutual redox reaction, structurally constraining the anticodon pairs that are assigned to these amino acid pairs. Significance tests which randomise the code are performed to study the extent of the variability of the energetic (ATP) yield. Random assignments can lead to a substantial yield of ATP and maintain enough variability, thus selection can act and refine the assignments into a proto-code that optimises the energetic yield. Monte Carlo simulations are performed to evaluate the establishment of these simple proto-codes, based on amino acid substitutions and codon swapping. In all cases, donor amino acids are assigned to anticodons composed of U+G, and have low redundancy (1-2 codons), whereas acceptor amino acids are assigned to the the remaining codons. These bioenergetic and structural constraints allow for a metabolic role for amino acids before their co-option as catalyst cofactors. Reviewers: this article was reviewed by Prof. William Martin, Prof. Eörs Szathmáry (nominated by Dr. Gáspár Jékely) and Dr. Ádám Kun (nominated by Dr. Sandor Pongor) PMID:22325238
Representation of DNA sequences with virtual potentials and their processing by (SEQREP) Kohonen self-organizing maps.

PubMed

Aires-de-Sousa, João; Aires-de-Sousa, Luisa

2003-01-01

We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment. Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep. Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002
Doclet To Synthesize UML

NASA Technical Reports Server (NTRS)

Barry, Matthew R.; Osborne, Richard N.

2005-01-01

The RoseDoclet computer program extends the capability of Java doclet software to automatically synthesize Unified Modeling Language (UML) content from Java language source code. [Doclets are Java-language programs that use the doclet application programming interface (API) to specify the content and format of the output of Javadoc. Javadoc is a program, originally designed to generate API documentation from Java source code, now also useful as an extensible engine for processing Java source code.] RoseDoclet takes advantage of Javadoc comments and tags already in the source code to produce a UML model of that code. RoseDoclet applies the doclet API to create a doclet passed to Javadoc. The Javadoc engine applies the doclet to the source code, emitting the output format specified by the doclet. RoseDoclet emits a Rose model file and populates it with fully documented packages, classes, methods, variables, and class diagrams identified in the source code. The way in which UML models are generated can be controlled by use of new Javadoc comment tags that RoseDoclet provides. The advantage of using RoseDoclet is that Javadoc documentation becomes leveraged for two purposes: documenting the as-built API and keeping the design documentation up to date.
shiftNMFk 1.1: Robust Nonnegative matrix factorization with kmeans clustering and signal shift, for allocation of unknown physical sources, toy version for open sourcing with publications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alexandrov, Boian S.; Lliev, Filip L.; Stanev, Valentin G.

This code is a toy (short) version of CODE-2016-83. From a general perspective, the code represents an unsupervised adaptive machine learning algorithm that allows efficient and high performance de-mixing and feature extraction of a multitude of non-negative signals mixed and recorded by a network of uncorrelated sensor arrays. The code identifies the number of the mixed original signals and their locations. Further, the code also allows deciphering of signals that have been delayed in regards to the mixing process in each sensor. This code is high customizable and it can be efficiently used for a fast macro-analyses of data. Themore » code is applicable to a plethora of distinct problems: chemical decomposition, pressure transient decomposition, unknown sources/signal allocation, EM signal decomposition. An additional procedure for allocation of the unknown sources is incorporated in the code.« less
Joint Source-Channel Decoding of Variable-Length Codes with Soft Information: A Survey

NASA Astrophysics Data System (ADS)

Guillemot, Christine; Siohan, Pierre

2005-12-01

Multimedia transmission over time-varying wireless channels presents a number of challenges beyond existing capabilities conceived so far for third-generation networks. Efficient quality-of-service (QoS) provisioning for multimedia on these channels may in particular require a loosening and a rethinking of the layer separation principle. In that context, joint source-channel decoding (JSCD) strategies have gained attention as viable alternatives to separate decoding of source and channel codes. A statistical framework based on hidden Markov models (HMM) capturing dependencies between the source and channel coding components sets the foundation for optimal design of techniques of joint decoding of source and channel codes. The problem has been largely addressed in the research community, by considering both fixed-length codes (FLC) and variable-length source codes (VLC) widely used in compression standards. Joint source-channel decoding of VLC raises specific difficulties due to the fact that the segmentation of the received bitstream into source symbols is random. This paper makes a survey of recent theoretical and practical advances in the area of JSCD with soft information of VLC-encoded sources. It first describes the main paths followed for designing efficient estimators for VLC-encoded sources, the key component of the JSCD iterative structure. It then presents the main issues involved in the application of the turbo principle to JSCD of VLC-encoded sources as well as the main approaches to source-controlled channel decoding. This survey terminates by performance illustrations with real image and video decoding systems.
A Repository of Codes of Ethics and Technical Standards in Health Informatics

PubMed Central

Zaïane, Osmar R.

2014-01-01

We present a searchable repository of codes of ethics and standards in health informatics. It is built using state-of-the-art search algorithms and technologies. The repository will be potentially beneficial for public health practitioners, researchers, and software developers in finding and comparing ethics topics of interest. Public health clinics, clinicians, and researchers can use the repository platform as a one-stop reference for various ethics codes and standards. In addition, the repository interface is built for easy navigation, fast search, and side-by-side comparative reading of documents. Our selection criteria for codes and standards are two-fold; firstly, to maintain intellectual property rights, we index only codes and standards freely available on the internet. Secondly, major international, regional, and national health informatics bodies across the globe are surveyed with the aim of understanding the landscape in this domain. We also look at prevalent technical standards in health informatics from major bodies such as the International Standards Organization (ISO) and the U. S. Food and Drug Administration (FDA). Our repository contains codes of ethics from the International Medical Informatics Association (IMIA), the iHealth Coalition (iHC), the American Health Information Management Association (AHIMA), the Australasian College of Health Informatics (ACHI), the British Computer Society (BCS), and the UK Council for Health Informatics Professions (UKCHIP), with room for adding more in the future. Our major contribution is enhancing the findability of codes and standards related to health informatics ethics by compilation and unified access through the health informatics ethics repository. PMID:25422725
Open-Source Development of the Petascale Reactive Flow and Transport Code PFLOTRAN

NASA Astrophysics Data System (ADS)

Hammond, G. E.; Andre, B.; Bisht, G.; Johnson, T.; Karra, S.; Lichtner, P. C.; Mills, R. T.

2013-12-01

Open-source software development has become increasingly popular in recent years. Open-source encourages collaborative and transparent software development and promotes unlimited free redistribution of source code to the public. Open-source development is good for science as it reveals implementation details that are critical to scientific reproducibility, but generally excluded from journal publications. In addition, research funds that would have been spent on licensing fees can be redirected to code development that benefits more scientists. In 2006, the developers of PFLOTRAN open-sourced their code under the U.S. Department of Energy SciDAC-II program. Since that time, the code has gained popularity among code developers and users from around the world seeking to employ PFLOTRAN to simulate thermal, hydraulic, mechanical and biogeochemical processes in the Earth's surface/subsurface environment. PFLOTRAN is a massively-parallel subsurface reactive multiphase flow and transport simulator designed from the ground up to run efficiently on computing platforms ranging from the laptop to leadership-class supercomputers, all from a single code base. The code employs domain decomposition for parallelism and is founded upon the well-established and open-source parallel PETSc and HDF5 frameworks. PFLOTRAN leverages modern Fortran (i.e. Fortran 2003-2008) in its extensible object-oriented design. The use of this progressive, yet domain-friendly programming language has greatly facilitated collaboration in the code's software development. Over the past year, PFLOTRAN's top-level data structures were refactored as Fortran classes (i.e. extendible derived types) to improve the flexibility of the code, ease the addition of new process models, and enable coupling to external simulators. For instance, PFLOTRAN has been coupled to the parallel electrical resistivity tomography code E4D to enable hydrogeophysical inversion while the same code base can be used as a third-party library to provide hydrologic flow, energy transport, and biogeochemical capability to the community land model, CLM, part of the open-source community earth system model (CESM) for climate. In this presentation, the advantages and disadvantages of open source software development in support of geoscience research at government laboratories, universities, and the private sector are discussed. Since the code is open-source (i.e. it's transparent and readily available to competitors), the PFLOTRAN team's development strategy within a competitive research environment is presented. Finally, the developers discuss their approach to object-oriented programming and the leveraging of modern Fortran in support of collaborative geoscience research as the Fortran standard evolves among compiler vendors.
Multidimensional incremental parsing for universal source coding.

PubMed

Bae, Soo Hyun; Juang, Biing-Hwang

2008-10-01

A multidimensional incremental parsing algorithm (MDIP) for multidimensional discrete sources, as a generalization of the Lempel-Ziv coding algorithm, is investigated. It consists of three essential component schemes, maximum decimation matching, hierarchical structure of multidimensional source coding, and dictionary augmentation. As a counterpart of the longest match search in the Lempel-Ziv algorithm, two classes of maximum decimation matching are studied. Also, an underlying behavior of the dictionary augmentation scheme for estimating the source statistics is examined. For an m-dimensional source, m augmentative patches are appended into the dictionary at each coding epoch, thus requiring the transmission of a substantial amount of information to the decoder. The property of the hierarchical structure of the source coding algorithm resolves this issue by successively incorporating lower dimensional coding procedures in the scheme. In regard to universal lossy source coders, we propose two distortion functions, the local average distortion and the local minimax distortion with a set of threshold levels for each source symbol. For performance evaluation, we implemented three image compression algorithms based upon the MDIP; one is lossless and the others are lossy. The lossless image compression algorithm does not perform better than the Lempel-Ziv-Welch coding, but experimentally shows efficiency in capturing the source structure. The two lossy image compression algorithms are implemented using the two distortion functions, respectively. The algorithm based on the local average distortion is efficient at minimizing the signal distortion, but the images by the one with the local minimax distortion have a good perceptual fidelity among other compression algorithms. Our insights inspire future research on feature extraction of multidimensional discrete sources.
An Efficient Variable Length Coding Scheme for an IID Source

NASA Technical Reports Server (NTRS)

Cheung, K. -M.

1995-01-01

A scheme is examined for using two alternating Huffman codes to encode a discrete independent and identically distributed source with a dominant symbol. This combined strategy, or alternating runlength Huffman (ARH) coding, was found to be more efficient than ordinary coding in certain circumstances.
Egocentric and allocentric representations in auditory cortex

PubMed Central

Brimijoin, W. Owen; Bizley, Jennifer K.

2017-01-01

A key function of the brain is to provide a stable representation of an object’s location in the world. In hearing, sound azimuth and elevation are encoded by neurons throughout the auditory system, and auditory cortex is necessary for sound localization. However, the coordinate frame in which neurons represent sound space remains undefined: classical spatial receptive fields in head-fixed subjects can be explained either by sensitivity to sound source location relative to the head (egocentric) or relative to the world (allocentric encoding). This coordinate frame ambiguity can be resolved by studying freely moving subjects; here we recorded spatial receptive fields in the auditory cortex of freely moving ferrets. We found that most spatially tuned neurons represented sound source location relative to the head across changes in head position and direction. In addition, we also recorded a small number of neurons in which sound location was represented in a world-centered coordinate frame. We used measurements of spatial tuning across changes in head position and direction to explore the influence of sound source distance and speed of head movement on auditory cortical activity and spatial tuning. Modulation depth of spatial tuning increased with distance for egocentric but not allocentric units, whereas, for both populations, modulation was stronger at faster movement speeds. Our findings suggest that early auditory cortex primarily represents sound source location relative to ourselves but that a minority of cells can represent sound location in the world independent of our own position. PMID:28617796
Source Code Plagiarism--A Student Perspective

ERIC Educational Resources Information Center

Joy, M.; Cosma, G.; Yau, J. Y.-K.; Sinclair, J.

2011-01-01

This paper considers the problem of source code plagiarism by students within the computing disciplines and reports the results of a survey of students in Computing departments in 18 institutions in the U.K. This survey was designed to investigate how well students understand the concept of source code plagiarism and to discover what, if any,…
Recent advances in coding theory for near error-free communications

NASA Technical Reports Server (NTRS)

Cheung, K.-M.; Deutsch, L. J.; Dolinar, S. J.; Mceliece, R. J.; Pollara, F.; Shahshahani, M.; Swanson, L.

1991-01-01

Channel and source coding theories are discussed. The following subject areas are covered: large constraint length convolutional codes (the Galileo code); decoder design (the big Viterbi decoder); Voyager's and Galileo's data compression scheme; current research in data compression for images; neural networks for soft decoding; neural networks for source decoding; finite-state codes; and fractals for data compression.
Passive Sampling Provides Evidence for Neward Bay as a Source of Polychlorinated Dibenzo-p-Dioxins and Furans to the New York/New Jersey, USA, Atmosphere

EPA Science Inventory

Freely dissolved and gas phase polychlorinated dibenzo-p-dioxins (PCDDs) and polychlorinated dibenzofurans (PCDFs) were measured in the water column and atmosphere at five locations within Newark Bay (New Jersey, USA) from May 2008 to August 2009 with polyethylene (PE) passive ...
Opportunity is Knocking: Will Education Open the Door? Carnegie Perspectives

ERIC Educational Resources Information Center

Iiyoshi, Toru

2006-01-01

This essay presents a discussion of how the tools and resources of open source education may demonstrably improve education quality. The main tenet of open education is to make educational assets freely available to the public. This is becoming easier and less expensive as network and multimedia technology evolves. Obstacles may stand in the way…
Hybrid concatenated codes and iterative decoding

NASA Technical Reports Server (NTRS)

Divsalar, Dariush (Inventor); Pollara, Fabrizio (Inventor)

2000-01-01

Several improved turbo code apparatuses and methods. The invention encompasses several classes: (1) A data source is applied to two or more encoders with an interleaver between the source and each of the second and subsequent encoders. Each encoder outputs a code element which may be transmitted or stored. A parallel decoder provides the ability to decode the code elements to derive the original source information d without use of a received data signal corresponding to d. The output may be coupled to a multilevel trellis-coded modulator (TCM). (2) A data source d is applied to two or more encoders with an interleaver between the source and each of the second and subsequent encoders. Each of the encoders outputs a code element. In addition, the original data source d is output from the encoder. All of the output elements are coupled to a TCM. (3) At least two data sources are applied to two or more encoders with an interleaver between each source and each of the second and subsequent encoders. The output may be coupled to a TCM. (4) At least two data sources are applied to two or more encoders with at least two interleavers between each source and each of the second and subsequent encoders. (5) At least one data source is applied to one or more serially linked encoders through at least one interleaver. The output may be coupled to a TCM. The invention includes a novel way of terminating a turbo coder.
Performing aggressive code optimization with an ability to rollback changes made by the aggressive optimizations

DOEpatents

Gschwind, Michael K

2013-07-23

Mechanisms for aggressively optimizing computer code are provided. With these mechanisms, a compiler determines an optimization to apply to a portion of source code and determines if the optimization as applied to the portion of source code will result in unsafe optimized code that introduces a new source of exceptions being generated by the optimized code. In response to a determination that the optimization is an unsafe optimization, the compiler generates an aggressively compiled code version, in which the unsafe optimization is applied, and a conservatively compiled code version in which the unsafe optimization is not applied. The compiler stores both versions and provides them for execution. Mechanisms are provided for switching between these versions during execution in the event of a failure of the aggressively compiled code version. Moreover, predictive mechanisms are provided for predicting whether such a failure is likely.
A neutron spectrum unfolding computer code based on artificial neural networks

NASA Astrophysics Data System (ADS)

Ortiz-Rodríguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.; Cervantes Viramontes, J. M.; Vega-Carrillo, H. R.

2014-02-01

The Bonner Spheres Spectrometer consists of a thermal neutron sensor placed at the center of a number of moderating polyethylene spheres of different diameters. From the measured readings, information can be derived about the spectrum of the neutron field where measurements were made. Disadvantages of the Bonner system are the weight associated with each sphere and the need to sequentially irradiate the spheres, requiring long exposure periods. Provided a well-established response matrix and adequate irradiation conditions, the most delicate part of neutron spectrometry, is the unfolding process. The derivation of the spectral information is not simple because the unknown is not given directly as a result of the measurements. The drawbacks associated with traditional unfolding procedures have motivated the need of complementary approaches. Novel methods based on Artificial Intelligence, mainly Artificial Neural Networks, have been widely investigated. In this work, a neutron spectrum unfolding code based on neural nets technology is presented. This code is called Neutron Spectrometry and Dosimetry with Artificial Neural networks unfolding code that was designed in a graphical interface. The core of the code is an embedded neural network architecture previously optimized using the robust design of artificial neural networks methodology. The main features of the code are: easy to use, friendly and intuitive to the user. This code was designed for a Bonner Sphere System based on a 6LiI(Eu) neutron detector and a response matrix expressed in 60 energy bins taken from an International Atomic Energy Agency compilation. The main feature of the code is that as entrance data, for unfolding the neutron spectrum, only seven rate counts measured with seven Bonner spheres are required; simultaneously the code calculates 15 dosimetric quantities as well as the total flux for radiation protection purposes. This code generates a full report with all information of the unfolding in the HTML format. NSDann unfolding code is freely available, upon request to the authors.
SU-E-T-212: Comparison of TG-43 Dosimetric Parameters of Low and High Energy Brachytherapy Sources Obtained by MCNP Code Versions of 4C, X and 5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zehtabian, M; Zaker, N; Sina, S

2015-06-15

Purpose: Different versions of MCNP code are widely used for dosimetry purposes. The purpose of this study is to compare different versions of the MCNP codes in dosimetric evaluation of different brachytherapy sources. Methods: The TG-43 parameters such as dose rate constant, radial dose function, and anisotropy function of different brachytherapy sources, i.e. Pd-103, I-125, Ir-192, and Cs-137 were calculated in water phantom. The results obtained by three versions of Monte Carlo codes (MCNP4C, MCNPX, MCNP5) were compared for low and high energy brachytherapy sources. Then the cross section library of MCNP4C code was changed to ENDF/B-VI release 8 whichmore » is used in MCNP5 and MCNPX codes. Finally, the TG-43 parameters obtained using the MCNP4C-revised code, were compared with other codes. Results: The results of these investigations indicate that for high energy sources, the differences in TG-43 parameters between the codes are less than 1% for Ir-192 and less than 0.5% for Cs-137. However for low energy sources like I-125 and Pd-103, large discrepancies are observed in the g(r) values obtained by MCNP4C and the two other codes. The differences between g(r) values calculated using MCNP4C and MCNP5 at the distance of 6cm were found to be about 17% and 28% for I-125 and Pd-103 respectively. The results obtained with MCNP4C-revised and MCNPX were similar. However, the maximum difference between the results obtained with the MCNP5 and MCNP4C-revised codes was 2% at 6cm. Conclusion: The results indicate that using MCNP4C code for dosimetry of low energy brachytherapy sources can cause large errors in the results. Therefore it is recommended not to use this code for low energy sources, unless its cross section library is changed. Since the results obtained with MCNP4C-revised and MCNPX were similar, it is concluded that the difference between MCNP4C and MCNPX is their cross section libraries.« less

Pydpiper: a flexible toolkit for constructing novel registration pipelines.

PubMed

Friedel, Miriam; van Eede, Matthijs C; Pipitone, Jon; Chakravarty, M Mallar; Lerch, Jason P

2014-01-01

Using neuroimaging technologies to elucidate the relationship between genotype and phenotype and brain and behavior will be a key contribution to biomedical research in the twenty-first century. Among the many methods for analyzing neuroimaging data, image registration deserves particular attention due to its wide range of applications. Finding strategies to register together many images and analyze the differences between them can be a challenge, particularly given that different experimental designs require different registration strategies. Moreover, writing software that can handle different types of image registration pipelines in a flexible, reusable and extensible way can be challenging. In response to this challenge, we have created Pydpiper, a neuroimaging registration toolkit written in Python. Pydpiper is an open-source, freely available software package that provides multiple modules for various image registration applications. Pydpiper offers five key innovations. Specifically: (1) a robust file handling class that allows access to outputs from all stages of registration at any point in the pipeline; (2) the ability of the framework to eliminate duplicate stages; (3) reusable, easy to subclass modules; (4) a development toolkit written for non-developers; (5) four complete applications that run complex image registration pipelines "out-of-the-box." In this paper, we will discuss both the general Pydpiper framework and the various ways in which component modules can be pieced together to easily create new registration pipelines. This will include a discussion of the core principles motivating code development and a comparison of Pydpiper with other available toolkits. We also provide a comprehensive, line-by-line example to orient users with limited programming knowledge and highlight some of the most useful features of Pydpiper. In addition, we will present the four current applications of the code.
Pydpiper: a flexible toolkit for constructing novel registration pipelines

PubMed Central

Friedel, Miriam; van Eede, Matthijs C.; Pipitone, Jon; Chakravarty, M. Mallar; Lerch, Jason P.

2014-01-01

Using neuroimaging technologies to elucidate the relationship between genotype and phenotype and brain and behavior will be a key contribution to biomedical research in the twenty-first century. Among the many methods for analyzing neuroimaging data, image registration deserves particular attention due to its wide range of applications. Finding strategies to register together many images and analyze the differences between them can be a challenge, particularly given that different experimental designs require different registration strategies. Moreover, writing software that can handle different types of image registration pipelines in a flexible, reusable and extensible way can be challenging. In response to this challenge, we have created Pydpiper, a neuroimaging registration toolkit written in Python. Pydpiper is an open-source, freely available software package that provides multiple modules for various image registration applications. Pydpiper offers five key innovations. Specifically: (1) a robust file handling class that allows access to outputs from all stages of registration at any point in the pipeline; (2) the ability of the framework to eliminate duplicate stages; (3) reusable, easy to subclass modules; (4) a development toolkit written for non-developers; (5) four complete applications that run complex image registration pipelines “out-of-the-box.” In this paper, we will discuss both the general Pydpiper framework and the various ways in which component modules can be pieced together to easily create new registration pipelines. This will include a discussion of the core principles motivating code development and a comparison of Pydpiper with other available toolkits. We also provide a comprehensive, line-by-line example to orient users with limited programming knowledge and highlight some of the most useful features of Pydpiper. In addition, we will present the four current applications of the code. PMID:25126069
SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

PubMed Central

2014-01-01

Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894
Exact Bayesian Inference for Phylogenetic Birth-Death Models.

PubMed

Parag, K V; Pybus, O G

2018-04-26

Inferring the rates of change of a population from a reconstructed phylogeny of genetic sequences is a central problem in macro-evolutionary biology, epidemiology, and many other disciplines. A popular solution involves estimating the parameters of a birth-death process (BDP), which links the shape of the phylogeny to its birth and death rates. Modern BDP estimators rely on random Markov chain Monte Carlo (MCMC) sampling to infer these rates. Such methods, while powerful and scalable, cannot be guaranteed to converge, leading to results that may be hard to replicate or difficult to validate. We present a conceptually and computationally different parametric BDP inference approach using flexible and easy to implement Snyder filter (SF) algorithms. This method is deterministic so its results are provable, guaranteed, and reproducible. We validate the SF on constant rate BDPs and find that it solves BDP likelihoods known to produce robust estimates. We then examine more complex BDPs with time-varying rates. Our estimates compare well with a recently developed parametric MCMC inference method. Lastly, we performmodel selection on an empirical Agamid species phylogeny, obtaining results consistent with the literature. The SF makes no approximations, beyond those required for parameter quantisation and numerical integration, and directly computes the posterior distribution of model parameters. It is a promising alternative inference algorithm that may serve either as a standalone Bayesian estimator or as a useful diagnostic reference for validating more involved MCMC strategies. The Snyder filter is implemented in Matlab and the time-varying BDP models are simulated in R. The source code and data are freely available at https://github.com/kpzoo/snyder-birth-death-code. kris.parag@zoo.ox.ac.uk. Supplementary material is available at Bioinformatics online.
Spatio Temporal EEG Source Imaging with the Hierarchical Bayesian Elastic Net and Elitist Lasso Models

PubMed Central

Paz-Linares, Deirel; Vega-Hernández, Mayrim; Rojas-López, Pedro A.; Valdés-Hernández, Pedro A.; Martínez-Montes, Eduardo; Valdés-Sosa, Pedro A.

2017-01-01

The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website. PMID:29200994
Spatio Temporal EEG Source Imaging with the Hierarchical Bayesian Elastic Net and Elitist Lasso Models.

PubMed

Paz-Linares, Deirel; Vega-Hernández, Mayrim; Rojas-López, Pedro A; Valdés-Hernández, Pedro A; Martínez-Montes, Eduardo; Valdés-Sosa, Pedro A

2017-01-01

The estimation of EEG generating sources constitutes an Inverse Problem (IP) in Neuroscience. This is an ill-posed problem due to the non-uniqueness of the solution and regularization or prior information is needed to undertake Electrophysiology Source Imaging. Structured Sparsity priors can be attained through combinations of (L1 norm-based) and (L2 norm-based) constraints such as the Elastic Net (ENET) and Elitist Lasso (ELASSO) models. The former model is used to find solutions with a small number of smooth nonzero patches, while the latter imposes different degrees of sparsity simultaneously along different dimensions of the spatio-temporal matrix solutions. Both models have been addressed within the penalized regression approach, where the regularization parameters are selected heuristically, leading usually to non-optimal and computationally expensive solutions. The existing Bayesian formulation of ENET allows hyperparameter learning, but using the computationally intensive Monte Carlo/Expectation Maximization methods, which makes impractical its application to the EEG IP. While the ELASSO have not been considered before into the Bayesian context. In this work, we attempt to solve the EEG IP using a Bayesian framework for ENET and ELASSO models. We propose a Structured Sparse Bayesian Learning algorithm based on combining the Empirical Bayes and the iterative coordinate descent procedures to estimate both the parameters and hyperparameters. Using realistic simulations and avoiding the inverse crime we illustrate that our methods are able to recover complicated source setups more accurately and with a more robust estimation of the hyperparameters and behavior under different sparsity scenarios than classical LORETA, ENET and LASSO Fusion solutions. We also solve the EEG IP using data from a visual attention experiment, finding more interpretable neurophysiological patterns with our methods. The Matlab codes used in this work, including Simulations, Methods, Quality Measures and Visualization Routines are freely available in a public website.
Process Model Improvement for Source Code Plagiarism Detection in Student Programming Assignments

ERIC Educational Resources Information Center

Kermek, Dragutin; Novak, Matija

2016-01-01

In programming courses there are various ways in which students attempt to cheat. The most commonly used method is copying source code from other students and making minimal changes in it, like renaming variable names. Several tools like Sherlock, JPlag and Moss have been devised to detect source code plagiarism. However, for larger student…
Finding Resolution for the Responsible Transparency of Economic Models in Health and Medicine.

PubMed

Padula, William V; McQueen, Robert Brett; Pronovost, Peter J

2017-11-01

The Second Panel on Cost-Effectiveness in Health and Medicine recommendations for conduct, methodological practices, and reporting of cost-effectiveness analyses has a number of questions unanswered with respect to the implementation of transparent, open source code interface for economic models. The possibility of making economic model source code could be positive and progressive for the field; however, several unintended consequences of this system should be first considered before complete implementation of this model. First, there is the concern regarding intellectual property rights that modelers have to their analyses. Second, the open source code could make analyses more accessible to inexperienced modelers, leading to inaccurate or misinterpreted results. We propose several resolutions to these concerns. The field should establish a licensing system of open source code such that the model originators maintain control of the code use and grant permissions to other investigators who wish to use it. The field should also be more forthcoming towards the teaching of cost-effectiveness analysis in medical and health services education so that providers and other professionals are familiar with economic modeling and able to conduct analyses with open source code. These types of unintended consequences need to be fully considered before the field's preparedness to move forward into an era of model transparency with open source code.
OpenCFU, a new free and open-source software to count cell colonies and other circular objects.

PubMed

Geissmann, Quentin

2013-01-01

Counting circular objects such as cell colonies is an important source of information for biologists. Although this task is often time-consuming and subjective, it is still predominantly performed manually. The aim of the present work is to provide a new tool to enumerate circular objects from digital pictures and video streams. Here, I demonstrate that the created program, OpenCFU, is very robust, accurate and fast. In addition, it provides control over the processing parameters and is implemented in an intuitive and modern interface. OpenCFU is a cross-platform and open-source software freely available at http://opencfu.sourceforge.net.
Freeing Crop Genetics through the Open Source Seed Initiative

PubMed Central

Luby, Claire H.; Goldman, Irwin L.

2016-01-01

For millennia, seeds have been freely available to use for farming and plant breeding without restriction. Within the past century, however, intellectual property rights (IPRs) have threatened this tradition. In response, a movement has emerged to counter the trend toward increasing consolidation of control and ownership of plant germplasm. One effort, the Open Source Seed Initiative (OSSI, www.osseeds.org), aims to ensure access to crop genetic resources by embracing an open source mechanism that fosters exchange and innovation among farmers, plant breeders, and seed companies. Plant breeders across many sectors have taken the OSSI Pledge to create a protected commons of plant germplasm for future generations. PMID:27093567
Freeing Crop Genetics through the Open Source Seed Initiative.

PubMed

Luby, Claire H; Goldman, Irwin L

2016-04-01

For millennia, seeds have been freely available to use for farming and plant breeding without restriction. Within the past century, however, intellectual property rights (IPRs) have threatened this tradition. In response, a movement has emerged to counter the trend toward increasing consolidation of control and ownership of plant germplasm. One effort, the Open Source Seed Initiative (OSSI, www.osseeds.org), aims to ensure access to crop genetic resources by embracing an open source mechanism that fosters exchange and innovation among farmers, plant breeders, and seed companies. Plant breeders across many sectors have taken the OSSI Pledge to create a protected commons of plant germplasm for future generations.
Evaluation of an open source tool for indexing and searching enterprise radiology and pathology reports

NASA Astrophysics Data System (ADS)

Kim, Woojin; Boonn, William

2010-03-01

Data mining of existing radiology and pathology reports within an enterprise health system can be used for clinical decision support, research, education, as well as operational analyses. In our health system, the database of radiology and pathology reports exceeds 13 million entries combined. We are building a web-based tool to allow search and data analysis of these combined databases using freely available and open source tools. This presentation will compare performance of an open source full-text indexing tool to MySQL's full-text indexing and searching and describe implementation procedures to incorporate these capabilities into a radiology-pathology search engine.
40 CFR 51.50 - What definitions apply to this subpart?

Code of Federal Regulations, 2010 CFR

2010-07-01

... accuracy description (MAD) codes means a set of six codes used to define the accuracy of latitude/longitude data for point sources. The six codes and their definitions are: (1) Coordinate Data Source Code: The... physical piece of or a closely related set of equipment. The EPA's reporting format for a given inventory...
A suite of MATLAB-based computational tools for automated analysis of COPAS Biosort data

PubMed Central

Morton, Elizabeth; Lamitina, Todd

2010-01-01

Complex Object Parametric Analyzer and Sorter (COPAS) devices are large-object, fluorescence-capable flow cytometers used for high-throughput analysis of live model organisms, including Drosophila melanogaster, Caenorhabditis elegans, and zebrafish. The COPAS is especially useful in C. elegans high-throughput genome-wide RNA interference (RNAi) screens that utilize fluorescent reporters. However, analysis of data from such screens is relatively labor-intensive and time-consuming. Currently, there are no computational tools available to facilitate high-throughput analysis of COPAS data. We used MATLAB to develop algorithms (COPAquant, COPAmulti, and COPAcompare) to analyze different types of COPAS data. COPAquant reads single-sample files, filters and extracts values and value ratios for each file, and then returns a summary of the data. COPAmulti reads 96-well autosampling files generated with the ReFLX adapter, performs sample filtering, graphs features across both wells and plates, performs some common statistical measures for hit identification, and outputs results in graphical formats. COPAcompare performs a correlation analysis between replicate 96-well plates. For many parameters, thresholds may be defined through a simple graphical user interface (GUI), allowing our algorithms to meet a variety of screening applications. In a screen for regulators of stress-inducible GFP expression, COPAquant dramatically accelerated data analysis and allowed us to rapidly move from raw data to hit identification. Because the COPAS file structure is standardized and our MATLAB code is freely available, our algorithms should be extremely useful for analysis of COPAS data from multiple platforms and organisms. The MATLAB code is freely available at our web site (www.med.upenn.edu/lamitinalab/downloads.shtml). PMID:20569218
The Astrophysics Source Code Library by the numbers

NASA Astrophysics Data System (ADS)

Allen, Alice; Teuben, Peter; Berriman, G. Bruce; DuPrie, Kimberly; Mink, Jessica; Nemiroff, Robert; Ryan, PW; Schmidt, Judy; Shamir, Lior; Shortridge, Keith; Wallin, John; Warmels, Rein

2018-01-01

The Astrophysics Source Code Library (ASCL, ascl.net) was founded in 1999 by Robert Nemiroff and John Wallin. ASCL editors seek both new and old peer-reviewed papers that describe methods or experiments that involve the development or use of source code, and add entries for the found codes to the library. Software authors can submit their codes to the ASCL as well. This ensures a comprehensive listing covering a significant number of the astrophysics source codes used in peer-reviewed studies. The ASCL is indexed by both NASA’s Astrophysics Data System (ADS) and Web of Science, making software used in research more discoverable. This presentation covers the growth in the ASCL’s number of entries, the number of citations to its entries, and in which journals those citations appear. It also discusses what changes have been made to the ASCL recently, and what its plans are for the future.
Astrophysics Source Code Library: Incite to Cite!

NASA Astrophysics Data System (ADS)

DuPrie, K.; Allen, A.; Berriman, B.; Hanisch, R. J.; Mink, J.; Nemiroff, R. J.; Shamir, L.; Shortridge, K.; Taylor, M. B.; Teuben, P.; Wallen, J. F.

2014-05-01

The Astrophysics Source Code Library (ASCl,http://ascl.net/) is an on-line registry of over 700 source codes that are of interest to astrophysicists, with more being added regularly. The ASCL actively seeks out codes as well as accepting submissions from the code authors, and all entries are citable and indexed by ADS. All codes have been used to generate results published in or submitted to a refereed journal and are available either via a download site or from an identified source. In addition to being the largest directory of scientist-written astrophysics programs available, the ASCL is also an active participant in the reproducible research movement with presentations at various conferences, numerous blog posts and a journal article. This poster provides a description of the ASCL and the changes that we are starting to see in the astrophysics community as a result of the work we are doing.
Astrophysics Source Code Library

NASA Astrophysics Data System (ADS)

Allen, A.; DuPrie, K.; Berriman, B.; Hanisch, R. J.; Mink, J.; Teuben, P. J.

2013-10-01

The Astrophysics Source Code Library (ASCL), founded in 1999, is a free on-line registry for source codes of interest to astronomers and astrophysicists. The library is housed on the discussion forum for Astronomy Picture of the Day (APOD) and can be accessed at http://ascl.net. The ASCL has a comprehensive listing that covers a significant number of the astrophysics source codes used to generate results published in or submitted to refereed journals and continues to grow. The ASCL currently has entries for over 500 codes; its records are citable and are indexed by ADS. The editors of the ASCL and members of its Advisory Committee were on hand at a demonstration table in the ADASS poster room to present the ASCL, accept code submissions, show how the ASCL is starting to be used by the astrophysics community, and take questions on and suggestions for improving the resource.
Generating code adapted for interlinking legacy scalar code and extended vector code

DOEpatents

Gschwind, Michael K

2013-06-04

Mechanisms for intermixing code are provided. Source code is received for compilation using an extended Application Binary Interface (ABI) that extends a legacy ABI and uses a different register configuration than the legacy ABI. First compiled code is generated based on the source code, the first compiled code comprising code for accommodating the difference in register configurations used by the extended ABI and the legacy ABI. The first compiled code and second compiled code are intermixed to generate intermixed code, the second compiled code being compiled code that uses the legacy ABI. The intermixed code comprises at least one call instruction that is one of a call from the first compiled code to the second compiled code or a call from the second compiled code to the first compiled code. The code for accommodating the difference in register configurations is associated with the at least one call instruction.
Optimal power allocation and joint source-channel coding for wireless DS-CDMA visual sensor networks

NASA Astrophysics Data System (ADS)

Pandremmenou, Katerina; Kondi, Lisimachos P.; Parsopoulos, Konstantinos E.

2011-01-01

In this paper, we propose a scheme for the optimal allocation of power, source coding rate, and channel coding rate for each of the nodes of a wireless Direct Sequence Code Division Multiple Access (DS-CDMA) visual sensor network. The optimization is quality-driven, i.e. the received quality of the video that is transmitted by the nodes is optimized. The scheme takes into account the fact that the sensor nodes may be imaging scenes with varying levels of motion. Nodes that image low-motion scenes will require a lower source coding rate, so they will be able to allocate a greater portion of the total available bit rate to channel coding. Stronger channel coding will mean that such nodes will be able to transmit at lower power. This will both increase battery life and reduce interference to other nodes. Two optimization criteria are considered. One that minimizes the average video distortion of the nodes and one that minimizes the maximum distortion among the nodes. The transmission powers are allowed to take continuous values, whereas the source and channel coding rates can assume only discrete values. Thus, the resulting optimization problem lies in the field of mixed-integer optimization tasks and is solved using Particle Swarm Optimization. Our experimental results show the importance of considering the characteristics of the video sequences when determining the transmission power, source coding rate and channel coding rate for the nodes of the visual sensor network.
mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets.

PubMed

Dalke, Andrew; Hert, Jérôme; Kramer, Christian

2018-05-29

Matched molecular pair analysis (MMPA) enables the automated and systematic compilation of medicinal chemistry rules from compound/property data sets. Here we present mmpdb, an open-source matched molecular pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large data sets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. It is freely available from https://github.com/rdkit/mmpdb .

Some links on this page may take you to non-federal websites. Their policies may differ from this site.