NASA Astrophysics Data System (ADS)
Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun
2017-12-01
Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure-property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure-property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure-property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials.
Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun
2017-01-01
Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure-property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure-property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure-property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials.
Accelerating the design of solar thermal fuel materials through high throughput simulations.
Liu, Yun; Grossman, Jeffrey C
2014-12-10
Solar thermal fuels (STF) store the energy of sunlight, which can then be released later in the form of heat, offering an emission-free and renewable solution for both solar energy conversion and storage. However, this approach is currently limited by the lack of low-cost materials with high energy density and high stability. In this Letter, we present an ab initio high-throughput computational approach to accelerate the design process and allow for searches over a broad class of materials. The high-throughput screening platform we have developed can run through large numbers of molecules composed of earth-abundant elements and identifies possible metastable structures of a given material. Corresponding isomerization enthalpies associated with the metastable structures are then computed. Using this high-throughput simulation approach, we have discovered molecular structures with high isomerization enthalpies that have the potential to be new candidates for high-energy density STF. We have also discovered physical principles to guide further STF materials design through structural analysis. More broadly, our results illustrate the potential of using high-throughput ab initio simulations to design materials that undergo targeted structural transitions.
Accelerating the Design of Solar Thermal Fuel Materials through High Throughput Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Y; Grossman, JC
2014-12-01
Solar thermal fuels (STF) store the energy of sunlight, which can then be released later in the form of heat, offering an emission-free and renewable solution for both solar energy conversion and storage. However, this approach is currently limited by the lack of low-cost materials with high energy density and high stability. In this Letter, we present an ab initio high-throughput computational approach to accelerate the design process and allow for searches over a broad class of materials. The high-throughput screening platform we have developed can run through large numbers of molecules composed of earth-abundant elements and identifies possible metastablemore » structures of a given material. Corresponding isomerization enthalpies associated with the metastable structures are then computed. Using this high-throughput simulation approach, we have discovered molecular structures with high isomerization enthalpies that have the potential to be new candidates for high-energy density STF. We have also discovered physical principles to guide further STF materials design through structural analysis. More broadly, our results illustrate the potential of using high-throughput ab initio simulations to design materials that undergo targeted structural transitions.« less
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-08-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-01-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089
High-throughput determination of RNA structure by proximity ligation.
Ramani, Vijay; Qiu, Ruolan; Shendure, Jay
2015-09-01
We present an unbiased method to globally resolve RNA structures through pairwise contact measurements between interacting regions. RNA proximity ligation (RPL) uses proximity ligation of native RNA followed by deep sequencing to yield chimeric reads with ligation junctions in the vicinity of structurally proximate bases. We apply RPL in both baker's yeast (Saccharomyces cerevisiae) and human cells and generate contact probability maps for ribosomal and other abundant RNAs, including yeast snoRNAs, the RNA subunit of the signal recognition particle and the yeast U2 spliceosomal RNA homolog. RPL measurements correlate with established secondary structures for these RNA molecules, including stem-loop structures and long-range pseudoknots. We anticipate that RPL will complement the current repertoire of computational and experimental approaches in enabling the high-throughput determination of secondary and tertiary RNA structures.
Combinatorial and High Throughput Discovery of High Temperature Piezoelectric Ceramics
2011-10-10
the known candidate piezoelectric ferroelectric perovskites. Unlike most computational studies on crystal chemistry, where the starting point is some...studies on crystal chemistry, where the starting point is some form of electronic structure calculation, we use a data driven approach to initiate our...experimental measurements reported in the literature. Given that our models are based solely on crystal and electronic structure data and did not
High-Throughput Thermodynamic Modeling and Uncertainty Quantification for ICME
NASA Astrophysics Data System (ADS)
Otis, Richard A.; Liu, Zi-Kui
2017-05-01
One foundational component of the integrated computational materials engineering (ICME) and Materials Genome Initiative is the computational thermodynamics based on the calculation of phase diagrams (CALPHAD) method. The CALPHAD method pioneered by Kaufman has enabled the development of thermodynamic, atomic mobility, and molar volume databases of individual phases in the full space of temperature, composition, and sometimes pressure for technologically important multicomponent engineering materials, along with sophisticated computational tools for using the databases. In this article, our recent efforts will be presented in terms of developing new computational tools for high-throughput modeling and uncertainty quantification based on high-throughput, first-principles calculations and the CALPHAD method along with their potential propagations to downstream ICME modeling and simulations.
Yun, Kyungwon; Lee, Hyunjae; Bang, Hyunwoo; Jeon, Noo Li
2016-02-21
This study proposes a novel way to achieve high-throughput image acquisition based on a computer-recognizable micro-pattern implemented on a microfluidic device. We integrated the QR code, a two-dimensional barcode system, onto the microfluidic device to simplify imaging of multiple ROIs (regions of interest). A standard QR code pattern was modified to arrays of cylindrical structures of polydimethylsiloxane (PDMS). Utilizing the recognition of the micro-pattern, the proposed system enables: (1) device identification, which allows referencing additional information of the device, such as device imaging sequences or the ROIs and (2) composing a coordinate system for an arbitrarily located microfluidic device with respect to the stage. Based on these functionalities, the proposed method performs one-step high-throughput imaging for data acquisition in microfluidic devices without further manual exploration and locating of the desired ROIs. In our experience, the proposed method significantly reduced the time for the preparation of an acquisition. We expect that the method will innovatively improve the prototype device data acquisition and analysis.
Computational Methods in Drug Discovery
Sliwoski, Gregory; Kothiwale, Sandeepkumar; Meiler, Jens
2014-01-01
Computer-aided drug discovery/design methods have played a major role in the development of therapeutically important small molecules for over three decades. These methods are broadly classified as either structure-based or ligand-based methods. Structure-based methods are in principle analogous to high-throughput screening in that both target and ligand structure information is imperative. Structure-based approaches include ligand docking, pharmacophore, and ligand design methods. The article discusses theory behind the most important methods and recent successful applications. Ligand-based methods use only ligand information for predicting activity depending on its similarity/dissimilarity to previously known active ligands. We review widely used ligand-based methods such as ligand-based pharmacophores, molecular descriptors, and quantitative structure-activity relationships. In addition, important tools such as target/ligand data bases, homology modeling, ligand fingerprint methods, etc., necessary for successful implementation of various computer-aided drug discovery/design methods in a drug discovery campaign are discussed. Finally, computational methods for toxicity prediction and optimization for favorable physiologic properties are discussed with successful examples from literature. PMID:24381236
A high-throughput approach to profile RNA structure.
Delli Ponti, Riccardo; Marti, Stefanie; Armaos, Alexandros; Tartaglia, Gian Gaetano
2017-03-17
Here we introduce the Computational Recognition of Secondary Structure (CROSS) method to calculate the structural profile of an RNA sequence (single- or double-stranded state) at single-nucleotide resolution and without sequence length restrictions. We trained CROSS using data from high-throughput experiments such as Selective 2΄-Hydroxyl Acylation analyzed by Primer Extension (SHAPE; Mouse and HIV transcriptomes) and Parallel Analysis of RNA Structure (PARS; Human and Yeast transcriptomes) as well as high-quality NMR/X-ray structures (PDB database). The algorithm uses primary structure information alone to predict experimental structural profiles with >80% accuracy, showing high performances on large RNAs such as Xist (17 900 nucleotides; Area Under the ROC Curve AUC of 0.75 on dimethyl sulfate (DMS) experiments). We integrated CROSS in thermodynamics-based methods to predict secondary structure and observed an increase in their predictive power by up to 30%. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Masoudi-Nejad, Ali; Asgari, Yazdan
2015-02-01
The cancer cell metabolism or the Warburg effect discovery goes back to 1924 when, for the first time Otto Warburg observed, in contrast to the normal cells, cancer cells have different metabolism. With the initiation of high throughput technologies and computational systems biology, cancer cell metabolism renaissances and many attempts were performed to revise the Warburg effect. The development of experimental and analytical tools which generate high-throughput biological data including lots of information could lead to application of computational models in biological discovery and clinical medicine especially for cancer. Due to the recent availability of tissue-specific reconstructed models, new opportunities in studying metabolic alteration in various kinds of cancers open up. Structural approaches at genome-scale levels seem to be suitable for developing diagnostic and prognostic molecular signatures, as well as in identifying new drug targets. In this review, we have considered these recent advances in structural-based analysis of cancer as a metabolic disease view. Two different structural approaches have been described here: topological and constraint-based methods. The ultimate goal of this type of systems analysis is not only the discovery of novel drug targets but also the development of new systems-based therapy strategies. Copyright © 2014 Elsevier Ltd. All rights reserved.
MrGrid: A Portable Grid Based Molecular Replacement Pipeline
Reboul, Cyril F.; Androulakis, Steve G.; Phan, Jennifer M. N.; Whisstock, James C.; Goscinski, Wojtek J.; Abramson, David; Buckle, Ashley M.
2010-01-01
Background The crystallographic determination of protein structures can be computationally demanding and for difficult cases can benefit from user-friendly interfaces to high-performance computing resources. Molecular replacement (MR) is a popular protein crystallographic technique that exploits the structural similarity between proteins that share some sequence similarity. But the need to trial permutations of search models, space group symmetries and other parameters makes MR time- and labour-intensive. However, MR calculations are embarrassingly parallel and thus ideally suited to distributed computing. In order to address this problem we have developed MrGrid, web-based software that allows multiple MR calculations to be executed across a grid of networked computers, allowing high-throughput MR. Methodology/Principal Findings MrGrid is a portable web based application written in Java/JSP and Ruby, and taking advantage of Apple Xgrid technology. Designed to interface with a user defined Xgrid resource the package manages the distribution of multiple MR runs to the available nodes on the Xgrid. We evaluated MrGrid using 10 different protein test cases on a network of 13 computers, and achieved an average speed up factor of 5.69. Conclusions MrGrid enables the user to retrieve and manage the results of tens to hundreds of MR calculations quickly and via a single web interface, as well as broadening the range of strategies that can be attempted. This high-throughput approach allows parameter sweeps to be performed in parallel, improving the chances of MR success. PMID:20386612
High quality chemical structure inventories provide the foundation of the U.S. EPA’s ToxCast and Tox21 projects, which are employing high-throughput technologies to screen thousands of chemicals in hundreds of biochemical and cell-based assays, probing a wide diversity of targets...
G-LoSA for Prediction of Protein-Ligand Binding Sites and Structures.
Lee, Hui Sun; Im, Wonpil
2017-01-01
Recent advances in high-throughput structure determination and computational protein structure prediction have significantly enriched the universe of protein structure. However, there is still a large gap between the number of available protein structures and that of proteins with annotated function in high accuracy. Computational structure-based protein function prediction has emerged to reduce this knowledge gap. The identification of a ligand binding site and its structure is critical to the determination of a protein's molecular function. We present a computational methodology for predicting small molecule ligand binding site and ligand structure using G-LoSA, our protein local structure alignment and similarity measurement tool. All the computational procedures described here can be easily implemented using G-LoSA Toolkit, a package of standalone software programs and preprocessed PDB structure libraries. G-LoSA and G-LoSA Toolkit are freely available to academic users at http://compbio.lehigh.edu/GLoSA . We also illustrate a case study to show the potential of our template-based approach harnessing G-LoSA for protein function prediction.
AELAS: Automatic ELAStic property derivations via high-throughput first-principles computation
NASA Astrophysics Data System (ADS)
Zhang, S. H.; Zhang, R. F.
2017-11-01
The elastic properties are fundamental and important for crystalline materials as they relate to other mechanical properties, various thermodynamic qualities as well as some critical physical properties. However, a complete set of experimentally determined elastic properties is only available for a small subset of known materials, and an automatic scheme for the derivations of elastic properties that is adapted to high-throughput computation is much demanding. In this paper, we present the AELAS code, an automated program for calculating second-order elastic constants of both two-dimensional and three-dimensional single crystal materials with any symmetry, which is designed mainly for high-throughput first-principles computation. Other derivations of general elastic properties such as Young's, bulk and shear moduli as well as Poisson's ratio of polycrystal materials, Pugh ratio, Cauchy pressure, elastic anisotropy and elastic stability criterion, are also implemented in this code. The implementation of the code has been critically validated by a lot of evaluations and tests on a broad class of materials including two-dimensional and three-dimensional materials, providing its efficiency and capability for high-throughput screening of specific materials with targeted mechanical properties. Program Files doi:http://dx.doi.org/10.17632/f8fwg4j9tw.1 Licensing provisions: BSD 3-Clause Programming language: Fortran Nature of problem: To automate the calculations of second-order elastic constants and the derivations of other elastic properties for two-dimensional and three-dimensional materials with any symmetry via high-throughput first-principles computation. Solution method: The space-group number is firstly determined by the SPGLIB code [1] and the structure is then redefined to unit cell with IEEE-format [2]. Secondly, based on the determined space group number, a set of distortion modes is automatically specified and the distorted structure files are generated. Afterwards, the total energy for each distorted structure is calculated by the first-principles codes, e.g. VASP [3]. Finally, the second-order elastic constants are determined from the quadratic coefficients of the polynomial fitting of the energies vs strain relationships and other elastic properties are accordingly derived. References [1] http://atztogo.github.io/spglib/. [2] A. Meitzler, H.F. Tiersten, A.W. Warner, D. Berlincourt, G.A. Couqin, F.S. Welsh III, IEEE standard on piezoelectricity, Society, 1988. [3] G. Kresse, J. Furthmüller, Phys. Rev. B 54 (1996) 11169.
Das, Abhiram; Schneider, Hannah; Burridge, James; Ascanio, Ana Karine Martinez; Wojciechowski, Tobias; Topp, Christopher N; Lynch, Jonathan P; Weitz, Joshua S; Bucksch, Alexander
2015-01-01
Plant root systems are key drivers of plant function and yield. They are also under-explored targets to meet global food and energy demands. Many new technologies have been developed to characterize crop root system architecture (CRSA). These technologies have the potential to accelerate the progress in understanding the genetic control and environmental response of CRSA. Putting this potential into practice requires new methods and algorithms to analyze CRSA in digital images. Most prior approaches have solely focused on the estimation of root traits from images, yet no integrated platform exists that allows easy and intuitive access to trait extraction and analysis methods from images combined with storage solutions linked to metadata. Automated high-throughput phenotyping methods are increasingly used in laboratory-based efforts to link plant genotype with phenotype, whereas similar field-based studies remain predominantly manual low-throughput. Here, we present an open-source phenomics platform "DIRT", as a means to integrate scalable supercomputing architectures into field experiments and analysis pipelines. DIRT is an online platform that enables researchers to store images of plant roots, measure dicot and monocot root traits under field conditions, and share data and results within collaborative teams and the broader community. The DIRT platform seamlessly connects end-users with large-scale compute "commons" enabling the estimation and analysis of root phenotypes from field experiments of unprecedented size. DIRT is an automated high-throughput computing and collaboration platform for field based crop root phenomics. The platform is accessible at http://www.dirt.iplantcollaborative.org/ and hosted on the iPlant cyber-infrastructure using high-throughput grid computing resources of the Texas Advanced Computing Center (TACC). DIRT is a high volume central depository and high-throughput RSA trait computation platform for plant scientists working on crop roots. It enables scientists to store, manage and share crop root images with metadata and compute RSA traits from thousands of images in parallel. It makes high-throughput RSA trait computation available to the community with just a few button clicks. As such it enables plant scientists to spend more time on science rather than on technology. All stored and computed data is easily accessible to the public and broader scientific community. We hope that easy data accessibility will attract new tool developers and spur creative data usage that may even be applied to other fields of science.
Theoretical Investigation of oxides for batteries and fuel cell applications
NASA Astrophysics Data System (ADS)
Ganesh, Panchapakesan; Lubimtsev, Andrew A.; Balachandran, Janakiraman
I will present theoretical studies of Li-ion and proton-conducting oxides using a combination of theory and computations that involve Density Functional Theory based atomistic modeling, cluster-expansion based studies, global optimization, high-throughput computations and machine learning based investigation of ionic transport in oxide materials. In Li-ion intercalated oxides, we explain the experimentally observed (Nature Materials 12, 518-522 (2013)) 'intercalation pseudocapacitance' phenomenon, and explain why Nb2O5 is special to show this behavior when Li-ions are intercalated (J. Mater. Chem. A, 2013,1, 14951-14956), but not when Na-ions are used. In addition, we explore Li-ion intercalation theoretically in VO2 (B) phase, which is somewhat structurally similar to Nb2O5 and predict an interesting role of site-trapping on the voltage and capacity of the material, validated by ongoing experiments. Computations of proton conducting oxides explain why Y-doped BaZrO3 , one of the fastest proton conducting oxide, shows a decrease in conductivity above 20% Y-doping. Further, using high throughput computations and machine learning tools we discover general principles to improve proton conductivity. Acknowledgements: LDRD at ORNL and CNMS at ORNL
Argueta, Edwin; Shaji, Jeena; Gopalan, Arun; Liao, Peilin; Snurr, Randall Q; Gómez-Gualdrón, Diego A
2018-01-09
Metal-organic frameworks (MOFs) are porous crystalline materials with attractive properties for gas separation and storage. Their remarkable tunability makes it possible to create millions of MOF variations but creates the need for fast material screening to identify promising structures. Computational high-throughput screening (HTS) is a possible solution, but its usefulness is tied to accurate predictions of MOF adsorption properties. Accurate adsorption simulations often require an accurate description of electrostatic interactions, which depend on the electronic charges of the MOF atoms. HTS-compatible methods to assign charges to MOF atoms need to accurately reproduce electrostatic potentials (ESPs) and be computationally affordable, but current methods present an unsatisfactory trade-off between computational cost and accuracy. We illustrate a method to assign charges to MOF atoms based on ab initio calculations on MOF molecular building blocks. A library of building blocks with built-in charges is thus created and used by an automated MOF construction code to create hundreds of MOFs with charges "inherited" from the constituent building blocks. The molecular building block-based (MBBB) charges are similar to REPEAT charges-which are charges that reproduce ESPs obtained from ab initio calculations on crystallographic unit cells of nanoporous crystals-and thus similar predictions of adsorption loadings, heats of adsorption, and Henry's constants are obtained with either method. The presented results indicate that the MBBB method to assign charges to MOF atoms is suitable for use in computational high-throughput screening of MOFs for applications that involve adsorption of molecules such as carbon dioxide.
NASA Astrophysics Data System (ADS)
Gómez-Bombarelli, Rafael; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.; Ha, Dong-Gwang; Einzinger, Markus; Wu, Tony; Baldo, Marc A.; Aspuru-Guzik, Alán.
2016-09-01
Discovering new OLED emitters requires many experiments to synthesize candidates and test performance in devices. Large scale computer simulation can greatly speed this search process but the problem remains challenging enough that brute force application of massive computing power is not enough to successfully identify novel structures. We report a successful High Throughput Virtual Screening study that leveraged a range of methods to optimize the search process. The generation of candidate structures was constrained to contain combinatorial explosion. Simulations were tuned to the specific problem and calibrated with experimental results. Experimentalists and theorists actively collaborated such that experimental feedback was regularly utilized to update and shape the computational search. Supervised machine learning methods prioritized candidate structures prior to quantum chemistry simulation to prevent wasting compute on likely poor performers. With this combination of techniques, each multiplying the strength of the search, this effort managed to navigate an area of molecular space and identify hundreds of promising OLED candidate structures. An experimentally validated selection of this set shows emitters with external quantum efficiencies as high as 22%.
Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing
NASA Astrophysics Data System (ADS)
Bazhirov, Timur; Mohammadi, Mohammad; Ding, Kevin; Barabash, Sergey
Recent advances in cloud computing made it possible to access large-scale computational resources completely on-demand in a rapid and efficient manner. When combined with high fidelity simulations, they serve as an alternative pathway to enable computational discovery and design of new materials through large-scale high-throughput screening. Here, we present a case study for a cloud platform implemented at Exabyte Inc. We perform calculations to screen lightweight ternary alloys for thermodynamic stability. Due to the lack of experimental data for most such systems, we rely on theoretical approaches based on first-principle pseudopotential density functional theory. We calculate the formation energies for a set of ternary compounds approximated by special quasirandom structures. During an example run we were able to scale to 10,656 CPUs within 7 minutes from the start, and obtain results for 296 compounds within 38 hours. The results indicate that the ultimate formation enthalpy of ternary systems can be negative for some of lightweight alloys, including Li and Mg compounds. We conclude that compared to traditional capital-intensive approach that requires in on-premises hardware resources, cloud computing is agile and cost-effective, yet scalable and delivers similar performance.
Condor-COPASI: high-throughput computing for biochemical networks
2012-01-01
Background Mathematical modelling has become a standard technique to improve our understanding of complex biological systems. As models become larger and more complex, simulations and analyses require increasing amounts of computational power. Clusters of computers in a high-throughput computing environment can help to provide the resources required for computationally expensive model analysis. However, exploiting such a system can be difficult for users without the necessary expertise. Results We present Condor-COPASI, a server-based software tool that integrates COPASI, a biological pathway simulation tool, with Condor, a high-throughput computing environment. Condor-COPASI provides a web-based interface, which makes it extremely easy for a user to run a number of model simulation and analysis tasks in parallel. Tasks are transparently split into smaller parts, and submitted for execution on a Condor pool. Result output is presented to the user in a number of formats, including tables and interactive graphical displays. Conclusions Condor-COPASI can effectively use a Condor high-throughput computing environment to provide significant gains in performance for a number of model simulation and analysis tasks. Condor-COPASI is free, open source software, released under the Artistic License 2.0, and is suitable for use by any institution with access to a Condor pool. Source code is freely available for download at http://code.google.com/p/condor-copasi/, along with full instructions on deployment and usage. PMID:22834945
Hattrick-Simpers, Jason R.; Gregoire, John M.; Kusne, A. Gilad
2016-05-26
With their ability to rapidly elucidate composition-structure-property relationships, high-throughput experimental studies have revolutionized how materials are discovered, optimized, and commercialized. It is now possible to synthesize and characterize high-throughput libraries that systematically address thousands of individual cuts of fabrication parameter space. An unresolved issue remains transforming structural characterization data into phase mappings. This difficulty is related to the complex information present in diffraction and spectroscopic data and its variation with composition and processing. Here, we review the field of automated phase diagram attribution and discuss the impact that emerging computational approaches will have in the generation of phase diagrams andmore » beyond.« less
NASA Astrophysics Data System (ADS)
Buongiorno Nardelli, Marco
High-Throughput Quantum-Mechanics computation of materials properties by ab initio methods has become the foundation of an effective approach to materials design, discovery and characterization. This data driven approach to materials science currently presents the most promising path to the development of advanced technological materials that could solve or mitigate important social and economic challenges of the 21st century. In particular, the rapid proliferation of computational data on materials properties presents the possibility to complement and extend materials property databases where the experimental data is lacking and difficult to obtain. Enhanced repositories such as AFLOWLIB open novel opportunities for structure discovery and optimization, including uncovering of unsuspected compounds, metastable structures and correlations between various properties. The practical realization of these opportunities depends almost exclusively on the the design of efficient algorithms for electronic structure simulations of realistic material systems beyond the limitations of the current standard theories. In this talk, I will review recent progress in theoretical and computational tools, and in particular, discuss the development and validation of novel functionals within Density Functional Theory and of local basis representations for effective ab-initio tight-binding schemes. Marco Buongiorno Nardelli is a pioneer in the development of computational platforms for theory/data/applications integration rooted in his profound and extensive expertise in the design of electronic structure codes and in his vision for sustainable and innovative software development for high-performance materials simulations. His research activities range from the design and discovery of novel materials for 21st century applications in renewable energy, environment, nano-electronics and devices, the development of advanced electronic structure theories and high-throughput techniques in materials genomics and computational materials design, to an active role as community scientific software developer (QUANTUM ESPRESSO, WanT, AFLOWpi)
New Toxico-Cheminformatics & Computational Toxicology ...
EPA’s National Center for Computational Toxicology is building capabilities to support a new paradigm for toxicity screening and prediction. The DSSTox project is improving public access to quality structure-annotated chemical toxicity information in less summarized forms than traditionally employed in SAR modeling, and in ways that facilitate data-mining, and data read-across. The DSSTox Structure-Browser provides structure searchability across all published DSSTox toxicity-related inventory, and is enabling linkages between previously isolated toxicity data resources. As of early March 2008, the public DSSTox inventory has been integrated into PubChem, allowing a user to take full advantage of PubChem structure-activity and bioassay clustering features. The most recent DSSTox version of the Carcinogenic Potency Database file (CPDBAS) illustrates ways in which various summary definitions of carcinogenic activity can be employed in modeling and data mining. Phase I of the ToxCastTM project is generating high-throughput screening data from several hundred biochemical and cell-based assays for a set of 320 chemicals, mostly pesticide actives, with rich toxicology profiles. Incorporating and expanding traditional SAR concepts into this new high-throughput and data-rich world pose conceptual and practical challenges, but also holds great promise for improving predictive capabilities.
Efficient and accurate adverse outcome pathway (AOP) based high-throughput screening (HTS) methods use a systems biology based approach to computationally model in vitro cellular and molecular data for rapid chemical prioritization; however, not all HTS assays are grounded by rel...
NASA Astrophysics Data System (ADS)
Zhang, Ruizhi; Du, Baoli; Chen, Kan; Reece, Mike; Materials Research Insititute Team
With the increasing computational power and reliable databases, high-throughput screening is playing a more and more important role in the search of new thermoelectric materials. Rather than the well established density functional theory (DFT) calculation based methods, we propose an alternative approach to screen for new TE materials: using crystal structural features as 'descriptors'. We show that a non-distorted transition metal sulphide polyhedral network can be a good descriptor for high power factor according to crystal filed theory. By using Cu/S containing compounds as an example, 1600+ Cu/S containing entries in the Inorganic Crystal Structure Database (ICSD) were screened, and of those 84 phases are identified as promising thermoelectric materials. The screening results are validated by both electronic structure calculations and experimental results from the literature. We also fabricated some new compounds to test our screening results. Another advantage of using crystal structure features as descriptors is that we can easily establish structural relationships between the identified phases. Based on this, two material design approaches are discussed: 1) High-pressure synthesis of metastable phase; 2) In-situ 2-phase composites with coherent interface. This work was supported by a Marie Curie International Incoming Fellowship of the European Community Human Potential Program.
High-throughput determination of structural phase diagram and constituent phases using GRENDEL
NASA Astrophysics Data System (ADS)
Kusne, A. G.; Keller, D.; Anderson, A.; Zaban, A.; Takeuchi, I.
2015-11-01
Advances in high-throughput materials fabrication and characterization techniques have resulted in faster rates of data collection and rapidly growing volumes of experimental data. To convert this mass of information into actionable knowledge of material process-structure-property relationships requires high-throughput data analysis techniques. This work explores the use of the Graph-based endmember extraction and labeling (GRENDEL) algorithm as a high-throughput method for analyzing structural data from combinatorial libraries, specifically, to determine phase diagrams and constituent phases from both x-ray diffraction and Raman spectral data. The GRENDEL algorithm utilizes a set of physical constraints to optimize results and provides a framework by which additional physics-based constraints can be easily incorporated. GRENDEL also permits the integration of database data as shown by the use of critically evaluated data from the Inorganic Crystal Structure Database in the x-ray diffraction data analysis. Also the Sunburst radial tree map is demonstrated as a tool to visualize material structure-property relationships found through graph based analysis.
AOPs and Biomarkers: Bridging High Throughput Screening and Regulatory Decision Making
As high throughput screening (HTS) plays a larger role in toxicity testing, camputational toxicology has emerged as a critical component in interpreting the large volume of data produced. Computational models designed to quantify potential adverse effects based on HTS data will b...
Lee, Hyun; Mittal, Anuradha; Patel, Kavankumar; Gatuz, Joseph L; Truong, Lena; Torres, Jaime; Mulhearn, Debbie C; Johnson, Michael E
2014-01-01
We have used a combination of virtual screening (VS) and high-throughput screening (HTS) techniques to identify novel, non-peptidic small molecule inhibitors against human SARS-CoV 3CLpro. A structure-based VS approach integrating docking and pharmacophore based methods was employed to computationally screen 621,000 compounds from the ZINC library. The screening protocol was validated using known 3CLpro inhibitors and was optimized for speed, improved selectivity, and for accommodating receptor flexibility. Subsequently, a fluorescence-based enzymatic HTS assay was developed and optimized to experimentally screen approximately 41,000 compounds from four structurally diverse libraries chosen mainly based on the VS results. False positives from initial HTS hits were eliminated by a secondary orthogonal binding analysis using surface plasmon resonance (SPR). The campaign identified a reversible small molecule inhibitor exhibiting mixed-type inhibition with a K(i) value of 11.1 μM. Together, these results validate our protocols as suitable approaches to screen virtual and chemical libraries, and the newly identified compound reported in our study represents a promising structural scaffold to pursue for further SARS-CoV 3CLpro inhibitor development. Copyright © 2013. Published by Elsevier Ltd.
Wang, Youwei; Zhang, Wenqing; Chen, Lidong; Shi, Siqi; Liu, Jianjun
2017-01-01
Abstract Li-ion batteries are a key technology for addressing the global challenge of clean renewable energy and environment pollution. Their contemporary applications, for portable electronic devices, electric vehicles, and large-scale power grids, stimulate the development of high-performance battery materials with high energy density, high power, good safety, and long lifetime. High-throughput calculations provide a practical strategy to discover new battery materials and optimize currently known material performances. Most cathode materials screened by the previous high-throughput calculations cannot meet the requirement of practical applications because only capacity, voltage and volume change of bulk were considered. It is important to include more structure–property relationships, such as point defects, surface and interface, doping and metal-mixture and nanosize effects, in high-throughput calculations. In this review, we established quantitative description of structure–property relationships in Li-ion battery materials by the intrinsic bulk parameters, which can be applied in future high-throughput calculations to screen Li-ion battery materials. Based on these parameterized structure–property relationships, a possible high-throughput computational screening flow path is proposed to obtain high-performance battery materials. PMID:28458737
Use of High-Throughput Testing and Approaches for Evaluating Chemical Risk-Relevance to Humans
ToxCast is profiling the bioactivity of thousands of chemicals based on high-throughput screening (HTS) and computational models that integrate knowledge of biological systems and in vivo toxicities. Many of these assays probe signaling pathways and cellular processes critical to...
Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for...
EPA Project Updates: DSSTox and ToxCast Generating New ...
EPAs National Center for Computational Toxicology is building capabilities to support a new paradigm for toxicity screening and prediction. The DSSTox project is improving public access to quality structure-annotated chemical toxicity information in less summarized forms than traditionally employed in SAR modeling, and in ways that facilitate data-mining, and data read-across. The DSSTox Structure-Browser, launched in September 2007, provides structure searchability across all published DSSTox toxicity-related inventory, and is enabling linkages between previously isolated toxicity data resources. As of early March 2008, the public DSSTox inventory as been integrated into PubChem, allowing a user to take full advantage of PubChem structure-activity and bioassay clustering features. The most recent DSSTox version of Carcinogenic Potency Database file (CPDBAS) illustrates ways in which various summary definitions of carcinogenic activity can be employed in modeling and data mining. Phase I of the ToxCast project is generating high-throughput screening data from several hundred biochemical and cell-based assays for a set of 320 chemicals, mostly pesticide actives, with rich toxicology profiles. Incorporating and expanding traditional SAR Concepts into this new high-throughput and data-rich would pose conceptual and practical challenges, but also holds great promise for improving predictive capabilities. EPA's National Center for Computational Toxicology is bu
SeqAPASS to evaluate conservation of high-throughput screening targets across non-mammalian species
Cell-based high-throughput screening (HTS) and computational technologies are being applied as tools for toxicity testing in the 21st century. The U.S. Environmental Protection Agency (EPA) embraced these technologies and created the ToxCast Program in 2007, which has served as a...
DockoMatic: automated peptide analog creation for high throughput virtual screening.
Jacob, Reed B; Bullock, Casey W; Andersen, Tim; McDougal, Owen M
2011-10-01
The purpose of this manuscript is threefold: (1) to describe an update to DockoMatic that allows the user to generate cyclic peptide analog structure files based on protein database (pdb) files, (2) to test the accuracy of the peptide analog structure generation utility, and (3) to evaluate the high throughput capacity of DockoMatic. The DockoMatic graphical user interface interfaces with the software program Treepack to create user defined peptide analogs. To validate this approach, DockoMatic produced cyclic peptide analogs were tested for three-dimensional structure consistency and binding affinity against four experimentally determined peptide structure files available in the Research Collaboratory for Structural Bioinformatics database. The peptides used to evaluate this new functionality were alpha-conotoxins ImI, PnIA, and their published analogs. Peptide analogs were generated by DockoMatic and tested for their ability to bind to X-ray crystal structure models of the acetylcholine binding protein originating from Aplysia californica. The results, consisting of more than 300 simulations, demonstrate that DockoMatic predicts the binding energy of peptide structures to within 3.5 kcal mol(-1), and the orientation of bound ligand compares to within 1.8 Å root mean square deviation for ligand structures as compared to experimental data. Evaluation of high throughput virtual screening capacity demonstrated that Dockomatic can collect, evaluate, and summarize the output of 10,000 AutoDock jobs in less than 2 hours of computational time, while 100,000 jobs requires approximately 15 hours and 1,000,000 jobs is estimated to take up to a week. Copyright © 2011 Wiley Periodicals, Inc.
Aggregating Data for Computational Toxicology Applications ...
Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built usi
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hattrick-Simpers, Jason R.; Gregoire, John M.; Kusne, A. Gilad
With their ability to rapidly elucidate composition-structure-property relationships, high-throughput experimental studies have revolutionized how materials are discovered, optimized, and commercialized. It is now possible to synthesize and characterize high-throughput libraries that systematically address thousands of individual cuts of fabrication parameter space. An unresolved issue remains transforming structural characterization data into phase mappings. This difficulty is related to the complex information present in diffraction and spectroscopic data and its variation with composition and processing. Here, we review the field of automated phase diagram attribution and discuss the impact that emerging computational approaches will have in the generation of phase diagrams andmore » beyond.« less
Computational methods in drug discovery
Leelananda, Sumudu P
2016-01-01
The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein–ligand docking, pharmacophore modeling and QSAR techniques are reviewed. PMID:28144341
Computational methods in drug discovery.
Leelananda, Sumudu P; Lindert, Steffen
2016-01-01
The process for drug discovery and development is challenging, time consuming and expensive. Computer-aided drug discovery (CADD) tools can act as a virtual shortcut, assisting in the expedition of this long process and potentially reducing the cost of research and development. Today CADD has become an effective and indispensable tool in therapeutic development. The human genome project has made available a substantial amount of sequence data that can be used in various drug discovery projects. Additionally, increasing knowledge of biological structures, as well as increasing computer power have made it possible to use computational methods effectively in various phases of the drug discovery and development pipeline. The importance of in silico tools is greater than ever before and has advanced pharmaceutical research. Here we present an overview of computational methods used in different facets of drug discovery and highlight some of the recent successes. In this review, both structure-based and ligand-based drug discovery methods are discussed. Advances in virtual high-throughput screening, protein structure prediction methods, protein-ligand docking, pharmacophore modeling and QSAR techniques are reviewed.
Developing Hypothetical Inhibition Mechanism of Novel Urea Transporter B Inhibitor
NASA Astrophysics Data System (ADS)
Li, Min; Tou, Weng Ieong; Zhou, Hong; Li, Fei; Ren, Huiwen; Chen, Calvin Yu-Chian; Yang, Baoxue
2014-07-01
Urea transporter B (UT-B) is a membrane channel protein that specifically transports urea. UT-B null mouse exhibited urea selective urine concentrating ability deficiency, which suggests the potential clinical applications of the UT-B inhibitors as novel diuretics. Primary high-throughput virtual screening (HTVS) of 50000 small-molecular drug-like compounds identified 2319 hit compounds. These 2319 compounds were screened by high-throughput screening using an erythrocyte osmotic lysis assay. Based on the pharmacological data, putative UT-B binding sites were identified by structure-based drug design and validated by ligand-based and QSAR model. Additionally, UT-B structural and functional characteristics under inhibitors treated and untreated conditions were simulated by molecular dynamics (MD). As the result, we identified four classes of compounds with UT-B inhibitory activity and predicted a human UT-B model, based on which computative binding sites were identified and validated. A novel potential mechanism of UT-B inhibitory activity was discovered by comparing UT-B from different species. Results suggest residue PHE198 in rat and mouse UT-B might block the inhibitor migration pathway. Inhibitory mechanisms of UT-B inhibitors and the functions of key residues in UT-B were proposed. The binding site analysis provides a structural basis for lead identification and optimization of UT-B inhibitors.
Aryee, Martin J.; Jaffe, Andrew E.; Corrada-Bravo, Hector; Ladd-Acosta, Christine; Feinberg, Andrew P.; Hansen, Kasper D.; Irizarry, Rafael A.
2014-01-01
Motivation: The recently released Infinium HumanMethylation450 array (the ‘450k’ array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Results: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. Availability and implementation: http://bioconductor.org/packages/release/bioc/html/minfi.html. Contact: khansen@jhsph.edu; rafa@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24478339
Zhong, Qing; Rüschoff, Jan H.; Guo, Tiannan; Gabrani, Maria; Schüffler, Peter J.; Rechsteiner, Markus; Liu, Yansheng; Fuchs, Thomas J.; Rupp, Niels J.; Fankhauser, Christian; Buhmann, Joachim M.; Perner, Sven; Poyet, Cédric; Blattner, Miriam; Soldini, Davide; Moch, Holger; Rubin, Mark A.; Noske, Aurelia; Rüschoff, Josef; Haffner, Michael C.; Jochum, Wolfram; Wild, Peter J.
2016-01-01
Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility. PMID:27052161
Zhong, Qing; Rüschoff, Jan H; Guo, Tiannan; Gabrani, Maria; Schüffler, Peter J; Rechsteiner, Markus; Liu, Yansheng; Fuchs, Thomas J; Rupp, Niels J; Fankhauser, Christian; Buhmann, Joachim M; Perner, Sven; Poyet, Cédric; Blattner, Miriam; Soldini, Davide; Moch, Holger; Rubin, Mark A; Noske, Aurelia; Rüschoff, Josef; Haffner, Michael C; Jochum, Wolfram; Wild, Peter J
2016-04-07
Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility.
High Performance Computing Modernization Program Kerberos Throughput Test Report
2017-10-26
functionality as Kerberos plugins. The pre -release production kit was used in these tests to compare against the current release kit. YubiKey support...HPCMP Kerberos Throughput Test Report 3 2. THROUGHPUT TESTING 2.1 Testing Components Throughput testing was done to determine the benefits of the pre ...both the current release kit and the pre -release production kit for a total of 378 individual tests in order to note any improvements. Based on work
High-Throughput Bit-Serial LDPC Decoder LSI Based on Multiple-Valued Asynchronous Interleaving
NASA Astrophysics Data System (ADS)
Onizawa, Naoya; Hanyu, Takahiro; Gaudet, Vincent C.
This paper presents a high-throughput bit-serial low-density parity-check (LDPC) decoder that uses an asynchronous interleaver. Since consecutive log-likelihood message values on the interleaver are similar, node computations are continuously performed by using the most recently arrived messages without significantly affecting bit-error rate (BER) performance. In the asynchronous interleaver, each message's arrival rate is based on the delay due to the wire length, so that the decoding throughput is not restricted by the worst-case latency, which results in a higher average rate of computation. Moreover, the use of a multiple-valued data representation makes it possible to multiplex control signals and data from mutual nodes, thus minimizing the number of handshaking steps in the asynchronous interleaver and eliminating the clock signal entirely. As a result, the decoding throughput becomes 1.3 times faster than that of a bit-serial synchronous decoder under a 90nm CMOS technology, at a comparable BER.
Nir, Oaz; Bakal, Chris; Perrimon, Norbert; Berger, Bonnie
2010-03-01
Biological networks are highly complex systems, consisting largely of enzymes that act as molecular switches to activate/inhibit downstream targets via post-translational modification. Computational techniques have been developed to perform signaling network inference using some high-throughput data sources, such as those generated from transcriptional and proteomic studies, but comparable methods have not been developed to use high-content morphological data, which are emerging principally from large-scale RNAi screens, to these ends. Here, we describe a systematic computational framework based on a classification model for identifying genetic interactions using high-dimensional single-cell morphological data from genetic screens, apply it to RhoGAP/GTPase regulation in Drosophila, and evaluate its efficacy. Augmented by knowledge of the basic structure of RhoGAP/GTPase signaling, namely, that GAPs act directly upstream of GTPases, we apply our framework for identifying genetic interactions to predict signaling relationships between these proteins. We find that our method makes mediocre predictions using only RhoGAP single-knockdown morphological data, yet achieves vastly improved accuracy by including original data from a double-knockdown RhoGAP genetic screen, which likely reflects the redundant network structure of RhoGAP/GTPase signaling. We consider other possible methods for inference and show that our primary model outperforms the alternatives. This work demonstrates the fundamental fact that high-throughput morphological data can be used in a systematic, successful fashion to identify genetic interactions and, using additional elementary knowledge of network structure, to infer signaling relations.
Spitzer, James D; Hupert, Nathaniel; Duckart, Jonathan; Xiong, Wei
2007-01-01
Community-based mass prophylaxis is a core public health operational competency, but staffing needs may overwhelm the local trained health workforce. Just-in-time (JIT) training of emergency staff and computer modeling of workforce requirements represent two complementary approaches to address this logistical problem. Multnomah County, Oregon, conducted a high-throughput point of dispensing (POD) exercise to test JIT training and computer modeling to validate POD staffing estimates. The POD had 84% non-health-care worker staff and processed 500 patients per hour. Post-exercise modeling replicated observed staff utilization levels and queue formation, including development and amelioration of a large medical evaluation queue caused by lengthy processing times and understaffing in the first half-hour of the exercise. The exercise confirmed the feasibility of using JIT training for high-throughput antibiotic dispensing clinics staffed largely by nonmedical professionals. Patient processing times varied over the course of the exercise, with important implications for both staff reallocation and future POD modeling efforts. Overall underutilization of staff revealed the opportunity for greater efficiencies and even higher future throughputs.
Faridi, Mohd Hafeez; Maiguel, Dony; Brown, Brock T.; Suyama, Eigo; Barth, Constantinos J.; Hedrick, Michael; Vasile, Stefan; Sergienko, Eduard; Schürer, Stephan; Gupta, Vineet
2010-01-01
Binding of leukocyte specific integrin CD11b/CD18 to its physiologic ligands is important for the development of normal immune response in vivo. Integrin CD11b/CD18 is also a key cellular effector of various inflammatory and autoimmune diseases. However, small molecules selectively inhibiting the function of integrin CD11b/CD18 are currently lacking. We used a newly described cell-based high throughput screening assay to identify a number of highly potent antagonists of integrin CD11b/CD18 from chemical libraries containing >100,000 unique compounds. Computational analyses suggest that the identified compounds cluster into several different chemical classes. A number of the newly identified compounds blocked adhesion of wild-type mouse neutrophils to CD11b/CD18 ligand fibrinogen. Mapping the most active compounds against chemical fingerprints of known antagonists of related integrin CD11a/CD18 shows little structural similarity, suggesting that the newly identified compounds are novel and unique. PMID:20188705
Collaborative Core Research Program for Chemical-Biological Warfare Defense
2015-01-04
Discovery through High Throughput Screening (HTS) and Fragment-Based Drug Design (FBDD...Discovery through High Throughput Screening (HTS) and Fragment-Based Drug Design (FBDD) Current pharmaceutical approaches involving drug discovery...structural analysis and docking program generally known as fragment based drug design (FBDD). The main advantage of using these approaches is that
Klukas, Christian; Chen, Dijun; Pape, Jean-Michel
2014-01-01
High-throughput phenotyping is emerging as an important technology to dissect phenotypic components in plants. Efficient image processing and feature extraction are prerequisites to quantify plant growth and performance based on phenotypic traits. Issues include data management, image analysis, and result visualization of large-scale phenotypic data sets. Here, we present Integrated Analysis Platform (IAP), an open-source framework for high-throughput plant phenotyping. IAP provides user-friendly interfaces, and its core functions are highly adaptable. Our system supports image data transfer from different acquisition environments and large-scale image analysis for different plant species based on real-time imaging data obtained from different spectra. Due to the huge amount of data to manage, we utilized a common data structure for efficient storage and organization of data for both input data and result data. We implemented a block-based method for automated image processing to extract a representative list of plant phenotypic traits. We also provide tools for build-in data plotting and result export. For validation of IAP, we performed an example experiment that contains 33 maize (Zea mays ‘Fernandez’) plants, which were grown for 9 weeks in an automated greenhouse with nondestructive imaging. Subsequently, the image data were subjected to automated analysis with the maize pipeline implemented in our system. We found that the computed digital volume and number of leaves correlate with our manually measured data in high accuracy up to 0.98 and 0.95, respectively. In summary, IAP provides a multiple set of functionalities for import/export, management, and automated analysis of high-throughput plant phenotyping data, and its analysis results are highly reliable. PMID:24760818
Ultra-Structure database design methodology for managing systems biology data and analyses
Maier, Christopher W; Long, Jeffrey G; Hemminger, Bradley M; Giddings, Morgan C
2009-01-01
Background Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping). Results We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research. Conclusion We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era. PMID:19691849
Machine learning in computational biology to accelerate high-throughput protein expression.
Sastry, Anand; Monk, Jonathan; Tegel, Hanna; Uhlen, Mathias; Palsson, Bernhard O; Rockberg, Johan; Brunk, Elizabeth
2017-08-15
The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets. ebrunk@ucsd.edu or johanr@biotech.kth.se. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
High-Throughput Characterization of Porous Materials Using Graphics Processing Units
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jihan; Martin, Richard L.; Rübel, Oliver
We have developed a high-throughput graphics processing units (GPU) code that can characterize a large database of crystalline porous materials. In our algorithm, the GPU is utilized to accelerate energy grid calculations where the grid values represent interactions (i.e., Lennard-Jones + Coulomb potentials) between gas molecules (i.e., CHmore » $$_{4}$$ and CO$$_{2}$$) and material's framework atoms. Using a parallel flood fill CPU algorithm, inaccessible regions inside the framework structures are identified and blocked based on their energy profiles. Finally, we compute the Henry coefficients and heats of adsorption through statistical Widom insertion Monte Carlo moves in the domain restricted to the accessible space. The code offers significant speedup over a single core CPU code and allows us to characterize a set of porous materials at least an order of magnitude larger than ones considered in earlier studies. For structures selected from such a prescreening algorithm, full adsorption isotherms can be calculated by conducting multiple grand canonical Monte Carlo simulations concurrently within the GPU.« less
Wan, Cuihong; Liu, Jian; Fong, Vincent; Lugowski, Andrew; Stoilova, Snejana; Bethune-Waddell, Dylan; Borgeson, Blake; Havugimana, Pierre C; Marcotte, Edward M; Emili, Andrew
2013-04-09
The experimental isolation and characterization of stable multi-protein complexes are essential to understanding the molecular systems biology of a cell. To this end, we have developed a high-throughput proteomic platform for the systematic identification of native protein complexes based on extensive fractionation of soluble protein extracts by multi-bed ion exchange high performance liquid chromatography (IEX-HPLC) combined with exhaustive label-free LC/MS/MS shotgun profiling. To support these studies, we have built a companion data analysis software pipeline, termed ComplexQuant. Proteins present in the hundreds of fractions typically collected per experiment are first identified by exhaustively interrogating MS/MS spectra using multiple database search engines within an integrative probabilistic framework, while accounting for possible post-translation modifications. Protein abundance is then measured across the fractions based on normalized total spectral counts and precursor ion intensities using a dedicated tool, PepQuant. This analysis allows co-complex membership to be inferred based on the similarity of extracted protein co-elution profiles. Each computational step has been optimized for processing large-scale biochemical fractionation datasets, and the reliability of the integrated pipeline has been benchmarked extensively. This article is part of a Special Issue entitled: From protein structures to clinical applications. Copyright © 2012 Elsevier B.V. All rights reserved.
Hayden, Eric J
2016-08-15
RNA molecules provide a realistic but tractable model of a genotype to phenotype relationship. This relationship has been extensively investigated computationally using secondary structure prediction algorithms. Enzymatic RNA molecules, or ribozymes, offer access to genotypic and phenotypic information in the laboratory. Advancements in high-throughput sequencing technologies have enabled the analysis of sequences in the lab that now rivals what can be accomplished computationally. This has motivated a resurgence of in vitro selection experiments and opened new doors for the analysis of the distribution of RNA functions in genotype space. A body of computational experiments has investigated the persistence of specific RNA structures despite changes in the primary sequence, and how this mutational robustness can promote adaptations. This article summarizes recent approaches that were designed to investigate the role of mutational robustness during the evolution of RNA molecules in the laboratory, and presents theoretical motivations, experimental methods and approaches to data analysis. Copyright © 2016 Elsevier Inc. All rights reserved.
Kračun, Stjepan Krešimir; Fangel, Jonatan Ulrik; Rydahl, Maja Gro; Pedersen, Henriette Lodberg; Vidal-Melgosa, Silvia; Willats, William George Tycho
2017-01-01
Cell walls are an important feature of plant cells and a major component of the plant glycome. They have both structural and physiological functions and are critical for plant growth and development. The diversity and complexity of these structures demand advanced high-throughput techniques to answer questions about their structure, functions and roles in both fundamental and applied scientific fields. Microarray technology provides both the high-throughput and the feasibility aspects required to meet that demand. In this chapter, some of the most recent microarray-based techniques relating to plant cell walls are described together with an overview of related contemporary techniques applied to carbohydrate microarrays and their general potential in glycoscience. A detailed experimental procedure for high-throughput mapping of plant cell wall glycans using the comprehensive microarray polymer profiling (CoMPP) technique is included in the chapter and provides a good example of both the robust and high-throughput nature of microarrays as well as their applicability to plant glycomics.
Wu, Cuichen; Wan, Shuo; Hou, Weijia; Zhang, Liqin; Xu, Jiehua; Cui, Cheng; Wang, Yanyue; Hu, Jun; Tan, Weihong
2015-03-04
Nucleic acid-based logic devices were first introduced in 1994. Since then, science has seen the emergence of new logic systems for mimicking mathematical functions, diagnosing disease and even imitating biological systems. The unique features of nucleic acids, such as facile and high-throughput synthesis, Watson-Crick complementary base pairing, and predictable structures, together with the aid of programming design, have led to the widespread applications of nucleic acids (NA) for logic gate and computing in biotechnology and biomedicine. In this feature article, the development of in vitro NA logic systems will be discussed, as well as the expansion of such systems using various input molecules for potential cellular, or even in vivo, applications.
Wu, Cuichen; Wan, Shuo; Hou, Weijia; Zhang, Liqin; Xu, Jiehua; Cui, Cheng; Wang, Yanyue; Hu, Jun
2015-01-01
Nucleic acid-based logic devices were first introduced in 1994. Since then, science has seen the emergence of new logic systems for mimicking mathematical functions, diagnosing disease and even imitating biological systems. The unique features of nucleic acids, such as facile and high-throughput synthesis, Watson-Crick complementary base pairing, and predictable structures, together with the aid of programming design, have led to the widespread applications of nucleic acids (NA) for logic gating and computing in biotechnology and biomedicine. In this feature article, the development of in vitro NA logic systems will be discussed, as well as the expansion of such systems using various input molecules for potential cellular, or even in vivo, applications. PMID:25597946
Judson, Richard S.; Martin, Matthew T.; Egeghy, Peter; Gangwal, Sumit; Reif, David M.; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A.; Richard, Ann M.
2012-01-01
Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases. PMID:22408426
Judson, Richard S; Martin, Matthew T; Egeghy, Peter; Gangwal, Sumit; Reif, David M; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A; Richard, Ann M
2012-01-01
Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases.
Kawaguchi, Risa; Kiryu, Hisanori
2016-05-06
RNA secondary structure around splice sites is known to assist normal splicing by promoting spliceosome recognition. However, analyzing the structural properties of entire intronic regions or pre-mRNA sequences has been difficult hitherto, owing to serious experimental and computational limitations, such as low read coverage and numerical problems. Our novel software, "ParasoR", is designed to run on a computer cluster and enables the exact computation of various structural features of long RNA sequences under the constraint of maximal base-pairing distance. ParasoR divides dynamic programming (DP) matrices into smaller pieces, such that each piece can be computed by a separate computer node without losing the connectivity information between the pieces. ParasoR directly computes the ratios of DP variables to avoid the reduction of numerical precision caused by the cancellation of a large number of Boltzmann factors. The structural preferences of mRNAs computed by ParasoR shows a high concordance with those determined by high-throughput sequencing analyses. Using ParasoR, we investigated the global structural preferences of transcribed regions in the human genome. A genome-wide folding simulation indicated that transcribed regions are significantly more structural than intergenic regions after removing repeat sequences and k-mer frequency bias. In particular, we observed a highly significant preference for base pairing over entire intronic regions as compared to their antisense sequences, as well as to intergenic regions. A comparison between pre-mRNAs and mRNAs showed that coding regions become more accessible after splicing, indicating constraints for translational efficiency. Such changes are correlated with gene expression levels, as well as GC content, and are enriched among genes associated with cytoskeleton and kinase functions. We have shown that ParasoR is very useful for analyzing the structural properties of long RNA sequences such as mRNAs, pre-mRNAs, and long non-coding RNAs whose lengths can be more than a million bases in the human genome. In our analyses, transcribed regions including introns are indicated to be subject to various types of structural constraints that cannot be explained from simple sequence composition biases. ParasoR is freely available at https://github.com/carushi/ParasoR .
Materials Databases Infrastructure Constructed by First Principles Calculations: A Review
Lin, Lianshan
2015-10-13
The First Principles calculations, especially the calculation based on High-Throughput Density Functional Theory, have been widely accepted as the major tools in atom scale materials design. The emerging super computers, along with the powerful First Principles calculations, have accumulated hundreds of thousands of crystal and compound records. The exponential growing of computational materials information urges the development of the materials databases, which not only provide unlimited storage for the daily increasing data, but still keep the efficiency in data storage, management, query, presentation and manipulation. This review covers the most cutting edge materials databases in materials design, and their hotmore » applications such as in fuel cells. By comparing the advantages and drawbacks of these high-throughput First Principles materials databases, the optimized computational framework can be identified to fit the needs of fuel cell applications. The further development of high-throughput DFT materials database, which in essence accelerates the materials innovation, is discussed in the summary as well.« less
NASA Astrophysics Data System (ADS)
Hayasaki, Yoshio
2017-02-01
Femtosecond laser processing is a promising tool for fabricating novel and useful structures on the surfaces of and inside materials. An enormous number of pulse irradiation points will be required for fabricating actual structures with millimeter scale, and therefore, the throughput of femtosecond laser processing must be improved for practical adoption of this technique. One promising method to improve throughput is parallel pulse generation based on a computer-generated hologram (CGH) displayed on a spatial light modulator (SLM), a technique called holographic femtosecond laser processing. The holographic method has the advantages such as high throughput, high light use efficiency, and variable, instantaneous, and 3D patterning. Furthermore, the use of an SLM gives an ability to correct unknown imperfections of the optical system and inhomogeneity in a sample using in-system optimization of the CGH. Furthermore, the CGH can adaptively compensate in response to dynamic unpredictable mechanical movements, air and liquid disturbances, a shape variation and deformation of the target sample, as well as adaptive wavefront control for environmental changes. Therefore, it is a powerful tool for the fabrication of biological cells and tissues, because they have free form, variable, and deformable structures. In this paper, we present the principle and the experimental setup of holographic femtosecond laser processing, and the effective way for processing the biological sample. We demonstrate the femtosecond laser processing of biological materials and the processing properties.
Dragas, Jelena; Jäckel, David; Hierlemann, Andreas; Franke, Felix
2017-01-01
Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction. PMID:25415989
Dragas, Jelena; Jackel, David; Hierlemann, Andreas; Franke, Felix
2015-03-01
Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction.
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-01-01
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Translational bioinformatics in the cloud: an affordable alternative
2010-01-01
With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073
Lee, Chankyun; Cao, Xiaoyuan; Yoshikane, Noboru; Tsuritani, Takehiro; Rhee, June-Koo Kevin
2015-10-19
The feasibility of software-defined optical networking (SDON) for a practical application critically depends on scalability of centralized control performance. The paper, highly scalable routing and wavelength assignment (RWA) algorithms are investigated on an OpenFlow-based SDON testbed for proof-of-concept demonstration. Efficient RWA algorithms are proposed to achieve high performance in achieving network capacity with reduced computation cost, which is a significant attribute in a scalable centralized-control SDON. The proposed heuristic RWA algorithms differ in the orders of request processes and in the procedures of routing table updates. Combined in a shortest-path-based routing algorithm, a hottest-request-first processing policy that considers demand intensity and end-to-end distance information offers both the highest throughput of networks and acceptable computation scalability. We further investigate trade-off relationship between network throughput and computation complexity in routing table update procedure by a simulation study.
High-throughput search for caloric materials: the CaloriCool approach
NASA Astrophysics Data System (ADS)
Zarkevich, N. A.; Johnson, D. D.; Pecharsky, V. K.
2018-01-01
The high-throughput search paradigm adopted by the newly established caloric materials consortium—CaloriCool®—with the goal to substantially accelerate discovery and design of novel caloric materials is briefly discussed. We begin with describing material selection criteria based on known properties, which are then followed by heuristic fast estimates, ab initio calculations, all of which has been implemented in a set of automated computational tools and measurements. We also demonstrate how theoretical and computational methods serve as a guide for experimental efforts by considering a representative example from the field of magnetocaloric materials.
High-throughput search for caloric materials: the CaloriCool approach
Zarkevich, Nikolai A.; Johnson, Duane D.; Pecharsky, V. K.
2017-12-13
The high-throughput search paradigm adopted by the newly established caloric materials consortium—CaloriCool ®—with the goal to substantially accelerate discovery and design of novel caloric materials is briefly discussed. Here, we begin with describing material selection criteria based on known properties, which are then followed by heuristic fast estimates, ab initio calculations, all of which has been implemented in a set of automated computational tools and measurements. We also demonstrate how theoretical and computational methods serve as a guide for experimental efforts by considering a representative example from the field of magnetocaloric materials.
Making big sense from big data in toxicology by read-across.
Hartung, Thomas
2016-01-01
Modern information technologies have made big data available in safety sciences, i.e., extremely large data sets that may be analyzed only computationally to reveal patterns, trends and associations. This happens by (1) compilation of large sets of existing data, e.g., as a result of the European REACH regulation, (2) the use of omics technologies and (3) systematic robotized testing in a high-throughput manner. All three approaches and some other high-content technologies leave us with big data--the challenge is now to make big sense of these data. Read-across, i.e., the local similarity-based intrapolation of properties, is gaining momentum with increasing data availability and consensus on how to process and report it. It is predominantly applied to in vivo test data as a gap-filling approach, but can similarly complement other incomplete datasets. Big data are first of all repositories for finding similar substances and ensure that the available data is fully exploited. High-content and high-throughput approaches similarly require focusing on clusters, in this case formed by underlying mechanisms such as pathways of toxicity. The closely connected properties, i.e., structural and biological similarity, create the confidence needed for predictions of toxic properties. Here, a new web-based tool under development called REACH-across, which aims to support and automate structure-based read-across, is presented among others.
Handheld Fluorescence Microscopy based Flow Analyzer.
Saxena, Manish; Jayakumar, Nitin; Gorthi, Sai Siva
2016-03-01
Fluorescence microscopy has the intrinsic advantages of favourable contrast characteristics and high degree of specificity. Consequently, it has been a mainstay in modern biological inquiry and clinical diagnostics. Despite its reliable nature, fluorescence based clinical microscopy and diagnostics is a manual, labour intensive and time consuming procedure. The article outlines a cost-effective, high throughput alternative to conventional fluorescence imaging techniques. With system level integration of custom-designed microfluidics and optics, we demonstrate fluorescence microscopy based imaging flow analyzer. Using this system we have imaged more than 2900 FITC labeled fluorescent beads per minute. This demonstrates high-throughput characteristics of our flow analyzer in comparison to conventional fluorescence microscopy. The issue of motion blur at high flow rates limits the achievable throughput in image based flow analyzers. Here we address the issue by computationally deblurring the images and show that this restores the morphological features otherwise affected by motion blur. By further optimizing concentration of the sample solution and flow speeds, along with imaging multiple channels simultaneously, the system is capable of providing throughput of about 480 beads per second.
A high-throughput exploration of magnetic materials by using structure predicting methods
NASA Astrophysics Data System (ADS)
Arapan, S.; Nieves, P.; Cuesta-López, S.
2018-02-01
We study the capability of a structure predicting method based on genetic/evolutionary algorithm for a high-throughput exploration of magnetic materials. We use the USPEX and VASP codes to predict stable and generate low-energy meta-stable structures for a set of representative magnetic structures comprising intermetallic alloys, oxides, interstitial compounds, and systems containing rare-earths elements, and for both types of ferromagnetic and antiferromagnetic ordering. We have modified the interface between USPEX and VASP codes to improve the performance of structural optimization as well as to perform calculations in a high-throughput manner. We show that exploring the structure phase space with a structure predicting technique reveals large sets of low-energy metastable structures, which not only improve currently exiting databases, but also may provide understanding and solutions to stabilize and synthesize magnetic materials suitable for permanent magnet applications.
High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software
Fabregat-Traver, Diego; Sharapov, Sodbo Zh.; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Aulchenko, Yurii; Bientinesi, Paolo
2014-01-01
To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the ’omics’ context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations, increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL. PMID:25717363
High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software.
Fabregat-Traver, Diego; Sharapov, Sodbo Zh; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Aulchenko, Yurii; Bientinesi, Paolo
2014-01-01
To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the 'omics' context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations, increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.
Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid
Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617
Perspectives on pathway perturbation: Focused research to enhance 3R objectives
In vitro high-throughput screening (HTS) and in silico technologies are emerging as 21st century tools for hazard identification. Computational methods that strategically examine cross-species conservation of protein sequence/structural information for chemical molecular targets ...
LOCATE: a mouse protein subcellular localization database
Fink, J. Lynn; Aturaliya, Rajith N.; Davis, Melissa J.; Zhang, Fasheng; Hanson, Kelly; Teasdale, Melvena S.; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Teasdale, Rohan D.
2006-01-01
We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for ∼40% of the mouse proteome. It is available at . PMID:16381849
High-throughput GPU-based LDPC decoding
NASA Astrophysics Data System (ADS)
Chang, Yang-Lang; Chang, Cheng-Chun; Huang, Min-Yu; Huang, Bormin
2010-08-01
Low-density parity-check (LDPC) code is a linear block code known to approach the Shannon limit via the iterative sum-product algorithm. LDPC codes have been adopted in most current communication systems such as DVB-S2, WiMAX, WI-FI and 10GBASE-T. LDPC for the needs of reliable and flexible communication links for a wide variety of communication standards and configurations have inspired the demand for high-performance and flexibility computing. Accordingly, finding a fast and reconfigurable developing platform for designing the high-throughput LDPC decoder has become important especially for rapidly changing communication standards and configurations. In this paper, a new graphic-processing-unit (GPU) LDPC decoding platform with the asynchronous data transfer is proposed to realize this practical implementation. Experimental results showed that the proposed GPU-based decoder achieved 271x speedup compared to its CPU-based counterpart. It can serve as a high-throughput LDPC decoder.
High-throughput sample adaptive offset hardware architecture for high-efficiency video coding
NASA Astrophysics Data System (ADS)
Zhou, Wei; Yan, Chang; Zhang, Jingzhi; Zhou, Xin
2018-03-01
A high-throughput hardware architecture for a sample adaptive offset (SAO) filter in the high-efficiency video coding video coding standard is presented. First, an implementation-friendly and simplified bitrate estimation method of rate-distortion cost calculation is proposed to reduce the computational complexity in the mode decision of SAO. Then, a high-throughput VLSI architecture for SAO is presented based on the proposed bitrate estimation method. Furthermore, multiparallel VLSI architecture for in-loop filters, which integrates both deblocking filter and SAO filter, is proposed. Six parallel strategies are applied in the proposed in-loop filters architecture to improve the system throughput and filtering speed. Experimental results show that the proposed in-loop filters architecture can achieve up to 48% higher throughput in comparison with prior work. The proposed architecture can reach a high-operating clock frequency of 297 MHz with TSMC 65-nm library and meet the real-time requirement of the in-loop filters for 8 K × 4 K video format at 132 fps.
High-throughput screening based on label-free detection of small molecule microarrays
NASA Astrophysics Data System (ADS)
Zhu, Chenggang; Fei, Yiyan; Zhu, Xiangdong
2017-02-01
Based on small-molecule microarrays (SMMs) and oblique-incidence reflectivity difference (OI-RD) scanner, we have developed a novel high-throughput drug preliminary screening platform based on label-free monitoring of direct interactions between target proteins and immobilized small molecules. The screening platform is especially attractive for screening compounds against targets of unknown function and/or structure that are not compatible with functional assay development. In this screening platform, OI-RD scanner serves as a label-free detection instrument which is able to monitor about 15,000 biomolecular interactions in a single experiment without the need to label any biomolecule. Besides, SMMs serves as a novel format for high-throughput screening by immobilization of tens of thousands of different compounds on a single phenyl-isocyanate functionalized glass slide. Based on the high-throughput screening platform, we sequentially screened five target proteins (purified target proteins or cell lysate containing target protein) in high-throughput and label-free mode. We found hits for respective target protein and the inhibition effects for some hits were confirmed by following functional assays. Compared to traditional high-throughput screening assay, the novel high-throughput screening platform has many advantages, including minimal sample consumption, minimal distortion of interactions through label-free detection, multi-target screening analysis, which has a great potential to be a complementary screening platform in the field of drug discovery.
Software Voting in Asynchronous NMR (N-Modular Redundancy) Computer Structures.
1983-05-06
added reliability is exchanged for increased system cost and decreased throughput. Some applications require extremely reliable systems, so the only...not the other way around. Although no systems proidc abstract voting yet. as more applications are written for NMR systems, the programmers are going...throughput goes down, the overhead goes up. Mathematically : Overhead= Non redundant Throughput- Actual Throughput (1) In this section, the actual throughput
Predicting organ toxicity using in vitro bioactivity data and chemical structure
Animal testing alone cannot practically evaluate the health hazard posed by tens of thousands of environmental chemicals. Computational approaches together with high-throughput experimental data may provide more efficient means to predict chemical toxicity. Here, we use a superv...
The US EPA ToxCast Program: Moving from Data Generation ...
The U.S. EPA ToxCast program is entering its tenth year. Significant learning and progress have occurred towards collection, analysis, and interpretation of the data. The library of ~1,800 chemicals has been subject to ongoing characterization (e.g., identity, purity, stability) and is unique in its scope, structural diversity, and use scenarios making it ideally suited to investigate the underlying molecular mechanisms of toxicity. The ~700 high-throughput in vitro assay endpoints cover 327 genes and 293 pathways as well as other integrated cellular processes and responses. The integrated analysis of high-throughput screening data has shown that most environmental and industrial chemicals are very non-selective in the biological targets they perturb, while a small subset of chemicals are relatively selective for specific biological targets. The selectivity of a chemical informs interpretation of the screening results while also guiding future mode-of-action or adverse outcome pathway approaches. Coupling the high-throughput in vitro assays with medium-throughput pharmacokinetic assays and reverse dosimetry allows conversion of the potency estimates to an administered dose. Comparison of the administered dose to human exposure provides a risk-based context. The lessons learned from this effort will be presented and discussed towards application to chemical safety decision making and the future of the computational toxicology program at the U.S. EPA. SOT pr
Algorithm for fast event parameters estimation on GEM acquired data
NASA Astrophysics Data System (ADS)
Linczuk, Paweł; Krawczyk, Rafał D.; Poźniak, Krzysztof T.; Kasprowicz, Grzegorz; Wojeński, Andrzej; Chernyshova, Maryna; Czarski, Tomasz
2016-09-01
We present study of a software-hardware environment for developing fast computation with high throughput and low latency methods, which can be used as back-end in High Energy Physics (HEP) and other High Performance Computing (HPC) systems, based on high amount of input from electronic sensor based front-end. There is a parallelization possibilities discussion and testing on Intel HPC solutions with consideration of applications with Gas Electron Multiplier (GEM) measurement systems presented in this paper.
High-throughput Crystallography for Structural Genomics
Joachimiak, Andrzej
2009-01-01
Protein X-ray crystallography recently celebrated its 50th anniversary. The structures of myoglobin and hemoglobin determined by Kendrew and Perutz provided the first glimpses into the complex protein architecture and chemistry. Since then, the field of structural molecular biology has experienced extraordinary progress and now over 53,000 proteins structures have been deposited into the Protein Data Bank. In the past decade many advances in macromolecular crystallography have been driven by world-wide structural genomics efforts. This was made possible because of third-generation synchrotron sources, structure phasing approaches using anomalous signal and cryo-crystallography. Complementary progress in molecular biology, proteomics, hardware and software for crystallographic data collection, structure determination and refinement, computer science, databases, robotics and automation improved and accelerated many processes. These advancements provide the robust foundation for structural molecular biology and assure strong contribution to science in the future. In this report we focus mainly on reviewing structural genomics high-throughput X-ray crystallography technologies and their impact. PMID:19765976
Computational Approaches to Phenotyping
Lussier, Yves A.; Liu, Yang
2007-01-01
The recent completion of the Human Genome Project has made possible a high-throughput “systems approach” for accelerating the elucidation of molecular underpinnings of human diseases, and subsequent derivation of molecular-based strategies to more effectively prevent, diagnose, and treat these diseases. Although altered phenotypes are among the most reliable manifestations of altered gene functions, research using systematic analysis of phenotype relationships to study human biology is still in its infancy. This article focuses on the emerging field of high-throughput phenotyping (HTP) phenomics research, which aims to capitalize on novel high-throughput computation and informatics technology developments to derive genomewide molecular networks of genotype–phenotype associations, or “phenomic associations.” The HTP phenomics research field faces the challenge of technological research and development to generate novel tools in computation and informatics that will allow researchers to amass, access, integrate, organize, and manage phenotypic databases across species and enable genomewide analysis to associate phenotypic information with genomic data at different scales of biology. Key state-of-the-art technological advancements critical for HTP phenomics research are covered in this review. In particular, we highlight the power of computational approaches to conduct large-scale phenomics studies. PMID:17202287
You, Zhu-Hong; Li, Shuai; Gao, Xin; Luo, Xin; Ji, Zhen
2014-01-01
Protein-protein interactions are the basis of biological functions, and studying these interactions on a molecular level is of crucial importance for understanding the functionality of a living cell. During the past decade, biosensors have emerged as an important tool for the high-throughput identification of proteins and their interactions. However, the high-throughput experimental methods for identifying PPIs are both time-consuming and expensive. On the other hand, high-throughput PPI data are often associated with high false-positive and high false-negative rates. Targeting at these problems, we propose a method for PPI detection by integrating biosensor-based PPI data with a novel computational model. This method was developed based on the algorithm of extreme learning machine combined with a novel representation of protein sequence descriptor. When performed on the large-scale human protein interaction dataset, the proposed method achieved 84.8% prediction accuracy with 84.08% sensitivity at the specificity of 85.53%. We conducted more extensive experiments to compare the proposed method with the state-of-the-art techniques, support vector machine. The achieved results demonstrate that our approach is very promising for detecting new PPIs, and it can be a helpful supplement for biosensor-based PPI data detection.
A Primer on High-Throughput Computing for Genomic Selection
Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel
2011-01-01
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303
Gore, Brooklin
2018-02-01
This presentation includes a brief background on High Throughput Computing, correlating gene transcription factors, optical mapping, genotype to phenotype mapping via QTL analysis, and current work on next gen sequencing.
Sharlow, Elizabeth R.; Lyda, Todd A.; Dodson, Heidi C.; Mustata, Gabriela; Morris, Meredith T.; Leimgruber, Stephanie S.; Lee, Kuo-Hsiung; Kashiwada, Yoshiki; Close, David; Lazo, John S.; Morris, James C.
2010-01-01
Background The parasitic protozoan Trypanosoma brucei utilizes glycolysis exclusively for ATP production during infection of the mammalian host. The first step in this metabolic pathway is mediated by hexokinase (TbHK), an enzyme essential to the parasite that transfers the γ-phospho of ATP to a hexose. Here we describe the identification and confirmation of novel small molecule inhibitors of bacterially expressed TbHK1, one of two TbHKs expressed by T. brucei, using a high throughput screening assay. Methodology/Principal Findings Exploiting optimized high throughput screening assay procedures, we interrogated 220,233 unique compounds and identified 239 active compounds from which ten small molecules were further characterized. Computation chemical cluster analyses indicated that six compounds were structurally related while the remaining four compounds were classified as unrelated or singletons. All ten compounds were ∼20-17,000-fold more potent than lonidamine, a previously identified TbHK1 inhibitor. Seven compounds inhibited T. brucei blood stage form parasite growth (0.03≤EC50<3 µM) with parasite specificity of the compounds being demonstrated using insect stage T. brucei parasites, Leishmania promastigotes, and mammalian cell lines. Analysis of two structurally related compounds, ebselen and SID 17387000, revealed that both were mixed inhibitors of TbHK1 with respect to ATP. Additionally, both compounds inhibited parasite lysate-derived HK activity. None of the compounds displayed structural similarity to known hexokinase inhibitors or human African trypanosomiasis therapeutics. Conclusions/Significance The novel chemotypes identified here could represent leads for future therapeutic development against the African trypanosome. PMID:20405000
High-throughput literature mining to support read-across ...
Building scientific confidence in the development and evaluation of read-across remains an ongoing challenge. Approaches include establishing systematic frameworks to identify sources of uncertainty and ways to address them. One source of uncertainty is related to characterizing biological similarity. Many research efforts are underway such as structuring mechanistic data in adverse outcome pathways and investigating the utility of high throughput (HT)/high content (HC) screening data. A largely untapped resource for read-across to date is the biomedical literature. This information has the potential to support read-across by facilitating the identification of valid source analogues with similar biological and toxicological profiles as well as providing the mechanistic understanding for any prediction made. A key challenge in using biomedical literature is to convert and translate its unstructured form into a computable format that can be linked to chemical structure. We developed a novel text-mining strategy to represent literature information for read across. Keywords were used to organize literature into toxicity signatures at the chemical level. These signatures were integrated with HT in vitro data and curated chemical structures. A rule-based algorithm assessed the strength of the literature relationship, providing a mechanism to rank and visualize the signature as literature ToxPIs (LitToxPIs). LitToxPIs were developed for over 6,000 chemicals for a varie
2011-06-01
4. Conclusion The Web -based AGeS system described in this paper is a computationally-efficient and scalable system for high- throughput genome...method for protecting web services involves making them more resilient to attack using autonomic computing techniques. This paper presents our initial...20–23, 2011 2011 DoD High Performance Computing Modernzation Program Users Group Conference HPCMP UGC 2011 The papers in this book comprise the
Monleón, Daniel; Colson, Kimberly; Moseley, Hunter N B; Anklin, Clemens; Oswald, Robert; Szyperski, Thomas; Montelione, Gaetano T
2002-01-01
Rapid data collection, spectral referencing, processing by time domain deconvolution, peak picking and editing, and assignment of NMR spectra are necessary components of any efficient integrated system for protein NMR structure analysis. We have developed a set of software tools designated AutoProc, AutoPeak, and AutoAssign, which function together with the data processing and peak-picking programs NMRPipe and Sparky, to provide an integrated software system for rapid analysis of protein backbone resonance assignments. In this paper we demonstrate that these tools, together with high-sensitivity triple resonance NMR cryoprobes for data collection and a Linux-based computer cluster architecture, can be combined to provide nearly complete backbone resonance assignments and secondary structures (based on chemical shift data) for a 59-residue protein in less than 30 hours of data collection and processing time. In this optimum case of a small protein providing excellent spectra, extensive backbone resonance assignments could also be obtained using less than 6 hours of data collection and processing time. These results demonstrate the feasibility of high throughput triple resonance NMR for determining resonance assignments and secondary structures of small proteins, and the potential for applying NMR in large scale structural proteomics projects.
Kavlock, Robert; Dix, David
2010-02-01
Computational toxicology is the application of mathematical and computer models to help assess chemical hazards and risks to human health and the environment. Supported by advances in informatics, high-throughput screening (HTS) technologies, and systems biology, the U.S. Environmental Protection Agency EPA is developing robust and flexible computational tools that can be applied to the thousands of chemicals in commerce, and contaminant mixtures found in air, water, and hazardous-waste sites. The Office of Research and Development (ORD) Computational Toxicology Research Program (CTRP) is composed of three main elements. The largest component is the National Center for Computational Toxicology (NCCT), which was established in 2005 to coordinate research on chemical screening and prioritization, informatics, and systems modeling. The second element consists of related activities in the National Health and Environmental Effects Research Laboratory (NHEERL) and the National Exposure Research Laboratory (NERL). The third and final component consists of academic centers working on various aspects of computational toxicology and funded by the U.S. EPA Science to Achieve Results (STAR) program. Together these elements form the key components in the implementation of both the initial strategy, A Framework for a Computational Toxicology Research Program (U.S. EPA, 2003), and the newly released The U.S. Environmental Protection Agency's Strategic Plan for Evaluating the Toxicity of Chemicals (U.S. EPA, 2009a). Key intramural projects of the CTRP include digitizing legacy toxicity testing information toxicity reference database (ToxRefDB), predicting toxicity (ToxCast) and exposure (ExpoCast), and creating virtual liver (v-Liver) and virtual embryo (v-Embryo) systems models. U.S. EPA-funded STAR centers are also providing bioinformatics, computational toxicology data and models, and developmental toxicity data and models. The models and underlying data are being made publicly available through the Aggregated Computational Toxicology Resource (ACToR), the Distributed Structure-Searchable Toxicity (DSSTox) Database Network, and other U.S. EPA websites. While initially focused on improving the hazard identification process, the CTRP is placing increasing emphasis on using high-throughput bioactivity profiling data in systems modeling to support quantitative risk assessments, and in developing complementary higher throughput exposure models. This integrated approach will enable analysis of life-stage susceptibility, and understanding of the exposures, pathways, and key events by which chemicals exert their toxicity in developing systems (e.g., endocrine-related pathways). The CTRP will be a critical component in next-generation risk assessments utilizing quantitative high-throughput data and providing a much higher capacity for assessing chemical toxicity than is currently available.
High-throughput bioinformatics with the Cyrille2 pipeline system
Fiers, Mark WEJ; van der Burgt, Ate; Datema, Erwin; de Groot, Joost CW; van Ham, Roeland CHJ
2008-01-01
Background Modern omics research involves the application of high-throughput technologies that generate vast volumes of data. These data need to be pre-processed, analyzed and integrated with existing knowledge through the use of diverse sets of software tools, models and databases. The analyses are often interdependent and chained together to form complex workflows or pipelines. Given the volume of the data used and the multitude of computational resources available, specialized pipeline software is required to make high-throughput analysis of large-scale omics datasets feasible. Results We have developed a generic pipeline system called Cyrille2. The system is modular in design and consists of three functionally distinct parts: 1) a web based, graphical user interface (GUI) that enables a pipeline operator to manage the system; 2) the Scheduler, which forms the functional core of the system and which tracks what data enters the system and determines what jobs must be scheduled for execution, and; 3) the Executor, which searches for scheduled jobs and executes these on a compute cluster. Conclusion The Cyrille2 system is an extensible, modular system, implementing the stated requirements. Cyrille2 enables easy creation and execution of high throughput, flexible bioinformatics pipelines. PMID:18269742
AOPs & Biomarkers: Bridging High Throughput Screening and Regulatory Decision Making.
As high throughput screening (HTS) approaches play a larger role in toxicity testing, computational toxicology has emerged as a critical component in interpreting the large volume of data produced. Computational models for this purpose are becoming increasingly more sophisticated...
Ab initio structure prediction of silicon and germanium sulfides for lithium-ion battery materials
NASA Astrophysics Data System (ADS)
Hsueh, Connie; Mayo, Martin; Morris, Andrew J.
Conventional experimental-based approaches to materials discovery, which can rely heavily on trial and error, are time-intensive and costly. We discuss approaches to coupling experimental and computational techniques in order to systematize, automate, and accelerate the process of materials discovery, which is of particular relevance to developing new battery materials. We use the ab initio random structure searching (AIRSS) method to conduct a systematic investigation of Si-S and Ge-S binary compounds in order to search for novel materials for lithium-ion battery (LIB) anodes. AIRSS is a high-throughput, density functional theory-based approach to structure prediction which has been successful at predicting the structures of LIBs containing sulfur and silicon and germanium. We propose a lithiation mechanism for Li-GeS2 anodes as well as report new, theoretically stable, layered and porous structures in the Si-S and Ge-S systems that pique experimental interest.
Gai, Jiading; Obeid, Nady; Holtrop, Joseph L.; Wu, Xiao-Long; Lam, Fan; Fu, Maojing; Haldar, Justin P.; Hwu, Wen-mei W.; Liang, Zhi-Pei; Sutton, Bradley P.
2013-01-01
Several recent methods have been proposed to obtain significant speed-ups in MRI image reconstruction by leveraging the computational power of GPUs. Previously, we implemented a GPU-based image reconstruction technique called the Illinois Massively Parallel Acquisition Toolkit for Image reconstruction with ENhanced Throughput in MRI (IMPATIENT MRI) for reconstructing data collected along arbitrary 3D trajectories. In this paper, we improve IMPATIENT by removing computational bottlenecks by using a gridding approach to accelerate the computation of various data structures needed by the previous routine. Further, we enhance the routine with capabilities for off-resonance correction and multi-sensor parallel imaging reconstruction. Through implementation of optimized gridding into our iterative reconstruction scheme, speed-ups of more than a factor of 200 are provided in the improved GPU implementation compared to the previous accelerated GPU code. PMID:23682203
Computational tool for the early screening of monoclonal antibodies for their viscosities
Agrawal, Neeraj J; Helk, Bernhard; Kumar, Sandeep; Mody, Neil; Sathish, Hasige A.; Samra, Hardeep S.; Buck, Patrick M; Li, Li; Trout, Bernhardt L
2016-01-01
Highly concentrated antibody solutions often exhibit high viscosities, which present a number of challenges for antibody-drug development, manufacturing and administration. The antibody sequence is a key determinant for high viscosity of highly concentrated solutions; therefore, a sequence- or structure-based tool that can identify highly viscous antibodies from their sequence would be effective in ensuring that only antibodies with low viscosity progress to the development phase. Here, we present a spatial charge map (SCM) tool that can accurately identify highly viscous antibodies from their sequence alone (using homology modeling to determine the 3-dimensional structures). The SCM tool has been extensively validated at 3 different organizations, and has proved successful in correctly identifying highly viscous antibodies. As a quantitative tool, SCM is amenable to high-throughput automated analysis, and can be effectively implemented during the antibody screening or engineering phase for the selection of low-viscosity antibodies. PMID:26399600
FPGA-based architecture for motion recovering in real-time
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Maya-Rueda, Selene E.; Torres-Huitzil, Cesar
2002-03-01
A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.
Microreactor Cells for High-Throughput X-ray Absorption Spectroscopy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beesley, Angela; Tsapatsaris, Nikolaos; Weiher, Norbert
2007-01-19
High-throughput experimentation has been applied to X-ray Absorption spectroscopy as a novel route for increasing research productivity in the catalysis community. Suitable instrumentation has been developed for the rapid determination of the local structure in the metal component of precursors for supported catalysts. An automated analytical workflow was implemented that is much faster than traditional individual spectrum analysis. It allows the generation of structural data in quasi-real time. We describe initial results obtained from the automated high throughput (HT) data reduction and analysis of a sample library implemented through the 96 well-plate industrial standard. The results show that a fullymore » automated HT-XAS technology based on existing industry standards is feasible and useful for the rapid elucidation of geometric and electronic structure of materials.« less
An Efficient Semi-supervised Learning Approach to Predict SH2 Domain Mediated Interactions.
Kundu, Kousik; Backofen, Rolf
2017-01-01
Src homology 2 (SH2) domain is an important subclass of modular protein domains that plays an indispensable role in several biological processes in eukaryotes. SH2 domains specifically bind to the phosphotyrosine residue of their binding peptides to facilitate various molecular functions. For determining the subtle binding specificities of SH2 domains, it is very important to understand the intriguing mechanisms by which these domains recognize their target peptides in a complex cellular environment. There are several attempts have been made to predict SH2-peptide interactions using high-throughput data. However, these high-throughput data are often affected by a low signal to noise ratio. Furthermore, the prediction methods have several additional shortcomings, such as linearity problem, high computational complexity, etc. Thus, computational identification of SH2-peptide interactions using high-throughput data remains challenging. Here, we propose a machine learning approach based on an efficient semi-supervised learning technique for the prediction of 51 SH2 domain mediated interactions in the human proteome. In our study, we have successfully employed several strategies to tackle the major problems in computational identification of SH2-peptide interactions.
Integrative interactive visualization of crystal structure, band structure, and Brillouin zone
NASA Astrophysics Data System (ADS)
Hanson, Robert; Hinke, Ben; van Koevering, Matthew; Oses, Corey; Toher, Cormac; Hicks, David; Gossett, Eric; Plata Ramos, Jose; Curtarolo, Stefano; Aflow Collaboration
The AFLOW library is an open-access database for high throughput ab-initio calculations that serves as a resource for the dissemination of computational results in the area of materials science. Our project aims to create an interactive web-based visualization of any structure in the AFLOW database that has associate band structure data in a way that allows novel simultaneous exploration of the crystal structure, band structure, and Brillouin zone. Interactivity is obtained using two synchronized JSmol implementations, one for the crystal structure and one for the Brillouin zone, along with a D3-based band-structure diagram produced on the fly from data obtained from the AFLOW database. The current website portal (http://aflowlib.mems.duke.edu/users/jmolers/matt/website) allows interactive access and visualization of crystal structure, Brillouin zone and band structure for more than 55,000 inorganic crystal structures. This work was supported by the US Navy Office of Naval Research through a Broad Area Announcement administered by Duke University.
Developing science gateways for drug discovery in a grid environment.
Pérez-Sánchez, Horacio; Rezaei, Vahid; Mezhuyev, Vitaliy; Man, Duhu; Peña-García, Jorge; den-Haan, Helena; Gesing, Sandra
2016-01-01
Methods for in silico screening of large databases of molecules increasingly complement and replace experimental techniques to discover novel compounds to combat diseases. As these techniques become more complex and computationally costly we are faced with an increasing problem to provide the research community of life sciences with a convenient tool for high-throughput virtual screening on distributed computing resources. To this end, we recently integrated the biophysics-based drug-screening program FlexScreen into a service, applicable for large-scale parallel screening and reusable in the context of scientific workflows. Our implementation is based on Pipeline Pilot and Simple Object Access Protocol and provides an easy-to-use graphical user interface to construct complex workflows, which can be executed on distributed computing resources, thus accelerating the throughput by several orders of magnitude.
Leaf-rolling in maize crops: from leaf scoring to canopy-level measurements for phenotyping
Madec, Simon; Irfan, Kamran; Lopez, Jeremy; Comar, Alexis; Hemmerlé, Matthieu; Dutartre, Dan; Praud, Sebastien; Tixier, Marie Helene
2018-01-01
Abstract Leaf rolling in maize crops is one of the main plant reactions to water stress that can be visually scored in the field. However, leaf-scoring techniques do not meet the high-throughput requirements needed by breeders for efficient phenotyping. Consequently, this study investigated the relationship between leaf-rolling scores and changes in canopy structure that can be determined by high-throughput remote-sensing techniques. Experiments were conducted in 2015 and 2016 on maize genotypes subjected to water stress. Leaf-rolling was scored visually over the whole day around the flowering stage. Concurrent digital hemispherical photographs were taken to evaluate the impact of leaf-rolling on canopy structure using the computed fraction of intercepted diffuse photosynthetically active radiation, FIPARdif. The results showed that leaves started to roll due to water stress around 09:00 h and leaf-rolling reached its maximum around 15:00 h (the photoperiod was about 05:00–20:00 h). In contrast, plants maintained under well-watered conditions did not show any significant rolling during the same day. A canopy-level index of rolling (CLIR) is proposed to quantify the diurnal changes in canopy structure induced by leaf-rolling. It normalizes for the differences in FIPARdif between genotypes observed in the early morning when leaves are unrolled, as well as for yearly effects linked to environmental conditions. Leaf-level rolling score was very strongly correlated with changes in canopy structure as described by the CLIR (r2=0.86, n=370). The daily time course of rolling was characterized using the amplitude of variation, and the rate and the timing of development computed at both the leaf and canopy levels. Results obtained from eight genotypes common between the two years of experiments showed that the amplitude of variation of the CLIR was the more repeatable trait (Spearman coefficient ρ=0.62) as compared to the rate (ρ=0.29) and the timing of development (ρ=0.33). The potential of these findings for the development of a high-throughput method for determining leaf-rolling based on aerial drone observations are considered. PMID:29617837
Computational toxicology is the application of mathematical and computer models to help assess chemical hazards and risks to human health and the environment. Supported by advances in informatics, high-throughput screening (HTS) technologies, and systems biology, the U.S. Environ...
Nanomanufacturing : nano-structured materials made layer-by-layer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cox, James V.; Cheng, Shengfeng; Grest, Gary Stephen
Large-scale, high-throughput production of nano-structured materials (i.e. nanomanufacturing) is a strategic area in manufacturing, with markets projected to exceed $1T by 2015. Nanomanufacturing is still in its infancy; process/product developments are costly and only touch on potential opportunities enabled by growing nanoscience discoveries. The greatest promise for high-volume manufacturing lies in age-old coating and imprinting operations. For materials with tailored nm-scale structure, imprinting/embossing must be achieved at high speeds (roll-to-roll) and/or over large areas (batch operation) with feature sizes less than 100 nm. Dispersion coatings with nanoparticles can also tailor structure through self- or directed-assembly. Layering films structured with thesemore » processes have tremendous potential for efficient manufacturing of microelectronics, photovoltaics and other topical nano-structured devices. This project is designed to perform the requisite R and D to bring Sandia's technology base in computational mechanics to bear on this scale-up problem. Project focus is enforced by addressing a promising imprinting process currently being commercialized.« less
Kuhn, Alexandre; Ong, Yao Min; Quake, Stephen R; Burkholder, William F
2015-07-08
Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed. We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate. This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pan, Jian-Bo; Ji, Nan; Pan, Wen
2014-01-01
Drugs may induce adverse drug reactions (ADRs) when they unexpectedly bind to proteins other than their therapeutic targets. Identification of these undesired protein binding partners, called off-targets, can facilitate toxicity assessment in the early stages of drug development. In this study, a computational framework was introduced for the exploration of idiosyncratic mechanisms underlying analgesic-induced severe adverse drug reactions (SADRs). The putative analgesic-target interactions were predicted by performing reverse docking of analgesics or their active metabolites against human/mammal protein structures in a high-throughput manner. Subsequently, bioinformatics analyses were undertaken to identify ADR-associated proteins (ADRAPs) and pathways. Using the pathways and ADRAPsmore » that this analysis identified, the mechanisms of SADRs such as cardiac disorders were explored. For instance, 53 putative ADRAPs and 24 pathways were linked with cardiac disorders, of which 10 ADRAPs were confirmed by previous experiments. Moreover, it was inferred that pathways such as base excision repair, glycolysis/glyconeogenesis, ErbB signaling, calcium signaling, and phosphatidyl inositol signaling likely play pivotal roles in drug-induced cardiac disorders. In conclusion, our framework offers an opportunity to globally understand SADRs at the molecular level, which has been difficult to realize through experiments. It also provides some valuable clues for drug repurposing. - Highlights: • A novel computational framework was developed for mechanistic study of SADRs. • Off-targets of drugs were identified in large scale and in a high-throughput manner. • SADRs like cardiac disorders were systematically explored in molecular networks. • A number of ADR-associated proteins were identified.« less
Hu, E; Liao, T. W.; Tiersch, T. R.
2013-01-01
Emerging commercial-level technology for aquatic sperm cryopreservation has not been modeled by computer simulation. Commercially available software (ARENA, Rockwell Automation, Inc. Milwaukee, WI) was applied to simulate high-throughput sperm cryopreservation of blue catfish (Ictalurus furcatus) based on existing processing capabilities. The goal was to develop a simulation model suitable for production planning and decision making. The objectives were to: 1) predict the maximum output for 8-hr workday; 2) analyze the bottlenecks within the process, and 3) estimate operational costs when run for daily maximum output. High-throughput cryopreservation was divided into six major steps modeled with time, resources and logic structures. The modeled production processed 18 fish and produced 1164 ± 33 (mean ± SD) 0.5-ml straws containing one billion cryopreserved sperm. Two such production lines could support all hybrid catfish production in the US and 15 such lines could support the entire channel catfish industry if it were to adopt artificial spawning techniques. Evaluations were made to improve efficiency, such as increasing scale, optimizing resources, and eliminating underutilized equipment. This model can serve as a template for other aquatic species and assist decision making in industrial application of aquatic germplasm in aquaculture, stock enhancement, conservation, and biomedical model fishes. PMID:25580079
Role of Open Source Tools and Resources in Virtual Screening for Drug Discovery.
Karthikeyan, Muthukumarasamy; Vyas, Renu
2015-01-01
Advancement in chemoinformatics research in parallel with availability of high performance computing platform has made handling of large scale multi-dimensional scientific data for high throughput drug discovery easier. In this study we have explored publicly available molecular databases with the help of open-source based integrated in-house molecular informatics tools for virtual screening. The virtual screening literature for past decade has been extensively investigated and thoroughly analyzed to reveal interesting patterns with respect to the drug, target, scaffold and disease space. The review also focuses on the integrated chemoinformatics tools that are capable of harvesting chemical data from textual literature information and transform them into truly computable chemical structures, identification of unique fragments and scaffolds from a class of compounds, automatic generation of focused virtual libraries, computation of molecular descriptors for structure-activity relationship studies, application of conventional filters used in lead discovery along with in-house developed exhaustive PTC (Pharmacophore, Toxicophores and Chemophores) filters and machine learning tools for the design of potential disease specific inhibitors. A case study on kinase inhibitors is provided as an example.
Capturing anharmonicity in a lattice thermal conductivity model for high-throughput predictions
Miller, Samuel A.; Gorai, Prashun; Ortiz, Brenden R.; ...
2017-01-06
High-throughput, low-cost, and accurate predictions of thermal properties of new materials would be beneficial in fields ranging from thermal barrier coatings and thermoelectrics to integrated circuits. To date, computational efforts for predicting lattice thermal conductivity (κ L) have been hampered by the complexity associated with computing multiple phonon interactions. In this work, we develop and validate a semiempirical model for κ L by fitting density functional theory calculations to experimental data. Experimental values for κ L come from new measurements on SrIn 2O 4, Ba 2SnO 4, Cu 2ZnSiTe 4, MoTe 2, Ba 3In 2O 6, Cu 3TaTe 4, SnO,more » and InI as well as 55 compounds from across the published literature. Here, to capture the anharmonicity in phonon interactions, we incorporate a structural parameter that allows the model to predict κ L within a factor of 1.5 of the experimental value across 4 orders of magnitude in κ L values and over a diverse chemical and structural phase space, with accuracy similar to or better than that of computationally more expensive models.« less
Notredame, Cedric
2018-05-02
Cedric Notredame from the Centre for Genomic Regulation gives a presentation on New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era at the JGI/Argonne HPC Workshop on January 26, 2010.
HPC AND GRID COMPUTING FOR INTEGRATIVE BIOMEDICAL RESEARCH
Kurc, Tahsin; Hastings, Shannon; Kumar, Vijay; Langella, Stephen; Sharma, Ashish; Pan, Tony; Oster, Scott; Ervin, David; Permar, Justin; Narayanan, Sivaramakrishnan; Gil, Yolanda; Deelman, Ewa; Hall, Mary; Saltz, Joel
2010-01-01
Integrative biomedical research projects query, analyze, and integrate many different data types and make use of datasets obtained from measurements or simulations of structure and function at multiple biological scales. With the increasing availability of high-throughput and high-resolution instruments, the integrative biomedical research imposes many challenging requirements on software middleware systems. In this paper, we look at some of these requirements using example research pattern templates. We then discuss how middleware systems, which incorporate Grid and high-performance computing, could be employed to address the requirements. PMID:20107625
High-throughput Molecular Simulations of MOFs for CO2 Separation: Opportunities and Challenges
NASA Astrophysics Data System (ADS)
Erucar, Ilknur; Keskin, Seda
2018-02-01
Metal organic frameworks (MOFs) have emerged as great alternatives to traditional nanoporous materials for CO2 separation applications. MOFs are porous materials that are formed by self-assembly of transition metals and organic ligands. The most important advantage of MOFs over well-known porous materials is the possibility to generate multiple materials with varying structural properties and chemical functionalities by changing the combination of metal centers and organic linkers during the synthesis. This leads to a large diversity of materials with various pore sizes and shapes that can be efficiently used for CO2 separations. Since the number of synthesized MOFs has already reached to several thousand, experimental investigation of each MOF at the lab-scale is not practical. High-throughput computational screening of MOFs is a great opportunity to identify the best materials for CO2 separation and to gain molecular-level insights into the structure-performance relationships. This type of knowledge can be used to design new materials with the desired structural features that can lead to extraordinarily high CO2 selectivities. In this mini-review, we focused on developments in high-throughput molecular simulations of MOFs for CO2 separations. After reviewing the current studies on this topic, we discussed the opportunities and challenges in the field and addressed the potential future developments.
Performance-scalable volumetric data classification for online industrial inspection
NASA Astrophysics Data System (ADS)
Abraham, Aby J.; Sadki, Mustapha; Lea, R. M.
2002-03-01
Non-intrusive inspection and non-destructive testing of manufactured objects with complex internal structures typically requires the enhancement, analysis and visualization of high-resolution volumetric data. Given the increasing availability of fast 3D scanning technology (e.g. cone-beam CT), enabling on-line detection and accurate discrimination of components or sub-structures, the inherent complexity of classification algorithms inevitably leads to throughput bottlenecks. Indeed, whereas typical inspection throughput requirements range from 1 to 1000 volumes per hour, depending on density and resolution, current computational capability is one to two orders-of-magnitude less. Accordingly, speeding up classification algorithms requires both reduction of algorithm complexity and acceleration of computer performance. A shape-based classification algorithm, offering algorithm complexity reduction, by using ellipses as generic descriptors of solids-of-revolution, and supporting performance-scalability, by exploiting the inherent parallelism of volumetric data, is presented. A two-stage variant of the classical Hough transform is used for ellipse detection and correlation of the detected ellipses facilitates position-, scale- and orientation-invariant component classification. Performance-scalability is achieved cost-effectively by accelerating a PC host with one or more COTS (Commercial-Off-The-Shelf) PCI multiprocessor cards. Experimental results are reported to demonstrate the feasibility and cost-effectiveness of the data-parallel classification algorithm for on-line industrial inspection applications.
Alginate Immobilization of Metabolic Enzymes (AIME) for High-Throughput Screening Assays (SOT)
Alginate Immobilization of Metabolic Enzymes (AIME) for High-Throughput Screening Assays DE DeGroot, RS Thomas, and SO SimmonsNational Center for Computational Toxicology, US EPA, Research Triangle Park, NC USAThe EPA’s ToxCast program utilizes a wide variety of high-throughput s...
DIVE: A Graph-based Visual Analytics Framework for Big Data
Rysavy, Steven J.; Bromley, Dennis; Daggett, Valerie
2014-01-01
The need for data-centric scientific tools is growing; domains like biology, chemistry, and physics are increasingly adopting computational approaches. As a result, scientists must now deal with the challenges of big data. To address these challenges, we built a visual analytics platform named DIVE: Data Intensive Visualization Engine. DIVE is a data-agnostic, ontologically-expressive software framework capable of streaming large datasets at interactive speeds. Here we present the technical details of the DIVE platform, multiple usage examples, and a case study from the Dynameomics molecular dynamics project. We specifically highlight our novel contributions to structured data model manipulation and high-throughput streaming of large, structured datasets. PMID:24808197
Computational approaches to define a human milk metaglycome
Agravat, Sanjay B.; Song, Xuezheng; Rojsajjakul, Teerapat; Cummings, Richard D.; Smith, David F.
2016-01-01
Motivation: The goal of deciphering the human glycome has been hindered by the lack of high-throughput sequencing methods for glycans. Although mass spectrometry (MS) is a key technology in glycan sequencing, MS alone provides limited information about the identification of monosaccharide constituents, their anomericity and their linkages. These features of individual, purified glycans can be partly identified using well-defined glycan-binding proteins, such as lectins and antibodies that recognize specific determinants within glycan structures. Results: We present a novel computational approach to automate the sequencing of glycans using metadata-assisted glycan sequencing, which combines MS analyses with glycan structural information from glycan microarray technology. Success in this approach was aided by the generation of a ‘virtual glycome’ to represent all potential glycan structures that might exist within a metaglycomes based on a set of biosynthetic assumptions using known structural information. We exploited this approach to deduce the structures of soluble glycans within the human milk glycome by matching predicted structures based on experimental data against the virtual glycome. This represents the first meta-glycome to be defined using this method and we provide a publically available web-based application to aid in sequencing milk glycans. Availability and implementation: http://glycomeseq.emory.edu Contact: sagravat@bidmc.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26803164
AmpliVar: mutation detection in high-throughput sequence from amplicon-based libraries.
Hsu, Arthur L; Kondrashova, Olga; Lunke, Sebastian; Love, Clare J; Meldrum, Cliff; Marquis-Nicholson, Renate; Corboy, Greg; Pham, Kym; Wakefield, Matthew; Waring, Paul M; Taylor, Graham R
2015-04-01
Conventional means of identifying variants in high-throughput sequencing align each read against a reference sequence, and then call variants at each position. Here, we demonstrate an orthogonal means of identifying sequence variation by grouping the reads as amplicons prior to any alignment. We used AmpliVar to make key-value hashes of sequence reads and group reads as individual amplicons using a table of flanking sequences. Low-abundance reads were removed according to a selectable threshold, and reads above this threshold were aligned as groups, rather than as individual reads, permitting the use of sensitive alignment tools. We show that this approach is more sensitive, more specific, and more computationally efficient than comparable methods for the analysis of amplicon-based high-throughput sequencing data. The method can be extended to enable alignment-free confirmation of variants seen in hybridization capture target-enrichment data. © 2015 WILEY PERIODICALS, INC.
ASIC-based architecture for the real-time computation of 2D convolution with large kernel size
NASA Astrophysics Data System (ADS)
Shao, Rui; Zhong, Sheng; Yan, Luxin
2015-12-01
Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the ASIC-based implementation of 2-D convolution with medium-large kernels. Aiming to improve the efficiency of storage resources on-chip, reducing off-chip bandwidth of these two issues, proposed construction of a data cache reuse. Multi-block SPRAM to cross cached images and the on-chip ping-pong operation takes full advantage of the data convolution calculation reuse, design a new ASIC data scheduling scheme and overall architecture. Experimental results show that the structure can achieve 40× 32 size of template real-time convolution operations, and improve the utilization of on-chip memory bandwidth and on-chip memory resources, the experimental results show that the structure satisfies the conditions to maximize data throughput output , reducing the need for off-chip memory bandwidth.
Wu, Bainan; Barile, Elisa; De, Surya K; Wei, Jun; Purves, Angela; Pellecchia, Maurizio
2015-01-01
In recent years the ever so complex field of drug discovery has embraced novel design strategies based on biophysical fragment screening (fragment-based drug design; FBDD) using nuclear magnetic resonance spectroscopy (NMR) and/or structure-guided approaches, most often using X-ray crystallography and computer modeling. Experience from recent years unveiled that these methods are more effective and less prone to artifacts compared to biochemical high-throughput screening (HTS) of large collection of compounds in designing protein inhibitors. Hence these strategies are increasingly becoming the most utilized in the modern pharmaceutical industry. Nonetheless, there is still an impending need to develop innovative and effective strategies to tackle other more challenging targets such as those involving protein-protein interactions (PPIs). While HTS strategies notoriously fail to identify viable hits against such targets, few successful examples of PPIs antagonists derived by FBDD strategies exist. Recently, we reported on a new strategy that combines some of the basic principles of fragment-based screening with combinatorial chemistry and NMR-based screening. The approach, termed HTS by NMR, combines the advantages of combinatorial chemistry and NMR-based screening to rapidly and unambiguously identify bona fide inhibitors of PPIs. This review will reiterate the critical aspects of the approach with examples of possible applications.
Wu, Bainan; Barile, Elisa; De, Surya K.; Wei, Jun; Purves, Angela; Pellecchia, Maurizio
2015-01-01
In recent years the ever so complex field of drug discovery has embraced novel design strategies based on biophysical fragment screening (fragment-based drug design; FBDD) using nuclear magnetic resonance spectroscopy (NMR) and/or structure-guided approaches, most often using X-ray crystallography and computer modeling. Experience from recent years unveiled that these methods are more effective and less prone to artifacts compared to biochemical high-throughput screening (HTS) of large collection of compounds in designing protein inhibitors. Hence these strategies are increasingly becoming the most utilized in the modern pharmaceutical industry. Nonetheless, there is still an impending need to develop innovative and effective strategies to tackle other more challenging targets such as those involving protein-protein interactions (PPIs). While HTS strategies notoriously fail to identify viable hits against such targets, few successful examples of PPIs antagonists derived by FBDD strategies exist. Recently, we reported on a new strategy that combines some of the basic principles of fragment-based screening with combinatorial chemistry and NMR-based screening. The approach, termed HTS by NMR, combines the advantages of combinatorial chemistry and NMR-based screening to rapidly and unambiguously identify bona fide inhibitors of PPIs. This review will reiterate the critical aspects of the approach with examples of possible applications. PMID:25986689
The Proteome Folding Project: Proteome-scale prediction of structure and function
Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard
2011-01-01
The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995
ClusCo: clustering and comparison of protein models.
Jamroz, Michal; Kolinski, Andrzej
2013-02-22
The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the one of the bottlenecks in the protein modeling pipeline. Clusco is fast and easy-to-use software for high-throughput comparison of protein models with different similarity measures (cRMSD, dRMSD, GDT_TS, TM-Score, MaxSub, Contact Map Overlap) and clustering of the comparison results with standard methods: K-means Clustering or Hierarchical Agglomerative Clustering. The application was highly optimized and written in C/C++, including the code for parallel execution on CPU and GPU, which resulted in a significant speedup over similar clustering and scoring computation programs.
The ``Missing Compounds'' affair in functionality-driven material discovery
NASA Astrophysics Data System (ADS)
Zunger, Alex
2014-03-01
In the paradigm of ``data-driven discovery,'' underlying one of the leading streams of the Material Genome Initiative (MGI), one attempts to compute high-throughput style as many of the properties of as many of the N (about 10**5- 10**6) compounds listed in databases of previously known compounds. One then inspects the ensuing Big Data, searching for useful trends. The alternative and complimentary paradigm of ``functionality-directed search and optimization'' used here, searches instead for the n much smaller than N configurations and compositions that have the desired value of the target functionality. Examples include the use of genetic and other search methods that optimize the structure or identity of atoms on lattice sites, using atomistic electronic structure (such as first-principles) approaches in search of a given electronic property. This addresses a few of the bottlenecks that have faced the alternative, data-driven/high throughput/Big Data philosophy: (i) When the configuration space is theoretically of infinite size, building a complete data base as in data-driven discovery is impossible, yet searching for the optimum functionality, is still a well-posed problem. (ii) The configuration space that we explore might include artificially grown, kinetically stabilized systems (such as 2D layer stacks; superlattices; colloidal nanostructures; Fullerenes) that are not listed in compound databases (used by data-driven approaches), (iii) a large fraction of chemically plausible compounds have not been experimentally synthesized, so in the data-driven approach these are often skipped. In our approach we search explicitly for such ``Missing Compounds''. It is likely that many interesting material properties will be found in cases (i)-(iii) that elude high throughput searches based on databases encapsulating existing knowledge. I will illustrate (a) Functionality-driven discovery of topological insulators and valley-split quantum-computer semiconductors, as well as (b) Use of ``first principles thermodynamics'' to discern which of the previously ``missing compounds'' should, in fact exist and in which structure. Synthesis efforts by Poeppelmeier group at NU realized 20 never-before-made half-Heusler compounds out of the 20 predicted ones, in our predicted space groups. This type of theory-led experimental search of designed materials with target functionalities may shorten the current process of discovery of interesting functional materials. Supported by DOE ,Office of Science, Energy Frontier Research Center for Inverse Design
Annotare--a tool for annotating high-throughput biomedical investigations and resulting data.
Shankar, Ravi; Parkinson, Helen; Burdett, Tony; Hastings, Emma; Liu, Junmin; Miller, Michael; Srinivasa, Rashmi; White, Joseph; Brazma, Alvis; Sherlock, Gavin; Stoeckert, Christian J; Ball, Catherine A
2010-10-01
Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis. Annotare is available from http://code.google.com/p/annotare/ under the terms of the open-source MIT License (http://www.opensource.org/licenses/mit-license.php). It has been tested on both Mac and Windows.
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Langó, Tamás; Róna, Gergely; Hunyadi-Gulyás, Éva; Turiák, Lilla; Varga, Julia; Dobson, László; Várady, György; Drahos, László; Vértessy, Beáta G; Medzihradszky, Katalin F; Szakács, Gergely; Tusnády, Gábor E
2017-02-13
Transmembrane proteins play crucial role in signaling, ion transport, nutrient uptake, as well as in maintaining the dynamic equilibrium between the internal and external environment of cells. Despite their important biological functions and abundance, less than 2% of all determined structures are transmembrane proteins. Given the persisting technical difficulties associated with high resolution structure determination of transmembrane proteins, additional methods, including computational and experimental techniques remain vital in promoting our understanding of their topologies, 3D structures, functions and interactions. Here we report a method for the high-throughput determination of extracellular segments of transmembrane proteins based on the identification of surface labeled and biotin captured peptide fragments by LC/MS/MS. We show that reliable identification of extracellular protein segments increases the accuracy and reliability of existing topology prediction algorithms. Using the experimental topology data as constraints, our improved prediction tool provides accurate and reliable topology models for hundreds of human transmembrane proteins.
Crystal MD: The massively parallel molecular dynamics software for metal with BCC structure
NASA Astrophysics Data System (ADS)
Hu, Changjun; Bai, He; He, Xinfu; Zhang, Boyao; Nie, Ningming; Wang, Xianmeng; Ren, Yingwen
2017-02-01
Material irradiation effect is one of the most important keys to use nuclear power. However, the lack of high-throughput irradiation facility and knowledge of evolution process, lead to little understanding of the addressed issues. With the help of high-performance computing, we could make a further understanding of micro-level-material. In this paper, a new data structure is proposed for the massively parallel simulation of the evolution of metal materials under irradiation environment. Based on the proposed data structure, we developed the new molecular dynamics software named Crystal MD. The simulation with Crystal MD achieved over 90% parallel efficiency in test cases, and it takes more than 25% less memory on multi-core clusters than LAMMPS and IMD, which are two popular molecular dynamics simulation software. Using Crystal MD, a two trillion particles simulation has been performed on Tianhe-2 cluster.
High-throughput Bayesian Network Learning using Heterogeneous Multicore Computers
Linderman, Michael D.; Athalye, Vivek; Meng, Teresa H.; Asadi, Narges Bani; Bruggner, Robert; Nolan, Garry P.
2017-01-01
Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of Bayesian Networks (BNs) is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important networks (20–50 nodes). In this paper, we present a novel graphics processing unit (GPU)-accelerated implementation of a Monte Carlo Markov Chain-based algorithm for learning BNs that is up to 7.5-fold faster than current general-purpose processor (GPP)-based implementations. The GPU-based implementation is just one of several implementations within the larger application, each optimized for a different input or machine configuration. We describe the methodology we use to build an extensible application, assembled from these variants, that can target a broad range of heterogeneous systems, e.g., GPUs, multicore GPPs. Specifically we show how we use the Merge programming model to efficiently integrate, test and intelligently select among the different potential implementations. PMID:28819655
Li, Xiaolan; Milan Bonotto, Rafaela; No, Joo Hwan; Kim, Keum Hyun; Baek, Sungmin; Kim, Hee Young; Windisch, Marc Peter; Pamplona Mosimann, Ana Luiza; de Borba, Luana; Liuzzi, Michel; Hansen, Michael Adsetts Edberg; Nunes Duarte dos Santos, Claudia; Freitas-Junior, Lucio Holanda
2013-01-01
Dengue virus is a mosquito-borne flavivirus that has a large impact in global health. It is considered as one of the medically important arboviruses, and developing a preventive or therapeutic solution remains a top priority in the medical and scientific community. Drug discovery programs for potential dengue antivirals have increased dramatically over the last decade, largely in part to the introduction of high-throughput assays. In this study, we have developed an image-based dengue high-throughput/high-content assay (HT/HCA) using an innovative computer vision approach to screen a kinase-focused library for anti-dengue compounds. Using this dengue HT/HCA, we identified a group of compounds with a 4-(1-aminoethyl)-N-methylthiazol-2-amine as a common core structure that inhibits dengue viral infection in a human liver-derived cell line (Huh-7.5 cells). Compounds CND1201, CND1203 and CND1243 exhibited strong antiviral activities against all four dengue serotypes. Plaque reduction and time-of-addition assays suggests that these compounds interfere with the late stage of viral infection cycle. These findings demonstrate that our image-based dengue HT/HCA is a reliable tool that can be used to screen various chemical libraries for potential dengue antiviral candidates. PMID:23437413
Prediction of Chemical Function: Model Development and Application
The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (...
Ontology based heterogeneous materials database integration and semantic query
NASA Astrophysics Data System (ADS)
Zhao, Shuai; Qian, Quan
2017-10-01
Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.
SVS: data and knowledge integration in computational biology.
Zycinski, Grzegorz; Barla, Annalisa; Verri, Alessandro
2011-01-01
In this paper we present a framework for structured variable selection (SVS). The main concept of the proposed schema is to take a step towards the integration of two different aspects of data mining: database and machine learning perspective. The framework is flexible enough to use not only microarray data, but other high-throughput data of choice (e.g. from mass spectrometry, microarray, next generation sequencing). Moreover, the feature selection phase incorporates prior biological knowledge in a modular way from various repositories and is ready to host different statistical learning techniques. We present a proof of concept of SVS, illustrating some implementation details and describing current results on high-throughput microarray data.
Awan, Muaaz Gul; Saeed, Fahad
2016-05-15
Modern proteomics studies utilize high-throughput mass spectrometers which can produce data at an astonishing rate. These big mass spectrometry (MS) datasets can easily reach peta-scale level creating storage and analytic problems for large-scale systems biology studies. Each spectrum consists of thousands of peaks which have to be processed to deduce the peptide. However, only a small percentage of peaks in a spectrum are useful for peptide deduction as most of the peaks are either noise or not useful for a given spectrum. This redundant processing of non-useful peaks is a bottleneck for streaming high-throughput processing of big MS data. One way to reduce the amount of computation required in a high-throughput environment is to eliminate non-useful peaks. Existing noise removing algorithms are limited in their data-reduction capability and are compute intensive making them unsuitable for big data and high-throughput environments. In this paper we introduce a novel low-complexity technique based on classification, quantization and sampling of MS peaks. We present a novel data-reductive strategy for analysis of Big MS data. Our algorithm, called MS-REDUCE, is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100× speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server. The developed tool and strategy has been made available to wider proteomics and parallel computing community and the code can be found at https://github.com/pcdslab/MSREDUCE CONTACT: : fahad.saeed@wmich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Novel approach of fragment-based lead discovery applied to renin inhibitors.
Tawada, Michiko; Suzuki, Shinkichi; Imaeda, Yasuhiro; Oki, Hideyuki; Snell, Gyorgy; Behnke, Craig A; Kondo, Mitsuyo; Tarui, Naoki; Tanaka, Toshimasa; Kuroita, Takanobu; Tomimoto, Masaki
2016-11-15
A novel approach was conducted for fragment-based lead discovery and applied to renin inhibitors. The biochemical screening of a fragment library against renin provided the hit fragment which showed a characteristic interaction pattern with the target protein. The hit fragment bound only to the S1, S3, and S3 SP (S3 subpocket) sites without any interactions with the catalytic aspartate residues (Asp32 and Asp215 (pepsin numbering)). Prior to making chemical modifications to the hit fragment, we first identified its essential binding sites by utilizing the hit fragment's substructures. Second, we created a new and smaller scaffold, which better occupied the identified essential S3 and S3 SP sites, by utilizing library synthesis with high-throughput chemistry. We then revisited the S1 site and efficiently explored a good building block attaching to the scaffold with library synthesis. In the library syntheses, the binding modes of each pivotal compound were determined and confirmed by X-ray crystallography and the library was strategically designed by structure-based computational approach not only to obtain a more active compound but also to obtain informative Structure Activity Relationship (SAR). As a result, we obtained a lead compound offering synthetic accessibility as well as the improved in vitro ADMET profiles. The fragments and compounds possessing a characteristic interaction pattern provided new structural insights into renin's active site and the potential to create a new generation of renin inhibitors. In addition, we demonstrated our FBDD strategy integrating highly sensitive biochemical assay, X-ray crystallography, and high-throughput synthesis and in silico library design aimed at fragment morphing at the initial stage was effective to elucidate a pocket profile and a promising lead compound. Copyright © 2016 Elsevier Ltd. All rights reserved.
A suite of MATLAB-based computational tools for automated analysis of COPAS Biosort data
Morton, Elizabeth; Lamitina, Todd
2010-01-01
Complex Object Parametric Analyzer and Sorter (COPAS) devices are large-object, fluorescence-capable flow cytometers used for high-throughput analysis of live model organisms, including Drosophila melanogaster, Caenorhabditis elegans, and zebrafish. The COPAS is especially useful in C. elegans high-throughput genome-wide RNA interference (RNAi) screens that utilize fluorescent reporters. However, analysis of data from such screens is relatively labor-intensive and time-consuming. Currently, there are no computational tools available to facilitate high-throughput analysis of COPAS data. We used MATLAB to develop algorithms (COPAquant, COPAmulti, and COPAcompare) to analyze different types of COPAS data. COPAquant reads single-sample files, filters and extracts values and value ratios for each file, and then returns a summary of the data. COPAmulti reads 96-well autosampling files generated with the ReFLX adapter, performs sample filtering, graphs features across both wells and plates, performs some common statistical measures for hit identification, and outputs results in graphical formats. COPAcompare performs a correlation analysis between replicate 96-well plates. For many parameters, thresholds may be defined through a simple graphical user interface (GUI), allowing our algorithms to meet a variety of screening applications. In a screen for regulators of stress-inducible GFP expression, COPAquant dramatically accelerated data analysis and allowed us to rapidly move from raw data to hit identification. Because the COPAS file structure is standardized and our MATLAB code is freely available, our algorithms should be extremely useful for analysis of COPAS data from multiple platforms and organisms. The MATLAB code is freely available at our web site (www.med.upenn.edu/lamitinalab/downloads.shtml). PMID:20569218
Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database
Butkiewicz, Mariusz; Lowe, Edward W.; Mueller, Ralf; Mendenhall, Jeffrey L.; Teixeira, Pedro L.; Weaver, C. David; Meiler, Jens
2013-01-01
With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed. PMID:23299552
Information-based management mode based on value network analysis for livestock enterprises
NASA Astrophysics Data System (ADS)
Liu, Haoqi; Lee, Changhoon; Han, Mingming; Su, Zhongbin; Padigala, Varshinee Anu; Shen, Weizheng
2018-01-01
With the development of computer and IT technologies, enterprise management has gradually become information-based management. Moreover, due to poor technical competence and non-uniform management, most breeding enterprises show a lack of organisation in data collection and management. In addition, low levels of efficiency result in increasing production costs. This paper adopts 'struts2' in order to construct an information-based management system for standardised and normalised management within the process of production in beef cattle breeding enterprises. We present a radio-frequency identification system by studying multiple-tag anti-collision via a dynamic grouping ALOHA algorithm. This algorithm is based on the existing ALOHA algorithm and uses an improved packet dynamic of this algorithm, which is characterised by a high-throughput rate. This new algorithm can reach a throughput 42% higher than that of the general ALOHA algorithm. With a change in the number of tags, the system throughput is relatively stable.
Asif, Muhammad; Guo, Xiangzhou; Zhang, Jing; Miao, Jungang
2018-04-17
Digital cross-correlation is central to many applications including but not limited to Digital Image Processing, Satellite Navigation and Remote Sensing. With recent advancements in digital technology, the computational demands of such applications have increased enormously. In this paper we are presenting a high throughput digital cross correlator, capable of processing 1-bit digitized stream, at the rate of up to 2 GHz, simultaneously on 64 channels i.e., approximately 4 Trillion correlation and accumulation operations per second. In order to achieve higher throughput, we have focused on frequency based partitioning of our design and tried to minimize and localize high frequency operations. This correlator is designed for a Passive Millimeter Wave Imager intended for the detection of contraband items concealed on human body. The goals are to increase the system bandwidth, achieve video rate imaging, improve sensitivity and reduce the size. Design methodology is detailed in subsequent sections, elaborating the techniques enabling high throughput. The design is verified for Xilinx Kintex UltraScale device in simulation and the implementation results are given in terms of device utilization and power consumption estimates. Our results show considerable improvements in throughput as compared to our baseline design, while the correlator successfully meets the functional requirements.
Adverse outcome pathway networks II: Network analytics
The US EPA is developing more cost effective and efficient ways to evaluate chemical safety using high throughput and computationally based testing strategies. An important component of this approach is the ability to translate chemical effects on fundamental biological processes...
Screening Chemicals for Estrogen Receptor Bioactivity Using a Computational Model.
Browne, Patience; Judson, Richard S; Casey, Warren M; Kleinstreuer, Nicole C; Thomas, Russell S
2015-07-21
The U.S. Environmental Protection Agency (EPA) is considering high-throughput and computational methods to evaluate the endocrine bioactivity of environmental chemicals. Here we describe a multistep, performance-based validation of new methods and demonstrate that these new tools are sufficiently robust to be used in the Endocrine Disruptor Screening Program (EDSP). Results from 18 estrogen receptor (ER) ToxCast high-throughput screening assays were integrated into a computational model that can discriminate bioactivity from assay-specific interference and cytotoxicity. Model scores range from 0 (no activity) to 1 (bioactivity of 17β-estradiol). ToxCast ER model performance was evaluated for reference chemicals, as well as results of EDSP Tier 1 screening assays in current practice. The ToxCast ER model accuracy was 86% to 93% when compared to reference chemicals and predicted results of EDSP Tier 1 guideline and other uterotrophic studies with 84% to 100% accuracy. The performance of high-throughput assays and ToxCast ER model predictions demonstrates that these methods correctly identify active and inactive reference chemicals, provide a measure of relative ER bioactivity, and rapidly identify chemicals with potential endocrine bioactivities for additional screening and testing. EPA is accepting ToxCast ER model data for 1812 chemicals as alternatives for EDSP Tier 1 ER binding, ER transactivation, and uterotrophic assays.
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
A Low-Power High-Speed Smart Sensor Design for Space Exploration Missions
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi
1997-01-01
A low-power high-speed smart sensor system based on a large format active pixel sensor (APS) integrated with a programmable neural processor for space exploration missions is presented. The concept of building an advanced smart sensing system is demonstrated by a system-level microchip design that is composed with an APS sensor, a programmable neural processor, and an embedded microprocessor in a SOI CMOS technology. This ultra-fast smart sensor system-on-a-chip design mimics what is inherent in biological vision systems. Moreover, it is programmable and capable of performing ultra-fast machine vision processing in all levels such as image acquisition, image fusion, image analysis, scene interpretation, and control functions. The system provides about one tera-operation-per-second computing power which is a two order-of-magnitude increase over that of state-of-the-art microcomputers. Its high performance is due to massively parallel computing structures, high data throughput rates, fast learning capabilities, and advanced VLSI system-on-a-chip implementation.
Accelerating Time Integration for the Shallow Water Equations on the Sphere Using GPUs
Archibald, R.; Evans, K. J.; Salinger, A.
2015-06-01
The push towards larger and larger computational platforms has made it possible for climate simulations to resolve climate dynamics across multiple spatial and temporal scales. This direction in climate simulation has created a strong need to develop scalable timestepping methods capable of accelerating throughput on high performance computing. This study details the recent advances in the implementation of implicit time stepping of the spectral element dynamical core within the United States Department of Energy (DOE) Accelerated Climate Model for Energy (ACME) on graphical processing units (GPU) based machines. We demonstrate how solvers in the Trilinos project are interfaced with ACMEmore » and GPU kernels to increase computational speed of the residual calculations in the implicit time stepping method for the atmosphere dynamics. We demonstrate the optimization gains and data structure reorganization that facilitates the performance improvements.« less
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
High throughput on-chip analysis of high-energy charged particle tracks using lensfree imaging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luo, Wei; Shabbir, Faizan; Gong, Chao
2015-04-13
We demonstrate a high-throughput charged particle analysis platform, which is based on lensfree on-chip microscopy for rapid ion track analysis using allyl diglycol carbonate, i.e., CR-39 plastic polymer as the sensing medium. By adopting a wide-area opto-electronic image sensor together with a source-shifting based pixel super-resolution technique, a large CR-39 sample volume (i.e., 4 cm × 4 cm × 0.1 cm) can be imaged in less than 1 min using a compact lensfree on-chip microscope, which detects partially coherent in-line holograms of the ion tracks recorded within the CR-39 detector. After the image capture, using highly parallelized reconstruction and ion track analysis algorithms running on graphics processingmore » units, we reconstruct and analyze the entire volume of a CR-39 detector within ∼1.5 min. This significant reduction in the entire imaging and ion track analysis time not only increases our throughput but also allows us to perform time-resolved analysis of the etching process to monitor and optimize the growth of ion tracks during etching. This computational lensfree imaging platform can provide a much higher throughput and more cost-effective alternative to traditional lens-based scanning optical microscopes for ion track analysis using CR-39 and other passive high energy particle detectors.« less
Kusne, Aaron Gilad; Gao, Tieren; Mehta, Apurva; Ke, Liqin; Nguyen, Manh Cuong; Ho, Kai-Ming; Antropov, Vladimir; Wang, Cai-Zhuang; Kramer, Matthew J.; Long, Christian; Takeuchi, Ichiro
2014-01-01
Advanced materials characterization techniques with ever-growing data acquisition speed and storage capabilities represent a challenge in modern materials science, and new procedures to quickly assess and analyze the data are needed. Machine learning approaches are effective in reducing the complexity of data and rapidly homing in on the underlying trend in multi-dimensional data. Here, we show that by employing an algorithm called the mean shift theory to a large amount of diffraction data in high-throughput experimentation, one can streamline the process of delineating the structural evolution across compositional variations mapped on combinatorial libraries with minimal computational cost. Data collected at a synchrotron beamline are analyzed on the fly, and by integrating experimental data with the inorganic crystal structure database (ICSD), we can substantially enhance the accuracy in classifying the structural phases across ternary phase spaces. We have used this approach to identify a novel magnetic phase with enhanced magnetic anisotropy which is a candidate for rare-earth free permanent magnet. PMID:25220062
Adverse outcome pathway networks: Development, analytics and applications
The US EPA is developing more cost effective and efficient ways to evaluate chemical safety using high throughput and computationally based testing strategies. An important component of this approach is the ability to translate chemical effects on fundamental biological processes...
Adverse outcome pathway networks I: Development and applications
The US EPA is developing more cost effective and efficient ways to evaluate chemical safety using high throughput and computationally based testing strategies. An important component of this approach is the ability to translate chemical effects on fundamental biological processes...
Adverse outcome pathway networks: Development, analytics, and applications
Product Description:The US EPA is developing more cost effective and efficient ways to evaluate chemical safety using high throughput and computationally based testing strategies. An important component of this approach is the ability to translate chemical effects on fundamental ...
Congenital limb malformations are among the most frequent malformation occurs in humans, with a frequency of about 1 in 500 to 1 in 1000 human live births. ToxCast is profiling the bioactivity of thousands of chemicals based on high-throughput (HTS) and computational methods that...
Optimization and high-throughput screening of antimicrobial peptides.
Blondelle, Sylvie E; Lohner, Karl
2010-01-01
While a well-established process for lead compound discovery in for-profit companies, high-throughput screening is becoming more popular in basic and applied research settings in academia. The development of combinatorial libraries combined with easy and less expensive access to new technologies have greatly contributed to the implementation of high-throughput screening in academic laboratories. While such techniques were earlier applied to simple assays involving single targets or based on binding affinity, they have now been extended to more complex systems such as whole cell-based assays. In particular, the urgent need for new antimicrobial compounds that would overcome the rapid rise of drug-resistant microorganisms, where multiple target assays or cell-based assays are often required, has forced scientists to focus onto high-throughput technologies. Based on their existence in natural host defense systems and their different mode of action relative to commercial antibiotics, antimicrobial peptides represent a new hope in discovering novel antibiotics against multi-resistant bacteria. The ease of generating peptide libraries in different formats has allowed a rapid adaptation of high-throughput assays to the search for novel antimicrobial peptides. Similarly, the availability nowadays of high-quantity and high-quality antimicrobial peptide data has permitted the development of predictive algorithms to facilitate the optimization process. This review summarizes the various library formats that lead to de novo antimicrobial peptide sequences as well as the latest structural knowledge and optimization processes aimed at improving the peptides selectivity.
O'Donnell, Michael
2015-01-01
State-and-transition simulation modeling relies on knowledge of vegetation composition and structure (states) that describe community conditions, mechanistic feedbacks such as fire that can affect vegetation establishment, and ecological processes that drive community conditions as well as the transitions between these states. However, as the need for modeling larger and more complex landscapes increase, a more advanced awareness of computing resources becomes essential. The objectives of this study include identifying challenges of executing state-and-transition simulation models, identifying common bottlenecks of computing resources, developing a workflow and software that enable parallel processing of Monte Carlo simulations, and identifying the advantages and disadvantages of different computing resources. To address these objectives, this study used the ApexRMS® SyncroSim software and embarrassingly parallel tasks of Monte Carlo simulations on a single multicore computer and on distributed computing systems. The results demonstrated that state-and-transition simulation models scale best in distributed computing environments, such as high-throughput and high-performance computing, because these environments disseminate the workloads across many compute nodes, thereby supporting analysis of larger landscapes, higher spatial resolution vegetation products, and more complex models. Using a case study and five different computing environments, the top result (high-throughput computing versus serial computations) indicated an approximate 96.6% decrease of computing time. With a single, multicore compute node (bottom result), the computing time indicated an 81.8% decrease relative to using serial computations. These results provide insight into the tradeoffs of using different computing resources when research necessitates advanced integration of ecoinformatics incorporating large and complicated data inputs and models. - See more at: http://aimspress.com/aimses/ch/reader/view_abstract.aspx?file_no=Environ2015030&flag=1#sthash.p1XKDtF8.dpuf
NASA Astrophysics Data System (ADS)
van Setten, M. J.; Giantomassi, M.; Gonze, X.; Rignanese, G.-M.; Hautier, G.
2017-10-01
The search for new materials based on computational screening relies on methods that accurately predict, in an automatic manner, total energy, atomic-scale geometries, and other fundamental characteristics of materials. Many technologically important material properties directly stem from the electronic structure of a material, but the usual workhorse for total energies, namely density-functional theory, is plagued by fundamental shortcomings and errors from approximate exchange-correlation functionals in its prediction of the electronic structure. At variance, the G W method is currently the state-of-the-art ab initio approach for accurate electronic structure. It is mostly used to perturbatively correct density-functional theory results, but is, however, computationally demanding and also requires expert knowledge to give accurate results. Accordingly, it is not presently used in high-throughput screening: fully automatized algorithms for setting up the calculations and determining convergence are lacking. In this paper, we develop such a method and, as a first application, use it to validate the accuracy of G0W0 using the PBE starting point and the Godby-Needs plasmon-pole model (G0W0GN @PBE) on a set of about 80 solids. The results of the automatic convergence study utilized provide valuable insights. Indeed, we find correlations between computational parameters that can be used to further improve the automatization of G W calculations. Moreover, we find that G0W0GN @PBE shows a correlation between the PBE and the G0W0GN @PBE gaps that is much stronger than that between G W and experimental gaps. However, the G0W0GN @PBE gaps still describe the experimental gaps more accurately than a linear model based on the PBE gaps. With this paper, we hence show that G W can be made automatic and is more accurate than using an empirical correction of the PBE gap, but that, for accurate predictive results for a broad class of materials, an improved starting point or some type of self-consistency is necessary.
Windows .NET Network Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST)
Dowd, Scot E; Zaragoza, Joaquin; Rodriguez, Javier R; Oliver, Melvin J; Payton, Paxton R
2005-01-01
Background BLAST is one of the most common and useful tools for Genetic Research. This paper describes a software application we have termed Windows .NET Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST), which enhances the BLAST utility by improving usability, fault recovery, and scalability in a Windows desktop environment. Our goal was to develop an easy to use, fault tolerant, high-throughput BLAST solution that incorporates a comprehensive BLAST result viewer with curation and annotation functionality. Results W.ND-BLAST is a comprehensive Windows-based software toolkit that targets researchers, including those with minimal computer skills, and provides the ability increase the performance of BLAST by distributing BLAST queries to any number of Windows based machines across local area networks (LAN). W.ND-BLAST provides intuitive Graphic User Interfaces (GUI) for BLAST database creation, BLAST execution, BLAST output evaluation and BLAST result exportation. This software also provides several layers of fault tolerance and fault recovery to prevent loss of data if nodes or master machines fail. This paper lays out the functionality of W.ND-BLAST. W.ND-BLAST displays close to 100% performance efficiency when distributing tasks to 12 remote computers of the same performance class. A high throughput BLAST job which took 662.68 minutes (11 hours) on one average machine was completed in 44.97 minutes when distributed to 17 nodes, which included lower performance class machines. Finally, there is a comprehensive high-throughput BLAST Output Viewer (BOV) and Annotation Engine components, which provides comprehensive exportation of BLAST hits to text files, annotated fasta files, tables, or association files. Conclusion W.ND-BLAST provides an interactive tool that allows scientists to easily utilizing their available computing resources for high throughput and comprehensive sequence analyses. The install package for W.ND-BLAST is freely downloadable from . With registration the software is free, installation, networking, and usage instructions are provided as well as a support forum. PMID:15819992
Morphology control in polymer blend fibers—a high throughput computing approach
NASA Astrophysics Data System (ADS)
Sesha Sarath Pokuri, Balaji; Ganapathysubramanian, Baskar
2016-08-01
Fibers made from polymer blends have conventionally enjoyed wide use, particularly in textiles. This wide applicability is primarily aided by the ease of manufacturing such fibers. More recently, the ability to tailor the internal morphology of polymer blend fibers by carefully designing processing conditions has enabled such fibers to be used in technologically relevant applications. Some examples include anisotropic insulating properties for heat and anisotropic wicking of moisture, coaxial morphologies for optical applications as well as fibers with high internal surface area for filtration and catalysis applications. However, identifying the appropriate processing conditions from the large space of possibilities using conventional trial-and-error approaches is a tedious and resource-intensive process. Here, we illustrate a high throughput computational approach to rapidly explore and characterize how processing conditions (specifically blend ratio and evaporation rates) affect the internal morphology of polymer blends during solvent based fabrication. We focus on a PS: PMMA system and identify two distinct classes of morphologies formed due to variations in the processing conditions. We subsequently map the processing conditions to the morphology class, thus constructing a ‘phase diagram’ that enables rapid identification of processing parameters for specific morphology class. We finally demonstrate the potential for time dependent processing conditions to get desired features of the morphology. This opens up the possibility of rational stage-wise design of processing pathways for tailored fiber morphology using high throughput computing.
Annotare—a tool for annotating high-throughput biomedical investigations and resulting data
Shankar, Ravi; Parkinson, Helen; Burdett, Tony; Hastings, Emma; Liu, Junmin; Miller, Michael; Srinivasa, Rashmi; White, Joseph; Brazma, Alvis; Sherlock, Gavin; Stoeckert, Christian J.; Ball, Catherine A.
2010-01-01
Summary: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis. Availability and Implementation: Annotare is available from http://code.google.com/p/annotare/ under the terms of the open-source MIT License (http://www.opensource.org/licenses/mit-license.php). It has been tested on both Mac and Windows. Contact: rshankar@stanford.edu PMID:20733062
Trade-Offs in Thin Film Solar Cells with Layered Chalcostibite Photovoltaic Absorbers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Welch, Adam W.; Baranowski, Lauryn L.; Peng, Haowei
Discovery of novel semiconducting materials is needed for solar energy conversion and other optoelectronic applications. However, emerging low-dimensional solar absorbers often have unconventional crystal structures and unusual combinations of optical absorption and electrical transport properties, which considerably slows down the research and development progress. Here, the effect of stronger absorption and weaker carrier collection of 2D-like absorber materials are studied using a high-throughput combinatorial experimental approach, complemented by advanced characterization and computations. It is found that the photoexcited charge carrier collection in CuSbSe 2 solar cells is enhanced by drift in an electric field, addressing a different absorption/collection balance. Themore » resulting drift solar cells efficiency is <5% due to inherent J SC/ V OC trade-off, suggesting that improved carrier diffusion and better contacts are needed to further increase the CuSbSe 2 performance. Furthermore, this study also illustrates the advantages of high-throughput experimental methods for fast optimization of the optoelectronic devices based on emerging low-dimensional semiconductor materials.« less
Trade-Offs in Thin Film Solar Cells with Layered Chalcostibite Photovoltaic Absorbers
Welch, Adam W.; Baranowski, Lauryn L.; Peng, Haowei; ...
2017-01-25
Discovery of novel semiconducting materials is needed for solar energy conversion and other optoelectronic applications. However, emerging low-dimensional solar absorbers often have unconventional crystal structures and unusual combinations of optical absorption and electrical transport properties, which considerably slows down the research and development progress. Here, the effect of stronger absorption and weaker carrier collection of 2D-like absorber materials are studied using a high-throughput combinatorial experimental approach, complemented by advanced characterization and computations. It is found that the photoexcited charge carrier collection in CuSbSe 2 solar cells is enhanced by drift in an electric field, addressing a different absorption/collection balance. Themore » resulting drift solar cells efficiency is <5% due to inherent J SC/ V OC trade-off, suggesting that improved carrier diffusion and better contacts are needed to further increase the CuSbSe 2 performance. Furthermore, this study also illustrates the advantages of high-throughput experimental methods for fast optimization of the optoelectronic devices based on emerging low-dimensional semiconductor materials.« less
Adaptations in Electronic Structure Calculations in Heterogeneous Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Talamudupula, Sai
Modern quantum chemistry deals with electronic structure calculations of unprecedented complexity and accuracy. They demand full power of high-performance computing and must be in tune with the given architecture for superior e ciency. To make such applications resourceaware, it is desirable to enable their static and dynamic adaptations using some external software (middleware), which may monitor both system availability and application needs, rather than mix science with system-related calls inside the application. The present work investigates scienti c application interlinking with middleware based on the example of the computational chemistry package GAMESS and middleware NICAN. The existing synchronous model ismore » limited by the possible delays due to the middleware processing time under the sustainable runtime system conditions. Proposed asynchronous and hybrid models aim at overcoming this limitation. When linked with NICAN, the fragment molecular orbital (FMO) method is capable of adapting statically and dynamically its fragment scheduling policy based on the computing platform conditions. Signi cant execution time and throughput gains have been obtained due to such static adaptations when the compute nodes have very di erent core counts. Dynamic adaptations are based on the main memory availability at run time. NICAN prompts FMO to postpone scheduling certain fragments, if there is not enough memory for their immediate execution. Hence, FMO may be able to complete the calculations whereas without such adaptations it aborts.« less
Computational Lipidomics and Lipid Bioinformatics: Filling In the Blanks.
Pauling, Josch; Klipp, Edda
2016-12-22
Lipids are highly diverse metabolites of pronounced importance in health and disease. While metabolomics is a broad field under the omics umbrella that may also relate to lipids, lipidomics is an emerging field which specializes in the identification, quantification and functional interpretation of complex lipidomes. Today, it is possible to identify and distinguish lipids in a high-resolution, high-throughput manner and simultaneously with a lot of structural detail. However, doing so may produce thousands of mass spectra in a single experiment which has created a high demand for specialized computational support to analyze these spectral libraries. The computational biology and bioinformatics community has so far established methodology in genomics, transcriptomics and proteomics but there are many (combinatorial) challenges when it comes to structural diversity of lipids and their identification, quantification and interpretation. This review gives an overview and outlook on lipidomics research and illustrates ongoing computational and bioinformatics efforts. These efforts are important and necessary steps to advance the lipidomics field alongside analytic, biochemistry, biomedical and biology communities and to close the gap in available computational methodology between lipidomics and other omics sub-branches.
A High-Throughput Processor for Flight Control Research Using Small UAVs
NASA Technical Reports Server (NTRS)
Klenke, Robert H.; Sleeman, W. C., IV; Motter, Mark A.
2006-01-01
There are numerous autopilot systems that are commercially available for small (<100 lbs) UAVs. However, they all share several key disadvantages for conducting aerodynamic research, chief amongst which is the fact that most utilize older, slower, 8- or 16-bit microcontroller technologies. This paper describes the development and testing of a flight control system (FCS) for small UAV s based on a modern, high throughput, embedded processor. In addition, this FCS platform contains user-configurable hardware resources in the form of a Field Programmable Gate Array (FPGA) that can be used to implement custom, application-specific hardware. This hardware can be used to off-load routine tasks such as sensor data collection, from the FCS processor thereby further increasing the computational throughput of the system.
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
NASA Astrophysics Data System (ADS)
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad
2015-05-01
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
High throughput computing: a solution for scientific analysis
O'Donnell, M.
2011-01-01
handle job failures due to hardware, software, or network interruptions (obviating the need to manually resubmit the job after each stoppage); be affordable; and most importantly, allow us to complete very large, complex analyses that otherwise would not even be possible. In short, we envisioned a job-management system that would take advantage of unused FORT CPUs within a local area network (LAN) to effectively distribute and run highly complex analytical processes. What we found was a solution that uses High Throughput Computing (HTC) and High Performance Computing (HPC) systems to do exactly that (Figure 1).
Reddy, Jithender G; Kumar, Dinesh; Hosur, Ramakrishna V
2015-02-01
Protein NMR spectroscopy has expanded dramatically over the last decade into a powerful tool for the study of their structure, dynamics, and interactions. The primary requirement for all such investigations is sequence-specific resonance assignment. The demand now is to obtain this information as rapidly as possible and in all types of protein systems, stable/unstable, soluble/insoluble, small/big, structured/unstructured, and so on. In this context, we introduce here two reduced dimensionality experiments – (3,2)D-hNCOcanH and (3,2)D-hNcoCAnH – which enhance the previously described 2D NMR-based assignment methods quite significantly. Both the experiments can be recorded in just about 2-3 h each and hence would be of immense value for high-throughput structural proteomics and drug discovery research. The applicability of the method has been demonstrated using alpha-helical bovine apo calbindin-D9k P43M mutant (75 aa) protein. Automated assignment of this data using AUTOBA has been presented, which enhances the utility of these experiments. The backbone resonance assignments so derived are utilized to estimate secondary structures and the backbone fold using Web-based algorithms. Taken together, we believe that the method and the protocol proposed here can be used for routine high-throughput structural studies of proteins. Copyright © 2014 John Wiley & Sons, Ltd.
Enabling Large-Scale Biomedical Analysis in the Cloud
Lin, Ying-Chih; Yu, Chin-Sheng; Lin, Yen-Jen
2013-01-01
Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable. PMID:24288665
Stepping into the omics era: Opportunities and challenges for biomaterials science and engineering.
Groen, Nathalie; Guvendiren, Murat; Rabitz, Herschel; Welsh, William J; Kohn, Joachim; de Boer, Jan
2016-04-01
The research paradigm in biomaterials science and engineering is evolving from using low-throughput and iterative experimental designs towards high-throughput experimental designs for materials optimization and the evaluation of materials properties. Computational science plays an important role in this transition. With the emergence of the omics approach in the biomaterials field, referred to as materiomics, high-throughput approaches hold the promise of tackling the complexity of materials and understanding correlations between material properties and their effects on complex biological systems. The intrinsic complexity of biological systems is an important factor that is often oversimplified when characterizing biological responses to materials and establishing property-activity relationships. Indeed, in vitro tests designed to predict in vivo performance of a given biomaterial are largely lacking as we are not able to capture the biological complexity of whole tissues in an in vitro model. In this opinion paper, we explain how we reached our opinion that converging genomics and materiomics into a new field would enable a significant acceleration of the development of new and improved medical devices. The use of computational modeling to correlate high-throughput gene expression profiling with high throughput combinatorial material design strategies would add power to the analysis of biological effects induced by material properties. We believe that this extra layer of complexity on top of high-throughput material experimentation is necessary to tackle the biological complexity and further advance the biomaterials field. In this opinion paper, we postulate that converging genomics and materiomics into a new field would enable a significant acceleration of the development of new and improved medical devices. The use of computational modeling to correlate high-throughput gene expression profiling with high throughput combinatorial material design strategies would add power to the analysis of biological effects induced by material properties. We believe that this extra layer of complexity on top of high-throughput material experimentation is necessary to tackle the biological complexity and further advance the biomaterials field. Copyright © 2016. Published by Elsevier Ltd.
Analysis of high-throughput biological data using their rank values.
Dembélé, Doulaye
2018-01-01
High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .
Structure-based design of combinatorial mutagenesis libraries
Verma, Deeptak; Grigoryan, Gevorg; Bailey-Kellogg, Chris
2015-01-01
The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. We present a general-purpose method, called “Structure-based Optimization of Combinatorial Mutagenesis” (SOCoM), which can optimize arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. SOCoM chooses both positions and substitutions, employing a combinatorial optimization framework based on library-averaged energy potentials in order to avoid explicitly modeling every variant in every possible library. In case study applications to green fluorescent protein, β-lactamase, and lipase A, SOCoM optimizes relatively small, focused libraries whose variants achieve energies comparable to or better than previous library design efforts, as well as larger libraries (previously not designable by structure-based methods) whose variants cover greater diversity while still maintaining substantially better energies than would be achieved by representative random library approaches. By allowing the creation of large-scale combinatorial libraries based on structural calculations, SOCoM promises to increase the scope of applicability of computational protein design and improve the hit rate of discovering beneficial variants. While designs presented here focus on variant stability (predicted by total energy), SOCoM can readily incorporate other structure-based assessments, such as the energy gap between alternative conformational or bound states. PMID:25611189
Structure-based design of combinatorial mutagenesis libraries.
Verma, Deeptak; Grigoryan, Gevorg; Bailey-Kellogg, Chris
2015-05-01
The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. We present a general-purpose method, called "Structure-based Optimization of Combinatorial Mutagenesis" (SOCoM), which can optimize arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. SOCoM chooses both positions and substitutions, employing a combinatorial optimization framework based on library-averaged energy potentials in order to avoid explicitly modeling every variant in every possible library. In case study applications to green fluorescent protein, β-lactamase, and lipase A, SOCoM optimizes relatively small, focused libraries whose variants achieve energies comparable to or better than previous library design efforts, as well as larger libraries (previously not designable by structure-based methods) whose variants cover greater diversity while still maintaining substantially better energies than would be achieved by representative random library approaches. By allowing the creation of large-scale combinatorial libraries based on structural calculations, SOCoM promises to increase the scope of applicability of computational protein design and improve the hit rate of discovering beneficial variants. While designs presented here focus on variant stability (predicted by total energy), SOCoM can readily incorporate other structure-based assessments, such as the energy gap between alternative conformational or bound states. © 2015 The Protein Society.
The Crystal Structure of TAL Effector PthXo1 Bound to Its DNA Target
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mak, Amanda Nga-Sze; Bradley, Philip; Cernadas, Raul A.
2012-02-10
DNA recognition by TAL effectors is mediated by tandem repeats, each 33 to 35 residues in length, that specify nucleotides via unique repeat-variable diresidues (RVDs). The crystal structure of PthXo1 bound to its DNA target was determined by high-throughput computational structure prediction and validated by heavy-atom derivatization. Each repeat forms a left-handed, two-helix bundle that presents an RVD-containing loop to the DNA. The repeats self-associate to form a right-handed superhelix wrapped around the DNA major groove. The first RVD residue forms a stabilizing contact with the protein backbone, while the second makes a base-specific contact to the DNA sense strand.more » Two degenerate amino-terminal repeats also interact with the DNA. Containing several RVDs and noncanonical associations, the structure illustrates the basis of TAL effector-DNA recognition.« less
Xi-cam: a versatile interface for data visualization and analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pandolfi, Ronald J.; Allan, Daniel B.; Arenholz, Elke
Xi-cam is an extensible platform for data management, analysis and visualization.Xi-camaims to provide a flexible and extensible approach to synchrotron data treatment as a solution to rising demands for high-volume/high-throughput processing pipelines. The core ofXi-camis an extensible plugin-based graphical user interface platform which provides users with an interactive interface to processing algorithms. Plugins are available for SAXS/WAXS/GISAXS/GIWAXS, tomography and NEXAFS data. WithXi-cam's `advanced' mode, data processing steps are designed as a graph-based workflow, which can be executed live, locally or remotely. Remote execution utilizes high-performance computing or de-localized resources, allowing for the effective reduction of high-throughput data.Xi-cam's plugin-based architecture targetsmore » cross-facility and cross-technique collaborative development, in support of multi-modal analysis.Xi-camis open-source and cross-platform, and available for download on GitHub.« less
Xi-cam: a versatile interface for data visualization and analysis
Pandolfi, Ronald J.; Allan, Daniel B.; Arenholz, Elke; ...
2018-05-31
Xi-cam is an extensible platform for data management, analysis and visualization.Xi-camaims to provide a flexible and extensible approach to synchrotron data treatment as a solution to rising demands for high-volume/high-throughput processing pipelines. The core ofXi-camis an extensible plugin-based graphical user interface platform which provides users with an interactive interface to processing algorithms. Plugins are available for SAXS/WAXS/GISAXS/GIWAXS, tomography and NEXAFS data. WithXi-cam's `advanced' mode, data processing steps are designed as a graph-based workflow, which can be executed live, locally or remotely. Remote execution utilizes high-performance computing or de-localized resources, allowing for the effective reduction of high-throughput data.Xi-cam's plugin-based architecture targetsmore » cross-facility and cross-technique collaborative development, in support of multi-modal analysis.Xi-camis open-source and cross-platform, and available for download on GitHub.« less
Prediction of physical protein protein interactions
NASA Astrophysics Data System (ADS)
Szilágyi, András; Grimm, Vera; Arakaki, Adrián K.; Skolnick, Jeffrey
2005-06-01
Many essential cellular processes such as signal transduction, transport, cellular motion and most regulatory mechanisms are mediated by protein-protein interactions. In recent years, new experimental techniques have been developed to discover the protein-protein interaction networks of several organisms. However, the accuracy and coverage of these techniques have proven to be limited, and computational approaches remain essential both to assist in the design and validation of experimental studies and for the prediction of interaction partners and detailed structures of protein complexes. Here, we provide a critical overview of existing structure-independent and structure-based computational methods. Although these techniques have significantly advanced in the past few years, we find that most of them are still in their infancy. We also provide an overview of experimental techniques for the detection of protein-protein interactions. Although the developments are promising, false positive and false negative results are common, and reliable detection is possible only by taking a consensus of different experimental approaches. The shortcomings of experimental techniques affect both the further development and the fair evaluation of computational prediction methods. For an adequate comparative evaluation of prediction and high-throughput experimental methods, an appropriately large benchmark set of biophysically characterized protein complexes would be needed, but is sorely lacking.
Computer applications making rapid advances in high throughput microbial proteomics (HTMP).
Anandkumar, Balakrishna; Haga, Steve W; Wu, Hui-Fen
2014-02-01
The last few decades have seen the rise of widely-available proteomics tools. From new data acquisition devices, such as MALDI-MS and 2DE to new database searching softwares, these new products have paved the way for high throughput microbial proteomics (HTMP). These tools are enabling researchers to gain new insights into microbial metabolism, and are opening up new areas of study, such as protein-protein interactions (interactomics) discovery. Computer software is a key part of these emerging fields. This current review considers: 1) software tools for identifying the proteome, such as MASCOT or PDQuest, 2) online databases of proteomes, such as SWISS-PROT, Proteome Web, or the Proteomics Facility of the Pathogen Functional Genomics Resource Center, and 3) software tools for applying proteomic data, such as PSI-BLAST or VESPA. These tools allow for research in network biology, protein identification, functional annotation, target identification/validation, protein expression, protein structural analysis, metabolic pathway engineering and drug discovery.
Hu, Jun; Liu, Zi; Yu, Dong-Jun; Zhang, Yang
2018-02-15
Sequence-order independent structural comparison, also called structural alignment, of small ligand molecules is often needed for computer-aided virtual drug screening. Although many ligand structure alignment programs are proposed, most of them build the alignments based on rigid-body shape comparison which cannot provide atom-specific alignment information nor allow structural variation; both abilities are critical to efficient high-throughput virtual screening. We propose a novel ligand comparison algorithm, LS-align, to generate fast and accurate atom-level structural alignments of ligand molecules, through an iterative heuristic search of the target function that combines inter-atom distance with mass and chemical bond comparisons. LS-align contains two modules of Rigid-LS-align and Flexi-LS-align, designed for rigid-body and flexible alignments, respectively, where a ligand-size independent, statistics-based scoring function is developed to evaluate the similarity of ligand molecules relative to random ligand pairs. Large-scale benchmark tests are performed on prioritizing chemical ligands of 102 protein targets involving 1,415,871 candidate compounds from the DUD-E (Database of Useful Decoys: Enhanced) database, where LS-align achieves an average enrichment factor (EF) of 22.0 at the 1% cutoff and the AUC score of 0.75, which are significantly higher than other state-of-the-art methods. Detailed data analyses show that the advanced performance is mainly attributed to the design of the target function that combines structural and chemical information to enhance the sensitivity of recognizing subtle difference of ligand molecules and the introduces of structural flexibility that help capture the conformational changes induced by the ligand-receptor binding interactions. These data demonstrate a new avenue to improve the virtual screening efficiency through the development of sensitive ligand structural alignments. http://zhanglab.ccmb.med.umich.edu/LS-align/. njyudj@njust.edu.cn or zhng@umich.edu. Supplementary data are available at Bioinformatics online.
Ekins, Sean; Olechno, Joe; Williams, Antony J.
2013-01-01
Dispensing and dilution processes may profoundly influence estimates of biological activity of compounds. Published data show Ephrin type-B receptor 4 IC50 values obtained via tip-based serial dilution and dispensing versus acoustic dispensing with direct dilution differ by orders of magnitude with no correlation or ranking of datasets. We generated computational 3D pharmacophores based on data derived by both acoustic and tip-based transfer. The computed pharmacophores differ significantly depending upon dispensing and dilution methods. The acoustic dispensing-derived pharmacophore correctly identified active compounds in a subsequent test set where the tip-based method failed. Data from acoustic dispensing generates a pharmacophore containing two hydrophobic features, one hydrogen bond donor and one hydrogen bond acceptor. This is consistent with X-ray crystallography studies of ligand-protein interactions and automatically generated pharmacophores derived from this structural data. In contrast, the tip-based data suggest a pharmacophore with two hydrogen bond acceptors, one hydrogen bond donor and no hydrophobic features. This pharmacophore is inconsistent with the X-ray crystallographic studies and automatically generated pharmacophores. In short, traditional dispensing processes are another important source of error in high-throughput screening that impacts computational and statistical analyses. These findings have far-reaching implications in biological research. PMID:23658723
High-Resolution Melt Analysis for Rapid Comparison of Bacterial Community Compositions
Hjelmsø, Mathis Hjort; Hansen, Lars Hestbjerg; Bælum, Jacob; Feld, Louise; Holben, William E.
2014-01-01
In the study of bacterial community composition, 16S rRNA gene amplicon sequencing is today among the preferred methods of analysis. The cost of nucleotide sequence analysis, including requisite computational and bioinformatic steps, however, takes up a large part of many research budgets. High-resolution melt (HRM) analysis is the study of the melt behavior of specific PCR products. Here we describe a novel high-throughput approach in which we used HRM analysis targeting the 16S rRNA gene to rapidly screen multiple complex samples for differences in bacterial community composition. We hypothesized that HRM analysis of amplified 16S rRNA genes from a soil ecosystem could be used as a screening tool to identify changes in bacterial community structure. This hypothesis was tested using a soil microcosm setup exposed to a total of six treatments representing different combinations of pesticide and fertilization treatments. The HRM analysis identified a shift in the bacterial community composition in two of the treatments, both including the soil fumigant Basamid GR. These results were confirmed with both denaturing gradient gel electrophoresis (DGGE) analysis and 454-based 16S rRNA gene amplicon sequencing. HRM analysis was shown to be a fast, high-throughput technique that can serve as an effective alternative to gel-based screening methods to monitor microbial community composition. PMID:24610853
Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sulakhe, D.; Rodriguez, A.; Wilde, M.
2008-03-01
Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less
Machine learning of molecular electronic properties in chemical compound space
NASA Astrophysics Data System (ADS)
Montavon, Grégoire; Rupp, Matthias; Gobre, Vivekanand; Vazquez-Mayagoitia, Alvaro; Hansen, Katja; Tkatchenko, Alexandre; Müller, Klaus-Robert; Anatole von Lilienfeld, O.
2013-09-01
The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure-property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies. The machine learning model is based on a deep multi-task artificial neural network, exploiting the underlying correlations between various molecular properties. The input is identical to ab initio methods, i.e. nuclear charges and Cartesian coordinates of all atoms. For small organic molecules, the accuracy of such a ‘quantum machine’ is similar, and sometimes superior, to modern quantum-chemical methods—at negligible computational cost.
Modeling limb-bud dysmorphogenesis in a predictive virtual embryo model
ToxCast is profiling the bioactivity of thousands of chemicals based on high-throughput screening (HTS) and computational methods that integrate knowledge of biological systems and in vivo toxicities (www.epa.gov/ncct/toxcast/). Many ToxCast assays assess signaling pathways and c...
Identification of functional modules using network topology and high-throughput data.
Ulitsky, Igor; Shamir, Ron
2007-01-26
With the advent of systems biology, biological knowledge is often represented today by networks. These include regulatory and metabolic networks, protein-protein interaction networks, and many others. At the same time, high-throughput genomics and proteomics techniques generate very large data sets, which require sophisticated computational analysis. Usually, separate and different analysis methodologies are applied to each of the two data types. An integrated investigation of network and high-throughput information together can improve the quality of the analysis by accounting simultaneously for topological network properties alongside intrinsic features of the high-throughput data. We describe a novel algorithmic framework for this challenge. We first transform the high-throughput data into similarity values, (e.g., by computing pairwise similarity of gene expression patterns from microarray data). Then, given a network of genes or proteins and similarity values between some of them, we seek connected sub-networks (or modules) that manifest high similarity. We develop algorithms for this problem and evaluate their performance on the osmotic shock response network in S. cerevisiae and on the human cell cycle network. We demonstrate that focused, biologically meaningful and relevant functional modules are obtained. In comparison with extant algorithms, our approach has higher sensitivity and higher specificity. We have demonstrated that our method can accurately identify functional modules. Hence, it carries the promise to be highly useful in analysis of high throughput data.
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...
2015-05-22
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
Ahmed, Wamiq M; Lenz, Dominik; Liu, Jia; Paul Robinson, J; Ghafoor, Arif
2008-03-01
High-throughput biological imaging uses automated imaging devices to collect a large number of microscopic images for analysis of biological systems and validation of scientific hypotheses. Efficient manipulation of these datasets for knowledge discovery requires high-performance computational resources, efficient storage, and automated tools for extracting and sharing such knowledge among different research sites. Newly emerging grid technologies provide powerful means for exploiting the full potential of these imaging techniques. Efficient utilization of grid resources requires the development of knowledge-based tools and services that combine domain knowledge with analysis algorithms. In this paper, we first investigate how grid infrastructure can facilitate high-throughput biological imaging research, and present an architecture for providing knowledge-based grid services for this field. We identify two levels of knowledge-based services. The first level provides tools for extracting spatiotemporal knowledge from image sets and the second level provides high-level knowledge management and reasoning services. We then present cellular imaging markup language, an extensible markup language-based language for modeling of biological images and representation of spatiotemporal knowledge. This scheme can be used for spatiotemporal event composition, matching, and automated knowledge extraction and representation for large biological imaging datasets. We demonstrate the expressive power of this formalism by means of different examples and extensive experimental results.
Computational design of molecules for an all-quinone redox flow battery.
Er, Süleyman; Suh, Changwon; Marshak, Michael P; Aspuru-Guzik, Alán
2015-02-01
Inspired by the electron transfer properties of quinones in biological systems, we recently showed that quinones are also very promising electroactive materials for stationary energy storage applications. Due to the practically infinite chemical space of organic molecules, the discovery of additional quinones or other redox-active organic molecules for energy storage applications is an open field of inquiry. Here, we introduce a high-throughput computational screening approach that we applied to an accelerated study of a total of 1710 quinone (Q) and hydroquinone (QH 2 ) ( i.e. , two-electron two-proton) redox couples. We identified the promising candidates for both the negative and positive sides of organic-based aqueous flow batteries, thus enabling an all-quinone battery. To further aid the development of additional interesting electroactive small molecules we also provide emerging quantitative structure-property relationships.
Multi-scale structural community organisation of the human genome.
Boulos, Rasha E; Tremblay, Nicolas; Arneodo, Alain; Borgnat, Pierre; Audit, Benjamin
2017-04-11
Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.
High Throughput Genotoxicity Profiling of the US EPA ToxCast Chemical Library
A key aim of the ToxCast project is to investigate modern molecular and genetic high content and high throughput screening (HTS) assays, along with various computational tools to supplement and perhaps replace traditional assays for evaluating chemical toxicity. Genotoxicity is a...
Accelerating evaluation of converged lattice thermal conductivity
NASA Astrophysics Data System (ADS)
Qin, Guangzhao; Hu, Ming
2018-01-01
High-throughput computational materials design is an emerging area in materials science, which is based on the fast evaluation of physical-related properties. The lattice thermal conductivity (κ) is a key property of materials for enormous implications. However, the high-throughput evaluation of κ remains a challenge due to the large resources costs and time-consuming procedures. In this paper, we propose a concise strategy to efficiently accelerate the evaluation process of obtaining accurate and converged κ. The strategy is in the framework of phonon Boltzmann transport equation (BTE) coupled with first-principles calculations. Based on the analysis of harmonic interatomic force constants (IFCs), the large enough cutoff radius (rcutoff), a critical parameter involved in calculating the anharmonic IFCs, can be directly determined to get satisfactory results. Moreover, we find a simple way to largely ( 10 times) accelerate the computations by fast reconstructing the anharmonic IFCs in the convergence test of κ with respect to the rcutof, which finally confirms the chosen rcutoff is appropriate. Two-dimensional graphene and phosphorene along with bulk SnSe are presented to validate our approach, and the long-debate divergence problem of thermal conductivity in low-dimensional systems is studied. The quantitative strategy proposed herein can be a good candidate for fast evaluating the reliable κ and thus provides useful tool for high-throughput materials screening and design with targeted thermal transport properties.
Predicting structural properties of fluids by thermodynamic extrapolation
NASA Astrophysics Data System (ADS)
Mahynski, Nathan A.; Jiao, Sally; Hatch, Harold W.; Blanco, Marco A.; Shen, Vincent K.
2018-05-01
We describe a methodology for extrapolating the structural properties of multicomponent fluids from one thermodynamic state to another. These properties generally include features of a system that may be computed from an individual configuration such as radial distribution functions, cluster size distributions, or a polymer's radius of gyration. This approach is based on the principle of using fluctuations in a system's extensive thermodynamic variables, such as energy, to construct an appropriate Taylor series expansion for these structural properties in terms of intensive conjugate variables, such as temperature. Thus, one may extrapolate these properties from one state to another when the series is truncated to some finite order. We demonstrate this extrapolation for simple and coarse-grained fluids in both the canonical and grand canonical ensembles, in terms of both temperatures and the chemical potentials of different components. The results show that this method is able to reasonably approximate structural properties of such fluids over a broad range of conditions. Consequently, this methodology may be employed to increase the computational efficiency of molecular simulations used to measure the structural properties of certain fluid systems, especially those used in high-throughput or data-driven investigations.
Mathematical and Computational Modeling in Complex Biological Systems
Li, Wenyang; Zhu, Xiaoliang
2017-01-01
The biological process and molecular functions involved in the cancer progression remain difficult to understand for biologists and clinical doctors. Recent developments in high-throughput technologies urge the systems biology to achieve more precise models for complex diseases. Computational and mathematical models are gradually being used to help us understand the omics data produced by high-throughput experimental techniques. The use of computational models in systems biology allows us to explore the pathogenesis of complex diseases, improve our understanding of the latent molecular mechanisms, and promote treatment strategy optimization and new drug discovery. Currently, it is urgent to bridge the gap between the developments of high-throughput technologies and systemic modeling of the biological process in cancer research. In this review, we firstly studied several typical mathematical modeling approaches of biological systems in different scales and deeply analyzed their characteristics, advantages, applications, and limitations. Next, three potential research directions in systems modeling were summarized. To conclude, this review provides an update of important solutions using computational modeling approaches in systems biology. PMID:28386558
A high-throughput screening approach for the optoelectronic properties of conjugated polymers.
Wilbraham, Liam; Berardo, Enrico; Turcani, Lukas; Jelfs, Kim E; Zwijnenburg, Martijn A
2018-06-25
We propose a general high-throughput virtual screening approach for the optical and electronic properties of conjugated polymers. This approach makes use of the recently developed xTB family of low-computational-cost density functional tight-binding methods from Grimme and co-workers, calibrated here to (TD-)DFT data computed for a representative diverse set of (co-)polymers. Parameters drawn from the resulting calibration using a linear model can then be applied to the xTB derived results for new polymers, thus generating near DFT-quality data with orders of magnitude reduction in computational cost. As a result, after an initial computational investment for calibration, this approach can be used to quickly and accurately screen on the order of thousands of polymers for target applications. We also demonstrate that the (opto)electronic properties of the conjugated polymers show only a very minor variation when considering different conformers and that the results of high-throughput screening are therefore expected to be relatively insensitive with respect to the conformer search methodology applied.
Mathematical and Computational Modeling in Complex Biological Systems.
Ji, Zhiwei; Yan, Ke; Li, Wenyang; Hu, Haigen; Zhu, Xiaoliang
2017-01-01
The biological process and molecular functions involved in the cancer progression remain difficult to understand for biologists and clinical doctors. Recent developments in high-throughput technologies urge the systems biology to achieve more precise models for complex diseases. Computational and mathematical models are gradually being used to help us understand the omics data produced by high-throughput experimental techniques. The use of computational models in systems biology allows us to explore the pathogenesis of complex diseases, improve our understanding of the latent molecular mechanisms, and promote treatment strategy optimization and new drug discovery. Currently, it is urgent to bridge the gap between the developments of high-throughput technologies and systemic modeling of the biological process in cancer research. In this review, we firstly studied several typical mathematical modeling approaches of biological systems in different scales and deeply analyzed their characteristics, advantages, applications, and limitations. Next, three potential research directions in systems modeling were summarized. To conclude, this review provides an update of important solutions using computational modeling approaches in systems biology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
He, Zhili; Deng, Ye; Nostrand, Joy Van
2010-05-17
Microarray-based genomic technology has been widely used for microbial community analysis, and it is expected that microarray-based genomic technologies will revolutionize the analysis of microbial community structure, function and dynamics. A new generation of functional gene arrays (GeoChip 3.0) has been developed, with 27,812 probes covering 56,990 gene variants from 292 functional gene families involved in carbon, nitrogen, phosphorus and sulfur cycles, energy metabolism, antibiotic resistance, metal resistance, and organic contaminant degradation. Those probes were derived from 2,744, 140, and 262 species for bacteria, archaea, and fungi, respectively. GeoChip 3.0 has several other distinct features, such as a common oligomore » reference standard (CORS) for data normalization and comparison, a software package for data management and future updating, and the gyrB gene for phylogenetic analysis. Our computational evaluation of probe specificity indicated that all designed probes had a high specificity to their corresponding targets. Also, experimental analysis with synthesized oligonucleotides and genomic DNAs showed that only 0.0036percent-0.025percent false positive rates were observed, suggesting that the designed probes are highly specific under the experimental conditions examined. In addition, GeoChip 3.0 was applied to analyze soil microbial communities in a multifactor grassland ecosystem in Minnesota, USA, which demonstrated that the structure, composition, and potential activity of soil microbial communities significantly changed with the plant species diversity. All results indicate that GeoChip 3.0 is a high throughput powerful tool for studying microbial community functional structure, and linking microbial communities to ecosystem processes and functioning. To our knowledge, GeoChip 3.0 is the most comprehensive microarrays currently available for studying microbial communities associated with geobiochemical cycling, global climate change, bioenergy, agricuture, land use, ecosystem management, environmental cleanup and restoration, bioreactor systems, and human health.« less
microRNAs Databases: Developmental Methodologies, Structural and Functional Annotations.
Singh, Nagendra Kumar
2017-09-01
microRNA (miRNA) is an endogenous and evolutionary conserved non-coding RNA, involved in post-transcriptional process as gene repressor and mRNA cleavage through RNA-induced silencing complex (RISC) formation. In RISC, miRNA binds in complementary base pair with targeted mRNA along with Argonaut proteins complex, causes gene repression or endonucleolytic cleavage of mRNAs and results in many diseases and syndromes. After the discovery of miRNA lin-4 and let-7, subsequently large numbers of miRNAs were discovered by low-throughput and high-throughput experimental techniques along with computational process in various biological and metabolic processes. The miRNAs are important non-coding RNA for understanding the complex biological phenomena of organism because it controls the gene regulation. This paper reviews miRNA databases with structural and functional annotations developed by various researchers. These databases contain structural and functional information of animal, plant and virus miRNAs including miRNAs-associated diseases, stress resistance in plant, miRNAs take part in various biological processes, effect of miRNAs interaction on drugs and environment, effect of variance on miRNAs, miRNAs gene expression analysis, sequence of miRNAs, structure of miRNAs. This review focuses on the developmental methodology of miRNA databases such as computational tools and methods used for extraction of miRNAs annotation from different resources or through experiment. This study also discusses the efficiency of user interface design of every database along with current entry and annotations of miRNA (pathways, gene ontology, disease ontology, etc.). Here, an integrated schematic diagram of construction process for databases is also drawn along with tabular and graphical comparison of various types of entries in different databases. Aim of this paper is to present the importance of miRNAs-related resources at a single place.
Taxonomic relevance of an adverse outcome pathway network considering apis and non-apis bees
Product Description: The US EPA is developing more cost effective and efficient ways to evaluate chemical safety using high throughput and computationally based testing strategies. An important component of this approach is the ability to translate chemical effects on fundamental...
High-throughput screening, predictive modeling and computational embryology - Abstract
High-throughput screening (HTS) studies are providing a rich source of data that can be applied to chemical profiling to address sensitivity and specificity of molecular targets, biological pathways, cellular and developmental processes. EPA’s ToxCast project is testing 960 uniq...
A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus.
Ekins, Sean; Freundlich, Joel S; Coffee, Megan
2014-01-01
We are currently faced with a global infectious disease crisis which has been anticipated for decades. While many promising biotherapeutics are being tested, the search for a small molecule has yet to deliver an approved drug or therapeutic for the Ebola or similar filoviruses that cause haemorrhagic fever. Two recent high throughput screens published in 2013 did however identify several hits that progressed to animal studies that are FDA approved drugs used for other indications. The current computational analysis uses these molecules from two different structural classes to construct a common features pharmacophore. This ligand-based pharmacophore implicates a possible common target or mechanism that could be further explored. A recent structure based design project yielded nine co-crystal structures of pyrrolidinone inhibitors bound to the viral protein 35 (VP35). When receptor-ligand pharmacophores based on the analogs of these molecules and the protein structures were constructed, the molecular features partially overlapped with the common features of solely ligand-based pharmacophore models based on FDA approved drugs. These previously identified FDA approved drugs with activity against Ebola were therefore docked into this protein. The antimalarials chloroquine and amodiaquine docked favorably in VP35. We propose that these drugs identified to date as inhibitors of the Ebola virus may be targeting VP35. These computational models may provide preliminary insights into the molecular features that are responsible for their activity against Ebola virus in vitro and in vivo and we propose that this hypothesis could be readily tested.
A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus
Ekins, Sean; Freundlich, Joel S.; Coffee, Megan
2014-01-01
We are currently faced with a global infectious disease crisis which has been anticipated for decades. While many promising biotherapeutics are being tested, the search for a small molecule has yet to deliver an approved drug or therapeutic for the Ebola or similar filoviruses that cause haemorrhagic fever. Two recent high throughput screens published in 2013 did however identify several hits that progressed to animal studies that are FDA approved drugs used for other indications. The current computational analysis uses these molecules from two different structural classes to construct a common features pharmacophore. This ligand-based pharmacophore implicates a possible common target or mechanism that could be further explored. A recent structure based design project yielded nine co-crystal structures of pyrrolidinone inhibitors bound to the viral protein 35 (VP35). When receptor-ligand pharmacophores based on the analogs of these molecules and the protein structures were constructed, the molecular features partially overlapped with the common features of solely ligand-based pharmacophore models based on FDA approved drugs. These previously identified FDA approved drugs with activity against Ebola were therefore docked into this protein. The antimalarials chloroquine and amodiaquine docked favorably in VP35. We propose that these drugs identified to date as inhibitors of the Ebola virus may be targeting VP35. These computational models may provide preliminary insights into the molecular features that are responsible for their activity against Ebola virus in vitro and in vivo and we propose that this hypothesis could be readily tested. PMID:25653841
Bahrami-Samani, Emad; Vo, Dat T.; de Araujo, Patricia Rosa; Vogel, Christine; Smith, Andrew D.; Penalva, Luiz O. F.; Uren, Philip J.
2014-01-01
Co- and post-transcriptional regulation of gene expression is complex and multi-faceted, spanning the complete RNA lifecycle from genesis to decay. High-throughput profiling of the constituent events and processes is achieved through a range of technologies that continue to expand and evolve. Fully leveraging the resulting data is non-trivial, and requires the use of computational methods and tools carefully crafted for specific data sources and often intended to probe particular biological processes. Drawing upon databases of information pre-compiled by other researchers can further elevate analyses. Within this review, we describe the major co- and post-transcriptional events in the RNA lifecycle that are amenable to high-throughput profiling. We place specific emphasis on the analysis of the resulting data, in particular the computational tools and resources available, as well as looking towards future challenges that remain to be addressed. PMID:25515586
NASA Astrophysics Data System (ADS)
Yamada, Yusuke; Hiraki, Masahiko; Sasajima, Kumiko; Matsugaki, Naohiro; Igarashi, Noriyuki; Amano, Yasushi; Warizaya, Masaichi; Sakashita, Hitoshi; Kikuchi, Takashi; Mori, Takeharu; Toyoshima, Akio; Kishimoto, Shunji; Wakatsuki, Soichi
2010-06-01
Recent advances in high-throughput techniques for macromolecular crystallography have highlighted the importance of structure-based drug design (SBDD), and the demand for synchrotron use by pharmaceutical researchers has increased. Thus, in collaboration with Astellas Pharma Inc., we have constructed a new high-throughput macromolecular crystallography beamline, AR-NE3A, which is dedicated to SBDD. At AR-NE3A, a photon flux up to three times higher than those at existing high-throughput beams at the Photon Factory, AR-NW12A and BL-5A, can be realized at the same sample positions. Installed in the experimental hutch are a high-precision diffractometer, fast-readout, high-gain CCD detector, and sample exchange robot capable of handling more than two hundred cryo-cooled samples stored in a Dewar. To facilitate high-throughput data collection required for pharmaceutical research, fully automated data collection and processing systems have been developed. Thus, sample exchange, centering, data collection, and data processing are automatically carried out based on the user's pre-defined schedule. Although Astellas Pharma Inc. has a priority access to AR-NE3A, the remaining beam time is allocated to general academic and other industrial users.
Ryan, Natalia; Chorley, Brian; Tice, Raymond R.; Judson, Richard; Corton, J. Christopher
2016-01-01
Microarray profiling of chemical-induced effects is being increasingly used in medium- and high-throughput formats. Computational methods are described here to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), often modulated by potential endocrine disrupting chemicals. ERα biomarker genes were identified by their consistent expression after exposure to 7 structurally diverse ERα agonists and 3 ERα antagonists in ERα-positive MCF-7 cells. Most of the biomarker genes were shown to be directly regulated by ERα as determined by ESR1 gene knockdown using siRNA as well as through chromatin immunoprecipitation coupled with DNA sequencing analysis of ERα-DNA interactions. The biomarker was evaluated as a predictive tool using the fold-change rank-based Running Fisher algorithm by comparison to annotated gene expression datasets from experiments using MCF-7 cells, including those evaluating the transcriptional effects of hormones and chemicals. Using 141 comparisons from chemical- and hormone-treated cells, the biomarker gave a balanced accuracy for prediction of ERα activation or suppression of 94% and 93%, respectively. The biomarker was able to correctly classify 18 out of 21 (86%) ER reference chemicals including “very weak” agonists. Importantly, the biomarker predictions accurately replicated predictions based on 18 in vitro high-throughput screening assays that queried different steps in ERα signaling. For 114 chemicals, the balanced accuracies were 95% and 98% for activation or suppression, respectively. These results demonstrate that the ERα gene expression biomarker can accurately identify ERα modulators in large collections of microarray data derived from MCF-7 cells. PMID:26865669
LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.
El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher
2016-11-01
The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
pmx: Automated protein structure and topology generation for alchemical perturbations
Gapsys, Vytautas; Michielssens, Servaas; Seeliger, Daniel; de Groot, Bert L
2015-01-01
Computational protein design requires methods to accurately estimate free energy changes in protein stability or binding upon an amino acid mutation. From the different approaches available, molecular dynamics-based alchemical free energy calculations are unique in their accuracy and solid theoretical basis. The challenge in using these methods lies in the need to generate hybrid structures and topologies representing two physical states of a system. A custom made hybrid topology may prove useful for a particular mutation of interest, however, a high throughput mutation analysis calls for a more general approach. In this work, we present an automated procedure to generate hybrid structures and topologies for the amino acid mutations in all commonly used force fields. The described software is compatible with the Gromacs simulation package. The mutation libraries are readily supported for five force fields, namely Amber99SB, Amber99SB*-ILDN, OPLS-AA/L, Charmm22*, and Charmm36. PMID:25487359
Computational biology of RNA interactions.
Dieterich, Christoph; Stadler, Peter F
2013-01-01
The biodiversity of the RNA world has been underestimated for decades. RNA molecules are key building blocks, sensors, and regulators of modern cells. The biological function of RNA molecules cannot be separated from their ability to bind to and interact with a wide space of chemical species, including small molecules, nucleic acids, and proteins. Computational chemists, physicists, and biologists have developed a rich tool set for modeling and predicting RNA interactions. These interactions are to some extent determined by the binding conformation of the RNA molecule. RNA binding conformations are approximated with often acceptable accuracy by sequence and secondary structure motifs. Secondary structure ensembles of a given RNA molecule can be efficiently computed in many relevant situations by employing a standard energy model for base pair interactions and dynamic programming techniques. The case of bi-molecular RNA-RNA interactions can be seen as an extension of this approach. However, unbiased transcriptome-wide scans for local RNA-RNA interactions are computationally challenging yet become efficient if the binding motif/mode is known and other external information can be used to confine the search space. Computational methods are less developed for proteins and small molecules, which bind to RNA with very high specificity. Binding descriptors of proteins are usually determined by in vitro high-throughput assays (e.g., microarrays or sequencing). Intriguingly, recent experimental advances, which are mostly based on light-induced cross-linking of binding partners, render in vivo binding patterns accessible yet require new computational methods for careful data interpretation. The grand challenge is to model the in vivo situation where a complex interplay of RNA binders competes for the same target RNA molecule. Evidently, bioinformaticians are just catching up with the impressive pace of these developments. Copyright © 2012 John Wiley & Sons, Ltd.
The impact of computer science in molecular medicine: enabling high-throughput research.
de la Iglesia, Diana; García-Remesal, Miguel; de la Calle, Guillermo; Kulikowski, Casimir; Sanz, Ferran; Maojo, Víctor
2013-01-01
The Human Genome Project and the explosion of high-throughput data have transformed the areas of molecular and personalized medicine, which are producing a wide range of studies and experimental results and providing new insights for developing medical applications. Research in many interdisciplinary fields is resulting in data repositories and computational tools that support a wide diversity of tasks: genome sequencing, genome-wide association studies, analysis of genotype-phenotype interactions, drug toxicity and side effects assessment, prediction of protein interactions and diseases, development of computational models, biomarker discovery, and many others. The authors of the present paper have developed several inventories covering tools, initiatives and studies in different computational fields related to molecular medicine: medical informatics, bioinformatics, clinical informatics and nanoinformatics. With these inventories, created by mining the scientific literature, we have carried out several reviews of these fields, providing researchers with a useful framework to locate, discover, search and integrate resources. In this paper we present an analysis of the state-of-the-art as it relates to computational resources for molecular medicine, based on results compiled in our inventories, as well as results extracted from a systematic review of the literature and other scientific media. The present review is based on the impact of their related publications and the available data and software resources for molecular medicine. It aims to provide information that can be useful to support ongoing research and work to improve diagnostics and therapeutics based on molecular-level insights.
Machine learning and computer vision approaches for phenotypic profiling.
Grys, Ben T; Lo, Dara S; Sahin, Nil; Kraus, Oren Z; Morris, Quaid; Boone, Charles; Andrews, Brenda J
2017-01-02
With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach. © 2017 Grys et al.
Machine learning and computer vision approaches for phenotypic profiling
Morris, Quaid
2017-01-01
With recent advances in high-throughput, automated microscopy, there has been an increased demand for effective computational strategies to analyze large-scale, image-based data. To this end, computer vision approaches have been applied to cell segmentation and feature extraction, whereas machine-learning approaches have been developed to aid in phenotypic classification and clustering of data acquired from biological images. Here, we provide an overview of the commonly used computer vision and machine-learning methods for generating and categorizing phenotypic profiles, highlighting the general biological utility of each approach. PMID:27940887
High-throughput screening, predictive modeling and computational embryology
High-throughput screening (HTS) studies are providing a rich source of data that can be applied to profile thousands of chemical compounds for biological activity and potential toxicity. EPA’s ToxCast™ project, and the broader Tox21 consortium, in addition to projects worldwide,...
High-throughput screening of dye-ligands for chromatography.
Kumar, Sunil; Punekar, Narayan S
2014-01-01
Dye-ligand-based chromatography has become popular after Cibacron Blue, the first reactive textile dye, found application for protein purification. Many other textile dyes have since been successfully used to purify a number of proteins and enzymes. While the exact nature of their interaction with target proteins is often unclear, dye-ligands are thought to mimic the structural features of their corresponding substrates, cofactors, etc. The dye-ligand affinity matrices are therefore considered pseudo-affinity matrices. In addition, dye-ligands may simply bind with proteins due to electrostatic, hydrophobic, and hydrogen-bonding interactions. Because of their low cost, ready availability, and structural stability, dye-ligand affinity matrices have gained much popularity. Choice of a large number of dye structures offers a range of matrices to be prepared and tested. When presented in the high-throughput screening mode, these dye-ligand matrices provide a formidable tool for protein purification. One could pick from the list of dye-ligands already available or build a systematic library of such structures for use. A high-throughput screen may be set up to choose best dye-ligand matrix as well as ideal conditions for binding and elution, for a given protein. The mode of operation could be either manual or automated. The technology is available to test the performance of dye-ligand matrices in small volumes in an automated liquid-handling workstation. Screening a systematic library of dye-ligand structures can help establish a structure-activity relationship. While the origins of dye-ligand chromatography lay in exploiting pseudo-affinity, it is now possible to design very specific biomimetic dye structures. High-throughput screening will be of value in this endeavor as well.
[Simulation and data analysis of stereological modeling based on virtual slices].
Wang, Hao; Shen, Hong; Bai, Xiao-yan
2008-05-01
To establish a computer-assisted stereological model for simulating the process of slice section and evaluate the relationship between section surface and estimated three-dimensional structure. The model was designed by mathematic method as a win32 software based on the MFC using Microsoft visual studio as IDE for simulating the infinite process of sections and analysis of the data derived from the model. The linearity of the fitting of the model was evaluated by comparison with the traditional formula. The win32 software based on this algorithm allowed random sectioning of the particles distributed randomly in an ideal virtual cube. The stereological parameters showed very high throughput (>94.5% and 92%) in homogeneity and independence tests. The data of density, shape and size of the section were tested to conform to normal distribution. The output of the model and that from the image analysis system showed statistical correlation and consistency. The algorithm we described can be used for evaluating the stereologic parameters of the structure of tissue slices.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Painter, J.; McCormick, P.; Krogh, M.
This paper presents the ACL (Advanced Computing Lab) Message Passing Library. It is a high throughput, low latency communications library, based on Thinking Machines Corp.`s CMMD, upon which message passing applications can be built. The library has been implemented on the Cray T3D, Thinking Machines CM-5, SGI workstations, and on top of PVM.
Hardcastle, Thomas J
2016-01-15
High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a 'large P, small n' setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of standardization between analyses. We present here a generalized method for identifying differential behaviour within high-throughput biological data through empirical Bayesian methods. This approach is based on our baySeq algorithm for identification of differential expression in RNA-seq data based on a negative binomial distribution, and in paired data based on a beta-binomial distribution. Here we show how the same empirical Bayesian approach can be applied to any parametric distribution, removing the need for lengthy development of novel methods for differently distributed data. Comparisons with existing methods developed to address specific problems in high-throughput biological data show that these generic methods can achieve equivalent or better performance. A number of enhancements to the basic algorithm are also presented to increase flexibility and reduce computational costs. The methods are implemented in the R baySeq (v2) package, available on Bioconductor http://www.bioconductor.org/packages/release/bioc/html/baySeq.html. tjh48@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Leulliot, Nicolas; Trésaugues, Lionel; Bremang, Michael; Sorel, Isabelle; Ulryck, Nathalie; Graille, Marc; Aboulfath, Ilham; Poupon, Anne; Liger, Dominique; Quevillon-Cheruel, Sophie; Janin, Joël; van Tilbeurgh, Herman
2005-06-01
Crystallization has long been regarded as one of the major bottlenecks in high-throughput structural determination by X-ray crystallography. Structural genomics projects have addressed this issue by using robots to set up automated crystal screens using nanodrop technology. This has moved the bottleneck from obtaining the first crystal hit to obtaining diffraction-quality crystals, as crystal optimization is a notoriously slow process that is difficult to automatize. This article describes the high-throughput optimization strategies used in the Yeast Structural Genomics project, with selected successful examples.
Mounet, Nicolas; Gibertini, Marco; Schwaller, Philippe; Campi, Davide; Merkys, Andrius; Marrazzo, Antimo; Sohier, Thibault; Castelli, Ivano Eligio; Cepellotti, Andrea; Pizzi, Giovanni; Marzari, Nicola
2018-03-01
Two-dimensional (2D) materials have emerged as promising candidates for next-generation electronic and optoelectronic applications. Yet, only a few dozen 2D materials have been successfully synthesized or exfoliated. Here, we search for 2D materials that can be easily exfoliated from their parent compounds. Starting from 108,423 unique, experimentally known 3D compounds, we identify a subset of 5,619 compounds that appear layered according to robust geometric and bonding criteria. High-throughput calculations using van der Waals density functional theory, validated against experimental structural data and calculated random phase approximation binding energies, further allowed the identification of 1,825 compounds that are either easily or potentially exfoliable. In particular, the subset of 1,036 easily exfoliable cases provides novel structural prototypes and simple ternary compounds as well as a large portfolio of materials to search from for optimal properties. For a subset of 258 compounds, we explore vibrational, electronic, magnetic and topological properties, identifying 56 ferromagnetic and antiferromagnetic systems, including half-metals and half-semiconductors.
NASA Astrophysics Data System (ADS)
Mounet, Nicolas; Gibertini, Marco; Schwaller, Philippe; Campi, Davide; Merkys, Andrius; Marrazzo, Antimo; Sohier, Thibault; Castelli, Ivano Eligio; Cepellotti, Andrea; Pizzi, Giovanni; Marzari, Nicola
2018-02-01
Two-dimensional (2D) materials have emerged as promising candidates for next-generation electronic and optoelectronic applications. Yet, only a few dozen 2D materials have been successfully synthesized or exfoliated. Here, we search for 2D materials that can be easily exfoliated from their parent compounds. Starting from 108,423 unique, experimentally known 3D compounds, we identify a subset of 5,619 compounds that appear layered according to robust geometric and bonding criteria. High-throughput calculations using van der Waals density functional theory, validated against experimental structural data and calculated random phase approximation binding energies, further allowed the identification of 1,825 compounds that are either easily or potentially exfoliable. In particular, the subset of 1,036 easily exfoliable cases provides novel structural prototypes and simple ternary compounds as well as a large portfolio of materials to search from for optimal properties. For a subset of 258 compounds, we explore vibrational, electronic, magnetic and topological properties, identifying 56 ferromagnetic and antiferromagnetic systems, including half-metals and half-semiconductors.
LXtoo: an integrated live Linux distribution for the bioinformatics community
2012-01-01
Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356
LXtoo: an integrated live Linux distribution for the bioinformatics community.
Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu
2012-07-19
Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.
A Survey of Computational Intelligence Techniques in Protein Function Prediction
Tiwari, Arvind Kumar; Srivastava, Rajeev
2014-01-01
During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395
CORDIC-based digital signal processing (DSP) element for adaptive signal processing
NASA Astrophysics Data System (ADS)
Bolstad, Gregory D.; Neeld, Kenneth B.
1995-04-01
The High Performance Adaptive Weight Computation (HAWC) processing element is a CORDIC based application specific DSP element that, when connected in a linear array, can perform extremely high throughput (100s of GFLOPS) matrix arithmetic operations on linear systems of equations in real time. In particular, it very efficiently performs the numerically intense computation of optimal least squares solutions for large, over-determined linear systems. Most techniques for computing solutions to these types of problems have used either a hard-wired, non-programmable systolic array approach, or more commonly, programmable DSP or microprocessor approaches. The custom logic methods can be efficient, but are generally inflexible. Approaches using multiple programmable generic DSP devices are very flexible, but suffer from poor efficiency and high computation latencies, primarily due to the large number of DSP devices that must be utilized to achieve the necessary arithmetic throughput. The HAWC processor is implemented as a highly optimized systolic array, yet retains some of the flexibility of a programmable data-flow system, allowing efficient implementation of algorithm variations. This provides flexible matrix processing capabilities that are one to three orders of magnitude less expensive and more dense than the current state of the art, and more importantly, allows a realizable solution to matrix processing problems that were previously considered impractical to physically implement. HAWC has direct applications in RADAR, SONAR, communications, and image processing, as well as in many other types of systems.
Automated crystallographic system for high-throughput protein structure determination.
Brunzelle, Joseph S; Shafaee, Padram; Yang, Xiaojing; Weigand, Steve; Ren, Zhong; Anderson, Wayne F
2003-07-01
High-throughput structural genomic efforts require software that is highly automated, distributive and requires minimal user intervention to determine protein structures. Preliminary experiments were set up to test whether automated scripts could utilize a minimum set of input parameters and produce a set of initial protein coordinates. From this starting point, a highly distributive system was developed that could determine macromolecular structures at a high throughput rate, warehouse and harvest the associated data. The system uses a web interface to obtain input data and display results. It utilizes a relational database to store the initial data needed to start the structure-determination process as well as generated data. A distributive program interface administers the crystallographic programs which determine protein structures. Using a test set of 19 protein targets, 79% were determined automatically.
Automated sample area definition for high-throughput microscopy.
Zeder, M; Ellrott, A; Amann, R
2011-04-01
High-throughput screening platforms based on epifluorescence microscopy are powerful tools in a variety of scientific fields. Although some applications are based on imaging geometrically defined samples such as microtiter plates, multiwell slides, or spotted gene arrays, others need to cope with inhomogeneously located samples on glass slides. The analysis of microbial communities in aquatic systems by sample filtration on membrane filters followed by multiple fluorescent staining, or the investigation of tissue sections are examples. Therefore, we developed a strategy for flexible and fast definition of sample locations by the acquisition of whole slide overview images and automated sample recognition by image analysis. Our approach was tested on different microscopes and the computer programs are freely available (http://www.technobiology.ch). Copyright © 2011 International Society for Advancement of Cytometry.
Defect Genome of Cubic Perovskites for Fuel Cell Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balachandran, Janakiraman; Lin, Lianshan; Anchell, Jonathan S.
Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable ab initio models driven by highly scalable workflows, to study formation and interaction of various point defectsmore » (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.« less
Defect Genome of Cubic Perovskites for Fuel Cell Applications
Balachandran, Janakiraman; Lin, Lianshan; Anchell, Jonathan S.; ...
2017-10-10
Heterogeneities such as point defects, inherent to material systems, can profoundly influence material functionalities critical for numerous energy applications. This influence in principle can be identified and quantified through development of large defect data sets which we call the defect genome, employing high-throughput ab initio calculations. However, high-throughput screening of material models with point defects dramatically increases the computational complexity and chemical search space, creating major impediments toward developing a defect genome. In this paper, we overcome these impediments by employing computationally tractable ab initio models driven by highly scalable workflows, to study formation and interaction of various point defectsmore » (e.g., O vacancies, H interstitials, and Y substitutional dopant), in over 80 cubic perovskites, for potential proton-conducting ceramic fuel cell (PCFC) applications. The resulting defect data sets identify several promising perovskite compounds that can exhibit high proton conductivity. Furthermore, the data sets also enable us to identify and explain, insightful and novel correlations among defect energies, material identities, and defect-induced local structural distortions. Finally, such defect data sets and resultant correlations are necessary to build statistical machine learning models, which are required to accelerate discovery of new materials.« less
NASA Astrophysics Data System (ADS)
Fuentes-Cabrera, Miguel; Anderson, John D.; Wilmoth, Jared; Ginovart, Marta; Prats, Clara; Portell-Canal, Xavier; Retterer, Scott
Microbial interactions are critical for governing community behavior and structure in natural environments. Examination of microbial interactions in the lab involves growth under ideal conditions in batch culture; conditions that occur in nature are, however, characterized by disequilibrium. Of particular interest is the role that system variables play in shaping cell-to-cell interactions and organization at ultrafine spatial scales. We seek to use experiments and agent-based modeling to help discover mechanisms relevant to microbial dynamics and interactions in the environment. Currently, we are using an agent-based model to simulate microbial growth, dynamics and interactions that occur on a microwell-array device developed in our lab. Bacterial cells growing in the microwells of this platform can be studied with high-throughput and high-content image analyses using brightfield and fluorescence microscopy. The agent-based model is written in the language Netlogo, which in turn is ''plugged into'' a computational framework that allows submitting many calculations in parallel for different initial parameters; visualizing the outcomes in an interactive phase-like diagram; and searching, with a genetic algorithm, for the parameters that lead to the most optimal simulation outcome.
Er, Süleyman; Suh, Changwon; Marshak, Michael P.
2015-01-01
Inspired by the electron transfer properties of quinones in biological systems, we recently showed that quinones are also very promising electroactive materials for stationary energy storage applications. Due to the practically infinite chemical space of organic molecules, the discovery of additional quinones or other redox-active organic molecules for energy storage applications is an open field of inquiry. Here, we introduce a high-throughput computational screening approach that we applied to an accelerated study of a total of 1710 quinone (Q) and hydroquinone (QH2) (i.e., two-electron two-proton) redox couples. We identified the promising candidates for both the negative and positive sides of organic-based aqueous flow batteries, thus enabling an all-quinone battery. To further aid the development of additional interesting electroactive small molecules we also provide emerging quantitative structure-property relationships. PMID:29560173
Design and function of biomimetic multilayer water purification membranes
Ling, Shengjie; Qin, Zhao; Huang, Wenwen; Cao, Sufeng; Kaplan, David L.; Buehler, Markus J.
2017-01-01
Multilayer architectures in water purification membranes enable increased water throughput, high filter efficiency, and high molecular loading capacity. However, the preparation of membranes with well-organized multilayer structures, starting from the nanoscale to maximize filtration efficiency, remains a challenge. We report a complete strategy to fully realize a novel biomaterial-based multilayer nanoporous membrane via the integration of computational simulation and experimental fabrication. Our comparative computational simulations, based on coarse-grained models of protein nanofibrils and mineral plates, reveal that the multilayer structure can only form with weak interactions between nanofibrils and mineral plates. We demonstrate experimentally that silk nanofibril (SNF) and hydroxyapatite (HAP) can be used to fabricate highly ordered multilayer membranes with nanoporous features by combining protein self-assembly and in situ biomineralization. The production is optimized to be a simple and highly repeatable process that does not require sophisticated equipment and is suitable for scaled production of low-cost water purification membranes. These membranes not only show ultrafast water penetration but also exhibit broad utility and high efficiency of removal and even reuse (in some cases) of contaminants, including heavy metal ions, dyes, proteins, and other nanoparticles in water. Our biomimetic design and synthesis of these functional SNF/HAP materials have established a paradigm that could lead to the large-scale, low-cost production of multilayer materials with broad spectrum and efficiency for water purification, with applications in wastewater treatment, biomedicine, food industry, and the life sciences. PMID:28435877
Design and function of biomimetic multilayer water purification membranes.
Ling, Shengjie; Qin, Zhao; Huang, Wenwen; Cao, Sufeng; Kaplan, David L; Buehler, Markus J
2017-04-01
Multilayer architectures in water purification membranes enable increased water throughput, high filter efficiency, and high molecular loading capacity. However, the preparation of membranes with well-organized multilayer structures, starting from the nanoscale to maximize filtration efficiency, remains a challenge. We report a complete strategy to fully realize a novel biomaterial-based multilayer nanoporous membrane via the integration of computational simulation and experimental fabrication. Our comparative computational simulations, based on coarse-grained models of protein nanofibrils and mineral plates, reveal that the multilayer structure can only form with weak interactions between nanofibrils and mineral plates. We demonstrate experimentally that silk nanofibril (SNF) and hydroxyapatite (HAP) can be used to fabricate highly ordered multilayer membranes with nanoporous features by combining protein self-assembly and in situ biomineralization. The production is optimized to be a simple and highly repeatable process that does not require sophisticated equipment and is suitable for scaled production of low-cost water purification membranes. These membranes not only show ultrafast water penetration but also exhibit broad utility and high efficiency of removal and even reuse (in some cases) of contaminants, including heavy metal ions, dyes, proteins, and other nanoparticles in water. Our biomimetic design and synthesis of these functional SNF/HAP materials have established a paradigm that could lead to the large-scale, low-cost production of multilayer materials with broad spectrum and efficiency for water purification, with applications in wastewater treatment, biomedicine, food industry, and the life sciences.
Cheng, Jerome; Hipp, Jason; Monaco, James; Lucas, David R; Madabhushi, Anant; Balis, Ulysses J
2011-01-01
Spatially invariant vector quantization (SIVQ) is a texture and color-based image matching algorithm that queries the image space through the use of ring vectors. In prior studies, the selection of one or more optimal vectors for a particular feature of interest required a manual process, with the user initially stochastically selecting candidate vectors and subsequently testing them upon other regions of the image to verify the vector's sensitivity and specificity properties (typically by reviewing a resultant heat map). In carrying out the prior efforts, the SIVQ algorithm was noted to exhibit highly scalable computational properties, where each region of analysis can take place independently of others, making a compelling case for the exploration of its deployment on high-throughput computing platforms, with the hypothesis that such an exercise will result in performance gains that scale linearly with increasing processor count. An automated process was developed for the selection of optimal ring vectors to serve as the predicate matching operator in defining histopathological features of interest. Briefly, candidate vectors were generated from every possible coordinate origin within a user-defined vector selection area (VSA) and subsequently compared against user-identified positive and negative "ground truth" regions on the same image. Each vector from the VSA was assessed for its goodness-of-fit to both the positive and negative areas via the use of the receiver operating characteristic (ROC) transfer function, with each assessment resulting in an associated area-under-the-curve (AUC) figure of merit. Use of the above-mentioned automated vector selection process was demonstrated in two cases of use: First, to identify malignant colonic epithelium, and second, to identify soft tissue sarcoma. For both examples, a very satisfactory optimized vector was identified, as defined by the AUC metric. Finally, as an additional effort directed towards attaining high-throughput capability for the SIVQ algorithm, we demonstrated the successful incorporation of it with the MATrix LABoratory (MATLAB™) application interface. The SIVQ algorithm is suitable for automated vector selection settings and high throughput computation.
Tempest: GPU-CPU computing for high-throughput database spectral matching.
Milloy, Jeffrey A; Faherty, Brendan K; Gerber, Scott A
2012-07-06
Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new "Accelerated Score" for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to that of the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.
Lessons from high-throughput protein crystallization screening: 10 years of practical experience
JR, Luft; EH, Snell; GT, DeTitta
2011-01-01
Introduction X-ray crystallography provides the majority of our structural biological knowledge at a molecular level and in terms of pharmaceutical design is a valuable tool to accelerate discovery. It is the premier technique in the field, but its usefulness is significantly limited by the need to grow well-diffracting crystals. It is for this reason that high-throughput crystallization has become a key technology that has matured over the past 10 years through the field of structural genomics. Areas covered The authors describe their experiences in high-throughput crystallization screening in the context of structural genomics and the general biomedical community. They focus on the lessons learnt from the operation of a high-throughput crystallization screening laboratory, which to date has screened over 12,500 biological macromolecules. They also describe the approaches taken to maximize the success while minimizing the effort. Through this, the authors hope that the reader will gain an insight into the efficient design of a laboratory and protocols to accomplish high-throughput crystallization on a single-, multiuser-laboratory or industrial scale. Expert Opinion High-throughput crystallization screening is readily available but, despite the power of the crystallographic technique, getting crystals is still not a solved problem. High-throughput approaches can help when used skillfully; however, they still require human input in the detailed analysis and interpretation of results to be more successful. PMID:22646073
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D'Angelo, S; Khan, T A; Reddy, S T; Naranjo, L; Ferrara, F; Bradbury, A R M
2015-08-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D’Angelo, S; Khan, T.A.; Reddy, S. T.; Naranjo, L.; Ferrara, F.; Bradbury, A.R.M.
2015-01-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. PMID:26451649
High-throughput measurement of rice tillers using a conveyor equipped with x-ray computed tomography
NASA Astrophysics Data System (ADS)
Yang, Wanneng; Xu, Xiaochun; Duan, Lingfeng; Luo, Qingming; Chen, Shangbin; Zeng, Shaoqun; Liu, Qian
2011-02-01
Tillering is one of the most important agronomic traits because the number of shoots per plant determines panicle number, a key component of grain yield. The conventional method of counting tillers is still manual. Under the condition of mass measurement, the accuracy and efficiency could be gradually degraded along with fatigue of experienced staff. Thus, manual measurement, including counting and recording, is not only time consuming but also lack objectivity. To automate this process, we developed a high-throughput facility, dubbed high-throughput system for measuring automatically rice tillers (H-SMART), for measuring rice tillers based on a conventional x-ray computed tomography (CT) system and industrial conveyor. Each pot-grown rice plant was delivered into the CT system for scanning via the conveyor equipment. A filtered back-projection algorithm was used to reconstruct the transverse section image of the rice culms. The number of tillers was then automatically extracted by image segmentation. To evaluate the accuracy of this system, three batches of rice at different growth stages (tillering, heading, or filling) were tested, yielding absolute mean absolute errors of 0.22, 0.36, and 0.36, respectively. Subsequently, the complete machine was used under industry conditions to estimate its efficiency, which was 4320 pots per continuous 24 h workday. Thus, the H-SMART could determine the number of tillers of pot-grown rice plants, providing three advantages over the manual tillering method: absence of human disturbance, automation, and high throughput. This facility expands the application of agricultural photonics in plant phenomics.
Yang, Wanneng; Xu, Xiaochun; Duan, Lingfeng; Luo, Qingming; Chen, Shangbin; Zeng, Shaoqun; Liu, Qian
2011-02-01
Tillering is one of the most important agronomic traits because the number of shoots per plant determines panicle number, a key component of grain yield. The conventional method of counting tillers is still manual. Under the condition of mass measurement, the accuracy and efficiency could be gradually degraded along with fatigue of experienced staff. Thus, manual measurement, including counting and recording, is not only time consuming but also lack objectivity. To automate this process, we developed a high-throughput facility, dubbed high-throughput system for measuring automatically rice tillers (H-SMART), for measuring rice tillers based on a conventional x-ray computed tomography (CT) system and industrial conveyor. Each pot-grown rice plant was delivered into the CT system for scanning via the conveyor equipment. A filtered back-projection algorithm was used to reconstruct the transverse section image of the rice culms. The number of tillers was then automatically extracted by image segmentation. To evaluate the accuracy of this system, three batches of rice at different growth stages (tillering, heading, or filling) were tested, yielding absolute mean absolute errors of 0.22, 0.36, and 0.36, respectively. Subsequently, the complete machine was used under industry conditions to estimate its efficiency, which was 4320 pots per continuous 24 h workday. Thus, the H-SMART could determine the number of tillers of pot-grown rice plants, providing three advantages over the manual tillering method: absence of human disturbance, automation, and high throughput. This facility expands the application of agricultural photonics in plant phenomics.
Development and operation of a high-throughput accurate-wavelength lens-based spectrometer a)
Bell, Ronald E.
2014-07-11
A high-throughput spectrometer for the 400-820 nm wavelength range has been developed for charge exchange recombination spectroscopy or general spectroscopy. A large 2160 mm -1 grating is matched with fast f /1.8 200 mm lenses, which provide stigmatic imaging. A precision optical encoder measures the grating angle with an accuracy ≤ 0.075 arc seconds. A high quantum efficiency low-etaloning CCD detector allows operation at longer wavelengths. A patch panel allows input fibers to interface with interchangeable fiber holders that attach to a kinematic mount behind the entrance slit. The computer-controlled hardware allows automated control of wavelength, timing, f-number, automated datamore » collection, and wavelength calibration.« less
We demonstrate a computational network model that integrates 18 in vitro, high-throughput screening assays measuring estrogen receptor (ER) binding, dimerization, chromatin binding, transcriptional activation and ER-dependent cell proliferation. The network model uses activity pa...
Jamal, Salma; Scaria, Vinod
2013-11-19
Leishmaniasis is a neglected tropical disease which affects approx. 12 million individuals worldwide and caused by parasite Leishmania. The current drugs used in the treatment of Leishmaniasis are highly toxic and has seen widespread emergence of drug resistant strains which necessitates the need for the development of new therapeutic options. The high throughput screen data available has made it possible to generate computational predictive models which have the ability to assess the active scaffolds in a chemical library followed by its ADME/toxicity properties in the biological trials. In the present study, we have used publicly available, high-throughput screen datasets of chemical moieties which have been adjudged to target the pyruvate kinase enzyme of L. mexicana (LmPK). The machine learning approach was used to create computational models capable of predicting the biological activity of novel antileishmanial compounds. Further, we evaluated the molecules using the substructure based approach to identify the common substructures contributing to their activity. We generated computational models based on machine learning methods and evaluated the performance of these models based on various statistical figures of merit. Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC. We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset. We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.
Emerging Computational Methods for the Rational Discovery of Allosteric Drugs
2016-01-01
Allosteric drug development holds promise for delivering medicines that are more selective and less toxic than those that target orthosteric sites. To date, the discovery of allosteric binding sites and lead compounds has been mostly serendipitous, achieved through high-throughput screening. Over the past decade, structural data has become more readily available for larger protein systems and more membrane protein classes (e.g., GPCRs and ion channels), which are common allosteric drug targets. In parallel, improved simulation methods now provide better atomistic understanding of the protein dynamics and cooperative motions that are critical to allosteric mechanisms. As a result of these advances, the field of predictive allosteric drug development is now on the cusp of a new era of rational structure-based computational methods. Here, we review algorithms that predict allosteric sites based on sequence data and molecular dynamics simulations, describe tools that assess the druggability of these pockets, and discuss how Markov state models and topology analyses provide insight into the relationship between protein dynamics and allosteric drug binding. In each section, we first provide an overview of the various method classes before describing relevant algorithms and software packages. PMID:27074285
Emerging Computational Methods for the Rational Discovery of Allosteric Drugs.
Wagner, Jeffrey R; Lee, Christopher T; Durrant, Jacob D; Malmstrom, Robert D; Feher, Victoria A; Amaro, Rommie E
2016-06-08
Allosteric drug development holds promise for delivering medicines that are more selective and less toxic than those that target orthosteric sites. To date, the discovery of allosteric binding sites and lead compounds has been mostly serendipitous, achieved through high-throughput screening. Over the past decade, structural data has become more readily available for larger protein systems and more membrane protein classes (e.g., GPCRs and ion channels), which are common allosteric drug targets. In parallel, improved simulation methods now provide better atomistic understanding of the protein dynamics and cooperative motions that are critical to allosteric mechanisms. As a result of these advances, the field of predictive allosteric drug development is now on the cusp of a new era of rational structure-based computational methods. Here, we review algorithms that predict allosteric sites based on sequence data and molecular dynamics simulations, describe tools that assess the druggability of these pockets, and discuss how Markov state models and topology analyses provide insight into the relationship between protein dynamics and allosteric drug binding. In each section, we first provide an overview of the various method classes before describing relevant algorithms and software packages.
Classified one-step high-radix signed-digit arithmetic units
NASA Astrophysics Data System (ADS)
Cherri, Abdallah K.
1998-08-01
High-radix number systems enable higher information storage density, less complexity, fewer system components, and fewer cascaded gates and operations. A simple one-step fully parallel high-radix signed-digit arithmetic is proposed for parallel optical computing based on new joint spatial encodings. This reduces hardware requirements and improves throughput by reducing the space-bandwidth produce needed. The high-radix signed-digit arithmetic operations are based on classifying the neighboring input digit pairs into various groups to reduce the computation rules. A new joint spatial encoding technique is developed to present both the operands and the computation rules. This technique increases the spatial bandwidth product of the spatial light modulators of the system. An optical implementation of the proposed high-radix signed-digit arithmetic operations is also presented. It is shown that our one-step trinary signed-digit and quaternary signed-digit arithmetic units are much simpler and better than all previously reported high-radix signed-digit techniques.
Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure
Background: The U.S. EPA ToxCastTM program is screening thousands of environmental chemicals for bioactivity using hundreds of high-throughput in vitro assays to build predictive models of toxicity. We represented chemicals based on bioactivity and chemical structure descriptors ...
Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hura, Greg L.; Menon, Angeli L.; Hammel, Michal
2009-07-20
We present an efficient pipeline enabling high-throughput analysis of protein structure in solution with small angle X-ray scattering (SAXS). Our SAXS pipeline combines automated sample handling of microliter volumes, temperature and anaerobic control, rapid data collection and data analysis, and couples structural analysis with automated archiving. We subjected 50 representative proteins, mostly from Pyrococcus furiosus, to this pipeline and found that 30 were multimeric structures in solution. SAXS analysis allowed us to distinguish aggregated and unfolded proteins, define global structural parameters and oligomeric states for most samples, identify shapes and similar structures for 25 unknown structures, and determine envelopes formore » 41 proteins. We believe that high-throughput SAXS is an enabling technology that may change the way that structural genomics research is done.« less
Toxicokinetic and Dosimetry Modeling Tools for Exposure ...
New technologies and in vitro testing approaches have been valuable additions to risk assessments that have historically relied solely on in vivo test results. Compared to in vivo methods, in vitro high throughput screening (HTS) assays are less expensive, faster and can provide mechanistic insights on chemical action. However, extrapolating from in vitro chemical concentrations to target tissue or blood concentrations in vivo is fraught with uncertainties, and modeling is dependent upon pharmacokinetic variables not measured in in vitro assays. To address this need, new tools have been created for characterizing, simulating, and evaluating chemical toxicokinetics. Physiologically-based pharmacokinetic (PBPK) models provide estimates of chemical exposures that produce potentially hazardous tissue concentrations, while tissue microdosimetry PK models relate whole-body chemical exposures to cell-scale concentrations. These tools rely on high-throughput in vitro measurements, and successful methods exist for pharmaceutical compounds that determine PK from limited in vitro measurements and chemical structure-derived property predictions. These high throughput (HT) methods provide a more rapid and less resource–intensive alternative to traditional PK model development. We have augmented these in vitro data with chemical structure-based descriptors and mechanistic tissue partitioning models to construct HTPBPK models for over three hundred environmental and pharmace
Computational Tools for Stem Cell Biology
Bian, Qin; Cahan, Patrick
2016-01-01
For over half a century, the field of developmental biology has leveraged computation to explore mechanisms of developmental processes. More recently, computational approaches have been critical in the translation of high throughput data into knowledge of both developmental and stem cell biology. In the last several years, a new sub-discipline of computational stem cell biology has emerged that synthesizes the modeling of systems-level aspects of stem cells with high-throughput molecular data. In this review, we provide an overview of this new field and pay particular attention to the impact that single-cell transcriptomics is expected to have on our understanding of development and our ability to engineer cell fate. PMID:27318512
Computational Tools for Stem Cell Biology.
Bian, Qin; Cahan, Patrick
2016-12-01
For over half a century, the field of developmental biology has leveraged computation to explore mechanisms of developmental processes. More recently, computational approaches have been critical in the translation of high throughput data into knowledge of both developmental and stem cell biology. In the past several years, a new subdiscipline of computational stem cell biology has emerged that synthesizes the modeling of systems-level aspects of stem cells with high-throughput molecular data. In this review, we provide an overview of this new field and pay particular attention to the impact that single cell transcriptomics is expected to have on our understanding of development and our ability to engineer cell fate. Copyright © 2016 Elsevier Ltd. All rights reserved.
Burdick, David B; Cavnor, Chris C; Handcock, Jeremy; Killcoyne, Sarah; Lin, Jake; Marzolf, Bruz; Ramsey, Stephen A; Rovira, Hector; Bressler, Ryan; Shmulevich, Ilya; Boyle, John
2010-07-14
High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
2010-01-01
Background High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Results Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. Conclusion The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services. PMID:20630057
Nuclear Magnetic Resonance Spectroscopy-Based Identification of Yeast.
Himmelreich, Uwe; Sorrell, Tania C; Daniel, Heide-Marie
2017-01-01
Rapid and robust high-throughput identification of environmental, industrial, or clinical yeast isolates is important whenever relatively large numbers of samples need to be processed in a cost-efficient way. Nuclear magnetic resonance (NMR) spectroscopy generates complex data based on metabolite profiles, chemical composition and possibly on medium consumption, which can not only be used for the assessment of metabolic pathways but also for accurate identification of yeast down to the subspecies level. Initial results on NMR based yeast identification where comparable with conventional and DNA-based identification. Potential advantages of NMR spectroscopy in mycological laboratories include not only accurate identification but also the potential of automated sample delivery, automated analysis using computer-based methods, rapid turnaround time, high throughput, and low running costs.We describe here the sample preparation, data acquisition and analysis for NMR-based yeast identification. In addition, a roadmap for the development of classification strategies is given that will result in the acquisition of a database and analysis algorithms for yeast identification in different environments.
Eljarrat, A; López-Conesa, L; Estradé, S; Peiró, F
2016-05-01
In this work, we present characterization methods for the analysis of nanometer-sized devices, based on silicon and III-V nitride semiconductor materials. These methods are devised in order to take advantage of the aberration corrected scanning transmission electron microscope, equipped with a monochromator. This set-up ensures the necessary high spatial and energy resolution for the characterization of the smallest structures. As with these experiments, we aim to obtain chemical and structural information, we use electron energy loss spectroscopy (EELS). The low-loss region of EELS is exploited, which features fundamental electronic properties of semiconductor materials and facilitates a high data throughput. We show how the detailed analysis of these spectra, using theoretical models and computational tools, can enhance the analytical power of EELS. In this sense, initially, results from the model-based fit of the plasmon peak are presented. Moreover, the application of multivariate analysis algorithms to low-loss EELS is explored. Finally, some physical limitations of the technique, such as spatial delocalization, are mentioned. © 2015 The Authors Journal of Microscopy © 2015 Royal Microscopical Society.
JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures
Dong, Min; Graham, Mitchell; Yadav, Nehul
2017-01-01
Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416
Lai, Yiu Wai; Krause, Michael; Savan, Alan; Thienhaus, Sigurd; Koukourakis, Nektarios; Hofmann, Martin R; Ludwig, Alfred
2011-10-01
A high-throughput characterization technique based on digital holography for mapping film thickness in thin-film materials libraries was developed. Digital holographic microscopy is used for fully automatic measurements of the thickness of patterned films with nanometer resolution. The method has several significant advantages over conventional stylus profilometry: it is contactless and fast, substrate bending is compensated, and the experimental setup is simple. Patterned films prepared by different combinatorial thin-film approaches were characterized to investigate and demonstrate this method. The results show that this technique is valuable for the quick, reliable and high-throughput determination of the film thickness distribution in combinatorial materials research. Importantly, it can also be applied to thin films that have been structured by shadow masking.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yamada, Yusuke; Hiraki, Masahiko; Sasajima, Kumiko
2010-06-23
Recent advances in high-throughput techniques for macromolecular crystallography have highlighted the importance of structure-based drug design (SBDD), and the demand for synchrotron use by pharmaceutical researchers has increased. Thus, in collaboration with Astellas Pharma Inc., we have constructed a new high-throughput macromolecular crystallography beamline, AR-NE3A, which is dedicated to SBDD. At AR-NE3A, a photon flux up to three times higher than those at existing high-throughput beams at the Photon Factory, AR-NW12A and BL-5A, can be realized at the same sample positions. Installed in the experimental hutch are a high-precision diffractometer, fast-readout, high-gain CCD detector, and sample exchange robot capable ofmore » handling more than two hundred cryo-cooled samples stored in a Dewar. To facilitate high-throughput data collection required for pharmaceutical research, fully automated data collection and processing systems have been developed. Thus, sample exchange, centering, data collection, and data processing are automatically carried out based on the user's pre-defined schedule. Although Astellas Pharma Inc. has a priority access to AR-NE3A, the remaining beam time is allocated to general academic and other industrial users.« less
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.
Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob
2013-01-01
We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992
Jiang, Guangli; Liu, Leibo; Zhu, Wenping; Yin, Shouyi; Wei, Shaojun
2015-09-04
This paper proposes a real-time feature extraction VLSI architecture for high-resolution images based on the accelerated KAZE algorithm. Firstly, a new system architecture is proposed. It increases the system throughput, provides flexibility in image resolution, and offers trade-offs between speed and scaling robustness. The architecture consists of a two-dimensional pipeline array that fully utilizes computational similarities in octaves. Secondly, a substructure (block-serial discrete-time cellular neural network) that can realize a nonlinear filter is proposed. This structure decreases the memory demand through the removal of data dependency. Thirdly, a hardware-friendly descriptor is introduced in order to overcome the hardware design bottleneck through the polar sample pattern; a simplified method to realize rotation invariance is also presented. Finally, the proposed architecture is designed in TSMC 65 nm CMOS technology. The experimental results show a performance of 127 fps in full HD resolution at 200 MHz frequency. The peak performance reaches 181 GOPS and the throughput is double the speed of other state-of-the-art architectures.
Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie
2017-05-15
B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.
Design and implementation of a high performance network security processor
NASA Astrophysics Data System (ADS)
Wang, Haixin; Bai, Guoqiang; Chen, Hongyi
2010-03-01
The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.
High Throughput Sequence Analysis for Disease Resistance in Maize
USDA-ARS?s Scientific Manuscript database
Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...
The focus of this meeting is the SAP's review and comment on the Agency's proposed high-throughput computational model of androgen receptor pathway activity as an alternative to the current Tier 1 androgen receptor assay (OCSPP 890.1150: Androgen Receptor Binding Rat Prostate Cyt...
The US EPA’s ToxCastTM program seeks to combine advances in high-throughput screening technology with methodologies from statistics and computer science to develop high-throughput decision support tools for assessing chemical hazard and risk. To develop new methods of analysis of...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allaire, Marc, E-mail: allaire@bnl.gov; Moiseeva, Natalia; Botez, Cristian E.
The correlation coefficients calculated between raw powder diffraction profiles can be used to identify ligand-bound/unbound states of lysozyme. The discovery of ligands that bind specifically to a targeted protein benefits from the development of generic assays for high-throughput screening of a library of chemicals. Protein powder diffraction (PPD) has been proposed as a potential method for use as a structure-based assay for high-throughput screening applications. Building on this effort, powder samples of bound/unbound states of soluble hen-egg white lysozyme precipitated with sodium chloride were compared. The correlation coefficients calculated between the raw diffraction profiles were consistent with the known bindingmore » properties of the ligands and suggested that the PPD approach can be used even prior to a full description using stereochemically restrained Rietveld refinement.« less
Computational methods for evaluation of cell-based data assessment--Bioconductor.
Le Meur, Nolwenn
2013-02-01
Recent advances in miniaturization and automation of technologies have enabled cell-based assay high-throughput screening, bringing along new challenges in data analysis. Automation, standardization, reproducibility have become requirements for qualitative research. The Bioconductor community has worked in that direction proposing several R packages to handle high-throughput data including flow cytometry (FCM) experiment. Altogether, these packages cover the main steps of a FCM analysis workflow, that is, data management, quality assessment, normalization, outlier detection, automated gating, cluster labeling, and feature extraction. Additionally, the open-source philosophy of R and Bioconductor, which offers room for new development, continuously drives research and improvement of theses analysis methods, especially in the field of clustering and data mining. This review presents the principal FCM packages currently available in R and Bioconductor, their advantages and their limits. Copyright © 2012 Elsevier Ltd. All rights reserved.
Stepping into the omics era: Opportunities and challenges for biomaterials science and engineering☆
Rabitz, Herschel; Welsh, William J.; Kohn, Joachim; de Boer, Jan
2016-01-01
The research paradigm in biomaterials science and engineering is evolving from using low-throughput and iterative experimental designs towards high-throughput experimental designs for materials optimization and the evaluation of materials properties. Computational science plays an important role in this transition. With the emergence of the omics approach in the biomaterials field, referred to as materiomics, high-throughput approaches hold the promise of tackling the complexity of materials and understanding correlations between material properties and their effects on complex biological systems. The intrinsic complexity of biological systems is an important factor that is often oversimplified when characterizing biological responses to materials and establishing property-activity relationships. Indeed, in vitro tests designed to predict in vivo performance of a given biomaterial are largely lacking as we are not able to capture the biological complexity of whole tissues in an in vitro model. In this opinion paper, we explain how we reached our opinion that converging genomics and materiomics into a new field would enable a significant acceleration of the development of new and improved medical devices. The use of computational modeling to correlate high-throughput gene expression profiling with high throughput combinatorial material design strategies would add power to the analysis of biological effects induced by material properties. We believe that this extra layer of complexity on top of high-throughput material experimentation is necessary to tackle the biological complexity and further advance the biomaterials field. PMID:26876875
Tepper, Naama; Shlomi, Tomer
2011-01-21
Combinatorial approaches in metabolic engineering work by generating genetic diversity in a microbial population followed by screening for strains with improved phenotypes. One of the most common goals in this field is the generation of a high rate chemical producing strain. A major hurdle with this approach is that many chemicals do not have easy to recognize attributes, making their screening expensive and time consuming. To address this problem, it was previously suggested to use microbial biosensors to facilitate the detection and quantification of chemicals of interest. Here, we present novel computational methods to: (i) rationally design microbial biosensors for chemicals of interest based on substrate auxotrophy that would enable their high-throughput screening; (ii) predict engineering strategies for coupling the synthesis of a chemical of interest with the production of a proxy metabolite for which high-throughput screening is possible via a designed bio-sensor. The biosensor design method is validated based on known genetic modifications in an array of E. coli strains auxotrophic to various amino-acids. Predicted chemical production rates achievable via the biosensor-based approach are shown to potentially improve upon those predicted by current rational strain design approaches. (A Matlab implementation of the biosensor design method is available via http://www.cs.technion.ac.il/~tomersh/tools).
MultiPhyl: a high-throughput phylogenomics webserver using distributed computing
Keane, Thomas M.; Naughton, Thomas J.; McInerney, James O.
2007-01-01
With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php. PMID:17553837
p3d--Python module for structural bioinformatics.
Fufezan, Christian; Specht, Michael
2009-08-21
High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.
A Fully Automated High-Throughput Zebrafish Behavioral Ototoxicity Assay.
Todd, Douglas W; Philip, Rohit C; Niihori, Maki; Ringle, Ryan A; Coyle, Kelsey R; Zehri, Sobia F; Zabala, Leanne; Mudery, Jordan A; Francis, Ross H; Rodriguez, Jeffrey J; Jacob, Abraham
2017-08-01
Zebrafish animal models lend themselves to behavioral assays that can facilitate rapid screening of ototoxic, otoprotective, and otoregenerative drugs. Structurally similar to human inner ear hair cells, the mechanosensory hair cells on their lateral line allow the zebrafish to sense water flow and orient head-to-current in a behavior called rheotaxis. This rheotaxis behavior deteriorates in a dose-dependent manner with increased exposure to the ototoxin cisplatin, thereby establishing itself as an excellent biomarker for anatomic damage to lateral line hair cells. Building on work by our group and others, we have built a new, fully automated high-throughput behavioral assay system that uses automated image analysis techniques to quantify rheotaxis behavior. This novel system consists of a custom-designed swimming apparatus and imaging system consisting of network-controlled Raspberry Pi microcomputers capturing infrared video. Automated analysis techniques detect individual zebrafish, compute their orientation, and quantify the rheotaxis behavior of a zebrafish test population, producing a powerful, high-throughput behavioral assay. Using our fully automated biological assay to test a standardized ototoxic dose of cisplatin against varying doses of compounds that protect or regenerate hair cells may facilitate rapid translation of candidate drugs into preclinical mammalian models of hearing loss.
Preparation of Protein Samples for NMR Structure, Function, and Small Molecule Screening Studies
Acton, Thomas B.; Xiao, Rong; Anderson, Stephen; Aramini, James; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Kornhaber, Gregory; Lau, Jessica; Lee, Dong Yup; Liu, Gaohua; Maglaqui, Melissa; Ma, Lichung; Mao, Lei; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Shastry, Ritu; Swapna, G.V.T.; Tang, Yeufeng; Tong, Saichiu; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.
2014-01-01
In this chapter, we concentrate on the production of high quality protein samples for NMR studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium, and outline our high-throughput strategies for producing high quality protein samples for nuclear magnetic resonance (NMR) studies. Our strategy is based on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6X-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (> 97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5,000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this paper describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening and structural genomics research. PMID:21371586
NASA Astrophysics Data System (ADS)
Menichetti, Roberto; Kanekal, Kiran H.; Kremer, Kurt; Bereau, Tristan
2017-09-01
The partitioning of small molecules in cell membranes—a key parameter for pharmaceutical applications—typically relies on experimentally available bulk partitioning coefficients. Computer simulations provide a structural resolution of the insertion thermodynamics via the potential of mean force but require significant sampling at the atomistic level. Here, we introduce high-throughput coarse-grained molecular dynamics simulations to screen thermodynamic properties. This application of physics-based models in a large-scale study of small molecules establishes linear relationships between partitioning coefficients and key features of the potential of mean force. This allows us to predict the structure of the insertion from bulk experimental measurements for more than 400 000 compounds. The potential of mean force hereby becomes an easily accessible quantity—already recognized for its high predictability of certain properties, e.g., passive permeation. Further, we demonstrate how coarse graining helps reduce the size of chemical space, enabling a hierarchical approach to screening small molecules.
Evaluation of the OpenCL AES Kernel using the Intel FPGA SDK for OpenCL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Zheming; Yoshii, Kazutomo; Finkel, Hal
The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes themore » FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. In this report, we evaluate the performance of the kernel using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board. Compared to the M506 module, the board provides more hardware resources for a larger design exploration space. The kernel performance is measured with the compute kernel throughput, an upper bound to the FPGA throughput. The report presents the experimental results in details. The Appendix lists the kernel source code.« less
2014-01-01
Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312
Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang
2014-03-05
RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.
Bunn, Jonathan Kenneth; Fang, Randy L; Albing, Mark R; Mehta, Apurva; Kramer, Matthew J; Besser, Matthew F; Hattrick-Simpers, Jason R
2015-07-10
High-temperature alloy coatings that can resist oxidation are urgently needed as nuclear cladding materials to mitigate the danger of hydrogen explosions during meltdown. Here we apply a combination of computationally guided materials synthesis, high-throughput structural characterization and data analysis tools to investigate the feasibility of coatings from the Fe–Cr–Al alloy system. Composition-spread samples were synthesized to cover the region of the phase diagram previous bulk studies have identified as forming protective oxides. The metallurgical and oxide phase evolution were studied via in situ synchrotron glancing incidence x-ray diffraction at temperatures up to 690 K. A composition region with an Al concentration greater than 3.08 at%, and between 20.0 at% and 32.9 at% Cr showed the least overall oxide growth. Subsequently, a series of samples were deposited on stubs and their oxidation behavior at 1373 K was observed. The continued presence of a passivating oxide was confirmed in this region over a period of 6 h.
Budavari, Tamas; Langmead, Ben; Wheelan, Sarah J.; Salzberg, Steven L.; Szalay, Alexander S.
2015-01-01
When computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU) hardware.We followed this approach in implementing a read aligner called Arioc that uses GPU-based parallel sort and reduction techniques to identify high-priority locations where potential alignments may be found. We then carried out a read-by-read comparison of Arioc’s reported alignments with the alignments found by several leading read aligners. With simulated reads, Arioc has comparable or better accuracy than the other read aligners we tested. With human sequencing reads, Arioc demonstrates significantly greater throughput than the other aligners we evaluated across a wide range of sensitivity settings. The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license. PMID:25780763
ERIC Educational Resources Information Center
da Silveira, Pedro Rodrigo Castro
2014-01-01
This thesis describes the development and deployment of a cyberinfrastructure for distributed high-throughput computations of materials properties at high pressures and/or temperatures--the Virtual Laboratory for Earth and Planetary Materials--VLab. VLab was developed to leverage the aggregated computational power of grid systems to solve…
Microarray profiling of chemical-induced effects is being increasingly used in medium and high-throughput formats. In this study, we describe computational methods to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), ...
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
The topology of metabolic isotope labeling networks.
Weitzel, Michael; Wiechert, Wolfgang; Nöh, Katharina
2007-08-29
Metabolic Flux Analysis (MFA) based on isotope labeling experiments (ILEs) is a widely established tool for determining fluxes in metabolic pathways. Isotope labeling networks (ILNs) contain all essential information required to describe the flow of labeled material in an ILE. Whereas recent experimental progress paves the way for high-throughput MFA, large network investigations and exact statistical methods, these developments are still limited by the poor performance of computational routines used for the evaluation and design of ILEs. In this context, the global analysis of ILN topology turns out to be a clue for realizing large speedup factors in all required computational procedures. With a strong focus on the speedup of algorithms the topology of ILNs is investigated using graph theoretic concepts and algorithms. A rigorous determination of all cyclic and isomorphic subnetworks, accompanied by the global analysis of ILN connectivity is performed. Particularly, it is proven that ILNs always brake up into a large number of small strongly connected components (SCCs) and, moreover, there are natural isomorphisms between many of these SCCs. All presented techniques are universal, i.e. they do not require special assumptions on the network structure, bidirectionality of fluxes, measurement configuration, or label input. The general results are exemplified with a practically relevant metabolic network which describes the central metabolism of E. coli comprising 10390 isotopomer pools. Exploiting the topological features of ILNs leads to a significant speedup of all universal algorithms for ILE evaluation. It is proven in theory and exemplified with the E. coli example that a speedup factor of about 1000 compared to standard algorithms is achieved. This widely opens the door for new high performance algorithms suitable for high throughput applications and large ILNs. Moreover, for the first time the global topological analysis of ILNs allows to comprehensively describe and understand the general patterns of label flow in complex networks. This is an invaluable tool for the structural design of new experiments and the interpretation of measured data.
Gene Ontology annotations at SGD: new data sources and annotation methods
Hong, Eurie L.; Balakrishnan, Rama; Dong, Qing; Christie, Karen R.; Park, Julie; Binkley, Gail; Costanzo, Maria C.; Dwight, Selina S.; Engel, Stacia R.; Fisk, Dianna G.; Hirschman, Jodi E.; Hitz, Benjamin C.; Krieger, Cynthia J.; Livstone, Michael S.; Miyasato, Stuart R.; Nash, Robert S.; Oughtred, Rose; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Zhu, Kathy K.; Dolinski, Kara; Botstein, David; Cherry, J. Michael
2008-01-01
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current. PMID:17982175
Micro-patterned agarose gel devices for single-cell high-throughput microscopy of E. coli cells.
Priest, David G; Tanaka, Nobuyuki; Tanaka, Yo; Taniguchi, Yuichi
2017-12-21
High-throughput microscopy of bacterial cells elucidated fundamental cellular processes including cellular heterogeneity and cell division homeostasis. Polydimethylsiloxane (PDMS)-based microfluidic devices provide advantages including precise positioning of cells and throughput, however device fabrication is time-consuming and requires specialised skills. Agarose pads are a popular alternative, however cells often clump together, which hinders single cell quantitation. Here, we imprint agarose pads with micro-patterned 'capsules', to trap individual cells and 'lines', to direct cellular growth outwards in a straight line. We implement this micro-patterning into multi-pad devices called CapsuleHotel and LineHotel for high-throughput imaging. CapsuleHotel provides ~65,000 capsule structures per mm 2 that isolate individual Escherichia coli cells. In contrast, LineHotel provides ~300 line structures per mm that direct growth of micro-colonies. With CapsuleHotel, a quantitative single cell dataset of ~10,000 cells across 24 samples can be acquired and analysed in under 1 hour. LineHotel allows tracking growth of > 10 micro-colonies across 24 samples simultaneously for up to 4 generations. These easy-to-use devices can be provided in kit format, and will accelerate discoveries in diverse fields ranging from microbiology to systems and synthetic biology.
Image Harvest: an open-source platform for high-throughput plant image processing and analysis
Knecht, Avi C.; Campbell, Malachy T.; Caprez, Adam; Swanson, David R.; Walia, Harkamal
2016-01-01
High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. PMID:27141917
Oulas, Anastasis; Karathanasis, Nestoras; Louloupi, Annita; Pavlopoulos, Georgios A; Poirazi, Panayiota; Kalantidis, Kriton; Iliopoulos, Ioannis
2015-01-01
Computational methods for miRNA target prediction are currently undergoing extensive review and evaluation. There is still a great need for improvement of these tools and bioinformatics approaches are looking towards high-throughput experiments in order to validate predictions. The combination of large-scale techniques with computational tools will not only provide greater credence to computational predictions but also lead to the better understanding of specific biological questions. Current miRNA target prediction tools utilize probabilistic learning algorithms, machine learning methods and even empirical biologically defined rules in order to build models based on experimentally verified miRNA targets. Large-scale protein downregulation assays and next-generation sequencing (NGS) are now being used to validate methodologies and compare the performance of existing tools. Tools that exhibit greater correlation between computational predictions and protein downregulation or RNA downregulation are considered the state of the art. Moreover, efficiency in prediction of miRNA targets that are concurrently verified experimentally provides additional validity to computational predictions and further highlights the competitive advantage of specific tools and their efficacy in extracting biologically significant results. In this review paper, we discuss the computational methods for miRNA target prediction and provide a detailed comparison of methodologies and features utilized by each specific tool. Moreover, we provide an overview of current state-of-the-art high-throughput methods used in miRNA target prediction.
lncRNATargets: A platform for lncRNA target prediction based on nucleic acid thermodynamics.
Hu, Ruifeng; Sun, Xiaobo
2016-08-01
Many studies have supported that long noncoding RNAs (lncRNAs) perform various functions in various critical biological processes. Advanced experimental and computational technologies allow access to more information on lncRNAs. Determining the functions and action mechanisms of these RNAs on a large scale is urgently needed. We provided lncRNATargets, which is a web-based platform for lncRNA target prediction based on nucleic acid thermodynamics. The nearest-neighbor (NN) model was used to calculate binging-free energy. The main principle of NN model for nucleic acid assumes that identity and orientation of neighbor base pairs determine stability of a given base pair. lncRNATargets features the following options: setting of a specific temperature that allow use not only for human but also for other animals or plants; processing all lncRNAs in high throughput without RNA size limitation that is superior to any other existing tool; and web-based, user-friendly interface, and colored result displays that allow easy access for nonskilled computer operators and provide better understanding of results. This technique could provide accurate calculation on the binding-free energy of lncRNA-target dimers to predict if these structures are well targeted together. lncRNATargets provides high accuracy calculations, and this user-friendly program is available for free at http://www.herbbol.org:8001/lrt/ .
Overview of the LINCS architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fletcher, J.G.; Watson, R.W.
1982-01-13
Computing at the Lawrence Livermore National Laboratory (LLNL) has evolved over the past 15 years with a computer network based resource sharing environment. The increasing use of low cost and high performance micro, mini and midi computers and commercially available local networking systems will accelerate this trend. Further, even the large scale computer systems, on which much of the LLNL scientific computing depends, are evolving into multiprocessor systems. It is our belief that the most cost effective use of this environment will depend on the development of application systems structured into cooperating concurrent program modules (processes) distributed appropriately over differentmore » nodes of the environment. A node is defined as one or more processors with a local (shared) high speed memory. Given the latter view, the environment can be characterized as consisting of: multiple nodes communicating over noisy channels with arbitrary delays and throughput, heterogenous base resources and information encodings, no single administration controlling all resources, distributed system state, and no uniform time base. The system design problem is - how to turn the heterogeneous base hardware/firmware/software resources of this environment into a coherent set of resources that facilitate development of cost effective, reliable, and human engineered applications. We believe the answer lies in developing a layered, communication oriented distributed system architecture; layered and modular to support ease of understanding, reconfiguration, extensibility, and hiding of implementation or nonessential local details; communication oriented because that is a central feature of the environment. The Livermore Interactive Network Communication System (LINCS) is a hierarchical architecture designed to meet the above needs. While having characteristics in common with other architectures, it differs in several respects.« less
Detection of IgG aggregation by a high throughput method based on extrinsic fluorescence.
He, Feng; Phan, Duke H; Hogan, Sabine; Bailey, Robert; Becker, Gerald W; Narhi, Linda O; Razinkov, Vladimir I
2010-06-01
The utility of extrinsic fluorescence as a tool for high throughput detection of monoclonal antibody aggregates was explored. Several IgG molecules were thermally stressed and the high molecular weight species were fractionated using size-exclusion chromatography (SEC). The isolated aggregates and monomers were studied by following the fluorescence of an extrinsic probe, SYPRO Orange. The dye displayed high sensitivity to structurally altered, aggregated IgG structures compared to the native form, which resulted in very low fluorescence in the presence of the dye. An example of the application is presented here to demonstrate the properties of this detection method. The fluorescence assay was shown to correlate with the SEC method in quantifying IgG aggregates. The fluorescent probe method appears to have potential to detect protein particles that could not be analyzed by SEC. This method may become a powerful high throughput tool to detect IgG aggregates in pharmaceutical solutions and to study other protein properties involving aggregation. It can also be used to study the kinetics of antibody particle formation, and perhaps allow identification of the species, which are the early building blocks of protein particles. (c) 2009 Wiley-Liss, Inc. and the American Pharmacists Association
NASA Technical Reports Server (NTRS)
Prevot, Thomas
2012-01-01
This paper describes the underlying principles and algorithms for computing the primary controller managed spacing (CMS) tools developed at NASA for precisely spacing aircraft along efficient descent paths. The trajectory-based CMS tools include slot markers, delay indications and speed advisories. These tools are one of three core NASA technologies integrated in NASAs ATM technology demonstration-1 (ATD-1) that will operationally demonstrate the feasibility of fuel-efficient, high throughput arrival operations using Automatic Dependent Surveillance Broadcast (ADS-B) and ground-based and airborne NASA technologies for precision scheduling and spacing.
A real-time spike sorting method based on the embedded GPU.
Zelan Yang; Kedi Xu; Xiang Tian; Shaomin Zhang; Xiaoxiang Zheng
2017-07-01
Microelectrode arrays with hundreds of channels have been widely used to acquire neuron population signals in neuroscience studies. Online spike sorting is becoming one of the most important challenges for high-throughput neural signal acquisition systems. Graphic processing unit (GPU) with high parallel computing capability might provide an alternative solution for increasing real-time computational demands on spike sorting. This study reported a method of real-time spike sorting through computing unified device architecture (CUDA) which was implemented on an embedded GPU (NVIDIA JETSON Tegra K1, TK1). The sorting approach is based on the principal component analysis (PCA) and K-means. By analyzing the parallelism of each process, the method was further optimized in the thread memory model of GPU. Our results showed that the GPU-based classifier on TK1 is 37.92 times faster than the MATLAB-based classifier on PC while their accuracies were the same with each other. The high-performance computing features of embedded GPU demonstrated in our studies suggested that the embedded GPU provide a promising platform for the real-time neural signal processing.
Ching, Travers; Zhu, Xun; Garmire, Lana X
2018-04-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.
Ensembler: Enabling High-Throughput Molecular Simulations at the Superfamily Scale.
Parton, Daniel L; Grinaway, Patrick B; Hanson, Sonya M; Beauchamp, Kyle A; Chodera, John D
2016-06-01
The rapidly expanding body of available genomic and protein structural data provides a rich resource for understanding protein dynamics with biomolecular simulation. While computational infrastructure has grown rapidly, simulations on an omics scale are not yet widespread, primarily because software infrastructure to enable simulations at this scale has not kept pace. It should now be possible to study protein dynamics across entire (super)families, exploiting both available structural biology data and conformational similarities across homologous proteins. Here, we present a new tool for enabling high-throughput simulation in the genomics era. Ensembler takes any set of sequences-from a single sequence to an entire superfamily-and shepherds them through various stages of modeling and refinement to produce simulation-ready structures. This includes comparative modeling to all relevant PDB structures (which may span multiple conformational states of interest), reconstruction of missing loops, addition of missing atoms, culling of nearly identical structures, assignment of appropriate protonation states, solvation in explicit solvent, and refinement and filtering with molecular simulation to ensure stable simulation. The output of this pipeline is an ensemble of structures ready for subsequent molecular simulations using computer clusters, supercomputers, or distributed computing projects like Folding@home. Ensembler thus automates much of the time-consuming process of preparing protein models suitable for simulation, while allowing scalability up to entire superfamilies. A particular advantage of this approach can be found in the construction of kinetic models of conformational dynamics-such as Markov state models (MSMs)-which benefit from a diverse array of initial configurations that span the accessible conformational states to aid sampling. We demonstrate the power of this approach by constructing models for all catalytic domains in the human tyrosine kinase family, using all available kinase catalytic domain structures from any organism as structural templates. Ensembler is free and open source software licensed under the GNU General Public License (GPL) v2. It is compatible with Linux and OS X. The latest release can be installed via the conda package manager, and the latest source can be downloaded from https://github.com/choderalab/ensembler.
Pietiainen, Vilja; Saarela, Jani; von Schantz, Carina; Turunen, Laura; Ostling, Paivi; Wennerberg, Krister
2014-05-01
The High Throughput Biomedicine (HTB) unit at the Institute for Molecular Medicine Finland FIMM was established in 2010 to serve as a national and international academic screening unit providing access to state of the art instrumentation for chemical and RNAi-based high throughput screening. The initial focus of the unit was multiwell plate based chemical screening and high content microarray-based siRNA screening. However, over the first four years of operation, the unit has moved to a more flexible service platform where both chemical and siRNA screening is performed at different scales primarily in multiwell plate-based assays with a wide range of readout possibilities with a focus on ultraminiaturization to allow for affordable screening for the academic users. In addition to high throughput screening, the equipment of the unit is also used to support miniaturized, multiplexed and high throughput applications for other types of research such as genomics, sequencing and biobanking operations. Importantly, with the translational research goals at FIMM, an increasing part of the operations at the HTB unit is being focused on high throughput systems biological platforms for functional profiling of patient cells in personalized and precision medicine projects.
FPGA cluster for high-performance AO real-time control system
NASA Astrophysics Data System (ADS)
Geng, Deli; Goodsell, Stephen J.; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.; Saunter, Chris D.
2006-06-01
Whilst the high throughput and low latency requirements for the next generation AO real-time control systems have posed a significant challenge to von Neumann architecture processor systems, the Field Programmable Gate Array (FPGA) has emerged as a long term solution with high performance on throughput and excellent predictability on latency. Moreover, FPGA devices have highly capable programmable interfacing, which lead to more highly integrated system. Nevertheless, a single FPGA is still not enough: multiple FPGA devices need to be clustered to perform the required subaperture processing and the reconstruction computation. In an AO real-time control system, the memory bandwidth is often the bottleneck of the system, simply because a vast amount of supporting data, e.g. pixel calibration maps and the reconstruction matrix, need to be accessed within a short period. The cluster, as a general computing architecture, has excellent scalability in processing throughput, memory bandwidth, memory capacity, and communication bandwidth. Problems, such as task distribution, node communication, system verification, are discussed.
CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging.
Held, Michael; Schmitz, Michael H A; Fischer, Bernd; Walter, Thomas; Neumann, Beate; Olma, Michael H; Peter, Matthias; Ellenberg, Jan; Gerlich, Daniel W
2010-09-01
Fluorescence time-lapse imaging has become a powerful tool to investigate complex dynamic processes such as cell division or intracellular trafficking. Automated microscopes generate time-resolved imaging data at high throughput, yet tools for quantification of large-scale movie data are largely missing. Here we present CellCognition, a computational framework to annotate complex cellular dynamics. We developed a machine-learning method that combines state-of-the-art classification with hidden Markov modeling for annotation of the progression through morphologically distinct biological states. Incorporation of time information into the annotation scheme was essential to suppress classification noise at state transitions and confusion between different functional states with similar morphology. We demonstrate generic applicability in different assays and perturbation conditions, including a candidate-based RNA interference screen for regulators of mitotic exit in human cells. CellCognition is published as open source software, enabling live-cell imaging-based screening with assays that directly score cellular dynamics.
Tan, Wui Siew; Lewis, Christina L; Horelik, Nicholas E; Pregibon, Daniel C; Doyle, Patrick S; Yi, Hyunmin
2008-11-04
We demonstrate hierarchical assembly of tobacco mosaic virus (TMV)-based nanotemplates with hydrogel-based encoded microparticles via nucleic acid hybridization. TMV nanotemplates possess a highly defined structure and a genetically engineered high density thiol functionality. The encoded microparticles are produced in a high throughput microfluidic device via stop-flow lithography (SFL) and consist of spatially discrete regions containing encoded identity information, an internal control, and capture DNAs. For the hybridization-based assembly, partially disassembled TMVs were programmed with linker DNAs that contain sequences complementary to both the virus 5' end and a selected capture DNA. Fluorescence microscopy, atomic force microscopy (AFM), and confocal microscopy results clearly indicate facile assembly of TMV nanotemplates onto microparticles with high spatial and sequence selectivity. We anticipate that our hybridization-based assembly strategy could be employed to create multifunctional viral-synthetic hybrid materials in a rapid and high-throughput manner. Additionally, we believe that these viral-synthetic hybrid microparticles may find broad applications in high capacity, multiplexed target sensing.
Computational Toxicology at the US EPA
Computational toxicology is the application of mathematical and computer models to help assess chemical hazards and risks to human health and the environment. Supported by advances in informatics, high-throughput screening (HTS) technologies, and systems biology, EPA is developin...
NASA Astrophysics Data System (ADS)
Ward, Logan; Liu, Ruoqian; Krishna, Amar; Hegde, Vinay I.; Agrawal, Ankit; Choudhary, Alok; Wolverton, Chris
2017-07-01
While high-throughput density functional theory (DFT) has become a prevalent tool for materials discovery, it is limited by the relatively large computational cost. In this paper, we explore using DFT data from high-throughput calculations to create faster, surrogate models with machine learning (ML) that can be used to guide new searches. Our method works by using decision tree models to map DFT-calculated formation enthalpies to a set of attributes consisting of two distinct types: (i) composition-dependent attributes of elemental properties (as have been used in previous ML models of DFT formation energies), combined with (ii) attributes derived from the Voronoi tessellation of the compound's crystal structure. The ML models created using this method have half the cross-validation error and similar training and evaluation speeds to models created with the Coulomb matrix and partial radial distribution function methods. For a dataset of 435 000 formation energies taken from the Open Quantum Materials Database (OQMD), our model achieves a mean absolute error of 80 meV/atom in cross validation, which is lower than the approximate error between DFT-computed and experimentally measured formation enthalpies and below 15% of the mean absolute deviation of the training set. We also demonstrate that our method can accurately estimate the formation energy of materials outside of the training set and be used to identify materials with especially large formation enthalpies. We propose that our models can be used to accelerate the discovery of new materials by identifying the most promising materials to study with DFT at little additional computational cost.
Jowhar, Ziad; Gudla, Prabhakar R; Shachar, Sigal; Wangsa, Darawalee; Russ, Jill L; Pegoraro, Gianluca; Ried, Thomas; Raznahan, Armin; Misteli, Tom
2018-06-01
The spatial organization of chromosomes in the nuclear space is an extensively studied field that relies on measurements of structural features and 3D positions of chromosomes with high precision and robustness. However, no tools are currently available to image and analyze chromosome territories in a high-throughput format. Here, we have developed High-throughput Chromosome Territory Mapping (HiCTMap), a method for the robust and rapid analysis of 2D and 3D chromosome territory positioning in mammalian cells. HiCTMap is a high-throughput imaging-based chromosome detection method which enables routine analysis of chromosome structure and nuclear position. Using an optimized FISH staining protocol in a 384-well plate format in conjunction with a bespoke automated image analysis workflow, HiCTMap faithfully detects chromosome territories and their position in 2D and 3D in a large population of cells per experimental condition. We apply this novel technique to visualize chromosomes 18, X, and Y in male and female primary human skin fibroblasts, and show accurate detection of the correct number of chromosomes in the respective genotypes. Given the ability to visualize and quantitatively analyze large numbers of nuclei, we use HiCTMap to measure chromosome territory area and volume with high precision and determine the radial position of chromosome territories using either centroid or equidistant-shell analysis. The HiCTMap protocol is also compatible with RNA FISH as demonstrated by simultaneous labeling of X chromosomes and Xist RNA in female cells. We suggest HiCTMap will be a useful tool for routine precision mapping of chromosome territories in a wide range of cell types and tissues. Published by Elsevier Inc.
Computational discovery of picomolar Q(o) site inhibitors of cytochrome bc1 complex.
Hao, Ge-Fei; Wang, Fu; Li, Hui; Zhu, Xiao-Lei; Yang, Wen-Chao; Huang, Li-Shar; Wu, Jia-Wei; Berry, Edward A; Yang, Guang-Fu
2012-07-11
A critical challenge to the fragment-based drug discovery (FBDD) is its low-throughput nature due to the necessity of biophysical method-based fragment screening. Herein, a method of pharmacophore-linked fragment virtual screening (PFVS) was successfully developed. Its application yielded the first picomolar-range Q(o) site inhibitors of the cytochrome bc(1) complex, an important membrane protein for drug and fungicide discovery. Compared with the original hit compound 4 (K(i) = 881.80 nM, porcine bc(1)), the most potent compound 4f displayed 20 507-fold improved binding affinity (K(i) = 43.00 pM). Compound 4f was proved to be a noncompetitive inhibitor with respect to the substrate cytochrome c, but a competitive inhibitor with respect to the substrate ubiquinol. Additionally, we determined the crystal structure of compound 4e (K(i) = 83.00 pM) bound to the chicken bc(1) at 2.70 Å resolution, providing a molecular basis for understanding its ultrapotency. To our knowledge, this study is the first application of the FBDD method in the discovery of picomolar inhibitors of a membrane protein. This work demonstrates that the novel PFVS approach is a high-throughput drug discovery method, independent of biophysical screening techniques.
Computational biology in the cloud: methods and new insights from computing at scale.
Kasson, Peter M
2013-01-01
The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.
The use of high-throughput in vitro assays has been proposed to play a significant role in the future of toxicity testing. In this study, rat hepatic metabolic clearance and plasma protein binding were measured for 59 ToxCast phase I chemicals. Computational in vitro-to-in vivo e...
Subnuclear foci quantification using high-throughput 3D image cytometry
NASA Astrophysics Data System (ADS)
Wadduwage, Dushan N.; Parrish, Marcus; Choi, Heejin; Engelward, Bevin P.; Matsudaira, Paul; So, Peter T. C.
2015-07-01
Ionising radiation causes various types of DNA damages including double strand breaks (DSBs). DSBs are often recognized by DNA repair protein ATM which forms gamma-H2AX foci at the site of the DSBs that can be visualized using immunohistochemistry. However most of such experiments are of low throughput in terms of imaging and image analysis techniques. Most of the studies still use manual counting or classification. Hence they are limited to counting a low number of foci per cell (5 foci per nucleus) as the quantification process is extremely labour intensive. Therefore we have developed a high throughput instrumentation and computational pipeline specialized for gamma-H2AX foci quantification. A population of cells with highly clustered foci inside nuclei were imaged, in 3D with submicron resolution, using an in-house developed high throughput image cytometer. Imaging speeds as high as 800 cells/second in 3D were achieved by using HiLo wide-field depth resolved imaging and a remote z-scanning technique. Then the number of foci per cell nucleus were quantified using a 3D extended maxima transform based algorithm. Our results suggests that while most of the other 2D imaging and manual quantification studies can count only up to about 5 foci per nucleus our method is capable of counting more than 100. Moreover we show that 3D analysis is significantly superior compared to the 2D techniques.
Jia, Kun; Bijeon, Jean Louis; Adam, Pierre Michel; Ionescu, Rodica Elena
2013-02-21
A commercial TEM grid was used as a mask for the creation of extremely well-organized gold micro-/nano-structures on a glass substrate via a high temperature annealing process at 500 °C. The structured substrate was (bio)functionalized and used for the high throughput LSPR immunosensing of different concentrations of a model protein named bovine serum albumin.
Emergence of a catalytic tetrad during evolution of a highly active artificial aldolase.
Obexer, Richard; Godina, Alexei; Garrabou, Xavier; Mittl, Peer R E; Baker, David; Griffiths, Andrew D; Hilvert, Donald
2017-01-01
Designing catalysts that achieve the rates and selectivities of natural enzymes is a long-standing goal in protein chemistry. Here, we show that an ultrahigh-throughput droplet-based microfluidic screening platform can be used to improve a previously optimized artificial aldolase by an additional factor of 30 to give a >10 9 rate enhancement that rivals the efficiency of class I aldolases. The resulting enzyme catalyses a reversible aldol reaction with high stereoselectivity and tolerates a broad range of substrates. Biochemical and structural studies show that catalysis depends on a Lys-Tyr-Asn-Tyr tetrad that emerged adjacent to a computationally designed hydrophobic pocket during directed evolution. This constellation of residues is poised to activate the substrate by Schiff base formation, promote mechanistically important proton transfers and stabilize multiple transition states along a complex reaction coordinate. The emergence of such a sophisticated catalytic centre shows that there is nothing magical about the catalytic activities or mechanisms of naturally occurring enzymes, or the evolutionary process that gave rise to them.
Musi, Valeria; Birdsall, Berry; Fernandez-Ballester, Gregorio; Guerrini, Remo; Salvatori, Severo; Serrano, Luis; Pastore, Annalisa
2006-04-01
SH3 domains are small protein modules that are involved in protein-protein interactions in several essential metabolic pathways. The availability of the complete genome and the limited number of clearly identifiable SH3 domains make the yeast Saccharomyces cerevisae an ideal proteomic-based model system to investigate the structural rules dictating the SH3-mediated protein interactions and to develop new tools to assist these studies. In the present work, we have determined the solution structure of the SH3 domain from Myo3 and modeled by homology that of the highly homologous Myo5, two myosins implicated in actin polymerization. We have then implemented an integrated approach that makes use of experimental and computational methods to characterize their binding properties. While accommodating their targets in the classical groove, the two domains have selectivity in both orientation and sequence specificity of the target peptides. From our study, we propose a consensus sequence that may provide a useful guideline to identify new natural partners and suggest a strategy of more general applicability that may be of use in other structural proteomic studies.
A high-throughput two channel discrete wavelet transform architecture for the JPEG2000 standard
NASA Astrophysics Data System (ADS)
Badakhshannoory, Hossein; Hashemi, Mahmoud R.; Aminlou, Alireza; Fatemi, Omid
2005-07-01
The Discrete Wavelet Transform (DWT) is increasingly recognized in image and video compression standards, as indicated by its use in JPEG2000. The lifting scheme algorithm is an alternative DWT implementation that has a lower computational complexity and reduced resource requirement. In the JPEG2000 standard two lifting scheme based filter banks are introduced: the 5/3 and 9/7. In this paper a high throughput, two channel DWT architecture for both of the JPEG2000 DWT filters is presented. The proposed pipelined architecture has two separate input channels that process the incoming samples simultaneously with minimum memory requirement for each channel. The architecture had been implemented in VHDL and synthesized on a Xilinx Virtex2 XCV1000. The proposed architecture applies DWT on a 2K by 1K image at 33 fps with a 75 MHZ clock frequency. This performance is achieved with 70% less resources than two independent single channel modules. The high throughput and reduced resource requirement has made this architecture the proper choice for real time applications such as Digital Cinema.
Lewis, Michelle; Weaver, Charles David; McClain, Mark S
2010-07-01
The Clostridium perfringens epsilon toxin, a select agent, is responsible for a severe, often fatal enterotoxemia characterized by edema in the heart, lungs, kidney, and brain. The toxin is believed to be an oligomeric pore-forming toxin. Currently, there is no effective therapy for countering the cytotoxic activity of the toxin in exposed individuals. Using a robust cell-based high-throughput screening (HTS) assay, we screened a 151,616-compound library for the ability to inhibit ε-toxin-induced cytotoxicity. Survival of MDCK cells exposed to the toxin was assessed by addition of resazurin to detect metabolic activity in surviving cells. The hit rate for this screen was 0.6%. Following a secondary screen of each hit in triplicate and assays to eliminate false positives, we focused on three structurally-distinct compounds: an N-cycloalkylbenzamide, a furo[2,3-b]quinoline, and a 6H-anthra[1,9-cd]isoxazol. None of the three compounds appeared to inhibit toxin binding to cells or the ability of the toxin to form oligomeric complexes. Additional assays demonstrated that two of the inhibitory compounds inhibited ε-toxin-induced permeabilization of MDCK cells to propidium iodide. Furthermore, the two compounds exhibited inhibitory effects on cells pre-treated with toxin. Structural analogs of one of the inhibitors identified through the high-throughput screen were analyzed and provided initial structure-activity data. These compounds should serve as the basis for further structure-activity refinement that may lead to the development of effective anti-ε-toxin therapeutics.
Lewis, Michelle; Weaver, Charles David; McClain, Mark S.
2010-01-01
The Clostridium perfringens epsilon toxin, a select agent, is responsible for a severe, often fatal enterotoxemia characterized by edema in the heart, lungs, kidney, and brain. The toxin is believed to be an oligomeric pore-forming toxin. Currently, there is no effective therapy for countering the cytotoxic activity of the toxin in exposed individuals. Using a robust cell-based high-throughput screening (HTS) assay, we screened a 151,616-compound library for the ability to inhibit ε-toxin-induced cytotoxicity. Survival of MDCK cells exposed to the toxin was assessed by addition of resazurin to detect metabolic activity in surviving cells. The hit rate for this screen was 0.6%. Following a secondary screen of each hit in triplicate and assays to eliminate false positives, we focused on three structurally-distinct compounds: an N-cycloalkylbenzamide, a furo[2,3-b]quinoline, and a 6H-anthra[1,9-cd]isoxazol. None of the three compounds appeared to inhibit toxin binding to cells or the ability of the toxin to form oligomeric complexes. Additional assays demonstrated that two of the inhibitory compounds inhibited ε-toxin-induced permeabilization of MDCK cells to propidium iodide. Furthermore, the two compounds exhibited inhibitory effects on cells pre-treated with toxin. Structural analogs of one of the inhibitors identified through the high-throughput screen were analyzed and provided initial structure-activity data. These compounds should serve as the basis for further structure-activity refinement that may lead to the development of effective anti-ε-toxin therapeutics. PMID:20721308
High-throughput biological techniques, like microarrays and drug screens, generate an enormous amount of data that may be critically important for cancer researchers and clinicians. Being able to manipulate the data to extract those pieces of interest, however, can require computational or bioinformatics skills beyond those of the average scientist.
The JCSG high-throughput structural biology pipeline.
Elsliger, Marc André; Deacon, Ashley M; Godzik, Adam; Lesley, Scott A; Wooley, John; Wüthrich, Kurt; Wilson, Ian A
2010-10-01
The Joint Center for Structural Genomics high-throughput structural biology pipeline has delivered more than 1000 structures to the community over the past ten years. The JCSG has made a significant contribution to the overall goal of the NIH Protein Structure Initiative (PSI) of expanding structural coverage of the protein universe, as well as making substantial inroads into structural coverage of an entire organism. Targets are processed through an extensive combination of bioinformatics and biophysical analyses to efficiently characterize and optimize each target prior to selection for structure determination. The pipeline uses parallel processing methods at almost every step in the process and can adapt to a wide range of protein targets from bacterial to human. The construction, expansion and optimization of the JCSG gene-to-structure pipeline over the years have resulted in many technological and methodological advances and developments. The vast number of targets and the enormous amounts of associated data processed through the multiple stages of the experimental pipeline required the development of variety of valuable resources that, wherever feasible, have been converted to free-access web-based tools and applications.
Ryan, Natalia; Chorley, Brian; Tice, Raymond R; Judson, Richard; Corton, J Christopher
2016-05-01
Microarray profiling of chemical-induced effects is being increasingly used in medium- and high-throughput formats. Computational methods are described here to identify molecular targets from whole-genome microarray data using as an example the estrogen receptor α (ERα), often modulated by potential endocrine disrupting chemicals. ERα biomarker genes were identified by their consistent expression after exposure to 7 structurally diverse ERα agonists and 3 ERα antagonists in ERα-positive MCF-7 cells. Most of the biomarker genes were shown to be directly regulated by ERα as determined by ESR1 gene knockdown using siRNA as well as through chromatin immunoprecipitation coupled with DNA sequencing analysis of ERα-DNA interactions. The biomarker was evaluated as a predictive tool using the fold-change rank-based Running Fisher algorithm by comparison to annotated gene expression datasets from experiments using MCF-7 cells, including those evaluating the transcriptional effects of hormones and chemicals. Using 141 comparisons from chemical- and hormone-treated cells, the biomarker gave a balanced accuracy for prediction of ERα activation or suppression of 94% and 93%, respectively. The biomarker was able to correctly classify 18 out of 21 (86%) ER reference chemicals including "very weak" agonists. Importantly, the biomarker predictions accurately replicated predictions based on 18 in vitro high-throughput screening assays that queried different steps in ERα signaling. For 114 chemicals, the balanced accuracies were 95% and 98% for activation or suppression, respectively. These results demonstrate that the ERα gene expression biomarker can accurately identify ERα modulators in large collections of microarray data derived from MCF-7 cells. Published by Oxford University Press on behalf of the Society of Toxicology 2016. This work is written by US Government employees and is in the public domain in the US.
The Stanford Automated Mounter: Enabling High-Throughput Protein Crystal Screening at SSRL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, C.A.; Cohen, A.E.
2009-05-26
The macromolecular crystallography experiment lends itself perfectly to high-throughput technologies. The initial steps including the expression, purification, and crystallization of protein crystals, along with some of the later steps involving data processing and structure determination have all been automated to the point where some of the last remaining bottlenecks in the process have been crystal mounting, crystal screening, and data collection. At the Stanford Synchrotron Radiation Laboratory, a National User Facility that provides extremely brilliant X-ray photon beams for use in materials science, environmental science, and structural biology research, the incorporation of advanced robotics has enabled crystals to be screenedmore » in a true high-throughput fashion, thus dramatically accelerating the final steps. Up to 288 frozen crystals can be mounted by the beamline robot (the Stanford Auto-Mounting System) and screened for diffraction quality in a matter of hours without intervention. The best quality crystals can then be remounted for the collection of complete X-ray diffraction data sets. Furthermore, the entire screening and data collection experiment can be controlled from the experimenter's home laboratory by means of advanced software tools that enable network-based control of the highly automated beamlines.« less
Lu, Pinyi; Hontecillas, Raquel; Horne, William T; Carbo, Adria; Viladomiu, Monica; Pedragosa, Mireia; Bevan, David R; Lewis, Stephanie N; Bassaganya-Riera, Josep
2012-01-01
Lanthionine synthetase component C-like protein 2 (LANCL2) is a member of the eukaryotic lanthionine synthetase component C-Like protein family involved in signal transduction and insulin sensitization. Recently, LANCL2 is a target for the binding and signaling of abscisic acid (ABA), a plant hormone with anti-diabetic and anti-inflammatory effects. The goal of this study was to determine the role of LANCL2 as a potential therapeutic target for developing novel drugs and nutraceuticals against inflammatory diseases. Previously, we performed homology modeling to construct a three-dimensional structure of LANCL2 using the crystal structure of lanthionine synthetase component C-like protein 1 (LANCL1) as a template. Using this model, structure-based virtual screening was performed using compounds from NCI (National Cancer Institute) Diversity Set II, ChemBridge, ZINC natural products, and FDA-approved drugs databases. Several potential ligands were identified using molecular docking. In order to validate the anti-inflammatory efficacy of the top ranked compound (NSC61610) in the NCI Diversity Set II, a series of in vitro and pre-clinical efficacy studies were performed using a mouse model of dextran sodium sulfate (DSS)-induced colitis. Our findings showed that the lead compound, NSC61610, activated peroxisome proliferator-activated receptor gamma in a LANCL2- and adenylate cyclase/cAMP dependent manner in vitro and ameliorated experimental colitis by down-modulating colonic inflammatory gene expression and favoring regulatory T cell responses. LANCL2 is a novel therapeutic target for inflammatory diseases. High-throughput, structure-based virtual screening is an effective computational-based drug design method for discovering anti-inflammatory LANCL2-based drug candidates.
Lu, Pinyi; Hontecillas, Raquel; Horne, William T.; Carbo, Adria; Viladomiu, Monica; Pedragosa, Mireia; Bevan, David R.; Lewis, Stephanie N.; Bassaganya-Riera, Josep
2012-01-01
Background Lanthionine synthetase component C-like protein 2 (LANCL2) is a member of the eukaryotic lanthionine synthetase component C-Like protein family involved in signal transduction and insulin sensitization. Recently, LANCL2 is a target for the binding and signaling of abscisic acid (ABA), a plant hormone with anti-diabetic and anti-inflammatory effects. Methodology/Principal Findings The goal of this study was to determine the role of LANCL2 as a potential therapeutic target for developing novel drugs and nutraceuticals against inflammatory diseases. Previously, we performed homology modeling to construct a three-dimensional structure of LANCL2 using the crystal structure of lanthionine synthetase component C-like protein 1 (LANCL1) as a template. Using this model, structure-based virtual screening was performed using compounds from NCI (National Cancer Institute) Diversity Set II, ChemBridge, ZINC natural products, and FDA-approved drugs databases. Several potential ligands were identified using molecular docking. In order to validate the anti-inflammatory efficacy of the top ranked compound (NSC61610) in the NCI Diversity Set II, a series of in vitro and pre-clinical efficacy studies were performed using a mouse model of dextran sodium sulfate (DSS)-induced colitis. Our findings showed that the lead compound, NSC61610, activated peroxisome proliferator-activated receptor gamma in a LANCL2- and adenylate cyclase/cAMP dependent manner in vitro and ameliorated experimental colitis by down-modulating colonic inflammatory gene expression and favoring regulatory T cell responses. Conclusions/Significance LANCL2 is a novel therapeutic target for inflammatory diseases. High-throughput, structure-based virtual screening is an effective computational-based drug design method for discovering anti-inflammatory LANCL2-based drug candidates. PMID:22509338
Tebani, Abdellah; Afonso, Carlos; Marret, Stéphane; Bekri, Soumeya
2016-01-01
The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of next-generation diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by high-throughput omics technologies and the computational surge, it enables multi-scale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. High-throughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable high-dimensional data modeling and, therefore, accessible and user-friendly visualization. Furthermore, bioinformatics has enabled comprehensive multi-omics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present state-of-the-art multi-omics data analysis strategies in a clinical context. The challenges of omics-based biomarker translation are discussed. Perspectives regarding the use of multi-omics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the post-genomic era. PMID:27649151
Tebani, Abdellah; Afonso, Carlos; Marret, Stéphane; Bekri, Soumeya
2016-09-14
The rise of technologies that simultaneously measure thousands of data points represents the heart of systems biology. These technologies have had a huge impact on the discovery of next-generation diagnostics, biomarkers, and drugs in the precision medicine era. Systems biology aims to achieve systemic exploration of complex interactions in biological systems. Driven by high-throughput omics technologies and the computational surge, it enables multi-scale and insightful overviews of cells, organisms, and populations. Precision medicine capitalizes on these conceptual and technological advancements and stands on two main pillars: data generation and data modeling. High-throughput omics technologies allow the retrieval of comprehensive and holistic biological information, whereas computational capabilities enable high-dimensional data modeling and, therefore, accessible and user-friendly visualization. Furthermore, bioinformatics has enabled comprehensive multi-omics and clinical data integration for insightful interpretation. Despite their promise, the translation of these technologies into clinically actionable tools has been slow. In this review, we present state-of-the-art multi-omics data analysis strategies in a clinical context. The challenges of omics-based biomarker translation are discussed. Perspectives regarding the use of multi-omics approaches for inborn errors of metabolism (IEM) are presented by introducing a new paradigm shift in addressing IEM investigations in the post-genomic era.
Fernandez, Michael; Boyd, Peter G; Daff, Thomas D; Aghaji, Mohammad Zein; Woo, Tom K
2014-09-04
In this work, we have developed quantitative structure-property relationship (QSPR) models using advanced machine learning algorithms that can rapidly and accurately recognize high-performing metal organic framework (MOF) materials for CO2 capture. More specifically, QSPR classifiers have been developed that can, in a fraction of a section, identify candidate MOFs with enhanced CO2 adsorption capacity (>1 mmol/g at 0.15 bar and >4 mmol/g at 1 bar). The models were tested on a large set of 292 050 MOFs that were not part of the training set. The QSPR classifier could recover 945 of the top 1000 MOFs in the test set while flagging only 10% of the whole library for compute intensive screening. Thus, using the machine learning classifiers as part of a high-throughput screening protocol would result in an order of magnitude reduction in compute time and allow intractably large structure libraries and search spaces to be screened.
Automated glycopeptide analysis—review of current state and future directions
Dallas, David C.; Martin, William F.; Hua, Serenus
2013-01-01
Glycosylation of proteins is involved in immune defense, cell–cell adhesion, cellular recognition and pathogen binding and is one of the most common and complex post-translational modifications. Science is still struggling to assign detailed mechanisms and functions to this form of conjugation. Even the structural analysis of glycoproteins—glycoproteomics—remains in its infancy due to the scarcity of high-throughput analytical platforms capable of determining glycopeptide composition and structure, especially platforms for complex biological mixtures. Glycopeptide composition and structure can be determined with high mass-accuracy mass spectrometry, particularly when combined with chromatographic separation, but the sheer volume of generated data necessitates computational software for interpretation. This review discusses the current state of glycopeptide assignment software—advances made to date and issues that remain to be addressed. The various software and algorithms developed so far provide important insights into glycoproteomics. However, there is currently no freely available software that can analyze spectral data in batch and unambiguously determine glycopeptide compositions for N- and O-linked glycopeptides from relevant biological sources such as human milk and serum. Few programs are capable of aiding in structural determination of the glycan component. To significantly advance the field of glycoproteomics, analytical software and algorithms are required that: (i) solve for both N- and O-linked glycopeptide compositions, structures and glycosites in biological mixtures; (ii) are high-throughput and process data in batches; (iii) can interpret mass spectral data from a variety of sources and (iv) are open source and freely available. PMID:22843980
The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences
USDA-ARS?s Scientific Manuscript database
The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...
NASA Astrophysics Data System (ADS)
Hai, Pengfei; Zhou, Yong; Zhang, Ruiying; Ma, Jun; Li, Yang; Shao, Jin-Yu; Wang, Lihong V.
2017-04-01
Circulating tumor cell (CTC) clusters, arising from multicellular groupings in a primary tumor, greatly elevate the metastatic potential of cancer compared with single CTCs. High-throughput detection and quantification of CTC clusters are important for understanding the tumor metastatic process and improving cancer therapy. Here, we applied a linear-array-based photoacoustic tomography (LA-PAT) system and improved the image reconstruction for label-free high-throughput CTC cluster detection and quantification in vivo. The feasibility was first demonstrated by imaging CTC cluster ex vivo. The relationship between the contrast-to-noise ratios (CNRs) and the number of cells in melanoma tumor cell clusters was investigated and verified. Melanoma CTC clusters with a minimum of four cells could be detected, and the number of cells could be computed from the CNR. Finally, we demonstrated imaging of injected melanoma CTC clusters in rats in vivo. Similarly, the number of cells in the melanoma CTC clusters could be quantified. The data showed that larger CTC clusters had faster clearance rates in the bloodstream, which agreed with the literature. The results demonstrated the capability of LA-PAT to detect and quantify melanoma CTC clusters in vivo and showed its potential for tumor metastasis study and cancer therapy.
Classification of protein quaternary structure by functional domain composition
Yu, Xiaojing; Wang, Chuan; Li, Yixue
2006-01-01
Background The number and the arrangement of subunits that form a protein are referred to as quaternary structure. Quaternary structure is an important protein attribute that is closely related to its function. Proteins with quaternary structure are called oligomeric proteins. Oligomeric proteins are involved in various biological processes, such as metabolism, signal transduction, and chromosome replication. Thus, it is highly desirable to develop some computational methods to automatically classify the quaternary structure of proteins from their sequences. Results To explore this problem, we adopted an approach based on the functional domain composition of proteins. Every protein was represented by a vector calculated from the domains in the PFAM database. The nearest neighbor algorithm (NNA) was used for classifying the quaternary structure of proteins from this information. The jackknife cross-validation test was performed on the non-redundant protein dataset in which the sequence identity was less than 25%. The overall success rate obtained is 75.17%. Additionally, to demonstrate the effectiveness of this method, we predicted the proteins in an independent dataset and achieved an overall success rate of 84.11% Conclusion Compared with the amino acid composition method and Blast, the results indicate that the domain composition approach may be a more effective and promising high-throughput method in dealing with this complicated problem in bioinformatics. PMID:16584572
On the Achievable Throughput Over TVWS Sensor Networks
Caleffi, Marcello; Cacciapuoti, Angela Sara
2016-01-01
In this letter, we study the throughput achievable by an unlicensed sensor network operating over TV white space spectrum in presence of coexistence interference. Through the letter, we first analytically derive the achievable throughput as a function of the channel ordering. Then, we show that the problem of deriving the maximum expected throughput through exhaustive search is computationally unfeasible. Finally, we derive a computational-efficient algorithm characterized by polynomial-time complexity to compute the channel set maximizing the expected throughput and, stemming from this, we derive a closed-form expression of the maximum expected throughput. Numerical simulations validate the theoretical analysis. PMID:27043565
Computer Simulation of Embryonic Systems: What can a ...
(1) Standard practice for assessing developmental toxicity is the observation of apical endpoints (intrauterine death, fetal growth retardation, structural malformations) in pregnant rats/rabbits following exposure during organogenesis. EPA’s computational toxicology research program (ToxCast) generated vast in vitro cellular and molecular effects data on >1858 chemicals in >600 high-throughput screening (HTS) assays. The diversity of assays has been increased for developmental toxicity with several HTS platforms, including the devTOX-quickPredict assay from Stemina Biomarker Discovery utilizing the human embryonic stem cell line (H9). Translating these HTS data into higher order-predictions of developmental toxicity is a significant challenge. Here, we address the application of computational systems models that recapitulate the kinematics of dynamical cell signaling networks (e.g., SHH, FGF, BMP, retinoids) in a CompuCell3D.org modeling environment. Examples include angiogenesis (angiodysplasia) and dysmorphogenesis. Being numerically responsive to perturbation, these models are amenable to data integration for systems Toxicology and Adverse Outcome Pathways (AOPs). The AOP simulation outputs predict potential phenotypes based on the in vitro HTS data ToxCast. A heuristic computational intelligence framework that recapitulates the kinematics of dynamical cell signaling networks in the embryo, together with the in vitro profiling data, produce quantitative pr
Computational Modeling and Simulation of Developmental ...
Standard practice for assessing developmental toxicity is the observation of apical endpoints (intrauterine death, fetal growth retardation, structural malformations) in pregnant rats/rabbits following exposure during organogenesis. EPA’s computational toxicology research program (ToxCast) generated vast in vitro cellular and molecular effects data on >1858 chemicals in >600 high-throughput screening (HTS) assays. The diversity of assays has been increased for developmental toxicity with several HTS platforms, including the devTOX-quickPredict assay from Stemina Biomarker Discovery utilizing the human embryonic stem cell line (H9). Translating these HTS data into higher order-predictions of developmental toxicity is a significant challenge. Here, we address the application of computational systems models that recapitulate the kinematics of dynamical cell signaling networks (e.g., SHH, FGF, BMP, retinoids) in a CompuCell3D.org modeling environment. Examples include angiogenesis (angiodysplasia) and dysmorphogenesis. Being numerically responsive to perturbation, these models are amenable to data integration for systems Toxicology and Adverse Outcome Pathways (AOPs). The AOP simulation outputs predict potential phenotypes based on the in vitro HTS data ToxCast. A heuristic computational intelligence framework that recapitulates the kinematics of dynamical cell signaling networks in the embryo, together with the in vitro profiling data, produce quantitative predic
Recent Developments in Toxico-Cheminformatics; Supporting ...
EPA's National Center for Computational Toxicology is building capabilities to support a new paradigm for toxicity screening and prediction through the harnessing of legacy toxicity data, creation of data linkages, and generation of new high-content and high-thoughput screening data. In association with EPA's ToxCast, ToxRefDB, and ACToR projects, the DSSTox project provides cheminformatics support and, in addition, is improving public access to quality structure-annotated chemical toxicity information in less summarized forms than traditionally employed in SAR modeling, and in ways that facilitate data-mining and data read-across. The latest DSSTox version of the Carcinogenic Potency Database file (CPDBAS) illustrates ways in which various summary definitions of carcinogenic activity can be employed in modeling and data mining. DSSTox Structure-Browser provides structure searchability across all published DSSTox toxicity-related inventory, and is enabling linkages between previously isolated toxicity data resources associated with environmental and industrial chemicals. The public DSSTox inventory also has been integrated into PubChem, allowing a user to take full advantage of PubChem structure-activity and bioassay clustering features. Phase I of the ToxCast project is generating high-throughput screening data from several hundred biochemical and cell-based assays for a set of 320 chemicals, mostly pesticide actives with rich toxicology profiles. Incorporating
Image Harvest: an open-source platform for high-throughput plant image processing and analysis.
Knecht, Avi C; Campbell, Malachy T; Caprez, Adam; Swanson, David R; Walia, Harkamal
2016-05-01
High-throughput plant phenotyping is an effective approach to bridge the genotype-to-phenotype gap in crops. Phenomics experiments typically result in large-scale image datasets, which are not amenable for processing on desktop computers, thus creating a bottleneck in the image-analysis pipeline. Here, we present an open-source, flexible image-analysis framework, called Image Harvest (IH), for processing images originating from high-throughput plant phenotyping platforms. Image Harvest is developed to perform parallel processing on computing grids and provides an integrated feature for metadata extraction from large-scale file organization. Moreover, the integration of IH with the Open Science Grid provides academic researchers with the computational resources required for processing large image datasets at no cost. Image Harvest also offers functionalities to extract digital traits from images to interpret plant architecture-related characteristics. To demonstrate the applications of these digital traits, a rice (Oryza sativa) diversity panel was phenotyped and genome-wide association mapping was performed using digital traits that are used to describe different plant ideotypes. Three major quantitative trait loci were identified on rice chromosomes 4 and 6, which co-localize with quantitative trait loci known to regulate agronomically important traits in rice. Image Harvest is an open-source software for high-throughput image processing that requires a minimal learning curve for plant biologists to analyzephenomics datasets. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology.
High performance hybrid magnetic structure for biotechnology applications
Humphries, David E; Pollard, Martin J; Elkin, Christopher J
2005-10-11
The present disclosure provides a high performance hybrid magnetic structure made from a combination of permanent magnets and ferromagnetic pole materials which are assembled in a predetermined array. The hybrid magnetic structure provides means for separation and other biotechnology applications involving holding, manipulation, or separation of magnetizable molecular structures and targets. Also disclosed are: a method of assembling the hybrid magnetic plates, a high throughput protocol featuring the hybrid magnetic structure, and other embodiments of the ferromagnetic pole shape, attachment and adapter interfaces for adapting the use of the hybrid magnetic structure for use with liquid handling and other robots for use in high throughput processes.
High performance hybrid magnetic structure for biotechnology applications
Humphries, David E.; Pollard, Martin J.; Elkin, Christopher J.
2006-12-12
The present disclosure provides a high performance hybrid magnetic structure made from a combination of permanent magnets and ferromagnetic pole materials which are assembled in a predetermined array. The hybrid magnetic structure provides for separation and other biotechnology applications involving holding, manipulation, or separation of magnetic or magnetizable molecular structures and targets. Also disclosed are: a method of assembling the hybrid magnetic plates, a high throughput protocol featuring the hybrid magnetic structure, and other embodiments of the ferromagnetic pole shape, attachment and adapter interfaces for adapting the use of the hybrid magnetic structure for use with liquid handling and other robots for use in high throughput processes.
Template-based structure modeling of protein-protein interactions
Szilagyi, Andras; Zhang, Yang
2014-01-01
The structure of protein-protein complexes can be constructed by using the known structure of other protein complexes as a template. The complex structure templates are generally detected either by homology-based sequence alignments or, given the structure of monomer components, by structure-based comparisons. Critical improvements have been made in recent years by utilizing interface recognition and by recombining monomer and complex template libraries. Encouraging progress has also been witnessed in genome-wide applications of template-based modeling, with modeling accuracy comparable to high-throughput experimental data. Nevertheless, bottlenecks exist due to the incompleteness of the proteinprotein complex structure library and the lack of methods for distant homologous template identification and full-length complex structure refinement. PMID:24721449
PANDORA: keyword-based analysis of protein sets by integration of annotation sources.
Kaplan, Noam; Vaaknin, Avishay; Linial, Michal
2003-10-01
Recent advances in high-throughput methods and the application of computational tools for automatic classification of proteins have made it possible to carry out large-scale proteomic analyses. Biological analysis and interpretation of sets of proteins is a time-consuming undertaking carried out manually by experts. We have developed PANDORA (Protein ANnotation Diagram ORiented Analysis), a web-based tool that provides an automatic representation of the biological knowledge associated with any set of proteins. PANDORA uses a unique approach of keyword-based graphical analysis that focuses on detecting subsets of proteins that share unique biological properties and the intersections of such sets. PANDORA currently supports SwissProt keywords, NCBI Taxonomy, InterPro entries and the hierarchical classification terms from ENZYME, SCOP and GO databases. The integrated study of several annotation sources simultaneously allows a representation of biological relations of structure, function, cellular location, taxonomy, domains and motifs. PANDORA is also integrated into the ProtoNet system, thus allowing testing thousands of automatically generated clusters. We illustrate how PANDORA enhances the biological understanding of large, non-uniform sets of proteins originating from experimental and computational sources, without the need for prior biological knowledge on individual proteins.
Tadmor, Brigitta; Tidor, Bruce
2005-09-01
Progress in the life sciences, including genome sequencing and high-throughput experimentation, offers an opportunity for understanding biology and medicine from a systems perspective. This 'new view', which complements the more traditional component-based approach, involves the integration of biological research with approaches from engineering disciplines and computer science. The result is more than a new set of technologies. Rather, it promises a fundamental reconceptualization of the life sciences based on the development of quantitative and predictive models to describe crucial processes. To achieve this change, learning communities are being formed at the interface of the life sciences, engineering and computer science. Through these communities, research and education will be integrated across disciplines and the challenges associated with multidisciplinary team-based science will be addressed.
Kato, Ryuji; Nakano, Hideo; Konishi, Hiroyuki; Kato, Katsuya; Koga, Yuchi; Yamane, Tsuneo; Kobayashi, Takeshi; Honda, Hiroyuki
2005-08-19
To engineer proteins with desirable characteristics from a naturally occurring protein, high-throughput screening (HTS) combined with directed evolutional approach is the essential technology. However, most HTS techniques are simple positive screenings. The information obtained from the positive candidates is used only as results but rarely as clues for understanding the structural rules, which may explain the protein activity. In here, we have attempted to establish a novel strategy for exploring functional proteins associated with computational analysis. As a model case, we explored lipases with inverted enantioselectivity for a substrate p-nitrophenyl 3-phenylbutyrate from the wild-type lipase of Burkhorderia cepacia KWI-56, which is originally selective for (S)-configuration of the substrate. Data from our previous work on (R)-enantioselective lipase screening were applied to fuzzy neural network (FNN), bioinformatic algorithm, to extract guidelines for screening and engineering processes to be followed. FNN has an advantageous feature of extracting hidden rules that lie between sequences of variants and their enzyme activity to gain high prediction accuracy. Without any prior knowledge, FNN predicted a rule indicating that "size at position L167," among four positions (L17, F119, L167, and L266) in the substrate binding core region, is the most influential factor for obtaining lipase with inverted (R)-enantioselectivity. Based on the guidelines obtained, newly engineered novel variants, which were not found in the actual screening, were experimentally proven to gain high (R)-enantioselectivity by engineering the size at position L167. We also designed and assayed two novel variants, namely FIGV (L17F, F119I, L167G, and L266V) and FFGI (L17F, L167G, and L266I), which were compatible with the guideline obtained from FNN analysis, and confirmed that these designed lipases could acquire high inverted enantioselectivity. The results have shown that with the aid of bioinformatic analysis, high-throughput screening can expand its potential for exploring vast combinatorial sequence spaces of proteins.
Genecentric: a package to uncover graph-theoretic structure in high-throughput epistasis data.
Gallant, Andrew; Leiserson, Mark D M; Kachalov, Maxim; Cowen, Lenore J; Hescott, Benjamin J
2013-01-18
New technology has resulted in high-throughput screens for pairwise genetic interactions in yeast and other model organisms. For each pair in a collection of non-essential genes, an epistasis score is obtained, representing how much sicker (or healthier) the double-knockout organism will be compared to what would be expected from the sickness of the component single knockouts. Recent algorithmic work has identified graph-theoretic patterns in this data that can indicate functional modules, and even sets of genes that may occur in compensatory pathways, such as a BPM-type schema first introduced by Kelley and Ideker. However, to date, any algorithms for finding such patterns in the data were implemented internally, with no software being made publically available. Genecentric is a new package that implements a parallelized version of the Leiserson et al. algorithm (J Comput Biol 18:1399-1409, 2011) for generating generalized BPMs from high-throughput genetic interaction data. Given a matrix of weighted epistasis values for a set of double knock-outs, Genecentric returns a list of generalized BPMs that may represent compensatory pathways. Genecentric also has an extension, GenecentricGO, to query FuncAssociate (Bioinformatics 25:3043-3044, 2009) to retrieve GO enrichment statistics on generated BPMs. Python is the only dependency, and our web site provides working examples and documentation. We find that Genecentric can be used to find coherent functional and perhaps compensatory gene sets from high throughput genetic interaction data. Genecentric is made freely available for download under the GPLv2 from http://bcb.cs.tufts.edu/genecentric.
Genecentric: a package to uncover graph-theoretic structure in high-throughput epistasis data
2013-01-01
Background New technology has resulted in high-throughput screens for pairwise genetic interactions in yeast and other model organisms. For each pair in a collection of non-essential genes, an epistasis score is obtained, representing how much sicker (or healthier) the double-knockout organism will be compared to what would be expected from the sickness of the component single knockouts. Recent algorithmic work has identified graph-theoretic patterns in this data that can indicate functional modules, and even sets of genes that may occur in compensatory pathways, such as a BPM-type schema first introduced by Kelley and Ideker. However, to date, any algorithms for finding such patterns in the data were implemented internally, with no software being made publically available. Results Genecentric is a new package that implements a parallelized version of the Leiserson et al. algorithm (J Comput Biol 18:1399-1409, 2011) for generating generalized BPMs from high-throughput genetic interaction data. Given a matrix of weighted epistasis values for a set of double knock-outs, Genecentric returns a list of generalized BPMs that may represent compensatory pathways. Genecentric also has an extension, GenecentricGO, to query FuncAssociate (Bioinformatics 25:3043-3044, 2009) to retrieve GO enrichment statistics on generated BPMs. Python is the only dependency, and our web site provides working examples and documentation. Conclusion We find that Genecentric can be used to find coherent functional and perhaps compensatory gene sets from high throughput genetic interaction data. Genecentric is made freely available for download under the GPLv2 from http://bcb.cs.tufts.edu/genecentric. PMID:23331614
A 0.13-µm implementation of 5 Gb/s and 3-mW folded parallel architecture for AES algorithm
NASA Astrophysics Data System (ADS)
Rahimunnisa, K.; Karthigaikumar, P.; Kirubavathy, J.; Jayakumar, J.; Kumar, S. Suresh
2014-02-01
A new architecture for encrypting and decrypting the confidential data using Advanced Encryption Standard algorithm is presented in this article. This structure combines the folded structure with parallel architecture to increase the throughput. The whole architecture achieved high throughput with less power. The proposed architecture is implemented in 0.13-µm Complementary metal-oxide-semiconductor (CMOS) technology. The proposed structure is compared with different existing structures, and from the result it is proved that the proposed structure gives higher throughput and less power compared to existing works.
TCP Throughput Profiles Using Measurements over Dedicated Connections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rao, Nageswara S.; Liu, Qiang; Sen, Satyabrata
Wide-area data transfers in high-performance computing infrastructures are increasingly being carried over dynamically provisioned dedicated network connections that provide high capacities with no competing traffic. We present extensive TCP throughput measurements and time traces over a suite of physical and emulated 10 Gbps connections with 0-366 ms round-trip times (RTTs). Contrary to the general expectation, they show significant statistical and temporal variations, in addition to the overall dependencies on the congestion control mechanism, buffer size, and the number of parallel streams. We analyze several throughput profiles that have highly desirable concave regions wherein the throughput decreases slowly with RTTs, inmore » stark contrast to the convex profiles predicted by various TCP analytical models. We present a generic throughput model that abstracts the ramp-up and sustainment phases of TCP flows, which provides insights into qualitative trends observed in measurements across TCP variants: (i) slow-start followed by well-sustained throughput leads to concave regions; (ii) large buffers and multiple parallel streams expand the concave regions in addition to improving the throughput; and (iii) stable throughput dynamics, indicated by a smoother Poincare map and smaller Lyapunov exponents, lead to wider concave regions. These measurements and analytical results together enable us to select a TCP variant and its parameters for a given connection to achieve high throughput with statistical guarantees.« less
Benchmarking high performance computing architectures with CMS’ skeleton framework
NASA Astrophysics Data System (ADS)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-10-01
In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
The role of dedicated data computing centers in the age of cloud computing
NASA Astrophysics Data System (ADS)
Caramarcu, Costin; Hollowell, Christopher; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr
2017-10-01
Brookhaven National Laboratory (BNL) anticipates significant growth in scientific programs with large computing and data storage needs in the near future and has recently reorganized support for scientific computing to meet these needs. A key component is the enhanced role of the RHIC-ATLAS Computing Facility (RACF) in support of high-throughput and high-performance computing (HTC and HPC) at BNL. This presentation discusses the evolving role of the RACF at BNL, in light of its growing portfolio of responsibilities and its increasing integration with cloud (academic and for-profit) computing activities. We also discuss BNL’s plan to build a new computing center to support the new responsibilities of the RACF and present a summary of the cost benefit analysis done, including the types of computing activities that benefit most from a local data center vs. cloud computing. This analysis is partly based on an updated cost comparison of Amazon EC2 computing services and the RACF, which was originally conducted in 2012.
Schieferstein, Jeremy M.; Pawate, Ashtamurthy S.; Wan, Frank; Sheraden, Paige N.; Broecker, Jana; Ernst, Oliver P.; Gennis, Robert B.
2017-01-01
Elucidating and clarifying the function of membrane proteins ultimately requires atomic resolution structures as determined most commonly by X-ray crystallography. Many high impact membrane protein structures have resulted from advanced techniques such as in meso crystallization that present technical difficulties for the set-up and scale-out of high-throughput crystallization experiments. In prior work, we designed a novel, low-throughput X-ray transparent microfluidic device that automated the mixing of protein and lipid by diffusion for in meso crystallization trials. Here, we report X-ray transparent microfluidic devices for high-throughput crystallization screening and optimization that overcome the limitations of scale and demonstrate their application to the crystallization of several membrane proteins. Two complementary chips are presented: (1) a high-throughput screening chip to test 192 crystallization conditions in parallel using as little as 8 nl of membrane protein per well and (2) a crystallization optimization chip to rapidly optimize preliminary crystallization hits through fine-gradient re-screening. We screened three membrane proteins for new in meso crystallization conditions, identifying several preliminary hits that we tested for X-ray diffraction quality. Further, we identified and optimized the crystallization condition for a photosynthetic reaction center mutant and solved its structure to a resolution of 3.5 Å. PMID:28469762
Wu, Szu-Huei; Yao, Chun-Hsu; Hsieh, Chieh-Jui; Liu, Yu-Wei; Chao, Yu-Sheng; Song, Jen-Shin; Lee, Jinq-Chyi
2015-07-10
Sodium-dependent glucose co-transporter 2 (SGLT2) inhibitors are of current interest as a treatment for type 2 diabetes. Efforts have been made to discover phlorizin-related glycosides with good SGLT2 inhibitory activity. To increase structural diversity and better understand the role of non-glycoside SGLT2 inhibitors on glycemic control, we initiated a research program to identify non-glycoside hits from high-throughput screening. Here, we report the development of a novel, fluorogenic probe-based glucose uptake system based on a Cu(I)-catalyzed [3+2] cycloaddition. The safer processes and cheaper substances made the developed assay our first priority for large-scale primary screening as compared to the well-known [(14)C]-labeled α-methyl-D-glucopyranoside ([(14)C]-AMG) radioactive assay. This effort culminated in the identification of a benzimidazole, non-glycoside SGLT2 hit with an EC50 value of 0.62 μM by high-throughput screening of 41,000 compounds. Copyright © 2015 Elsevier B.V. All rights reserved.
Predicting protein crystallization propensity from protein sequence
2011-01-01
The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein’s propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the protein’s iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein’s propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor. PMID:20177794
High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.
Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca
2015-01-01
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.
NASA Astrophysics Data System (ADS)
Kudoh, Eisuke; Ito, Haruki; Wang, Zhisen; Adachi, Fumiyuki
In mobile communication systems, high speed packet data services are demanded. In the high speed data transmission, throughput degrades severely due to severe inter-path interference (IPI). Recently, we proposed a random transmit power control (TPC) to increase the uplink throughput of DS-CDMA packet mobile communications. In this paper, we apply IPI cancellation in addition to the random TPC. We derive the numerical expression of the received signal-to-interference plus noise power ratio (SINR) and introduce IPI cancellation factor. We also derive the numerical expression of system throughput when IPI is cancelled ideally to compare with the Monte Carlo numerically evaluated system throughput. Then we evaluate, by Monte-Carlo numerical computation method, the combined effect of random TPC and IPI cancellation on the uplink throughput of DS-CDMA packet mobile communications.
"First generation" automated DNA sequencing technology.
Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M
2011-10-01
Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki
2014-12-01
As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.
Benchmarking Procedures for High-Throughput Context Specific Reconstruction Algorithms
Pacheco, Maria P.; Pfau, Thomas; Sauter, Thomas
2016-01-01
Recent progress in high-throughput data acquisition has shifted the focus from data generation to processing and understanding of how to integrate collected information. Context specific reconstruction based on generic genome scale models like ReconX or HMR has the potential to become a diagnostic and treatment tool tailored to the analysis of specific individuals. The respective computational algorithms require a high level of predictive power, robustness and sensitivity. Although multiple context specific reconstruction algorithms were published in the last 10 years, only a fraction of them is suitable for model building based on human high-throughput data. Beside other reasons, this might be due to problems arising from the limitation to only one metabolic target function or arbitrary thresholding. This review describes and analyses common validation methods used for testing model building algorithms. Two major methods can be distinguished: consistency testing and comparison based testing. The first is concerned with robustness against noise, e.g., missing data due to the impossibility to distinguish between the signal and the background of non-specific binding of probes in a microarray experiment, and whether distinct sets of input expressed genes corresponding to i.e., different tissues yield distinct models. The latter covers methods comparing sets of functionalities, comparison with existing networks or additional databases. We test those methods on several available algorithms and deduce properties of these algorithms that can be compared with future developments. The set of tests performed, can therefore serve as a benchmarking procedure for future algorithms. PMID:26834640
From Lab to Fab: Developing a Nanoscale Delivery Tool for Scalable Nanomanufacturing
NASA Astrophysics Data System (ADS)
Safi, Asmahan A.
The emergence of nanomaterials with unique properties at the nanoscale over the past two decades carries a capacity to impact society and transform or create new industries ranging from nanoelectronics to nanomedicine. However, a gap in nanomanufacturing technologies has prevented the translation of nanomaterial into real-world commercialized products. Bridging this gap requires a paradigm shift in methods for fabricating structured devices with a nanoscale resolution in a repeatable fashion. This thesis explores the new paradigms for fabricating nanoscale structures devices and systems for high throughput high registration applications. We present a robust and scalable nanoscale delivery platform, the Nanofountain Probe (NFP), for parallel direct-write of functional materials. The design and microfabrication of NFP is presented. The new generation addresses the challenges of throughput, resolution and ink replenishment characterizing tip-based nanomanufacturing. To achieve these goals, optimized probe geometry is integrated to the process along with channel sealing and cantilever bending. The capabilities of the newly fabricated probes are demonstrated through two type of delivery: protein nanopatterning and single cell nanoinjection. The broad applications of the NFP for single cell delivery are investigated. An external microfluidic packaging is developed to enable delivery in liquid environment. The system is integrated to a combined atomic force microscope and inverted fluorescence microscope. Intracellular delivery is demonstrated by injecting a fluorescent dextran into Hela cells in vitro while monitoring the injection forces. Such developments enable in vitro cellular delivery for single cell studies and high throughput gene expression. The nanomanufacturing capabilities of NFPs are explored. Nanofabrication of carbon nanotube-based electronics presents all the manufacturing challenges characterizing of assembling nanomaterials precisely onto devices. The presented study combines top-down and bottom-approaches by integrating the catalyst patterning and carbon nanotube growth directly on structures. Large array of iron-rich catalyst are patterned on an substrate for subsequent carbon nanotubes synthesis. The dependence of probe geometry and substrate wetting is assessed by modeling and experimental studies. Finally preliminary results on synthesis of carbon nanotube by catalyst assisted chemical vapor deposition suggest increasing the catalyst yield is critical. Such work will enable high throughput nanomanufacturing of carbon nanotube based devices.
2017-02-01
note, a number of different measures implemented in both MATLAB and Python as functions are used to quantify similarity/distance between 2 vector-based...this technical note are widely used and may have an important role when computing the distance and similarity of large datasets and when considering high...throughput processes. In this technical note, a number of different measures implemented in both MAT- LAB and Python as functions are used to
SCREENING CHEMICALS FOR ESTROGEN RECEPTOR BIOACTIVITY USING A COMPUTATIONAL MODEL
The U.S. Environmental Protection Agency (EPA) is considering the use high-throughput and computational methods for regulatory applications in the Endocrine Disruptor Screening Program (EDSP). To use these new tools for regulatory decision making, computational methods must be a...
Choi, Woon Ih; Wood, Brandon C.; Schwegler, Eric; ...
2015-09-22
Transition metal (TM) atoms in porphyrin–like complexes play important roles in many protein and enzymetic systems, where crystal–field effects are used to modify d–orbital levels. Inspired by the tunable electronic structure of these motifs, a high–throughput computational search for synthetic hydrogen catalysts is performed based on a similar motif of TM atoms embedded into the lattice of graphene. Based on an initial list of 300 possible embedding geometries, binders, and host atoms, descriptors for stability and catalytic activity are applied to extract ten promising candidates for hydrogen evolution, two of which are expected to exhibit high activity for hydrogen oxidation.more » In several instances, the active TM atoms are earth–abundant elements that show no activity in the bulk phase, highlighting the importance of the coordination environment in tuning the d–orbitals. In conclusion, it is found that the most active candidates involve a hitherto unreported surface reaction pathway that involves a Kubas–complex intermediate, which significantly lowers the kinetic barrier associated with hydrogen dissociation and association.« less
Analog Correlator Based on One Bit Digital Correlator
NASA Technical Reports Server (NTRS)
Prokop, Norman (Inventor); Krasowski, Michael (Inventor)
2017-01-01
A two input time domain correlator may perform analog correlation. In order to achieve high throughput rates with reduced or minimal computational overhead, the input data streams may be hard limited through adaptive thresholding to yield two binary bit streams. Correlation may be achieved through the use of a Hamming distance calculation, where the distance between the two bit streams approximates the time delay that separates them. The resulting Hamming distance approximates the correlation time delay with high accuracy.
An open-source computational and data resource to analyze digital maps of immunopeptidomes
Caron, Etienne; Espona, Lucia; Kowalewski, Daniel J.; ...
2015-07-08
We present a novel mass spectrometry-based high-throughput workflow and an open-source computational and data resource to reproducibly identify and quantify HLA-associated peptides. Collectively, the resources support the generation of HLA allele-specific peptide assay libraries consisting of consensus fragment ion spectra, and the analysis of quantitative digital maps of HLA peptidomes generated from a range of biological sources by SWATH mass spectrometry (MS). This study represents the first community-based effort to develop a robust platform for the reproducible and quantitative measurement of the entire repertoire of peptides presented by HLA molecules, an essential step towards the design of efficient immunotherapies.
Computational solutions to large-scale data management and analysis
Schadt, Eric E.; Linderman, Michael D.; Sorenson, Jon; Lee, Lawrence; Nolan, Garry P.
2011-01-01
Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle our big data problems. PMID:20717155
Region Templates: Data Representation and Management for High-Throughput Image Analysis
Pan, Tony; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Klasky, Scott; Saltz, Joel
2015-01-01
We introduce a region template abstraction and framework for the efficient storage, management and processing of common data types in analysis of large datasets of high resolution images on clusters of hybrid computing nodes. The region template abstraction provides a generic container template for common data structures, such as points, arrays, regions, and object sets, within a spatial and temporal bounding box. It allows for different data management strategies and I/O implementations, while providing a homogeneous, unified interface to applications for data storage and retrieval. A region template application is represented as a hierarchical dataflow in which each computing stage may be represented as another dataflow of finer-grain tasks. The execution of the application is coordinated by a runtime system that implements optimizations for hybrid machines, including performance-aware scheduling for maximizing the utilization of computing devices and techniques to reduce the impact of data transfers between CPUs and GPUs. An experimental evaluation on a state-of-the-art hybrid cluster using a microscopy imaging application shows that the abstraction adds negligible overhead (about 3%) and achieves good scalability and high data transfer rates. Optimizations in a high speed disk based storage implementation of the abstraction to support asynchronous data transfers and computation result in an application performance gain of about 1.13×. Finally, a processing rate of 11,730 4K×4K tiles per minute was achieved for the microscopy imaging application on a cluster with 100 nodes (300 GPUs and 1,200 CPU cores). This computation rate enables studies with very large datasets. PMID:26139953
Wyatt, S K; Barck, K H; Kates, L; Zavala-Solorio, J; Ross, J; Kolumam, G; Sonoda, J; Carano, R A D
2015-11-01
The ability to non-invasively measure body composition in mouse models of obesity and obesity-related disorders is essential for elucidating mechanisms of metabolic regulation and monitoring the effects of novel treatments. These studies aimed to develop a fully automated, high-throughput micro-computed tomography (micro-CT)-based image analysis technique for longitudinal quantitation of adipose, non-adipose and lean tissue as well as bone and demonstrate utility for assessing the effects of two distinct treatments. An initial validation study was performed in diet-induced obesity (DIO) and control mice on a vivaCT 75 micro-CT system. Subsequently, four groups of DIO mice were imaged pre- and post-treatment with an experimental agonistic antibody specific for anti-fibroblast growth factor receptor 1 (anti-FGFR1, R1MAb1), control immunoglobulin G antibody, a known anorectic antiobesity drug (rimonabant, SR141716), or solvent control. The body composition analysis technique was then ported to a faster micro-CT system (CT120) to markedly increase throughput as well as to evaluate the use of micro-CT image intensity for hepatic lipid content in DIO and control mice. Ex vivo chemical analysis and colorimetric analysis of the liver triglycerides were performed as the standard metrics for correlation with body composition and hepatic lipid status, respectively. Micro-CT-based body composition measures correlate with ex vivo chemical analysis metrics and enable distinction between DIO and control mice. R1MAb1 and rimonabant have differing effects on body composition as assessed by micro-CT. High-throughput body composition imaging is possible using a modified CT120 system. Micro-CT also provides a non-invasive assessment of hepatic lipid content. This work describes, validates and demonstrates utility of a fully automated image analysis technique to quantify in vivo micro-CT-derived measures of adipose, non-adipose and lean tissue, as well as bone. These body composition metrics highly correlate with standard ex vivo chemical analysis and enable longitudinal evaluation of body composition and therapeutic efficacy monitoring.
Data-driven discovery of new Dirac semimetal materials
NASA Astrophysics Data System (ADS)
Yan, Qimin; Chen, Ru; Neaton, Jeffrey
In recent years, a significant amount of materials property data from high-throughput computations based on density functional theory (DFT) and the application of database technologies have enabled the rise of data-driven materials discovery. In this work, we initiate the extension of the data-driven materials discovery framework to the realm of topological semimetal materials and to accelerate the discovery of novel Dirac semimetals. We implement current available and develop new workflows to data-mine the Materials Project database for novel Dirac semimetals with desirable band structures and symmetry protected topological properties. This data-driven effort relies on the successful development of several automatic data generation and analysis tools, including a workflow for the automatic identification of topological invariants and pattern recognition techniques to find specific features in a massive number of computed band structures. Utilizing this approach, we successfully identified more than 15 novel Dirac point and Dirac nodal line systems that have not been theoretically predicted or experimentally identified. This work is supported by the Materials Project Predictive Modeling Center through the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231.
NASA Astrophysics Data System (ADS)
Carlson, H. K.; Coates, J. D.; Deutschbauer, A. M.
2015-12-01
The selective perturbation of complex microbial ecosystems to predictably influence outcomes in engineered and industrial environments remains a grand challenge for geomicrobiology. In some industrial ecosystems, such as oil reservoirs, sulfate reducing microorganisms (SRM) produce hydrogen sulfide which is toxic, explosive and corrosive. Current strategies to selectively inhibit sulfidogenesis are based on non-specific biocide treatments, bio-competitive exclusion by alternative electron acceptors or sulfate-analogs which are competitive inhibitors or futile/alternative substrates of the sulfate reduction pathway. Despite the economic cost of sulfidogenesis, there has been minimal exploration of the chemical space of possible inhibitory compounds, and very little work has quantitatively assessed the selectivity of putative souring treatments. We have developed a high-throughput screening strategy to target SRM, quantitatively ranked the selectivity and potency of hundreds of compounds and identified previously unrecognized SRM selective inhibitors and synergistic interactions between inhibitors. Once inhibitor selectivity is defined, high-throughput characterization of microbial community structure across compound gradients and identification of fitness determinants using isolate bar-coded transposon mutant libraries can give insights into the genetic mechanisms whereby compounds structure microbial communities. The high-throughput (HT) approach we present can be readily applied to target SRM in diverse environments and more broadly, could be used to identify and quantify the potency and selectivity of inhibitors of a variety of microbial metabolisms. Our findings and approach are relevant for engineering environmental ecosystems and also to understand the role of natural gradients in shaping microbial niche space.
InteGO2: A web tool for measuring and visualizing gene semantic similarities using Gene Ontology
Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; ...
2016-08-31
Here, the Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. As a result, we present InteGO2, a web toolmore » that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. In conclusion, InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface.« less
InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.
Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang; Juan, Liran; Jiang, Qinghua; Wang, Yadong; Chen, Jin
2016-08-31
The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .
InteGO2: A web tool for measuring and visualizing gene semantic similarities using Gene Ontology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peng, Jiajie; Li, Hongxiang; Liu, Yongzhuang
Here, the Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. As a result, we present InteGO2, a web toolmore » that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. In conclusion, InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface.« less
A GPU-Parallelized Eigen-Based Clutter Filter Framework for Ultrasound Color Flow Imaging.
Chee, Adrian J Y; Yiu, Billy Y S; Yu, Alfred C H
2017-01-01
Eigen-filters with attenuation response adapted to clutter statistics in color flow imaging (CFI) have shown improved flow detection sensitivity in the presence of tissue motion. Nevertheless, its practical adoption in clinical use is not straightforward due to the high computational cost for solving eigendecompositions. Here, we provide a pedagogical description of how a real-time computing framework for eigen-based clutter filtering can be developed through a single-instruction, multiple data (SIMD) computing approach that can be implemented on a graphical processing unit (GPU). Emphasis is placed on the single-ensemble-based eigen-filtering approach (Hankel singular value decomposition), since it is algorithmically compatible with GPU-based SIMD computing. The key algebraic principles and the corresponding SIMD algorithm are explained, and annotations on how such algorithm can be rationally implemented on the GPU are presented. Real-time efficacy of our framework was experimentally investigated on a single GPU device (GTX Titan X), and the computing throughput for varying scan depths and slow-time ensemble lengths was studied. Using our eigen-processing framework, real-time video-range throughput (24 frames/s) can be attained for CFI frames with full view in azimuth direction (128 scanlines), up to a scan depth of 5 cm ( λ pixel axial spacing) for slow-time ensemble length of 16 samples. The corresponding CFI image frames, with respect to the ones derived from non-adaptive polynomial regression clutter filtering, yielded enhanced flow detection sensitivity in vivo, as demonstrated in a carotid imaging case example. These findings indicate that the GPU-enabled eigen-based clutter filtering can improve CFI flow detection performance in real time.
High-throughput search of ternary chalcogenides for p-type transparent electrodes
Shi, Jingming; Cerqueira, Tiago F. T.; Cui, Wenwen; Nogueira, Fernando; Botti, Silvana; Marques, Miguel A. L.
2017-01-01
Delafossite crystals are fascinating ternary oxides that have demonstrated transparent conductivity and ambipolar doping. Here we use a high-throughput approach based on density functional theory to find delafossite and related layered phases of composition ABX2, where A and B are elements of the periodic table, and X is a chalcogen (O, S, Se, and Te). From the 15 624 compounds studied in the trigonal delafossite prototype structure, 285 are within 50 meV/atom from the convex hull of stability. These compounds are further investigated using global structural prediction methods to obtain their lowest-energy crystal structure. We find 79 systems not present in the materials project database that are thermodynamically stable and crystallize in the delafossite or in closely related structures. These novel phases are then characterized by calculating their band gaps and hole effective masses. This characterization unveils a large diversity of properties, ranging from normal metals, magnetic metals, and some candidate compounds for p-type transparent electrodes. PMID:28266587
Computationally guided discovery of thermoelectric materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorai, Prashun; Stevanović, Vladan; Toberer, Eric S.
The potential for advances in thermoelectric materials, and thus solid-state refrigeration and power generation, is immense. Progress so far has been limited by both the breadth and diversity of the chemical space and the serial nature of experimental work. In this Review, we discuss how recent computational advances are revolutionizing our ability to predict electron and phonon transport and scattering, as well as materials dopability, and we examine efficient approaches to calculating critical transport properties across large chemical spaces. When coupled with experimental feedback, these high-throughput approaches can stimulate the discovery of new classes of thermoelectric materials. Within smaller materialsmore » subsets, computations can guide the optimal chemical and structural tailoring to enhance materials performance and provide insight into the underlying transport physics. Beyond perfect materials, computations can be used for the rational design of structural and chemical modifications (such as defects, interfaces, dopants and alloys) to provide additional control on transport properties to optimize performance. Through computational predictions for both materials searches and design, a new paradigm in thermoelectric materials discovery is emerging.« less
Computationally guided discovery of thermoelectric materials
Gorai, Prashun; Stevanović, Vladan; Toberer, Eric S.
2017-08-22
The potential for advances in thermoelectric materials, and thus solid-state refrigeration and power generation, is immense. Progress so far has been limited by both the breadth and diversity of the chemical space and the serial nature of experimental work. In this Review, we discuss how recent computational advances are revolutionizing our ability to predict electron and phonon transport and scattering, as well as materials dopability, and we examine efficient approaches to calculating critical transport properties across large chemical spaces. When coupled with experimental feedback, these high-throughput approaches can stimulate the discovery of new classes of thermoelectric materials. Within smaller materialsmore » subsets, computations can guide the optimal chemical and structural tailoring to enhance materials performance and provide insight into the underlying transport physics. Beyond perfect materials, computations can be used for the rational design of structural and chemical modifications (such as defects, interfaces, dopants and alloys) to provide additional control on transport properties to optimize performance. Through computational predictions for both materials searches and design, a new paradigm in thermoelectric materials discovery is emerging.« less
First-principles data-driven discovery of transition metal oxides for artificial photosynthesis
NASA Astrophysics Data System (ADS)
Yan, Qimin
We develop a first-principles data-driven approach for rapid identification of transition metal oxide (TMO) light absorbers and photocatalysts for artificial photosynthesis using the Materials Project. Initially focusing on Cr, V, and Mn-based ternary TMOs in the database, we design a broadly-applicable multiple-layer screening workflow automating density functional theory (DFT) and hybrid functional calculations of bulk and surface electronic and magnetic structures. We further assess the electrochemical stability of TMOs in aqueous environments from computed Pourbaix diagrams. Several promising earth-abundant low band-gap TMO compounds with desirable band edge energies and electrochemical stability are identified by our computational efforts and then synergistically evaluated using high-throughput synthesis and photoelectrochemical screening techniques by our experimental collaborators at Caltech. Our joint theory-experiment effort has successfully identified new earth-abundant copper and manganese vanadate complex oxides that meet highly demanding requirements for photoanodes, substantially expanding the known space of such materials. By integrating theory and experiment, we validate our approach and develop important new insights into structure-property relationships for TMOs for oxygen evolution photocatalysts, paving the way for use of first-principles data-driven techniques in future applications. This work is supported by the Materials Project Predictive Modeling Center and the Joint Center for Artificial Photosynthesis through the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231. Computational resources also provided by the Department of Energy through the National Energy Supercomputing Center.
Aarons, Jolyon; Jones, Lewys; Varambhia, Aakash; MacArthur, Katherine E; Ozkaya, Dogan; Sarwar, Misbah; Skylaris, Chris-Kriton; Nellist, Peter D
2017-07-12
Many studies of heterogeneous catalysis, both experimental and computational, make use of idealized structures such as extended surfaces or regular polyhedral nanoparticles. This simplification neglects the morphological diversity in real commercial oxygen reduction reaction (ORR) catalysts used in fuel-cell cathodes. Here we introduce an approach that combines 3D nanoparticle structures obtained from high-throughput high-precision electron microscopy with density functional theory. Discrepancies between experimental observations and cuboctahedral/truncated-octahedral particles are revealed and discussed using a range of widely used descriptors, such as electron-density, d-band centers, and generalized coordination numbers. We use this new approach to determine the optimum particle size for which both detrimental surface roughness and particle shape effects are minimized.
Design of object-oriented distributed simulation classes
NASA Technical Reports Server (NTRS)
Schoeffler, James D. (Principal Investigator)
1995-01-01
Distributed simulation of aircraft engines as part of a computer aided design package is being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for 'Numerical Propulsion Simulation System'. NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT 'Actor' model of a concurrent object and uses 'connectors' to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Design of Object-Oriented Distributed Simulation Classes
NASA Technical Reports Server (NTRS)
Schoeffler, James D.
1995-01-01
Distributed simulation of aircraft engines as part of a computer aided design package being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for "Numerical Propulsion Simulation System". NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT "Actor" model of a concurrent object and uses "connectors" to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Chan, Leo Li-Ying; Smith, Tim; Kumph, Kendra A; Kuksin, Dmitry; Kessel, Sarah; Déry, Olivier; Cribbes, Scott; Lai, Ning; Qiu, Jean
2016-10-01
To ensure cell-based assays are performed properly, both cell concentration and viability have to be determined so that the data can be normalized to generate meaningful and comparable results. Cell-based assays performed in immuno-oncology, toxicology, or bioprocessing research often require measuring of multiple samples and conditions, thus the current automated cell counter that uses single disposable counting slides is not practical for high-throughput screening assays. In the recent years, a plate-based image cytometry system has been developed for high-throughput biomolecular screening assays. In this work, we demonstrate a high-throughput AO/PI-based cell concentration and viability method using the Celigo image cytometer. First, we validate the method by comparing directly to Cellometer automated cell counter. Next, cell concentration dynamic range, viability dynamic range, and consistency are determined. The high-throughput AO/PI method described here allows for 96-well to 384-well plate samples to be analyzed in less than 7 min, which greatly reduces the time required for the single sample-based automated cell counter. In addition, this method can improve the efficiency for high-throughput screening assays, where multiple cell counts and viability measurements are needed prior to performing assays such as flow cytometry, ELISA, or simply plating cells for cell culture.
NASA Astrophysics Data System (ADS)
Yan, Zongkai; Zhang, Xiaokun; Li, Guang; Cui, Yuxing; Jiang, Zhaolian; Liu, Wen; Peng, Zhi; Xiang, Yong
2018-01-01
The conventional methods for designing and preparing thin film based on wet process remain a challenge due to disadvantages such as time-consuming and ineffective, which hinders the development of novel materials. Herein, we present a high-throughput combinatorial technique for continuous thin film preparation relied on chemical bath deposition (CBD). The method is ideally used to prepare high-throughput combinatorial material library with low decomposition temperatures and high water- or oxygen-sensitivity at relatively high-temperature. To check this system, a Cu(In, Ga)Se (CIGS) thin films library doped with 0-19.04 at.% of antimony (Sb) was taken as an example to evaluate the regulation of varying Sb doping concentration on the grain growth, structure, morphology and electrical properties of CIGS thin film systemically. Combined with the Energy Dispersive Spectrometer (EDS), X-ray Photoelectron Spectroscopy (XPS), automated X-ray Diffraction (XRD) for rapid screening and Localized Electrochemical Impedance Spectroscopy (LEIS), it was confirmed that this combinatorial high-throughput system could be used to identify the composition with the optimal grain orientation growth, microstructure and electrical properties systematically, through accurately monitoring the doping content and material composition. According to the characterization results, a Sb2Se3 quasi-liquid phase promoted CIGS film-growth model has been put forward. In addition to CIGS thin film reported here, the combinatorial CBD also could be applied to the high-throughput screening of other sulfide thin film material systems.
The Adverse Outcome Pathway (AOP) framework provides a systematic way to describe linkages between molecular and cellular processes and organism or population level effects. The current AOP assembly methods however, are inefficient. Our goal is to generate computationally-pr...
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data
Ching, Travers; Zhu, Xun
2018-01-01
Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet. PMID:29634719
DOE Office of Scientific and Technical Information (OSTI.GOV)
Green, Martin L.; Choi, C. L.; Hattrick-Simpers, J. R.
The Materials Genome Initiative, a national effort to introduce new materials into the market faster and at lower cost, has made significant progress in computational simulation and modeling of materials. To build on this progress, a large amount of experimental data for validating these models, and informing more sophisticated ones, will be required. High-throughput experimentation generates large volumes of experimental data using combinatorial materials synthesis and rapid measurement techniques, making it an ideal experimental complement to bring the Materials Genome Initiative vision to fruition. This paper reviews the state-of-the-art results, opportunities, and challenges in high-throughput experimentation for materials design. Asmore » a result, a major conclusion is that an effort to deploy a federated network of high-throughput experimental (synthesis and characterization) tools, which are integrated with a modern materials data infrastructure, is needed.« less
Green, Martin L.; Choi, C. L.; Hattrick-Simpers, J. R.; ...
2017-03-28
The Materials Genome Initiative, a national effort to introduce new materials into the market faster and at lower cost, has made significant progress in computational simulation and modeling of materials. To build on this progress, a large amount of experimental data for validating these models, and informing more sophisticated ones, will be required. High-throughput experimentation generates large volumes of experimental data using combinatorial materials synthesis and rapid measurement techniques, making it an ideal experimental complement to bring the Materials Genome Initiative vision to fruition. This paper reviews the state-of-the-art results, opportunities, and challenges in high-throughput experimentation for materials design. Asmore » a result, a major conclusion is that an effort to deploy a federated network of high-throughput experimental (synthesis and characterization) tools, which are integrated with a modern materials data infrastructure, is needed.« less
Schulthess, Pascal; van Wijk, Rob C; Krekels, Elke H J; Yates, James W T; Spaink, Herman P; van der Graaf, Piet H
2018-04-25
To advance the systems approach in pharmacology, experimental models and computational methods need to be integrated from early drug discovery onward. Here, we propose outside-in model development, a model identification technique to understand and predict the dynamics of a system without requiring prior biological and/or pharmacological knowledge. The advanced data required could be obtained by whole vertebrate, high-throughput, low-resource dose-exposure-effect experimentation with the zebrafish larva. Combinations of these innovative techniques could improve early drug discovery. © 2018 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
NASA Astrophysics Data System (ADS)
Regmi, Raju; Mohan, Kavya; Mondal, Partha Pratim
2014-09-01
Visualization of intracellular organelles is achieved using a newly developed high throughput imaging cytometry system. This system interrogates the microfluidic channel using a sheet of light rather than the existing point-based scanning techniques. The advantages of the developed system are many, including, single-shot scanning of specimens flowing through the microfluidic channel at flow rate ranging from micro- to nano- lit./min. Moreover, this opens-up in-vivo imaging of sub-cellular structures and simultaneous cell counting in an imaging cytometry system. We recorded a maximum count of 2400 cells/min at a flow-rate of 700 nl/min, and simultaneous visualization of fluorescently-labeled mitochondrial network in HeLa cells during flow. The developed imaging cytometry system may find immediate application in biotechnology, fluorescence microscopy and nano-medicine.
Computational approaches to protein inference in shotgun proteomics
2012-01-01
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programing and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area. PMID:23176300
Active-learning strategies in computer-assisted drug discovery.
Reker, Daniel; Schneider, Gisbert
2015-04-01
High-throughput compound screening is time and resource consuming, and considerable effort is invested into screening compound libraries, profiling, and selecting the most promising candidates for further testing. Active-learning methods assist the selection process by focusing on areas of chemical space that have the greatest chance of success while considering structural novelty. The core feature of these algorithms is their ability to adapt the structure-activity landscapes through feedback. Instead of full-deck screening, only focused subsets of compounds are tested, and the experimental readout is used to refine molecule selection for subsequent screening cycles. Once implemented, these techniques have the potential to reduce costs and save precious materials. Here, we provide a comprehensive overview of the various computational active-learning approaches and outline their potential for drug discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.
High-Throughput Non-Contact Vitrification of Cell-Laden Droplets Based on Cell Printing
NASA Astrophysics Data System (ADS)
Shi, Meng; Ling, Kai; Yong, Kar Wey; Li, Yuhui; Feng, Shangsheng; Zhang, Xiaohui; Pingguan-Murphy, Belinda; Lu, Tian Jian; Xu, Feng
2015-12-01
Cryopreservation is the most promising way for long-term storage of biological samples e.g., single cells and cellular structures. Among various cryopreservation methods, vitrification is advantageous by employing high cooling rate to avoid the formation of harmful ice crystals in cells. Most existing vitrification methods adopt direct contact of cells with liquid nitrogen to obtain high cooling rates, which however causes the potential contamination and difficult cell collection. To address these limitations, we developed a non-contact vitrification device based on an ultra-thin freezing film to achieve high cooling/warming rate and avoid direct contact between cells and liquid nitrogen. A high-throughput cell printer was employed to rapidly generate uniform cell-laden microdroplets into the device, where the microdroplets were hung on one side of the film and then vitrified by pouring the liquid nitrogen onto the other side via boiling heat transfer. Through theoretical and experimental studies on vitrification processes, we demonstrated that our device offers a high cooling/warming rate for vitrification of the NIH 3T3 cells and human adipose-derived stem cells (hASCs) with maintained cell viability and differentiation potential. This non-contact vitrification device provides a novel and effective way to cryopreserve cells at high throughput and avoid the contamination and collection problems.
High-Throughput Non-Contact Vitrification of Cell-Laden Droplets Based on Cell Printing
Shi, Meng; Ling, Kai; Yong, Kar Wey; Li, Yuhui; Feng, Shangsheng; Zhang, Xiaohui; Pingguan-Murphy, Belinda; Lu, Tian Jian; Xu, Feng
2015-01-01
Cryopreservation is the most promising way for long-term storage of biological samples e.g., single cells and cellular structures. Among various cryopreservation methods, vitrification is advantageous by employing high cooling rate to avoid the formation of harmful ice crystals in cells. Most existing vitrification methods adopt direct contact of cells with liquid nitrogen to obtain high cooling rates, which however causes the potential contamination and difficult cell collection. To address these limitations, we developed a non-contact vitrification device based on an ultra-thin freezing film to achieve high cooling/warming rate and avoid direct contact between cells and liquid nitrogen. A high-throughput cell printer was employed to rapidly generate uniform cell-laden microdroplets into the device, where the microdroplets were hung on one side of the film and then vitrified by pouring the liquid nitrogen onto the other side via boiling heat transfer. Through theoretical and experimental studies on vitrification processes, we demonstrated that our device offers a high cooling/warming rate for vitrification of the NIH 3T3 cells and human adipose-derived stem cells (hASCs) with maintained cell viability and differentiation potential. This non-contact vitrification device provides a novel and effective way to cryopreserve cells at high throughput and avoid the contamination and collection problems. PMID:26655688
Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline.
Dowsey, Andrew W; Dunn, Michael J; Yang, Guang-Zhong
2008-04-01
The quest for high-throughput proteomics has revealed a number of challenges in recent years. Whilst substantial improvements in automated protein separation with liquid chromatography and mass spectrometry (LC/MS), aka 'shotgun' proteomics, have been achieved, large-scale open initiatives such as the Human Proteome Organization (HUPO) Brain Proteome Project have shown that maximal proteome coverage is only possible when LC/MS is complemented by 2D gel electrophoresis (2-DE) studies. Moreover, both separation methods require automated alignment and differential analysis to relieve the bioinformatics bottleneck and so make high-throughput protein biomarker discovery a reality. The purpose of this article is to describe a fully automatic image alignment framework for the integration of 2-DE into a high-throughput differential expression proteomics pipeline. The proposed method is based on robust automated image normalization (RAIN) to circumvent the drawbacks of traditional approaches. These use symbolic representation at the very early stages of the analysis, which introduces persistent errors due to inaccuracies in modelling and alignment. In RAIN, a third-order volume-invariant B-spline model is incorporated into a multi-resolution schema to correct for geometric and expression inhomogeneity at multiple scales. The normalized images can then be compared directly in the image domain for quantitative differential analysis. Through evaluation against an existing state-of-the-art method on real and synthetically warped 2D gels, the proposed analysis framework demonstrates substantial improvements in matching accuracy and differential sensitivity. High-throughput analysis is established through an accelerated GPGPU (general purpose computation on graphics cards) implementation. Supplementary material, software and images used in the validation are available at http://www.proteomegrid.org/rain/.
Sequence-Based Prediction of RNA-Binding Residues in Proteins.
Walia, Rasna R; El-Manzalawy, Yasser; Honavar, Vasant G; Dobbs, Drena
2017-01-01
Identifying individual residues in the interfaces of protein-RNA complexes is important for understanding the molecular determinants of protein-RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein-RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein-RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner.
Sequence-Based Prediction of RNA-Binding Residues in Proteins
Walia, Rasna R.; EL-Manzalawy, Yasser; Honavar, Vasant G.; Dobbs, Drena
2017-01-01
Identifying individual residues in the interfaces of protein–RNA complexes is important for understanding the molecular determinants of protein–RNA recognition and has many potential applications. Recent technical advances have led to several high-throughput experimental methods for identifying partners in protein–RNA complexes, but determining RNA-binding residues in proteins is still expensive and time-consuming. This chapter focuses on available computational methods for identifying which amino acids in an RNA-binding protein participate directly in contacting RNA. Step-by-step protocols for using three different web-based servers to predict RNA-binding residues are described. In addition, currently available web servers and software tools for predicting RNA-binding sites, as well as databases that contain valuable information about known protein–RNA complexes, RNA-binding motifs in proteins, and protein-binding recognition sites in RNA are provided. We emphasize sequence-based methods that can reliably identify interfacial residues without the requirement for structural information regarding either the RNA-binding protein or its RNA partner. PMID:27787829
ACToR-Aggregated Computational Resource | Science ...
ACToR (Aggregated Computational Toxicology Resource) is a database and set of software applications that bring into one central location many types and sources of data on environmental chemicals. Currently, the ACToR chemical database contains information on chemical structure, in vitro bioassays and in vivo toxicology assays derived from more than 150 sources including the U.S. Environmental Protection Agency (EPA), Centers for Disease Control (CDC), U.S. Food & Drug Administration (FDA), National Institutes of Health (NIH), state agencies, corresponding government agencies in Canada, Europe and Japan, universities, the World Health Organization (WHO) and non-governmental organizations (NGOs). At the EPA National Center for Computational Toxicology, ACToR helps manage large data sets being used in a high throughput environmental chemical screening and prioritization program called ToxCast(TM).
ACToR - Aggregated Computational Toxicology Resource
DOE Office of Scientific and Technical Information (OSTI.GOV)
Judson, Richard; Richard, Ann; Dix, David
2008-11-15
ACToR (Aggregated Computational Toxicology Resource) is a database and set of software applications that bring into one central location many types and sources of data on environmental chemicals. Currently, the ACToR chemical database contains information on chemical structure, in vitro bioassays and in vivo toxicology assays derived from more than 150 sources including the U.S. Environmental Protection Agency (EPA), Centers for Disease Control (CDC), U.S. Food and Drug Administration (FDA), National Institutes of Health (NIH), state agencies, corresponding government agencies in Canada, Europe and Japan, universities, the World Health Organization (WHO) and non-governmental organizations (NGOs). At the EPA National Centermore » for Computational Toxicology, ACToR helps manage large data sets being used in a high-throughput environmental chemical screening and prioritization program called ToxCast{sup TM}.« less
Identifying Structural Alerts Based on Zebrafish Developmental Morphological Toxicity (TDS)
Zebrafish constitute a powerful alternative animal model for chemical hazard evaluation. To provide an in vivo complement to high-throughput screening data from the ToxCast program, zebrafish developmental toxicity screens were conducted on the ToxCast Phase I (Padilla et al., 20...
Han, Lianyi; Wang, Yanli; Bryant, Stephen H
2008-09-25
Recent advances in high-throughput screening (HTS) techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HTS assays, it is a very challenging task to mine the HTS data for potential interest in drug development research. Computational approaches for the analysis of HTS results face great challenges due to the large quantity of information and significant amounts of erroneous data produced. In this study, Decision Trees (DT) based models were developed to discriminate compound bioactivities by using their chemical structure fingerprints provided in the PubChem system http://pubchem.ncbi.nlm.nih.gov. The DT models were examined for filtering biological activity data contained in four assays deposited in the PubChem Bioassay Database including assays tested for 5HT1a agonists, antagonists, and HIV-1 RT-RNase H inhibitors. The 10-fold Cross Validation (CV) sensitivity, specificity and Matthews Correlation Coefficient (MCC) for the models are 57.2 approximately 80.5%, 97.3 approximately 99.0%, 0.4 approximately 0.5 respectively. A further evaluation was also performed for DT models built for two independent bioassays, where inhibitors for the same HIV RNase target were screened using different compound libraries, this experiment yields enrichment factor of 4.4 and 9.7. Our results suggest that the designed DT models can be used as a virtual screening technique as well as a complement to traditional approaches for hits selection.
[Methods of high-throughput plant phenotyping for large-scale breeding and genetic experiments].
Afonnikov, D A; Genaev, M A; Doroshkov, A V; Komyshev, E G; Pshenichnikova, T A
2016-07-01
Phenomics is a field of science at the junction of biology and informatics which solves the problems of rapid, accurate estimation of the plant phenotype; it was rapidly developed because of the need to analyze phenotypic characteristics in large scale genetic and breeding experiments in plants. It is based on using the methods of computer image analysis and integration of biological data. Owing to automation, new approaches make it possible to considerably accelerate the process of estimating the characteristics of a phenotype, to increase its accuracy, and to remove a subjectivism (inherent to humans). The main technologies of high-throughput plant phenotyping in both controlled and field conditions, their advantages and disadvantages, and also the prospects of their use for the efficient solution of problems of plant genetics and breeding are presented in the review.
Framework for computationally-predicted AOPs
Framework for computationally-predicted AOPs Given that there are a vast number of existing and new chemicals in the commercial pipeline, emphasis is placed on developing high throughput screening (HTS) methods for hazard prediction. Adverse Outcome Pathways (AOPs) represent a...
EPA CHEMICAL PRIORITIZATION COMMUNITY OF PRACTICE.
IN 2005 THE NATIONAL CENTER FOR COMPUTATIONAL TOXICOLOGY (NCCT) ORGANIZED EPA CHEMICAL PRIORITIATION COMMUNITY OF PRACTICE (CPCP) TO PROVIDE A FORUM FOR DISCUSSING THE UTILITY OF COMPUTATIONAL CHEMISTRY, HIGH-THROUGHPUT SCREENIG (HTS) AND VARIOUS TOXICOGENOMIC TECHNOLOGIES FOR CH...
Accessible high-throughput virtual screening molecular docking software for students and educators.
Jacob, Reed B; Andersen, Tim; McDougal, Owen M
2012-05-01
We survey low cost high-throughput virtual screening (HTVS) computer programs for instructors who wish to demonstrate molecular docking in their courses. Since HTVS programs are a useful adjunct to the time consuming and expensive wet bench experiments necessary to discover new drug therapies, the topic of molecular docking is core to the instruction of biochemistry and molecular biology. The availability of HTVS programs coupled with decreasing costs and advances in computer hardware have made computational approaches to drug discovery possible at institutional and non-profit budgets. This paper focuses on HTVS programs with graphical user interfaces (GUIs) that use either DOCK or AutoDock for the prediction of DockoMatic, PyRx, DockingServer, and MOLA since their utility has been proven by the research community, they are free or affordable, and the programs operate on a range of computer platforms.
TreeMAC: Localized TDMA MAC protocol for real-time high-data-rate sensor networks
Song, W.-Z.; Huang, R.; Shirazi, B.; LaHusen, R.
2009-01-01
Earlier sensor network MAC protocols focus on energy conservation in low-duty cycle applications, while some recent applications involve real-time high-data-rate signals. This motivates us to design an innovative localized TDMA MAC protocol to achieve high throughput and low congestion in data collection sensor networks, besides energy conservation. TreeMAC divides a time cycle into frames and each frame into slots. A parent node determines the children's frame assignment based on their relative bandwidth demand, and each node calculates its own slot assignment based on its hop-count to the sink. This innovative 2-dimensional frame-slot assignment algorithm has the following nice theory properties. First, given any node, at any time slot, there is at most one active sender in its neighborhood (including itself). Second, the packet scheduling with TreeMAC is bufferless, which therefore minimizes the probability of network congestion. Third, the data throughput to the gateway is at least 1/3 of the optimum assuming reliable links. Our experiments on a 24-node testbed show that TreeMAC protocol significantly improves network throughput, fairness, and energy efficiency compared to TinyOS's default CSMA MAC protocol and a recent TDMA MAC protocol Funneling-MAC. Partial results of this paper were published in Song, Huang, Shirazi and Lahusen [W.-Z. Song, R. Huang, B. Shirazi, and R. Lahusen, TreeMAC: Localized TDMA MAC protocol for high-throughput and fairness in sensor networks, in: The 7th Annual IEEE International Conference on Pervasive Computing and Communications, PerCom, March 2009]. Our new contributions include analyses of the performance of TreeMAC from various aspects. We also present more implementation detail and evaluate TreeMAC from other aspects. ?? 2009 Elsevier B.V.
Schaaf, Tory M.; Peterson, Kurt C.; Grant, Benjamin D.; Bawaskar, Prachi; Yuen, Samantha; Li, Ji; Muretta, Joseph M.; Gillispie, Gregory D.; Thomas, David D.
2017-01-01
A robust high-throughput screening (HTS) strategy has been developed to discover small-molecule effectors targeting the sarco/endoplasmic reticulum calcium ATPase (SERCA), based on a fluorescence microplate reader that records both the nanosecond decay waveform (lifetime mode) and the complete emission spectrum (spectral mode), with high precision and speed. This spectral unmixing plate reader (SUPR) was used to screen libraries of small molecules with a fluorescence resonance energy transfer (FRET) biosensor expressed in living cells. Ligand binding was detected by FRET associated with structural rearrangements of green (GFP, donor) and red (RFP, acceptor) fluorescent proteins fused to the cardiac-specific SERCA2a isoform. The results demonstrate accurate quantitation of FRET along with high precision of hit identification. Fluorescence lifetime analysis resolved SERCA’s distinct structural states, providing a method to classify small-molecule chemotypes on the basis of their structural effect on the target. The spectral analysis was also applied to flag interference by fluorescent compounds. FRET hits were further evaluated for functional effects on SERCA’s ATPase activity via both a coupled-enzyme assay and a FRET-based calcium sensor. Concentration-response curves indicated excellent correlation between FRET and function. These complementary spectral and lifetime FRET detection methods offer an attractive combination of precision, speed, and resolution for HTS. PMID:27899691
NASA Astrophysics Data System (ADS)
Soundararajan, Venky; Aravamudan, Murali
2014-12-01
The efficacy and mechanisms of therapeutic action are largely described by atomic bonds and interactions local to drug binding sites. Here we introduce global connectivity analysis as a high-throughput computational assay of therapeutic action - inspired by the Google page rank algorithm that unearths most ``globally connected'' websites from the information-dense world wide web (WWW). We execute short timescale (30 ps) molecular dynamics simulations with high sampling frequency (0.01 ps), to identify amino acid residue hubs whose global connectivity dynamics are characteristic of the ligand or mutation associated with the target protein. We find that unexpected allosteric hubs - up to 20Å from the ATP binding site, but within 5Å of the phosphorylation site - encode the Gibbs free energy of inhibition (ΔGinhibition) for select protein kinase-targeted cancer therapeutics. We further find that clinically relevant somatic cancer mutations implicated in both drug resistance and personalized drug sensitivity can be predicted in a high-throughput fashion. Our results establish global connectivity analysis as a potent assay of protein functional modulation. This sets the stage for unearthing disease-causal exome mutations and motivates forecast of clinical drug response on a patient-by-patient basis. We suggest incorporation of structure-guided genetic inference assays into pharmaceutical and healthcare Oncology workflows.
Probabilistic Assessment of High-Throughput Wireless Sensor Networks
Kim, Robin E.; Mechitov, Kirill; Sim, Sung-Han; Spencer, Billie F.; Song, Junho
2016-01-01
Structural health monitoring (SHM) using wireless smart sensors (WSS) has the potential to provide rich information on the state of a structure. However, because of their distributed nature, maintaining highly robust and reliable networks can be challenging. Assessing WSS network communication quality before and after finalizing a deployment is critical to achieve a successful WSS network for SHM purposes. Early studies on WSS network reliability mostly used temporal signal indicators, composed of a smaller number of packets, to assess the network reliability. However, because the WSS networks for SHM purpose often require high data throughput, i.e., a larger number of packets are delivered within the communication, such an approach is not sufficient. Instead, in this study, a model that can assess, probabilistically, the long-term performance of the network is proposed. The proposed model is based on readily-available measured data sets that represent communication quality during high-throughput data transfer. Then, an empirical limit-state function is determined, which is further used to estimate the probability of network communication failure. Monte Carlo simulation is adopted in this paper and applied to a small and a full-bridge wireless networks. By performing the proposed analysis in complex sensor networks, an optimized sensor topology can be achieved. PMID:27258270
Mass spectrometry-driven drug discovery for development of herbal medicine.
Zhang, Aihua; Sun, Hui; Wang, Xijun
2018-05-01
Herbal medicine (HM) has made a major contribution to the drug discovery process with regard to identifying products compounds. Currently, more attention has been focused on drug discovery from natural compounds of HM. Despite the rapid advancement of modern analytical techniques, drug discovery is still a difficult and lengthy process. Fortunately, mass spectrometry (MS) can provide us with useful structural information for drug discovery, has been recognized as a sensitive, rapid, and high-throughput technology for advancing drug discovery from HM in the post-genomic era. It is essential to develop an efficient, high-quality, high-throughput screening method integrated with an MS platform for early screening of candidate drug molecules from natural products. We have developed a new chinmedomics strategy reliant on MS that is capable of capturing the candidate molecules, facilitating their identification of novel chemical structures in the early phase; chinmedomics-guided natural product discovery based on MS may provide an effective tool that addresses challenges in early screening of effective constituents of herbs against disease. This critical review covers the use of MS with related techniques and methodologies for natural product discovery, biomarker identification, and determination of mechanisms of action. It also highlights high-throughput chinmedomics screening methods suitable for lead compound discovery illustrated by recent successes. © 2016 Wiley Periodicals, Inc.
Benchmarking high performance computing architectures with CMS’ skeleton framework
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-11-23
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Benchmarking high performance computing architectures with CMS’ skeleton framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Suram, Santosh K; Newhouse, Paul F; Zhou, Lan; Van Campen, Douglas G; Mehta, Apurva; Gregoire, John M
2016-11-14
Combinatorial materials science strategies have accelerated materials development in a variety of fields, and we extend these strategies to enable structure-property mapping for light absorber materials, particularly in high order composition spaces. High throughput optical spectroscopy and synchrotron X-ray diffraction are combined to identify the optical properties of Bi-V-Fe oxides, leading to the identification of Bi 4 V 1.5 Fe 0.5 O 10.5 as a light absorber with direct band gap near 2.7 eV. The strategic combination of experimental and data analysis techniques includes automated Tauc analysis to estimate band gap energies from the high throughput spectroscopy data, providing an automated platform for identifying new optical materials.
A Fair Contention Access Scheme for Low-Priority Traffic in Wireless Body Area Networks
Sajeel, Muhammad; Bashir, Faisal; Asfand-e-yar, Muhammad; Tauqir, Muhammad
2017-01-01
Recently, wireless body area networks (WBANs) have attracted significant consideration in ubiquitous healthcare. A number of medium access control (MAC) protocols, primarily derived from the superframe structure of the IEEE 802.15.4, have been proposed in literature. These MAC protocols aim to provide quality of service (QoS) by prioritizing different traffic types in WBANs. A contention access period (CAP)with high contention in priority-based MAC protocols can result in higher number of collisions and retransmissions. During CAP, traffic classes with higher priority are dominant over low-priority traffic; this has led to starvation of low-priority traffic, thus adversely affecting WBAN throughput, delay, and energy consumption. Hence, this paper proposes a traffic-adaptive priority-based superframe structure that is able to reduce contention in the CAP period, and provides a fair chance for low-priority traffic. Simulation results in ns-3 demonstrate that the proposed MAC protocol, called traffic- adaptive priority-based MAC (TAP-MAC), achieves low energy consumption, high throughput, and low latency compared to the IEEE 802.15.4 standard, and the most recent priority-based MAC protocol, called priority-based MAC protocol (PA-MAC). PMID:28832495
Zhang, Guang Lan; Keskin, Derin B.; Lin, Hsin-Nan; Lin, Hong Huang; DeLuca, David S.; Leppanen, Scott; Milford, Edgar L.; Reinherz, Ellis L.; Brusic, Vladimir
2014-01-01
Human leukocyte antigens (HLA) are important biomarkers because multiple diseases, drug toxicity, and vaccine responses reveal strong HLA associations. Current clinical HLA typing is an elimination process requiring serial testing. We present an alternative in situ synthesized DNA-based microarray method that contains hundreds of thousands of probes representing a complete overlapping set covering 1,610 clinically relevant HLA class I alleles accompanied by computational tools for assigning HLA type to 4-digit resolution. Our proof-of-concept experiment included 21 blood samples, 18 cell lines, and multiple controls. The method is accurate, robust, and amenable to automation. Typing errors were restricted to homozygous samples or those with very closely related alleles from the same locus, but readily resolved by targeted DNA sequencing validation of flagged samples. High-throughput HLA typing technologies that are effective, yet inexpensive, can be used to analyze the world’s populations, benefiting both global public health and personalized health care. PMID:25505899
Bell, Andrew S; Bradley, Joseph; Everett, Jeremy R; Knight, Michelle; Loesel, Jens; Mathias, John; McLoughlin, David; Mills, James; Sharp, Robert E; Williams, Christine; Wood, Terence P
2013-05-01
The screening files of many large companies, including Pfizer, have grown considerably due to internal chemistry efforts, company mergers and acquisitions, external contracted synthesis, or compound purchase schemes. In order to screen the targets of interest in a cost-effective fashion, we devised an easy-to-assemble, plate-based diversity subset (PBDS) that represents almost the entire computed chemical space of the screening file whilst comprising only a fraction of the plates in the collection. In order to create this file, we developed new design principles for the quality assessment of screening plates: the Rule of 40 (Ro40) and a plate selection process that insured excellent coverage of both library chemistry and legacy chemistry space. This paper describes the rationale, design, construction, and performance of the PBDS, that has evolved into the standard paradigm for singleton (one compound per well) high-throughput screening in Pfizer since its introduction in 2006.
Software/hardware distributed processing network supporting the Ada environment
NASA Astrophysics Data System (ADS)
Wood, Richard J.; Pryk, Zen
1993-09-01
A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.
Direct Duplex Detection: An Emerging Tool in the RNA Structure Analysis Toolbox.
Weidmann, Chase A; Mustoe, Anthony M; Weeks, Kevin M
2016-09-01
While a variety of powerful tools exists for analyzing RNA structure, identifying long-range and intermolecular base-pairing interactions has remained challenging. Recently, three groups introduced a high-throughput strategy that uses psoralen-mediated crosslinking to directly identify RNA-RNA duplexes in cells. Initial application of these methods highlights the preponderance of long-range structures within and between RNA molecules and their widespread structural dynamics. Copyright © 2016 Elsevier Ltd. All rights reserved.
Structural protein descriptors in 1-dimension and their sequence-based predictions.
Kurgan, Lukasz; Disfani, Fatemeh Miri
2011-09-01
The last few decades observed an increasing interest in development and application of 1-dimensional (1D) descriptors of protein structure. These descriptors project 3D structural features onto 1D strings of residue-wise structural assignments. They cover a wide-range of structural aspects including conformation of the backbone, burying depth/solvent exposure and flexibility of residues, and inter-chain residue-residue contacts. We perform first-of-its-kind comprehensive comparative review of the existing 1D structural descriptors. We define, review and categorize ten structural descriptors and we also describe, summarize and contrast over eighty computational models that are used to predict these descriptors from the protein sequences. We show that the majority of the recent sequence-based predictors utilize machine learning models, with the most popular being neural networks, support vector machines, hidden Markov models, and support vector and linear regressions. These methods provide high-throughput predictions and most of them are accessible to a non-expert user via web servers and/or stand-alone software packages. We empirically evaluate several recent sequence-based predictors of secondary structure, disorder, and solvent accessibility descriptors using a benchmark set based on CASP8 targets. Our analysis shows that the secondary structure can be predicted with over 80% accuracy and segment overlap (SOV), disorder with over 0.9 AUC, 0.6 Matthews Correlation Coefficient (MCC), and 75% SOV, and relative solvent accessibility with PCC of 0.7 and MCC of 0.6 (0.86 when homology is used). We demonstrate that the secondary structure predicted from sequence without the use of homology modeling is as good as the structure extracted from the 3D folds predicted by top-performing template-based methods.
Transcriptome-based differentiation of closely-related Miscanthus lines.
Chouvarine, Philippe; Cooksey, Amanda M; McCarthy, Fiona M; Ray, David A; Baldwin, Brian S; Burgess, Shane C; Peterson, Daniel G
2012-01-01
Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations. A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence. Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation.
A Disk-Based System for Producing and Distributing Science Products from MODIS
NASA Technical Reports Server (NTRS)
Masuoka, Edward; Wolfe, Robert; Sinno, Scott; Ye Gang; Teague, Michael
2007-01-01
Since beginning operations in 1999, the MODIS Adaptive Processing System (MODAPS) has evolved to take advantage of trends in information technology, such as the falling cost of computing cycles and disk storage and the availability of high quality open-source software (Linux, Apache and Perl), to achieve substantial gains in processing and distribution capacity and throughput while driving down the cost of system operations.
Zhou, Bailing; Zhao, Huiying; Yu, Jiafeng; Guo, Chengang; Dou, Xianghua; Song, Feng; Hu, Guodong; Cao, Zanxia; Qu, Yuanxu; Yang, Yuedong; Zhou, Yaoqi; Wang, Jihua
2018-01-04
Long non-coding RNAs (lncRNAs) play important functional roles in various biological processes. Early databases were utilized to deposit all lncRNA candidates produced by high-throughput experimental and/or computational techniques to facilitate classification, assessment and validation. As more lncRNAs are validated by low-throughput experiments, several databases were established for experimentally validated lncRNAs. However, these databases are small in scale (with a few hundreds of lncRNAs only) and specific in their focuses (plants, diseases or interactions). Thus, it is highly desirable to have a comprehensive dataset for experimentally validated lncRNAs as a central repository for all of their structures, functions and phenotypes. Here, we established EVLncRNAs by curating lncRNAs validated by low-throughput experiments (up to 1 May 2016) and integrating specific databases (lncRNAdb, LncRANDisease, Lnc2Cancer and PLNIncRBase) with additional functional and disease-specific information not covered previously. The current version of EVLncRNAs contains 1543 lncRNAs from 77 species that is 2.9 times larger than the current largest database for experimentally validated lncRNAs. Seventy-four percent lncRNA entries are partially or completely new, comparing to all existing experimentally validated databases. The established database allows users to browse, search and download as well as to submit experimentally validated lncRNAs. The database is available at http://biophy.dzu.edu.cn/EVLncRNAs. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhao, Huiying; Yu, Jiafeng; Guo, Chengang; Dou, Xianghua; Song, Feng; Hu, Guodong; Cao, Zanxia; Qu, Yuanxu
2018-01-01
Abstract Long non-coding RNAs (lncRNAs) play important functional roles in various biological processes. Early databases were utilized to deposit all lncRNA candidates produced by high-throughput experimental and/or computational techniques to facilitate classification, assessment and validation. As more lncRNAs are validated by low-throughput experiments, several databases were established for experimentally validated lncRNAs. However, these databases are small in scale (with a few hundreds of lncRNAs only) and specific in their focuses (plants, diseases or interactions). Thus, it is highly desirable to have a comprehensive dataset for experimentally validated lncRNAs as a central repository for all of their structures, functions and phenotypes. Here, we established EVLncRNAs by curating lncRNAs validated by low-throughput experiments (up to 1 May 2016) and integrating specific databases (lncRNAdb, LncRANDisease, Lnc2Cancer and PLNIncRBase) with additional functional and disease-specific information not covered previously. The current version of EVLncRNAs contains 1543 lncRNAs from 77 species that is 2.9 times larger than the current largest database for experimentally validated lncRNAs. Seventy-four percent lncRNA entries are partially or completely new, comparing to all existing experimentally validated databases. The established database allows users to browse, search and download as well as to submit experimentally validated lncRNAs. The database is available at http://biophy.dzu.edu.cn/EVLncRNAs. PMID:28985416
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis, Darren S.; Peterson, Elena S.; Oehmen, Chris S.
2008-05-04
This work presents the ScalaBLAST Web Application (SWA), a web based application implemented using the PHP script language, MySQL DBMS, and Apache web server under a GNU/Linux platform. SWA is an application built as part of the Data Intensive Computer for Complex Biological Systems (DICCBS) project at the Pacific Northwest National Laboratory (PNNL). SWA delivers accelerated throughput of bioinformatics analysis via high-performance computing through a convenient, easy-to-use web interface. This approach greatly enhances emerging fields of study in biology such as ontology-based homology, and multiple whole genome comparisons which, in the absence of a tool like SWA, require a heroicmore » effort to overcome the computational bottleneck associated with genome analysis. The current version of SWA includes a user account management system, a web based user interface, and a backend process that generates the files necessary for the Internet scientific community to submit a ScalaBLAST parallel processing job on a dedicated cluster.« less
Quesada-Cabrera, Raul; Weng, Xiaole; Hyett, Geoff; Clark, Robin J H; Wang, Xue Z; Darr, Jawwad A
2013-09-09
High-throughput continuous hydrothermal flow synthesis was used to manufacture 66 unique nanostructured oxide samples in the Ce-Zr-Y-O system. This synthesis approach resulted in a significant increase in throughput compared to that of conventional batch or continuous hydrothermal synthesis methods. The as-prepared library samples were placed into a wellplate for both automated high-throughput powder X-ray diffraction and Raman spectroscopy data collection, which allowed comprehensive structural characterization and phase mapping. The data suggested that a continuous cubic-like phase field connects all three Ce-Zr-O, Ce-Y-O, and Y-Zr-O binary systems together with a smooth and steady transition between the structures of neighboring compositions. The continuous hydrothermal process led to as-prepared crystallite sizes in the range of 2-7 nm (as determined by using the Scherrer equation).
Identifying chemicals that provide a specific function within a product, yet have minimal impact on the human body or environment, is the goal of most formulation chemists and engineers practicing green chemistry. We present a methodology to identify potential chemical functional...
Song, Zewei; Schlatter, Dan; Kennedy, Peter; Kinkel, Linda L.; Kistler, H. Corby; Nguyen, Nhu; Bates, Scott T.
2015-01-01
Next generation fungal amplicon sequencing is being used with increasing frequency to study fungal diversity in various ecosystems; however, the influence of sample preparation on the characterization of fungal community is poorly understood. We investigated the effects of four procedural modifications to library preparation for high-throughput sequencing (HTS). The following treatments were considered: 1) the amount of soil used in DNA extraction, 2) the inclusion of additional steps (freeze/thaw cycles, sonication, or hot water bath incubation) in the extraction procedure, 3) the amount of DNA template used in PCR, and 4) the effect of sample pooling, either physically or computationally. Soils from two different ecosystems in Minnesota, USA, one prairie and one forest site, were used to assess the generality of our results. The first three treatments did not significantly influence observed fungal OTU richness or community structure at either site. Physical pooling captured more OTU richness compared to individual samples, but total OTU richness at each site was highest when individual samples were computationally combined. We conclude that standard extraction kit protocols are well optimized for fungal HTS surveys, but because sample pooling can significantly influence OTU richness estimates, it is important to carefully consider the study aims when planning sampling procedures. PMID:25974078
PANGEA: pipeline for analysis of next generation amplicons
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-01-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525
PANGEA: pipeline for analysis of next generation amplicons.
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-07-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.
A Multidisciplinary Approach to High Throughput Nuclear Magnetic Resonance Spectroscopy
Pourmodheji, Hossein; Ghafar-Zadeh, Ebrahim; Magierowski, Sebastian
2016-01-01
Nuclear Magnetic Resonance (NMR) is a non-contact, powerful structure-elucidation technique for biochemical analysis. NMR spectroscopy is used extensively in a variety of life science applications including drug discovery. However, existing NMR technology is limited in that it cannot run a large number of experiments simultaneously in one unit. Recent advances in micro-fabrication technologies have attracted the attention of researchers to overcome these limitations and significantly accelerate the drug discovery process by developing the next generation of high-throughput NMR spectrometers using Complementary Metal Oxide Semiconductor (CMOS). In this paper, we examine this paradigm shift and explore new design strategies for the development of the next generation of high-throughput NMR spectrometers using CMOS technology. A CMOS NMR system consists of an array of high sensitivity micro-coils integrated with interfacing radio-frequency circuits on the same chip. Herein, we first discuss the key challenges and recent advances in the field of CMOS NMR technology, and then a new design strategy is put forward for the design and implementation of highly sensitive and high-throughput CMOS NMR spectrometers. We thereafter discuss the functionality and applicability of the proposed techniques by demonstrating the results. For microelectronic researchers starting to work in the field of CMOS NMR technology, this paper serves as a tutorial with comprehensive review of state-of-the-art technologies and their performance levels. Based on these levels, the CMOS NMR approach offers unique advantages for high resolution, time-sensitive and high-throughput bimolecular analysis required in a variety of life science applications including drug discovery. PMID:27294925
EPAs National Center for Computational Toxicology is developing methods that apply computational chemistry, high-throughput screening (HTS) and genomic technologies to predict potential toxicity and prioritize the use of limited testing resources.
High-throughput analysis of T-DNA location and structure using sequence capture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
High-throughput analysis of T-DNA location and structure using sequence capture
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; ...
2015-10-07
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
High Throughput Determination of Critical Human Dosing Parameters (SOT)
High throughput toxicokinetics (HTTK) is a rapid approach that uses in vitro data to estimate TK for hundreds of environmental chemicals. Reverse dosimetry (i.e., reverse toxicokinetics or RTK) based on HTTK data converts high throughput in vitro toxicity screening (HTS) data int...
High Throughput Determinations of Critical Dosing Parameters (IVIVE workshop)
High throughput toxicokinetics (HTTK) is an approach that allows for rapid estimations of TK for hundreds of environmental chemicals. HTTK-based reverse dosimetry (i.e, reverse toxicokinetics or RTK) is used in order to convert high throughput in vitro toxicity screening (HTS) da...
Xiong, Zheng; He, Yinyan; Hattrick-Simpers, Jason R; Hu, Jianjun
2017-03-13
The creation of composition-processing-structure relationships currently represents a key bottleneck for data analysis for high-throughput experimental (HTE) material studies. Here we propose an automated phase diagram attribution algorithm for HTE data analysis that uses a graph-based segmentation algorithm and Delaunay tessellation to create a crystal phase diagram from high throughput libraries of X-ray diffraction (XRD) patterns. We also propose the sample-pair based objective evaluation measures for the phase diagram prediction problem. Our approach was validated using 278 diffraction patterns from a Fe-Ga-Pd composition spread sample with a prediction precision of 0.934 and a Matthews Correlation Coefficient score of 0.823. The algorithm was then applied to the open Ni-Mn-Al thin-film composition spread sample to obtain the first predicted phase diagram mapping for that sample.
Rautenberg, Philipp L.; Kumaraswamy, Ajayrama; Tejero-Cantero, Alvaro; Doblander, Christoph; Norouzian, Mohammad R.; Kai, Kazuki; Jacobsen, Hans-Arno; Ai, Hiroyuki; Wachtler, Thomas; Ikeno, Hidetoshi
2014-01-01
Neuroscience today deals with a “data deluge” derived from the availability of high-throughput sensors of brain structure and brain activity, and increased computational resources for detailed simulations with complex output. We report here (1) a novel approach to data sharing between collaborating scientists that brings together file system tools and cloud technologies, (2) a service implementing this approach, called NeuronDepot, and (3) an example application of the service to a complex use case in the neurosciences. The main drivers for our approach are to facilitate collaborations with a transparent, automated data flow that shields scientists from having to learn new tools or data structuring paradigms. Using NeuronDepot is simple: one-time data assignment from the originator and cloud based syncing—thus making experimental and modeling data available across the collaboration with minimum overhead. Since data sharing is cloud based, our approach opens up the possibility of using new software developments and hardware scalabitliy which are associated with elastic cloud computing. We provide an implementation that relies on existing synchronization services and is usable from all devices via a reactive web interface. We are motivating our solution by solving the practical problems of the GinJang project, a collaboration of three universities across eight time zones with a complex workflow encompassing data from electrophysiological recordings, imaging, morphological reconstructions, and simulations. PMID:24971059
Rautenberg, Philipp L; Kumaraswamy, Ajayrama; Tejero-Cantero, Alvaro; Doblander, Christoph; Norouzian, Mohammad R; Kai, Kazuki; Jacobsen, Hans-Arno; Ai, Hiroyuki; Wachtler, Thomas; Ikeno, Hidetoshi
2014-01-01
Neuroscience today deals with a "data deluge" derived from the availability of high-throughput sensors of brain structure and brain activity, and increased computational resources for detailed simulations with complex output. We report here (1) a novel approach to data sharing between collaborating scientists that brings together file system tools and cloud technologies, (2) a service implementing this approach, called NeuronDepot, and (3) an example application of the service to a complex use case in the neurosciences. The main drivers for our approach are to facilitate collaborations with a transparent, automated data flow that shields scientists from having to learn new tools or data structuring paradigms. Using NeuronDepot is simple: one-time data assignment from the originator and cloud based syncing-thus making experimental and modeling data available across the collaboration with minimum overhead. Since data sharing is cloud based, our approach opens up the possibility of using new software developments and hardware scalabitliy which are associated with elastic cloud computing. We provide an implementation that relies on existing synchronization services and is usable from all devices via a reactive web interface. We are motivating our solution by solving the practical problems of the GinJang project, a collaboration of three universities across eight time zones with a complex workflow encompassing data from electrophysiological recordings, imaging, morphological reconstructions, and simulations.
In Silico Chemogenomics Drug Repositioning Strategies for Neglected Tropical Diseases.
Andrade, Carolina Horta; Neves, Bruno Junior; Melo-Filho, Cleber Camilo; Rodrigues, Juliana; Silva, Diego Cabral; Braga, Rodolpho Campos; Cravo, Pedro Vitor Lemos
2018-03-08
Only ~1% of all drug candidates against Neglected Tropical Diseases (NTDs) have reached clinical trials in the last decades, underscoring the need for new, safe and effective treatments. In such context, drug repositioning, which allows finding novel indications for approved drugs whose pharmacokinetic and safety profiles are already known, is emerging as a promising strategy for tackling NTDs. Chemogenomics is a direct descendent of the typical drug discovery process that involves the systematic screening of chemical compounds against drug targets in high-throughput screening (HTS) efforts, for the identification of lead compounds. However, different to the one-drug-one-target paradigm, chemogenomics attempts to identify all potential ligands for all possible targets and diseases. In this review, we summarize current methodological development efforts in drug repositioning that use state-of-the-art computational ligand- and structure-based chemogenomics approaches. Furthermore, we highlighted the recent progress in computational drug repositioning for some NTDs, based on curation and modeling of genomic, biological, and chemical data. Additionally, we also present in-house and other successful examples and suggest possible solutions to existing pitfalls. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
NASA Astrophysics Data System (ADS)
Moreland, Blythe; Oman, Kenji; Curfman, John; Yan, Pearlly; Bundschuh, Ralf
Methyl-binding domain (MBD) protein pulldown experiments have been a valuable tool in measuring the levels of methylated CpG dinucleotides. Due to the frequent use of this technique, high-throughput sequencing data sets are available that allow a detailed quantitative characterization of the underlying interaction between methylated DNA and MBD proteins. Analyzing such data sets, we first found that two such proteins cannot bind closer to each other than 2 bp, consistent with structural models of the DNA-protein interaction. Second, the large amount of sequencing data allowed us to find rather weak but nevertheless clearly statistically significant sequence preferences for several bases around the required CpG. These results demonstrate that pulldown sequencing is a high-precision tool in characterizing DNA-protein interactions. This material is based upon work supported by the National Science Foundation under Grant No. DMR-1410172.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perkins, Stephen J.; Wright, David W.; Zhang, Hailiang
2016-10-14
The capabilities of current computer simulations provide a unique opportunity to model small-angle scattering (SAS) data at the atomistic level, and to include other structural constraints ranging from molecular and atomistic energetics to crystallography, electron microscopy and NMR. This extends the capabilities of solution scattering and provides deeper insights into the physics and chemistry of the systems studied. Realizing this potential, however, requires integrating the experimental data with a new generation of modelling software. To achieve this, the CCP-SAS collaboration (http://www.ccpsas.org/) is developing open-source, high-throughput and user-friendly software for the atomistic and coarse-grained molecular modelling of scattering data. Robust state-of-the-artmore » molecular simulation engines and molecular dynamics and Monte Carlo force fields provide constraints to the solution structure inferred from the small-angle scattering data, which incorporates the known physical chemistry of the system. The implementation of this software suite involves a tiered approach in whichGenAppprovides the deployment infrastructure for running applications on both standard and high-performance computing hardware, andSASSIEprovides a workflow framework into which modules can be plugged to prepare structures, carry out simulations, calculate theoretical scattering data and compare results with experimental data.GenAppproduces the accessible web-based front end termedSASSIE-web, andGenAppandSASSIEalso make community SAS codes available. Applications are illustrated by case studies: (i) inter-domain flexibility in two- to six-domain proteins as exemplified by HIV-1 Gag, MASP and ubiquitin; (ii) the hinge conformation in human IgG2 and IgA1 antibodies; (iii) the complex formed between a hexameric protein Hfq and mRNA; and (iv) synthetic `bottlebrush' polymers.« less
Perkins, Stephen J; Wright, David W; Zhang, Hailiang; Brookes, Emre H; Chen, Jianhan; Irving, Thomas C; Krueger, Susan; Barlow, David J; Edler, Karen J; Scott, David J; Terrill, Nicholas J; King, Stephen M; Butler, Paul D; Curtis, Joseph E
2016-12-01
The capabilities of current computer simulations provide a unique opportunity to model small-angle scattering (SAS) data at the atomistic level, and to include other structural constraints ranging from molecular and atomistic energetics to crystallography, electron microscopy and NMR. This extends the capabilities of solution scattering and provides deeper insights into the physics and chemistry of the systems studied. Realizing this potential, however, requires integrating the experimental data with a new generation of modelling software. To achieve this, the CCP-SAS collaboration (http://www.ccpsas.org/) is developing open-source, high-throughput and user-friendly software for the atomistic and coarse-grained molecular modelling of scattering data. Robust state-of-the-art molecular simulation engines and molecular dynamics and Monte Carlo force fields provide constraints to the solution structure inferred from the small-angle scattering data, which incorporates the known physical chemistry of the system. The implementation of this software suite involves a tiered approach in which GenApp provides the deployment infrastructure for running applications on both standard and high-performance computing hardware, and SASSIE provides a workflow framework into which modules can be plugged to prepare structures, carry out simulations, calculate theoretical scattering data and compare results with experimental data. GenApp produces the accessible web-based front end termed SASSIE-web , and GenApp and SASSIE also make community SAS codes available. Applications are illustrated by case studies: (i) inter-domain flexibility in two- to six-domain proteins as exemplified by HIV-1 Gag, MASP and ubiquitin; (ii) the hinge conformation in human IgG2 and IgA1 antibodies; (iii) the complex formed between a hexameric protein Hfq and mRNA; and (iv) synthetic 'bottlebrush' polymers.
Automatic high-throughput screening of colloidal crystals using machine learning
NASA Astrophysics Data System (ADS)
Spellings, Matthew; Glotzer, Sharon C.
Recent improvements in hardware and software have united to pose an interesting problem for computational scientists studying self-assembly of particles into crystal structures: while studies covering large swathes of parameter space can be dispatched at once using modern supercomputers and parallel architectures, identifying the different regions of a phase diagram is often a serial task completed by hand. While analytic methods exist to distinguish some simple structures, they can be difficult to apply, and automatic identification of more complex structures is still lacking. In this talk we describe one method to create numerical ``fingerprints'' of local order and use them to analyze a study of complex ordered structures. We can use these methods as first steps toward automatic exploration of parameter space and, more broadly, the strategic design of new materials.
NASA Astrophysics Data System (ADS)
Ohene-Kwofie, Daniel; Otoo, Ekow
2015-10-01
The ATLAS detector, operated at the Large Hadron Collider (LHC) records proton-proton collisions at CERN every 50ns resulting in a sustained data flow up to PB/s. The upgraded Tile Calorimeter of the ATLAS experiment will sustain about 5PB/s of digital throughput. These massive data rates require extremely fast data capture and processing. Although there has been a steady increase in the processing speed of CPU/GPGPU assembled for high performance computing, the rate of data input and output, even under parallel I/O, has not kept up with the general increase in computing speeds. The problem then is whether one can implement an I/O subsystem infrastructure capable of meeting the computational speeds of the advanced computing systems at the petascale and exascale level. We propose a system architecture that leverages the Partitioned Global Address Space (PGAS) model of computing to maintain an in-memory data-store for the Processing Unit (PU) of the upgraded electronics of the Tile Calorimeter which is proposed to be used as a high throughput general purpose co-processor to the sROD of the upgraded Tile Calorimeter. The physical memory of the PUs are aggregated into a large global logical address space using RDMA- capable interconnects such as PCI- Express to enhance data processing throughput.
Optimal processor assignment for pipeline computations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath
1991-01-01
The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.
Assembly and diploid architecture of an individual human genome via single-molecule technologies
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali
2015-01-01
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality. PMID:26121404
Assembly and diploid architecture of an individual human genome via single-molecule technologies.
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali
2015-08-01
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.
High-Throughput Screening and Hit Validation of Extracellular-Related Kinase 5 (ERK5) Inhibitors.
Myers, Stephanie M; Bawn, Ruth H; Bisset, Louise C; Blackburn, Timothy J; Cottyn, Betty; Molyneux, Lauren; Wong, Ai-Ching; Cano, Celine; Clegg, William; Harrington, Ross W; Leung, Hing; Rigoreau, Laurent; Vidot, Sandrine; Golding, Bernard T; Griffin, Roger J; Hammonds, Tim; Newell, David R; Hardcastle, Ian R
2016-08-08
The extracellular-related kinase 5 (ERK5) is a promising target for cancer therapy. A high-throughput screen was developed for ERK5, based on the IMAP FP progressive binding system, and used to identify hits from a library of 57 617 compounds. Four distinct chemical series were evident within the screening hits. Resynthesis and reassay of the hits demonstrated that one series did not return active compounds, whereas three series returned active hits. Structure-activity studies demonstrated that the 4-benzoylpyrrole-2-carboxamide pharmacophore had excellent potential for further development. The minimum kinase binding pharmacophore was identified, and key examples demonstrated good selectivity for ERK5 over p38α kinase.
NASA Astrophysics Data System (ADS)
Hai, Pengfei; Zhou, Yong; Zhang, Ruiying; Ma, Jun; Li, Yang; Wang, Lihong V.
2017-03-01
Circulating tumor cell (CTC) clusters arise from multicellular grouping in the primary tumor and elevate the metastatic potential by 23 to 50 fold compared to single CTCs. High throughout detection and quantification of CTC clusters is critical for understanding the tumor metastasis process and improving cancer therapy. In this work, we report a linear-array-based photoacoustic tomography (LA-PAT) system capable of label-free high-throughput CTC cluster detection and quantification in vivo. LA-PAT detects CTC clusters and quantifies the number of cells in them based on the contrast-to-noise ratios (CNRs) of photoacoustic signals. The feasibility of LA-PAT was first demonstrated by imaging CTC clusters ex vivo. LA-PAT detected CTC clusters in the blood-filled microtubes and computed the number of cells in the clusters. The size distribution of the CTC clusters measured by LA-PAT agreed well with that obtained by optical microscopy. We demonstrated the ability of LA-PAT to detect and quantify CTC clusters in vivo by imaging injected CTC clusters in rat tail veins. LA-PAT detected CTC clusters immediately after injection as well as when they were circulating in the rat bloodstreams. Similarly, the numbers of cells in the clusters were computed based on the CNRs of the photoacoustic signals. The data showed that larger CTC clusters disappear faster than the smaller ones. The results prove the potential of LA-PAT as a promising tool for both preclinical tumor metastasis studies and clinical cancer therapy evaluation.
Towards High-Throughput, Simultaneous Characterization of Thermal and Thermoelectric Properties
NASA Astrophysics Data System (ADS)
Miers, Collier Stephen
The extension of thermoelectric generators to more general markets requires that the devices be affordable and practical (low $/Watt) to implement. A key challenge in this pursuit is the quick and accurate characterization of thermoelectric materials, which will allow researchers to tune and modify the material properties quickly. The goal of this thesis is to design and fabricate a high-throughput characterization system for the simultaneous characterization of thermal, electrical, and thermoelectric properties for device scale material samples. The measurement methodology presented in this thesis combines a custom designed measurement system created specifically for high-throughput testing with a novel device structure that permits simultaneous characterization of the material properties. The measurement system is based upon the 3o method for thermal conductivity measurements, with the addition of electrodes and voltage probes to measure the electrical conductivity and Seebeck coefficient. A device designed and optimized to permit the rapid characterization of thermoelectric materials is also presented. This structure is optimized to ensure 1D heat transfer within the sample, thus permitting rapid data analysis and fitting using a MATLAB script. Verification of the thermal portion of the system is presented using fused silica and sapphire materials for benchmarking. The fused silica samples yielded a thermal conductivity of 1.21 W/(m K), while a thermal conductivity of 31.2 W/(m K) was measured for the sapphire samples. The device and measurement system designed and developed in this thesis provide insight and serve as a foundation for the development of high throughput, simultaneous measurement platforms.
Mass Conservation and Inference of Metabolic Networks from High-Throughput Mass Spectrometry Data
Bandaru, Pradeep; Bansal, Mukesh
2011-01-01
Abstract We present a step towards the metabolome-wide computational inference of cellular metabolic reaction networks from metabolic profiling data, such as mass spectrometry. The reconstruction is based on identification of irreducible statistical interactions among the metabolite activities using the ARACNE reverse-engineering algorithm and on constraining possible metabolic transformations to satisfy the conservation of mass. The resulting algorithms are validated on synthetic data from an abridged computational model of Escherichia coli metabolism. Precision rates upwards of 50% are routinely observed for identification of full metabolic reactions, and recalls upwards of 20% are also seen. PMID:21314454
Zhang, Bing; Schmoyer, Denise; Kirov, Stefan; Snoddy, Jay
2004-01-01
Background Microarray and other high-throughput technologies are producing large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in the gene sets. Results We have created a web-based tool for data analysis and data visualization for sets of genes called GOTree Machine (GOTM). This tool was originally intended to analyze sets of co-regulated genes identified from microarray analysis but is adaptable for use with other gene sets from other high-throughput analyses. GOTree Machine generates a GOTree, a tree-like structure to navigate the Gene Ontology Directed Acyclic Graph for input gene sets. This system provides user friendly data navigation and visualization. Statistical analysis helps users to identify the most important Gene Ontology categories for the input gene sets and suggests biological areas that warrant further study. GOTree Machine is available online at . Conclusion GOTree Machine has a broad application in functional genomic, proteomic and other high-throughput methods that generate large sets of interesting genes; its primary purpose is to help users sort for interesting patterns in gene sets. PMID:14975175
NCBI GEO: archive for high-throughput functional genomic data.
Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Edgar, Ron
2009-01-01
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as 'Minimum Information About a Microarray Experiment' (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
Structural Genomics of Protein Phosphatases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Almo,S.; Bonanno, J.; Sauder, J.
The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptionalmore » regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.« less
Integration of a neuroimaging processing pipeline into a pan-canadian computing grid
NASA Astrophysics Data System (ADS)
Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.
2012-02-01
The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.
From cancer genomes to cancer models: bridging the gaps
Baudot, Anaïs; Real, Francisco X.; Izarzugaza, José M. G.; Valencia, Alfonso
2009-01-01
Cancer genome projects are now being expanded in an attempt to provide complete landscapes of the mutations that exist in tumours. Although the importance of cataloguing genome variations is well recognized, there are obvious difficulties in bridging the gaps between high-throughput resequencing information and the molecular mechanisms of cancer evolution. Here, we describe the current status of the high-throughput genomic technologies, and the current limitations of the associated computational analysis and experimental validation of cancer genetic variants. We emphasize how the current cancer-evolution models will be influenced by the high-throughput approaches, in particular through efforts devoted to monitoring tumour progression, and how, in turn, the integration of data and models will be translated into mechanistic knowledge and clinical applications. PMID:19305388
NASA Astrophysics Data System (ADS)
Olivares-Amaya, Roberto; Hachmann, Johannes; Amador-Bedolla, Carlos; Daly, Aidan; Jinich, Adrian; Atahan-Evrenk, Sule; Boixo, Sergio; Aspuru-Guzik, Alán
2012-02-01
Organic photovoltaic devices have emerged as competitors to silicon-based solar cells, currently reaching efficiencies of over 9% and offering desirable properties for manufacturing and installation. We study conjugated donor polymers for high-efficiency bulk-heterojunction photovoltaic devices with a molecular library motivated by experimental feasibility. We use quantum mechanics and a distributed computing approach to explore this vast molecular space. We will detail the screening approach starting from the generation of the molecular library, which can be easily extended to other kinds of molecular systems. We will describe the screening method for these materials which ranges from descriptor models, ubiquitous in the drug discovery community, to eventually reaching first principles quantum chemistry methods. We will present results on the statistical analysis, based principally on machine learning, specifically partial least squares and Gaussian processes. Alongside, clustering methods and the use of the hypergeometric distribution reveal moieties important for the donor materials and allow us to quantify structure-property relationships. These efforts enable us to accelerate materials discovery in organic photovoltaics through our collaboration with experimental groups.
Life in the fast lane for protein crystallization and X-ray crystallography
NASA Technical Reports Server (NTRS)
Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.
2005-01-01
The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high-rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today's high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).
Life in the Fast Lane for Protein Crystallization and X-Ray Crystallography
NASA Technical Reports Server (NTRS)
Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.
2004-01-01
The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today s high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).
[Current applications of high-throughput DNA sequencing technology in antibody drug research].
Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong
2012-03-01
Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.
Progress on the Fabric for Frontier Experiments Project at Fermilab
NASA Astrophysics Data System (ADS)
Box, Dennis; Boyd, Joseph; Dykstra, Dave; Garzoglio, Gabriele; Herner, Kenneth; Kirby, Michael; Kreymer, Arthur; Levshina, Tanya; Mhashilkar, Parag; Sharma, Neha
2015-12-01
The FabrIc for Frontier Experiments (FIFE) project is an ambitious, major-impact initiative within the Fermilab Scientific Computing Division designed to lead the computing model for Fermilab experiments. FIFE is a collaborative effort between experimenters and computing professionals to design and develop integrated computing models for experiments of varying needs and infrastructure. The major focus of the FIFE project is the development, deployment, and integration of Open Science Grid solutions for high throughput computing, data management, database access and collaboration within experiment. To accomplish this goal, FIFE has developed workflows that utilize Open Science Grid sites along with dedicated and commercial cloud resources. The FIFE project has made significant progress integrating into experiment computing operations several services including new job submission services, software and reference data distribution through CVMFS repositories, flexible data transfer client, and access to opportunistic resources on the Open Science Grid. The progress with current experiments and plans for expansion with additional projects will be discussed. FIFE has taken a leading role in the definition of the computing model for Fermilab experiments, aided in the design of computing for experiments beyond Fermilab, and will continue to define the future direction of high throughput computing for future physics experiments worldwide.
NASA Astrophysics Data System (ADS)
Hinuma, Yoyo; Kumagai, Yu; Tanaka, Isao; Oba, Fumiyasu
2017-02-01
The band alignment of prototypical semiconductors and insulators is investigated using first-principles calculations. A dielectric-dependent hybrid functional, where the nonlocal Fock exchange mixing is set at the reciprocal of the static electronic dielectric constant and the exchange correlation is otherwise treated as in the Perdew-Burke-Ernzerhof (PBE0) hybrid functional, is used as well as the Heyd-Scuseria-Ernzerhof (HSE06) hybrid and PBE semilocal functionals. In addition, these hybrid functionals are applied non-self-consistently to accelerate calculations. The systems considered include C and Si in the diamond structure, BN, AlP, AlAs, AlSb, GaP, GaAs, InP, ZnS, ZnSe, ZnTe, CdS, CdSe, and CdTe in the zinc-blende structure, MgO in the rocksalt structure, and GaN and ZnO in the wurtzite structure. Surface band positions with respect to the vacuum level, i.e., ionization potentials and electron affinities, and band offsets at selected zinc-blende heterointerfaces are evaluated as well as band gaps. The non-self-consistent approach speeds up hybrid functional calculations by an order of magnitude, while it is shown using HSE06 that the resultant band gaps and surface band positions are similar to the self-consistent results. The dielectric-dependent hybrid functional improves the band gaps and surface band positions of wide-gap systems over HSE06. The interfacial band offsets are predicted with a similar degree of precision. Overall, the performance of the dielectric-dependent hybrid functional is comparable to the G W0 approximation based on many-body perturbation theory in the prediction of band gaps and alignments for most systems. The present results demonstrate that the dielectric-dependent hybrid functional, particularly when applied non-self-consistently, is promising for applications to systematic calculations or high-throughput screening that demand both computational efficiency and sufficient accuracy.
Liu, Zhaomin; Pottel, Joshua; Shahamat, Moeed; Tomberg, Anna; Labute, Paul; Moitessier, Nicolas
2016-04-25
Computational chemists use structure-based drug design and molecular dynamics of drug/protein complexes which require an accurate description of the conformational space of drugs. Organic chemists use qualitative chemical principles such as the effect of electronegativity on hyperconjugation, the impact of steric clashes on stereochemical outcome of reactions, and the consequence of resonance on the shape of molecules to rationalize experimental observations. While computational chemists speak about electron densities and molecular orbitals, organic chemists speak about partial charges and localized molecular orbitals. Attempts to reconcile these two parallel approaches such as programs for natural bond orbitals and intrinsic atomic orbitals computing Lewis structures-like orbitals and reaction mechanism have appeared. In the past, we have shown that encoding and quantifying chemistry knowledge and qualitative principles can lead to predictive methods. In the same vein, we thought to understand the conformational behaviors of molecules and to encode this knowledge back into a molecular mechanics tool computing conformational potential energy and to develop an alternative to atom types and training of force fields on large sets of molecules. Herein, we describe a conceptually new approach to model torsion energies based on fundamental chemistry principles. To demonstrate our approach, torsional energy parameters were derived on-the-fly from atomic properties. When the torsional energy terms implemented in GAFF, Parm@Frosst, and MMFF94 were substituted by our method, the accuracy of these force fields to reproduce MP2-derived torsional energy profiles and their transferability to a variety of functional groups and drug fragments were overall improved. In addition, our method did not rely on atom types and consequently did not suffer from poor automated atom type assignments.
Centrifuge: rapid and sensitive classification of metagenomic sequences
Song, Li; Breitwieser, Florian P.
2016-01-01
Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. PMID:27852649
Method and apparatus for digitally based high speed x-ray spectrometer
Warburton, W.K.; Hubbard, B.
1997-11-04
A high speed, digitally based, signal processing system which accepts input data from a detector-preamplifier and produces a spectral analysis of the x-rays illuminating the detector. The system achieves high throughputs at low cost by dividing the required digital processing steps between a ``hardwired`` processor implemented in combinatorial digital logic, which detects the presence of the x-ray signals in the digitized data stream and extracts filtered estimates of their amplitudes, and a programmable digital signal processing computer, which refines the filtered amplitude estimates and bins them to produce the desired spectral analysis. One set of algorithms allow this hybrid system to match the resolution of analog systems while operating at much higher data rates. A second set of algorithms implemented in the processor allow the system to be self calibrating as well. The same processor also handles the interface to an external control computer. 19 figs.
Method and apparatus for digitally based high speed x-ray spectrometer
Warburton, William K.; Hubbard, Bradley
1997-01-01
A high speed, digitally based, signal processing system which accepts input data from a detector-preamplifier and produces a spectral analysis of the x-rays illuminating the detector. The system achieves high throughputs at low cost by dividing the required digital processing steps between a "hardwired" processor implemented in combinatorial digital logic, which detects the presence of the x-ray signals in the digitized data stream and extracts filtered estimates of their amplitudes, and a programmable digital signal processing computer, which refines the filtered amplitude estimates and bins them to produce the desired spectral analysis. One set of algorithms allow this hybrid system to match the resolution of analog systems while operating at much higher data rates. A second set of algorithms implemented in the processor allow the system to be self calibrating as well. The same processor also handles the interface to an external control computer.
Concepción-Acevedo, Jeniffer; Weiss, Howard N; Chaudhry, Waqas Nasir; Levin, Bruce R
2015-01-01
The maximum exponential growth rate, the Malthusian parameter (MP), is commonly used as a measure of fitness in experimental studies of adaptive evolution and of the effects of antibiotic resistance and other genes on the fitness of planktonic microbes. Thanks to automated, multi-well optical density plate readers and computers, with little hands-on effort investigators can readily obtain hundreds of estimates of MPs in less than a day. Here we compare estimates of the relative fitness of antibiotic susceptible and resistant strains of E. coli, Pseudomonas aeruginosa and Staphylococcus aureus based on MP data obtained with automated multi-well plate readers with the results from pairwise competition experiments. This leads us to question the reliability of estimates of MP obtained with these high throughput devices and the utility of these estimates of the maximum growth rates to detect fitness differences.
Raspberry Pi-powered imaging for plant phenotyping.
Tovar, Jose C; Hoyer, J Steen; Lin, Andy; Tielking, Allison; Callen, Steven T; Elizabeth Castillo, S; Miller, Michael; Tessman, Monica; Fahlgren, Noah; Carrington, James C; Nusinow, Dmitri A; Gehan, Malia A
2018-03-01
Image-based phenomics is a powerful approach to capture and quantify plant diversity. However, commercial platforms that make consistent image acquisition easy are often cost-prohibitive. To make high-throughput phenotyping methods more accessible, low-cost microcomputers and cameras can be used to acquire plant image data. We used low-cost Raspberry Pi computers and cameras to manage and capture plant image data. Detailed here are three different applications of Raspberry Pi-controlled imaging platforms for seed and shoot imaging. Images obtained from each platform were suitable for extracting quantifiable plant traits (e.g., shape, area, height, color) en masse using open-source image processing software such as PlantCV. This protocol describes three low-cost platforms for image acquisition that are useful for quantifying plant diversity. When coupled with open-source image processing tools, these imaging platforms provide viable low-cost solutions for incorporating high-throughput phenomics into a wide range of research programs.
A kinase-focused compound collection: compilation and screening strategy.
Sun, Dongyu; Chuaqui, Claudio; Deng, Zhan; Bowes, Scott; Chin, Donovan; Singh, Juswinder; Cullen, Patrick; Hankins, Gretchen; Lee, Wen-Cherng; Donnelly, Jason; Friedman, Jessica; Josiah, Serene
2006-06-01
Lead identification by high-throughput screening of large compound libraries has been supplemented with virtual screening and focused compound libraries. To complement existing approaches for lead identification at Biogen Idec, a kinase-focused compound collection was designed, developed and validated. Two strategies were adopted to populate the compound collection: a ligand shape-based virtual screening and a receptor-based approach (structural interaction fingerprint). Compounds selected with the two approaches were cherry-picked from an existing high-throughput screening compound library, ordered from suppliers and supplemented with specific medicinal compounds from internal programs. Promising hits and leads have been generated from the kinase-focused compound collection against multiple kinase targets. The principle of the collection design and screening strategy was validated and the use of the kinase-focused compound collection for lead identification has been added to existing strategies.
NASA Astrophysics Data System (ADS)
Wang, Yonggang; Li, Deng; Lu, Xiaoming; Cheng, Xinyi; Wang, Liwei
2014-10-01
Continuous crystal-based positron emission tomography (PET) detectors could be an ideal alternative for current high-resolution pixelated PET detectors if the issues of high performance γ interaction position estimation and its real-time implementation are solved. Unfortunately, existing position estimators are not very feasible for implementation on field-programmable gate array (FPGA). In this paper, we propose a new self-organizing map neural network-based nearest neighbor (SOM-NN) positioning scheme aiming not only at providing high performance, but also at being realistic for FPGA implementation. Benefitting from the SOM feature mapping mechanism, the large set of input reference events at each calibration position is approximated by a small set of prototypes, and the computation of the nearest neighbor searching for unknown events is largely reduced. Using our experimental data, the scheme was evaluated, optimized and compared with the smoothed k-NN method. The spatial resolutions of full-width-at-half-maximum (FWHM) of both methods averaged over the center axis of the detector were obtained as 1.87 ±0.17 mm and 1.92 ±0.09 mm, respectively. The test results show that the SOM-NN scheme has an equivalent positioning performance with the smoothed k-NN method, but the amount of computation is only about one-tenth of the smoothed k-NN method. In addition, the algorithm structure of the SOM-NN scheme is more feasible for implementation on FPGA. It has the potential to realize real-time position estimation on an FPGA with a high-event processing throughput.
Orr, Asuka A; Gonzalez-Rivera, Juan C; Wilson, Mark; Bhikha, P Reena; Wang, Daiqi; Contreras, Lydia M; Tamamis, Phanourios
2018-02-01
There are over 150 currently known, highly diverse chemically modified RNAs, which are dynamic, reversible, and can modulate RNA-protein interactions. Yet, little is known about the wealth of such interactions. This can be attributed to the lack of tools that allow the rapid study of all the potential RNA modifications that might mediate RNA-protein interactions. As a promising step toward this direction, here we present a computational protocol for the characterization of interactions between proteins and RNA containing post-transcriptional modifications. Given an RNA-protein complex structure, potential RNA modified ribonucleoside positions, and molecular mechanics parameters for capturing energetics of RNA modifications, our protocol operates in two stages. In the first stage, a decision-making tool, comprising short simulations and interaction energy calculations, performs a fast and efficient search in a high-throughput fashion, through a list of different types of RNA modifications categorized into trees according to their structural and physicochemical properties, and selects a subset of RNA modifications prone to interact with the target protein. In the second stage, RNA modifications that are selected as recognized by the protein are examined in-detail using all-atom simulations and free energy calculations. We implement and experimentally validate this protocol in a test case involving the study of RNA modifications in complex with Escherichia coli (E. coli) protein Polynucleotide Phosphorylase (PNPase), depicting the favorable interaction between 8-oxo-7,8-dihydroguanosine (8-oxoG) RNA modification and PNPase. Further advancement of the protocol can broaden our understanding of protein interactions with all known RNA modifications in several systems. Copyright © 2018 Elsevier Inc. All rights reserved.
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Discovery of a novel general anesthetic chemotype using high-throughput screening.
McKinstry-Wu, Andrew R; Bu, Weiming; Rai, Ganesha; Lea, Wendy A; Weiser, Brian P; Liang, David F; Simeonov, Anton; Jadhav, Ajit; Maloney, David J; Eckenhoff, Roderic G
2015-02-01
The development of novel anesthetics has historically been a process of combined serendipity and empiricism, with most recent new anesthetics developed via modification of existing anesthetic structures. Using a novel high-throughput screen employing the fluorescent anesthetic 1-aminoanthracene and apoferritin as a surrogate for on-pathway anesthetic protein target(s), we screened a 350,000 compound library for competition with 1-aminoanthracene-apoferritin binding. Hit compounds meeting structural criteria had their binding affinities for apoferritin quantified with isothermal titration calorimetry and were tested for γ-aminobutyric acid type A receptor binding using a flunitrazepam binding assay. Chemotypes with a strong presence in the top 700 and exhibiting activity via isothermal titration calorimetry were selected for medicinal chemistry optimization including testing for anesthetic potency and toxicity in an in vivo Xenopus laevis tadpole assay. Compounds with low toxicity and high potency were tested for anesthetic potency in mice. From an initial chemical library of more than 350,000 compounds, we identified 2,600 compounds that potently inhibited 1-aminoanthracene binding to apoferritin. A subset of compounds chosen by structural criteria (700) was successfully reconfirmed using the initial assay. Based on a strong presence in both the initial and secondary screens the 6-phenylpyridazin-3(2H)-one chemotype was assessed for anesthetic activity in tadpoles. Medicinal chemistry efforts identified four compounds with high potency and low toxicity in tadpoles, two were found to be effective novel anesthetics in mice. The authors demonstrate the first use of a high-throughput screen to successfully identify a novel anesthetic chemotype and show mammalian anesthetic activity for members of that chemotype.
The use of high-throughput screening techniques to evaluate mitochondrial toxicity.
Wills, Lauren P
2017-11-01
Toxicologists and chemical regulators depend on accurate and effective methods to evaluate and predict the toxicity of thousands of current and future compounds. Robust high-throughput screening (HTS) experiments have the potential to efficiently test large numbers of chemical compounds for effects on biological pathways. HTS assays can be utilized to examine chemical toxicity across multiple mechanisms of action, experimental models, concentrations, and lengths of exposure. Many agricultural, industrial, and pharmaceutical chemicals classified as harmful to human and environmental health exert their effects through the mechanism of mitochondrial toxicity. Mitochondrial toxicants are compounds that cause a decrease in the number of mitochondria within a cell, and/or decrease the ability of mitochondria to perform normal functions including producing adenosine triphosphate (ATP) and maintaining cellular homeostasis. Mitochondrial dysfunction can lead to apoptosis, necrosis, altered metabolism, muscle weakness, neurodegeneration, decreased organ function, and eventually disease or death of the whole organism. The development of HTS techniques to identify mitochondrial toxicants will provide extensive databases with essential connections between mechanistic mitochondrial toxicity and chemical structure. Computational and bioinformatics approaches can be used to evaluate compound databases for specific chemical structures associated with toxicity, with the goal of developing quantitative structure-activity relationship (QSAR) models and mitochondrial toxicophores. Ultimately these predictive models will facilitate the identification of mitochondrial liabilities in consumer products, industrial compounds, pharmaceuticals and environmental hazards. Copyright © 2017 Elsevier B.V. All rights reserved.
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
NASA Astrophysics Data System (ADS)
Trimarchi, Giancarlo; Zhang, Xiuwen; DeVries Vermeer, Michael J.; Cantwell, Jacqueline; Poeppelmeier, Kenneth R.; Zunger, Alex
2015-10-01
Theoretical sorting of stable and synthesizable "missing compounds" from those that are unstable is a crucial step in the discovery of previously unknown functional materials. This active research area often involves high-throughput (HT) examination of the total energy of a given compound in a list of candidate formal structure types (FSTs), searching for those with the lowest energy within that list. While it is well appreciated that local relaxation methods based on a fixed list of structure types can lead to inaccurate geometries, this approach is widely used in HT studies because it produces answers faster than global optimization methods (that vary lattice vectors and atomic positions without local restrictions). We find, however, a different failure mode of the HT protocol: specific crystallographic classes of formal structure types each correspond to a series of chemically distinct "daughter structure types" (DSTs) that have the same space group but possess totally different local bonding configurations, including coordination types. Failure to include such DSTs in the fixed list of examined candidate structures used in contemporary high-throughput approaches can lead to qualitative misidentification of the stable bonding pattern, not just quantitative inaccuracies. In this work, we (i) clarify the understanding of the general DST-FST relationship, thus improving current discovery HT approaches, (ii) illustrate this failure mode for RbCuS and RbCuSe (the latter being a yet unreported compound and is predicted here) by developing a synthesis method and accelerated crystal-structure determination, and (iii) apply the genetic-algorithm-based global space-group optimization (GSGO) approach which is not vulnerable to the failure mode of HT searches of fixed lists, demonstrating a correct identification of the stable DST. The broad impact of items (i)-(iii) lies in the demonstrated predictive ability of a more comprehensive search strategy than what is currently used—use HT calculations as the preliminary broad screening followed by unbiased GSGO of the final candidates.
Development and Validation of a Computational Model for Androgen Receptor Activity
Testing thousands of chemicals to identify potential androgen receptor (AR) agonists or antagonists would cost millions of dollars and take decades to complete using current validated methods. High-throughput in vitro screening (HTS) and computational toxicology approaches can mo...
ACToR Chemical Structure processing using Open Source ...
ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d
Valdés, Julio J; Bonham-Carter, Graeme
2006-03-01
A computational intelligence approach is used to explore the problem of detecting internal state changes in time dependent processes; described by heterogeneous, multivariate time series with imprecise data and missing values. Such processes are approximated by collections of time dependent non-linear autoregressive models represented by a special kind of neuro-fuzzy neural network. Grid and high throughput computing model mining procedures based on neuro-fuzzy networks and genetic algorithms, generate: (i) collections of models composed of sets of time lag terms from the time series, and (ii) prediction functions represented by neuro-fuzzy networks. The composition of the models and their prediction capabilities, allows the identification of changes in the internal structure of the process. These changes are associated with the alternation of steady and transient states, zones with abnormal behavior, instability, and other situations. This approach is general, and its sensitivity for detecting subtle changes of state is revealed by simulation experiments. Its potential in the study of complex processes in earth sciences and astrophysics is illustrated with applications using paleoclimate and solar data.
Entropy as a Gene-Like Performance Indicator Promoting Thermoelectric Materials.
Liu, Ruiheng; Chen, Hongyi; Zhao, Kunpeng; Qin, Yuting; Jiang, Binbin; Zhang, Tiansong; Sha, Gang; Shi, Xun; Uher, Ctirad; Zhang, Wenqing; Chen, Lidong
2017-10-01
High-throughput explorations of novel thermoelectric materials based on the Materials Genome Initiative paradigm only focus on digging into the structure-property space using nonglobal indicators to design materials with tunable electrical and thermal transport properties. As the genomic units, following the biogene tradition, such indicators include localized crystal structural blocks in real space or band degeneracy at certain points in reciprocal space. However, this nonglobal approach does not consider how real materials differentiate from others. Here, this study successfully develops a strategy of using entropy as the global gene-like performance indicator that shows how multicomponent thermoelectric materials with high entropy can be designed via a high-throughput screening method. Optimizing entropy works as an effective guide to greatly improve the thermoelectric performance through either a significantly depressed lattice thermal conductivity down to its theoretical minimum value and/or via enhancing the crystal structure symmetry to yield large Seebeck coefficients. The entropy engineering using multicomponent crystal structures or other possible techniques provides a new avenue for an improvement of the thermoelectric performance beyond the current methods and approaches. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
High-Throughput Synthesis and Structure of Zeolite ZSM-43 with Two-Directional 8-Ring Channels.
Willhammar, Tom; Su, Jie; Yun, Yifeng; Zou, Xiaodong; Afeworki, Mobae; Weston, Simon C; Vroman, Hilda B; Lonergan, William W; Strohmaier, Karl G
2017-08-07
The aluminosilicate zeolite ZSM-43 (where ZSM = Zeolite Socony Mobil) was first synthesized more than 3 decades ago, but its chemical structure remained unsolved because of its poor crystallinity and small crystal size. Here we present optimization of the ZSM-43 synthesis using a high-throughput approach and subsequent structure determination by the combination of electron crystallographic methods and powder X-ray diffraction. The synthesis required the use of a combination of both inorganic (Cs + and K + ) and organic (choline) structure-directing agents. High-throughput synthesis enabled a screening of the synthesis conditions, which made it possible to optimize the synthesis, despite its complexity, in order to obtain a material with significantly improved crystallinity. When both rotation electron diffraction and high-resolution transmission electron microscopy imaging techniques are applied, the structure of ZSM-43 could be determined. The structure of ZSM-43 is a new zeolite framework type and possesses a unique two-dimensional channel system limited by 8-ring channels. ZSM-43 is stable upon calcination, and sorption measurements show that the material is suitable for adsorption of carbon dioxide as well as methane.
Controlling high-throughput manufacturing at the nano-scale
NASA Astrophysics Data System (ADS)
Cooper, Khershed P.
2013-09-01
Interest in nano-scale manufacturing research and development is growing. The reason is to accelerate the translation of discoveries and inventions of nanoscience and nanotechnology into products that would benefit industry, economy and society. Ongoing research in nanomanufacturing is focused primarily on developing novel nanofabrication techniques for a variety of applications—materials, energy, electronics, photonics, biomedical, etc. Our goal is to foster the development of high-throughput methods of fabricating nano-enabled products. Large-area parallel processing and highspeed continuous processing are high-throughput means for mass production. An example of large-area processing is step-and-repeat nanoimprinting, by which nanostructures are reproduced again and again over a large area, such as a 12 in wafer. Roll-to-roll processing is an example of continuous processing, by which it is possible to print and imprint multi-level nanostructures and nanodevices on a moving flexible substrate. The big pay-off is high-volume production and low unit cost. However, the anticipated cost benefits can only be realized if the increased production rate is accompanied by high yields of high quality products. To ensure product quality, we need to design and construct manufacturing systems such that the processes can be closely monitored and controlled. One approach is to bring cyber-physical systems (CPS) concepts to nanomanufacturing. CPS involves the control of a physical system such as manufacturing through modeling, computation, communication and control. Such a closely coupled system will involve in-situ metrology and closed-loop control of the physical processes guided by physics-based models and driven by appropriate instrumentation, sensing and actuation. This paper will discuss these ideas in the context of controlling high-throughput manufacturing at the nano-scale.
Simplified Models for Accelerated Structural Prediction of Conjugated Semiconducting Polymers
Henry, Michael M.; Jones, Matthew L.; Oosterhout, Stefan D.; ...
2017-11-08
We perform molecular dynamics simulations of poly(benzodithiophene-thienopyrrolodione) (BDT-TPD) oligomers in order to evaluate the accuracy with which unoptimized molecular models can predict experimentally characterized morphologies. The predicted morphologies are characterized using simulated grazing-incidence X-ray scattering (GIXS) and compared to the experimental scattering patterns. We find that approximating the aromatic rings in BDT-TPD with rigid bodies, rather than combinations of bond, angle, and dihedral constraints, results in 14% lower computational cost and provides nearly equivalent structural predictions compared to the flexible model case. The predicted glass transition temperature of BDT-TPD (410 +/- 32 K) is found to be in agreement withmore » experiments. Predicted morphologies demonstrate short-range structural order due to stacking of the chain backbones (p-p stacking around 3.9 A), and long-range spatial correlations due to the self-organization of backbone stacks into 'ribbons' (lamellar ordering around 20.9 A), representing the best-to-date computational predictions of structure of complex conjugated oligomers. We find that expensive simulated annealing schedules are not needed to predict experimental structures here, with instantaneous quenches providing nearly equivalent predictions at a fraction of the computational cost of annealing. We therefore suggest utilizing rigid bodies and fast cooling schedules for high-throughput screening studies of semiflexible polymers and oligomers to utilize their significant computational benefits where appropriate.« less
Simplified Models for Accelerated Structural Prediction of Conjugated Semiconducting Polymers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Henry, Michael M.; Jones, Matthew L.; Oosterhout, Stefan D.
We perform molecular dynamics simulations of poly(benzodithiophene-thienopyrrolodione) (BDT-TPD) oligomers in order to evaluate the accuracy with which unoptimized molecular models can predict experimentally characterized morphologies. The predicted morphologies are characterized using simulated grazing-incidence X-ray scattering (GIXS) and compared to the experimental scattering patterns. We find that approximating the aromatic rings in BDT-TPD with rigid bodies, rather than combinations of bond, angle, and dihedral constraints, results in 14% lower computational cost and provides nearly equivalent structural predictions compared to the flexible model case. The predicted glass transition temperature of BDT-TPD (410 +/- 32 K) is found to be in agreement withmore » experiments. Predicted morphologies demonstrate short-range structural order due to stacking of the chain backbones (p-p stacking around 3.9 A), and long-range spatial correlations due to the self-organization of backbone stacks into 'ribbons' (lamellar ordering around 20.9 A), representing the best-to-date computational predictions of structure of complex conjugated oligomers. We find that expensive simulated annealing schedules are not needed to predict experimental structures here, with instantaneous quenches providing nearly equivalent predictions at a fraction of the computational cost of annealing. We therefore suggest utilizing rigid bodies and fast cooling schedules for high-throughput screening studies of semiflexible polymers and oligomers to utilize their significant computational benefits where appropriate.« less
REDItools: high-throughput RNA editing detection made easy.
Picardi, Ernesto; Pesole, Graziano
2013-07-15
The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. ernesto.picardi@uniba.it or graziano.pesole@uniba.it Supplementary data are available at Bioinformatics online.
Prediction of Chemical Function: Model Development and ...
The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (HT) screening-level exposures developed under ExpoCast can be combined with HT screening (HTS) bioactivity data for the risk-based prioritization of chemicals for further evaluation. The functional role (e.g. solvent, plasticizer, fragrance) that a chemical performs can drive both the types of products in which it is found and the concentration in which it is present and therefore impacting exposure potential. However, critical chemical use information (including functional role) is lacking for the majority of commercial chemicals for which exposure estimates are needed. A suite of machine-learning based models for classifying chemicals in terms of their likely functional roles in products based on structure were developed. This effort required collection, curation, and harmonization of publically-available data sources of chemical functional use information from government and industry bodies. Physicochemical and structure descriptor data were generated for chemicals with function data. Machine-learning classifier models for function were then built in a cross-validated manner from the descriptor/function data using the method of random forests. The models were applied to: 1) predict chemi
CAMAC throughput of a new RISC-based data acquisition computer at the DIII-D tokamak
NASA Astrophysics Data System (ADS)
Vanderlaan, J. F.; Cummings, J. W.
1993-10-01
The amount of experimental data acquired per plasma discharge at DIII-D has continued to grow. The largest shot size in May 1991 was 49 Mbyte; in May 1992, 66 Mbyte; and in April 1993, 80 Mbyte. The increasing load has prompted the installation of a new Motorola 88100-based MODCOMP computer to supplement the existing core of three older MODCOMP data acquisition CPU's. New Kinetic Systems CAMAC serial highway driver hardware runs on the 88100 VME bus. The new operating system is MODCOMP REAL/IX version of AT&T System V UNIX with real-time extensions and networking capabilities; future plans call for installation of additional computers of this type for tokamak and neutral beam control functions. Experiences with the CAMAC hardware and software will be chronicled, including observation of data throughput. The Enhanced Serial Highway crate controller is advertised as twice as fast as the previous crate controller, and computer I/O speeds are expected to also increase data rates.
Bechill, John; Zhong, Rong; Zhang, Chen; Solomaha, Elena
2016-01-01
p53 function is frequently inhibited in cancer either through mutations or by increased degradation via MDM2 and/or E6AP E3-ubiquitin ligases. Most agents that restore p53 expression act by binding MDM2 or E6AP to prevent p53 degradation. However, fewer compounds directly bind to and activate p53. Here, we identified compounds that shared a core structure that bound p53, caused nuclear localization of p53 and caused cell death. To identify these compounds, we developed a novel cell-based screen to redirect p53 degradation to the Skip-Cullin-F-box (SCF) ubiquitin ligase complex in cells expressing high levels of p53. In a multiplexed assay, we coupled p53 targeted degradation with Rb1 targeted degradation in order to identify compounds that prevented p53 degradation while not inhibiting degradation through the SCF complex or other proteolytic machinery. High-throughput screening identified several leads that shared a common 2-[(E)-2-phenylvinyl]-8-quinolinol core structure that stabilized p53. Surface plasmon resonance analysis indicated that these compounds bound p53 with a KD of 200 ± 52 nM. Furthermore, these compounds increased p53 nuclear localization and transcription of the p53 target genes PUMA, BAX, p21 and FAS in cancer cells. Although p53-null cells had a 2.5±0.5-fold greater viability compared to p53 wild type cells after treatment with core compounds, loss of p53 did not completely rescue cell viability suggesting that compounds may target both p53-dependent and p53-independent pathways to inhibit cell proliferation. Thus, we present a novel, cell-based high-throughput screen to identify a 2-[(E)-2-phenylvinyl]-8-quinolinol core structure that bound to p53 and increased p53 activity in cancer cells. These compounds may serve as anti-neoplastic agents in part by targeting p53 as well as other potential pathways. PMID:27124407
An image analysis toolbox for high-throughput C. elegans assays
Wählby, Carolina; Kamentsky, Lee; Liu, Zihan H.; Riklin-Raviv, Tammy; Conery, Annie L.; O’Rourke, Eyleen J.; Sokolnicki, Katherine L.; Visvikis, Orane; Ljosa, Vebjorn; Irazoqui, Javier E.; Golland, Polina; Ruvkun, Gary; Ausubel, Frederick M.; Carpenter, Anne E.
2012-01-01
We present a toolbox for high-throughput screening of image-based Caenorhabditis elegans phenotypes. The image analysis algorithms measure morphological phenotypes in individual worms and are effective for a variety of assays and imaging systems. This WormToolbox is available via the open-source CellProfiler project and enables objective scoring of whole-animal high-throughput image-based assays of C. elegans for the study of diverse biological pathways relevant to human disease. PMID:22522656
High-throughput, image-based screening of pooled genetic variant libraries
Emanuel, George; Moffitt, Jeffrey R.; Zhuang, Xiaowei
2018-01-01
Image-based, high-throughput screening of genetic perturbations will advance both biology and biotechnology. We report a high-throughput screening method that allows diverse genotypes and corresponding phenotypes to be imaged in numerous individual cells. We achieve genotyping by introducing barcoded genetic variants into cells and using massively multiplexed FISH to measure the barcodes. We demonstrated this method by screening mutants of the fluorescent protein YFAST, yielding brighter and more photostable YFAST variants. PMID:29083401
Gozalbes, Rafael; Carbajo, Rodrigo J; Pineda-Lucena, Antonio
2010-01-01
In the last decade, fragment-based drug discovery (FBDD) has evolved from a novel approach in the search of new hits to a valuable alternative to the high-throughput screening (HTS) campaigns of many pharmaceutical companies. The increasing relevance of FBDD in the drug discovery universe has been concomitant with an implementation of the biophysical techniques used for the detection of weak inhibitors, e.g. NMR, X-ray crystallography or surface plasmon resonance (SPR). At the same time, computational approaches have also been progressively incorporated into the FBDD process and nowadays several computational tools are available. These stretch from the filtering of huge chemical databases in order to build fragment-focused libraries comprising compounds with adequate physicochemical properties, to more evolved models based on different in silico methods such as docking, pharmacophore modelling, QSAR and virtual screening. In this paper we will review the parallel evolution and complementarities of biophysical techniques and computational methods, providing some representative examples of drug discovery success stories by using FBDD.
TERRA REF: Advancing phenomics with high resolution, open access sensor and genomics data
NASA Astrophysics Data System (ADS)
LeBauer, D.; Kooper, R.; Burnette, M.; Willis, C.
2017-12-01
Automated plant measurement has the potential to improve understanding of genetic and environmental controls on plant traits (phenotypes). The application of sensors and software in the automation of high throughput phenotyping reflects a fundamental shift from labor intensive hand measurements to drone, tractor, and robot mounted sensing platforms. These tools are expected to speed the rate of crop improvement by enabling plant breeders to more accurately select plants with improved yields, resource use efficiency, and stress tolerance. However, there are many challenges facing high throughput phenomics: sensors and platforms are expensive, currently there are few standard methods of data collection and storage, and the analysis of large data sets requires high performance computers and automated, reproducible computing pipelines. To overcome these obstacles and advance the science of high throughput phenomics, the TERRA Phenotyping Reference Platform (TERRA-REF) team is developing an open-access database of high resolution sensor data. TERRA REF is an integrated field and greenhouse phenotyping system that includes: a reference field scanner with fifteen sensors that can generate terrabytes of data each day at mm resolution; UAV, tractor, and fixed field sensing platforms; and an automated controlled-environment scanner. These platforms will enable investigation of diverse sensing modalities, and the investigation of traits under controlled and field environments. It is the goal of TERRA REF to lower the barrier to entry for academic and industry researchers by providing high-resolution data, open source software, and online computing resources. Our project is unique in that all data will be made fully public in November 2018, and is already available to early adopters through the beta-user program. We will describe the datasets and how to use them as well as the databases and computing pipeline and how these can be reused and remixed in other phenomics pipelines. Finally, we will describe the National Data Service workbench, a cloud computing platform that can access the petabyte scale data while supporting reproducible research.
An alpha-numeric code for representing N-linked glycan structures in secreted glycoproteins.
Yusufi, Faraaz Noor Khan; Park, Wonjun; Lee, May May; Lee, Dong-Yup
2009-01-01
Advances in high-throughput techniques have led to the creation of increasing amounts of glycome data. The storage and analysis of this data would benefit greatly from a compact notation for describing glycan structures that can be easily stored and interpreted by computers. Towards this end, we propose a fixed-length alpha-numeric code for representing N-linked glycan structures commonly found in secreted glycoproteins from mammalian cell cultures. This code, GlycoDigit, employs a pre-assigned alpha-numeric index to represent the monosaccharides attached in different branches to the core glycan structure. The present branch-centric representation allows us to visualize the structure while the numerical nature of the code makes it machine readable. In addition, a difference operator can be defined to quantitatively differentiate between glycan structures for further analysis. The usefulness and applicability of GlycoDigit were demonstrated by constructing and visualizing an N-linked glycosylation network.
Higher Throughput Calorimetry: Opportunities, Approaches and Challenges
Recht, Michael I.; Coyle, Joseph E.; Bruce, Richard H.
2010-01-01
Higher throughput thermodynamic measurements can provide value in structure-based drug discovery during fragment screening, hit validation, and lead optimization. Enthalpy can be used to detect and characterize ligand binding, and changes that affect the interaction of protein and ligand can sometimes be detected more readily from changes in the enthalpy of binding than from the corresponding free-energy changes or from protein-ligand structures. Newer, higher throughput calorimeters are being incorporated into the drug discovery process. Improvements in titration calorimeters come from extensions of a mature technology and face limitations in scaling. Conversely, array calorimetry, an emerging technology, shows promise for substantial improvements in throughput and material utilization, but improved sensitivity is needed. PMID:20888754
Guo, Chuangxing; Linton, Angelica; Jalaie, Mehran; Kephart, Susan; Ornelas, Martha; Pairish, Mason; Greasley, Samantha; Richardson, Paul; Maegley, Karen; Hickey, Michael; Li, John; Wu, Xin; Ji, Xiaodong; Xie, Zhi
2013-06-01
The M2 isoform of pyruvate kinase is an emerging target for antitumor therapy. In this letter, we describe the discovery of 2-((1H-benzo[d]imidazol-1-yl)methyl)-4H-pyrido[1,2-a]pyrimidin-4-ones as potent and selective PKM2 activators which were found to have a novel binding mode. The original lead identified from high throughput screening was optimized into an efficient series via computer-aided structure-based drug design. Both a representative compound from this series and an activator described in the literature were used as molecular tools to probe the biological effects of PKM2 activation on cancer cells. Our results suggested that PKM2 activation alone is not sufficient to alter cancer cell metabolism. Copyright © 2013 Elsevier Ltd. All rights reserved.
CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets
Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.
2017-01-01
High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787
Microfluidics for cell-based high throughput screening platforms - A review.
Du, Guansheng; Fang, Qun; den Toonder, Jaap M J
2016-01-15
In the last decades, the basic techniques of microfluidics for the study of cells such as cell culture, cell separation, and cell lysis, have been well developed. Based on cell handling techniques, microfluidics has been widely applied in the field of PCR (Polymerase Chain Reaction), immunoassays, organ-on-chip, stem cell research, and analysis and identification of circulating tumor cells. As a major step in drug discovery, high-throughput screening allows rapid analysis of thousands of chemical, biochemical, genetic or pharmacological tests in parallel. In this review, we summarize the application of microfluidics in cell-based high throughput screening. The screening methods mentioned in this paper include approaches using the perfusion flow mode, the droplet mode, and the microarray mode. We also discuss the future development of microfluidic based high throughput screening platform for drug discovery. Copyright © 2015 Elsevier B.V. All rights reserved.
Evaluation of FPGA to PC feedback loop
NASA Astrophysics Data System (ADS)
Linczuk, Pawel; Zabolotny, Wojciech M.; Wojenski, Andrzej; Krawczyk, Rafal D.; Pozniak, Krzysztof T.; Chernyshova, Maryna; Czarski, Tomasz; Gaska, Michal; Kasprowicz, Grzegorz; Kowalska-Strzeciwilk, Ewa; Malinowski, Karol
2017-08-01
The paper presents the evaluation study of the performance of the data transmission subsystem which can be used in High Energy Physics (HEP) and other High-Performance Computing (HPC) systems. The test environment consisted of Xilinx Artix-7 FPGA and server-grade PC connected via the PCIe 4xGen2 bus. The DMA engine was based on the Xilinx DMA for PCI Express Subsystem1 controlled by the modified Xilinx XDMA kernel driver.2 The research is focused on the influence of the system configuration on achievable throughput and latency of data transfer.
High-Throughput Computing on High-Performance Platforms: A Case Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oleynik, D; Panitkin, S; Matteo, Turilli
The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan -- a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i)more » a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.« less
Targeted post-mortem computed tomography cardiac angiography: proof of concept.
Saunders, Sarah L; Morgan, Bruno; Raj, Vimal; Robinson, Claire E; Rutty, Guy N
2011-07-01
With the increasing use and availability of multi-detector computed tomography and magnetic resonance imaging in autopsy practice, there has been an international push towards the development of the so-called near virtual autopsy. However, currently, a significant obstacle to the consideration as to whether or not near virtual autopsies could one day replace the conventional invasive autopsy is the failure of post-mortem imaging to yield detailed information concerning the coronary arteries. To date, a cost-effective, practical solution to allow high throughput imaging has not been presented within the forensic literature. We present a proof of concept paper describing a simple, quick, cost-effective, manual, targeted in situ post-mortem cardiac angiography method using a minimally invasive approach, to be used with multi-detector computed tomography for high throughput cadaveric imaging which can be used in permanent or temporary mortuaries.
Shim, Jihyun; Mackerell, Alexander D
2011-05-01
A significant number of drug discovery efforts are based on natural products or high throughput screens from which compounds showing potential therapeutic effects are identified without knowledge of the target molecule or its 3D structure. In such cases computational ligand-based drug design (LBDD) can accelerate the drug discovery processes. LBDD is a general approach to elucidate the relationship of a compound's structure and physicochemical attributes to its biological activity. The resulting structure-activity relationship (SAR) may then act as the basis for the prediction of compounds with improved biological attributes. LBDD methods range from pharmacophore models identifying essential features of ligands responsible for their activity, quantitative structure-activity relationships (QSAR) yielding quantitative estimates of activities based on physiochemical properties, and to similarity searching, which explores compounds with similar properties as well as various combinations of the above. A number of recent LBDD approaches involve the use of multiple conformations of the ligands being studied. One of the basic components to generate multiple conformations in LBDD is molecular mechanics (MM), which apply an empirical energy function to relate conformation to energies and forces. The collection of conformations for ligands is then combined with functional data using methods ranging from regression analysis to neural networks, from which the SAR is determined. Accordingly, for effective application of LBDD for SAR determinations it is important that the compounds be accurately modelled such that the appropriate range of conformations accessible to the ligands is identified. Such accurate modelling is largely based on use of the appropriate empirical force field for the molecules being investigated and the approaches used to generate the conformations. The present chapter includes a brief overview of currently used SAR methods in LBDD followed by a more detailed presentation of issues and limitations associated with empirical energy functions and conformational sampling methods.
High Throughput Plasma Water Treatment
NASA Astrophysics Data System (ADS)
Mujovic, Selman; Foster, John
2016-10-01
The troublesome emergence of new classes of micro-pollutants, such as pharmaceuticals and endocrine disruptors, poses challenges for conventional water treatment systems. In an effort to address these contaminants and to support water reuse in drought stricken regions, new technologies must be introduced. The interaction of water with plasma rapidly mineralizes organics by inducing advanced oxidation in addition to other chemical, physical and radiative processes. The primary barrier to the implementation of plasma-based water treatment is process volume scale up. In this work, we investigate a potentially scalable, high throughput plasma water reactor that utilizes a packed bed dielectric barrier-like geometry to maximize the plasma-water interface. Here, the water serves as the dielectric medium. High-speed imaging and emission spectroscopy are used to characterize the reactor discharges. Changes in methylene blue concentration and basic water parameters are mapped as a function of plasma treatment time. Experimental results are compared to electrostatic and plasma chemistry computations, which will provide insight into the reactor's operation so that efficiency can be assessed. Supported by NSF (CBET 1336375).
Computational strategies to address chromatin structure problems
NASA Astrophysics Data System (ADS)
Perišić, Ognjen; Schlick, Tamar
2016-06-01
While the genetic information is contained in double helical DNA, gene expression is a complex multilevel process that involves various functional units, from nucleosomes to fully formed chromatin fibers accompanied by a host of various chromatin binding enzymes. The chromatin fiber is a polymer composed of histone protein complexes upon which DNA wraps, like yarn upon many spools. The nature of chromatin structure has been an open question since the beginning of modern molecular biology. Many experiments have shown that the chromatin fiber is a highly dynamic entity with pronounced structural diversity that includes properties of idealized zig-zag and solenoid models, as well as other motifs. This diversity can produce a high packing ratio and thus inhibit access to a majority of the wound DNA. Despite much research, chromatin’s dynamic structure has not yet been fully described. Long stretches of chromatin fibers exhibit puzzling dynamic behavior that requires interpretation in the light of gene expression patterns in various tissue and organisms. The properties of chromatin fiber can be investigated with experimental techniques, like in vitro biochemistry, in vivo imagining, and high-throughput chromosome capture technology. Those techniques provide useful insights into the fiber’s structure and dynamics, but they are limited in resolution and scope, especially regarding compact fibers and chromosomes in the cellular milieu. Complementary but specialized modeling techniques are needed to handle large floppy polymers such as the chromatin fiber. In this review, we discuss current approaches in the chromatin structure field with an emphasis on modeling, such as molecular dynamics and coarse-grained computational approaches. Combinations of these computational techniques complement experiments and address many relevant biological problems, as we will illustrate with special focus on epigenetic modulation of chromatin structure.
Cheminformatics approaches and structure-based rules are being used to evaluate and explore the ToxCast chemical landscape and associated high-throughput screening (HTS) data. We have shown that the library provides comprehensive coverage of the knowledge domains and target inven...
Read-across is a technique used to fill data gaps within chemical safety assessments. It is based on the premise that chemicals with similar structures are likely to have similar biological activities. Known information on the property of a chemical (source) is used to make a pre...
Zheng, Wei; Padia, Janak; Urban, Daniel J.; Jadhav, Ajit; Goker-Alpan, Ozlem; Simeonov, Anton; Goldin, Ehud; Auld, Douglas; LaMarca, Mary E.; Inglese, James; Austin, Christopher P.; Sidransky, Ellen
2007-01-01
Gaucher disease is an autosomal recessive lysosomal storage disorder caused by mutations in the glucocerebrosidase gene. Missense mutations result in reduced enzyme activity that may be due to misfolding, raising the possibility of small-molecule chaperone correction of the defect. Screening large compound libraries by quantitative high-throughput screening (qHTS) provides comprehensive information on the potency, efficacy, and structure–activity relationships (SAR) of active compounds directly from the primary screen, facilitating identification of leads for medicinal chemistry optimization. We used qHTS to rapidly identify three structural series of potent, selective, nonsugar glucocerebrosidase inhibitors. The three structural classes had excellent potencies and efficacies and, importantly, high selectivity against closely related hydrolases. Preliminary SAR data were used to select compounds with high activity in both enzyme and cell-based assays. Compounds from two of these structural series increased N370S mutant glucocerebrosidase activity by 40–90% in patient cell lines and enhanced lysosomal colocalization, indicating chaperone activity. These small molecules have potential as leads for chaperone therapy for Gaucher disease, and this paradigm promises to accelerate the development of leads for other rare genetic disorders. PMID:17670938
A high throughput geocomputing system for remote sensing quantitative retrieval and a case study
NASA Astrophysics Data System (ADS)
Xue, Yong; Chen, Ziqiang; Xu, Hui; Ai, Jianwen; Jiang, Shuzheng; Li, Yingjie; Wang, Ying; Guang, Jie; Mei, Linlu; Jiao, Xijuan; He, Xingwei; Hou, Tingting
2011-12-01
The quality and accuracy of remote sensing instruments have been improved significantly, however, rapid processing of large-scale remote sensing data becomes the bottleneck for remote sensing quantitative retrieval applications. The remote sensing quantitative retrieval is a data-intensive computation application, which is one of the research issues of high throughput computation. The remote sensing quantitative retrieval Grid workflow is a high-level core component of remote sensing Grid, which is used to support the modeling, reconstruction and implementation of large-scale complex applications of remote sensing science. In this paper, we intend to study middleware components of the remote sensing Grid - the dynamic Grid workflow based on the remote sensing quantitative retrieval application on Grid platform. We designed a novel architecture for the remote sensing Grid workflow. According to this architecture, we constructed the Remote Sensing Information Service Grid Node (RSSN) with Condor. We developed a graphic user interface (GUI) tools to compose remote sensing processing Grid workflows, and took the aerosol optical depth (AOD) retrieval as an example. The case study showed that significant improvement in the system performance could be achieved with this implementation. The results also give a perspective on the potential of applying Grid workflow practices to remote sensing quantitative retrieval problems using commodity class PCs.
Hedvat, Michael; Emdad, Luni; Das, Swadesh K; Kim, Keetae; Dasgupta, Santanu; Thomas, Shibu; Hu, Bin; Zhu, Shan; Dash, Rupesh; Quinn, Bridget A; Oyesanya, Regina A; Kegelman, Timothy P; Sokhi, Upneet K; Sarkar, Siddik; Erdogan, Eda; Menezes, Mitchell E; Bhoopathi, Praveen; Wang, Xiang-Yang; Pomper, Martin G; Wei, Jun; Wu, Bainan; Stebbins, John L; Diaz, Paul W; Reed, John C; Pellecchia, Maurizio; Sarkar, Devanand; Fisher, Paul B
2012-11-01
Structure-based modeling combined with rational drug design, and high throughput screening approaches offer significant potential for identifying and developing lead compounds with therapeutic potential. The present review focuses on these two approaches using explicit examples based on specific derivatives of Gossypol generated through rational design and applications of a cancer-specificpromoter derived from Progression Elevated Gene-3. The Gossypol derivative Sabutoclax (BI-97C1) displays potent anti-tumor activity against a diverse spectrum of human tumors. The model of the docked structure of Gossypol bound to Bcl-XL provided a virtual structure-activity-relationship where appropriate modifications were predicted on a rational basis. These structure-based studies led to the isolation of Sabutoclax, an optically pure isomer of Apogossypol displaying superior efficacy and reduced toxicity. These studies illustrate the power of combining structure-based modeling with rational design to predict appropriate derivatives of lead compounds to be empirically tested and evaluated for bioactivity. Another approach to cancer drug discovery utilizes a cancer-specific promoter as readouts of the transformed state. The promoter region of Progression Elevated Gene-3 is such a promoter with cancer-specific activity. The specificity of this promoter has been exploited as a means of constructing cancer terminator viruses that selectively kill cancer cells and as a systemic imaging modality that specifically visualizes in vivo cancer growth with no background from normal tissues. Screening of small molecule inhibitors that suppress the Progression Elevated Gene-3-promoter may provide relevant lead compounds for cancer therapy that can be combined with further structure-based approaches leading to the development of novel compounds for cancer therapy.
Baldi, Pierre
2011-12-27
A response is presented to sentiments expressed in "Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response from The Cambridge Crystallographic Data Centre", recently published in the Journal of Chemical Information and Modeling, (1) which may give readers a misleading impression regarding significant impediments to scientific research posed by the CCDC.
Ozyurt, A Sinem; Selby, Thomas L
2008-07-01
This study describes a method to computationally assess the function of homologous enzymes through small molecule binding interaction energy. Three experimentally determined X-ray structures and four enzyme models from ornithine cyclo-deaminase, alanine dehydrogenase, and mu-crystallin were used in combination with nine small molecules to derive a function score (FS) for each enzyme-model combination. While energy values varied for a single molecule-enzyme combination due to differences in the active sites, we observe that the binding energies for the entire pathway were proportional for each set of small molecules investigated. This proportionality of energies for a reaction pathway appears to be dependent on the amino acids in the active site and their direct interactions with the small molecules, which allows a function score (FS) to be calculated to assess the specificity of each enzyme. Potential of mean force (PMF) calculations were used to obtain the energies, and the resulting FS values demonstrate that a measurement of function may be obtained using differences between these PMF values. Additionally, limitations of this method are discussed based on: (a) larger substrates with significant conformational flexibility; (b) low homology enzymes; and (c) open active sites. This method should be useful in accurately predicting specificity for single enzymes that have multiple steps in their reactions and in high throughput computational methods to accurately annotate uncharacterized proteins based on active site interaction analysis. 2008 Wiley-Liss, Inc.