Sample records for parallel processing techniques

  1. Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

    NASA Astrophysics Data System (ADS)

    Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

    2017-07-01

    Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).

  2. Applying Parallel Processing Techniques to Tether Dynamics Simulation

    NASA Technical Reports Server (NTRS)

    Wells, B. Earl

    1996-01-01

    The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.

  3. Parallel computing in genomic research: advances and applications

    PubMed Central

    Ocaña, Kary; de Oliveira, Daniel

    2015-01-01

    Today’s genomic experiments have to process the so-called “biological big data” that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. PMID:26604801

  4. Parallel computing in genomic research: advances and applications.

    PubMed

    Ocaña, Kary; de Oliveira, Daniel

    2015-01-01

    Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.

  5. Parallel pivoting combined with parallel reduction

    NASA Technical Reports Server (NTRS)

    Alaghband, Gita

    1987-01-01

    Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.

  6. The Goddard Space Flight Center Program to develop parallel image processing systems

    NASA Technical Reports Server (NTRS)

    Schaefer, D. H.

    1972-01-01

    Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.

  7. Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

    USDA-ARS?s Scientific Manuscript database

    This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...

  8. Process optimization using combinatorial design principles: parallel synthesis and design of experiment methods.

    PubMed

    Gooding, Owen W

    2004-06-01

    The use of parallel synthesis techniques with statistical design of experiment (DoE) methods is a powerful combination for the optimization of chemical processes. Advances in parallel synthesis equipment and easy to use software for statistical DoE have fueled a growing acceptance of these techniques in the pharmaceutical industry. As drug candidate structures become more complex at the same time that development timelines are compressed, these enabling technologies promise to become more important in the future.

  9. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  10. Parallel workflow tools to facilitate human brain MRI post-processing

    PubMed Central

    Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang

    2015-01-01

    Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043

  11. a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

    NASA Astrophysics Data System (ADS)

    Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

    2015-07-01

    Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.

  12. Parallel processing spacecraft communication system

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

    1998-01-01

    An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.

  13. Parallel processing in finite element structural analysis

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K.

    1987-01-01

    A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).

  14. Cooperative storage of shared files in a parallel computing system with dynamic block size

    DOEpatents

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2015-11-10

    Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).

  15. Parallelization and implementation of approximate root isolation for nonlinear system by Monte Carlo

    NASA Astrophysics Data System (ADS)

    Khosravi, Ebrahim

    1998-12-01

    This dissertation solves a fundamental problem of isolating the real roots of nonlinear systems of equations by Monte-Carlo that were published by Bush Jones. This algorithm requires only function values and can be applied readily to complicated systems of transcendental functions. The implementation of this sequential algorithm provides scientists with the means to utilize function analysis in mathematics or other fields of science. The algorithm, however, is so computationally intensive that the system is limited to a very small set of variables, and this will make it unfeasible for large systems of equations. Also a computational technique was needed for investigating a metrology of preventing the algorithm structure from converging to the same root along different paths of computation. The research provides techniques for improving the efficiency and correctness of the algorithm. The sequential algorithm for this technique was corrected and a parallel algorithm is presented. This parallel method has been formally analyzed and is compared with other known methods of root isolation. The effectiveness, efficiency, enhanced overall performance of the parallel processing of the program in comparison to sequential processing is discussed. The message passing model was used for this parallel processing, and it is presented and implemented on Intel/860 MIMD architecture. The parallel processing proposed in this research has been implemented in an ongoing high energy physics experiment: this algorithm has been used to track neutrinoes in a super K detector. This experiment is located in Japan, and data can be processed on-line or off-line locally or remotely.

  16. Parallel processing approach to transform-based image coding

    NASA Astrophysics Data System (ADS)

    Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

    1991-06-01

    This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.

  17. Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

    NASA Astrophysics Data System (ADS)

    Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

    2018-01-01

    We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.

  18. Psychodrama: A Creative Approach for Addressing Parallel Process in Group Supervision

    ERIC Educational Resources Information Center

    Hinkle, Michelle Gimenez

    2008-01-01

    This article provides a model for using psychodrama to address issues of parallel process during group supervision. Information on how to utilize the specific concepts and techniques of psychodrama in relation to group supervision is discussed. A case vignette of the model is provided.

  19. Telemetry downlink interfaces and level-zero processing

    NASA Technical Reports Server (NTRS)

    Horan, S.; Pfeiffer, J.; Taylor, J.

    1991-01-01

    The technical areas being investigated are as follows: (1) processing of space to ground data frames; (2) parallel architecture performance studies; and (3) parallel programming techniques. Additionally, the University administrative details and the technical liaison between New Mexico State University and Goddard Space Flight Center are addressed.

  20. Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

    ERIC Educational Resources Information Center

    Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

    2013-01-01

    Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…

  1. A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Rao, Hariprasad Nannapaneni

    1989-01-01

    The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

  2. Parallel k-means++

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less

  3. Generating unstructured nuclear reactor core meshes in parallel

    DOE PAGES

    Jain, Rajeev; Tautges, Timothy J.

    2014-10-24

    Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less

  4. Real-time polarization-sensitive optical coherence tomography data processing with parallel computing

    PubMed Central

    Liu, Gangjun; Zhang, Jun; Yu, Lingfeng; Xie, Tuqiang; Chen, Zhongping

    2010-01-01

    With the increase of the A-line speed of optical coherence tomography (OCT) systems, real-time processing of acquired data has become a bottleneck. The shared-memory parallel computing technique is used to process OCT data in real time. The real-time processing power of a quad-core personal computer (PC) is analyzed. It is shown that the quad-core PC could provide real-time OCT data processing ability of more than 80K A-lines per second. A real-time, fiber-based, swept source polarization-sensitive OCT system with 20K A-line speed is demonstrated with this technique. The real-time 2D and 3D polarization-sensitive imaging of chicken muscle and pig tendon is also demonstrated. PMID:19904337

  5. Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

    NASA Astrophysics Data System (ADS)

    Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

    2006-05-01

    The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

  6. 100 Gbps Wireless System and Circuit Design Using Parallel Spread-Spectrum Sequencing

    NASA Astrophysics Data System (ADS)

    Scheytt, J. Christoph; Javed, Abdul Rehman; Bammidi, Eswara Rao; KrishneGowda, Karthik; Kallfass, Ingmar; Kraemer, Rolf

    2017-09-01

    In this article mixed analog/digital signal processing techniques based on parallel spread-spectrum sequencing (PSSS) and radio frequency (RF) carrier synchronization for ultra-broadband wireless communication are investigated on system and circuit level.

  7. Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

    PubMed Central

    Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

    2014-01-01

    Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868

  8. Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

    PubMed

    Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

    2014-07-01

    Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.

  9. Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

    Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less

  10. Perturbation Experiments: Approaches for Metabolic Pathway Analysis in Bioreactors.

    PubMed

    Weiner, Michael; Tröndle, Julia; Albermann, Christoph; Sprenger, Georg A; Weuster-Botz, Dirk

    2016-01-01

    In the last decades, targeted metabolic engineering of microbial cells has become one of the major tools in bioprocess design and optimization. For successful application, a detailed knowledge is necessary about the relevant metabolic pathways and their regulation inside the cells. Since in vitro experiments cannot display process conditions and behavior properly, process data about the cells' metabolic state have to be collected in vivo. For this purpose, special techniques and methods are necessary. Therefore, most techniques enabling in vivo characterization of metabolic pathways rely on perturbation experiments, which can be divided into dynamic and steady-state approaches. To avoid any process disturbance, approaches which enable perturbation of cell metabolism in parallel to the continuing production process are reasonable. Furthermore, the fast dynamics of microbial production processes amplifies the need of parallelized data generation. These points motivate the development of a parallelized approach for multiple metabolic perturbation experiments outside the operating production reactor. An appropriate approach for in vivo characterization of metabolic pathways is presented and applied exemplarily to a microbial L-phenylalanine production process on a 15 L-scale.

  11. Performance enhancement of various real-time image processing techniques via speculative execution

    NASA Astrophysics Data System (ADS)

    Younis, Mohamed F.; Sinha, Purnendu; Marlowe, Thomas J.; Stoyenko, Alexander D.

    1996-03-01

    In real-time image processing, an application must satisfy a set of timing constraints while ensuring the semantic correctness of the system. Because of the natural structure of digital data, pure data and task parallelism have been used extensively in real-time image processing to accelerate the handling time of image data. These types of parallelism are based on splitting the execution load performed by a single processor across multiple nodes. However, execution of all parallel threads is mandatory for correctness of the algorithm. On the other hand, speculative execution is an optimistic execution of part(s) of the program based on assumptions on program control flow or variable values. Rollback may be required if the assumptions turn out to be invalid. Speculative execution can enhance average, and sometimes worst-case, execution time. In this paper, we target various image processing techniques to investigate applicability of speculative execution. We identify opportunities for safe and profitable speculative execution in image compression, edge detection, morphological filters, and blob recognition.

  12. Domain decomposition methods in aerodynamics

    NASA Technical Reports Server (NTRS)

    Venkatakrishnan, V.; Saltz, Joel

    1990-01-01

    Compressible Euler equations are solved for two-dimensional problems by a preconditioned conjugate gradient-like technique. An approximate Riemann solver is used to compute the numerical fluxes to second order accuracy in space. Two ways to achieve parallelism are tested, one which makes use of parallelism inherent in triangular solves and the other which employs domain decomposition techniques. The vectorization/parallelism in triangular solves is realized by the use of a recording technique called wavefront ordering. This process involves the interpretation of the triangular matrix as a directed graph and the analysis of the data dependencies. It is noted that the factorization can also be done in parallel with the wave front ordering. The performances of two ways of partitioning the domain, strips and slabs, are compared. Results on Cray YMP are reported for an inviscid transonic test case. The performances of linear algebra kernels are also reported.

  13. Visual-area coding technique (VACT): optical parallel implementation of fuzzy logic and its visualization with the digital-halftoning process

    NASA Astrophysics Data System (ADS)

    Konishi, Tsuyoshi; Tanida, Jun; Ichioka, Yoshiki

    1995-06-01

    A novel technique, the visual-area coding technique (VACT), for the optical implementation of fuzzy logic with the capability of visualization of the results is presented. This technique is based on the microfont method and is considered to be an instance of digitized analog optical computing. Huge amounts of data can be processed in fuzzy logic with the VACT. In addition, real-time visualization of the processed result can be accomplished.

  14. Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Akil, Mohamed

    2017-05-01

    The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

  15. GPU computing in medical physics: a review.

    PubMed

    Pratx, Guillem; Xing, Lei

    2011-05-01

    The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.

  16. Parallel, Asynchronous Executive (PAX): System concepts, facilities, and architecture

    NASA Technical Reports Server (NTRS)

    Jones, W. H.

    1983-01-01

    The Parallel, Asynchronous Executive (PAX) is a software operating system simulation that allows many computers to work on a single problem at the same time. PAX is currently implemented on a UNIVAC 1100/42 computer system. Independent UNIVAC runstreams are used to simulate independent computers. Data are shared among independent UNIVAC runstreams through shared mass-storage files. PAX has achieved the following: (1) applied several computing processes simultaneously to a single, logically unified problem; (2) resolved most parallel processor conflicts by careful work assignment; (3) resolved by means of worker requests to PAX all conflicts not resolved by work assignment; (4) provided fault isolation and recovery mechanisms to meet the problems of an actual parallel, asynchronous processing machine. Additionally, one real-life problem has been constructed for the PAX environment. This is CASPER, a collection of aerodynamic and structural dynamic problem simulation routines. CASPER is not discussed in this report except to provide examples of parallel-processing techniques.

  17. Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

    NASA Technical Reports Server (NTRS)

    Abdi, Frank

    1996-01-01

    A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.

  18. Language Classification using N-grams Accelerated by FPGA-based Bloom Filters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacob, A; Gokhale, M

    N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.

  19. Execution models for mapping programs onto distributed memory parallel computers

    NASA Technical Reports Server (NTRS)

    Sussman, Alan

    1992-01-01

    The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.

  20. Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Chiou, Jin-Chern

    1990-01-01

    Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.

  1. Iris unwrapping using the Bresenham circle algorithm for real-time iris recognition

    NASA Astrophysics Data System (ADS)

    Carothers, Matthew T.; Ngo, Hau T.; Rakvic, Ryan N.; Broussard, Randy P.

    2015-02-01

    An efficient parallel architecture design for the iris unwrapping process in a real-time iris recognition system using the Bresenham Circle Algorithm is presented in this paper. Based on the characteristics of the model parameters this algorithm was chosen over the widely used polar conversion technique as the iris unwrapping model. The architecture design is parallelized to increase the throughput of the system and is suitable for processing an inputted image size of 320 × 240 pixels in real-time using Field Programmable Gate Array (FPGA) technology. Quartus software is used to implement, verify, and analyze the design's performance using the VHSIC Hardware Description Language. The system's predicted processing time is faster than the modern iris unwrapping technique used today∗.

  2. A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.

    PubMed

    Catarinucci, Luca; Tarricone, Luciano

    2009-01-01

    The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.

  3. Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

    PubMed

    Komarov, Ivan; D'Souza, Roshan M

    2012-01-01

    The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.

  4. A parallel architecture of interpolated timing recovery for high- speed data transfer rate and wide capture-range

    NASA Astrophysics Data System (ADS)

    Higashino, Satoru; Kobayashi, Shoei; Yamagami, Tamotsu

    2007-06-01

    High data transfer rate has been demanded for data storage devices along increasing the storage capacity. In order to increase the transfer rate, high-speed data processing techniques in read-channel devices are required. Generally, parallel architecture is utilized for the high-speed digital processing. We have developed a new architecture of Interpolated Timing Recovery (ITR) to achieve high-speed data transfer rate and wide capture-range in read-channel devices for the information storage channels. It facilitates the parallel implementation on large-scale-integration (LSI) devices.

  5. Programming Probabilistic Structural Analysis for Parallel Processing Computer

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

    1991-01-01

    The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.

  6. A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms

    PubMed Central

    Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein

    2017-01-01

    Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts’ Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2–100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms. PMID:28487831

  7. A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms.

    PubMed

    Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein

    2017-01-01

    Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts' Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2-100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms.

  8. Adventures in Parallel Processing: Entry, Descent and Landing Simulation for the Genesis and Stardust Missions

    NASA Technical Reports Server (NTRS)

    Lyons, Daniel T.; Desai, Prasun N.

    2005-01-01

    This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

  9. Parallel processing via a dual olfactory pathway in the honeybee.

    PubMed

    Brill, Martin F; Rosenbaum, Tobias; Reus, Isabelle; Kleineidam, Christoph J; Nawrot, Martin P; Rössler, Wolfgang

    2013-02-06

    In their natural environment, animals face complex and highly dynamic olfactory input. Thus vertebrates as well as invertebrates require fast and reliable processing of olfactory information. Parallel processing has been shown to improve processing speed and power in other sensory systems and is characterized by extraction of different stimulus parameters along parallel sensory information streams. Honeybees possess an elaborate olfactory system with unique neuronal architecture: a dual olfactory pathway comprising a medial projection-neuron (PN) antennal lobe (AL) protocerebral output tract (m-APT) and a lateral PN AL output tract (l-APT) connecting the olfactory lobes with higher-order brain centers. We asked whether this neuronal architecture serves parallel processing and employed a novel technique for simultaneous multiunit recordings from both tracts. The results revealed response profiles from a high number of PNs of both tracts to floral, pheromonal, and biologically relevant odor mixtures tested over multiple trials. PNs from both tracts responded to all tested odors, but with different characteristics indicating parallel processing of similar odors. Both PN tracts were activated by widely overlapping response profiles, which is a requirement for parallel processing. The l-APT PNs had broad response profiles suggesting generalized coding properties, whereas the responses of m-APT PNs were comparatively weaker and less frequent, indicating higher odor specificity. Comparison of response latencies within and across tracts revealed odor-dependent latencies. We suggest that parallel processing via the honeybee dual olfactory pathway provides enhanced odor processing capabilities serving sophisticated odor perception and olfactory demands associated with a complex olfactory world of this social insect.

  10. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2001-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  11. Relative Debugging of Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2002-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  12. Partitioning and packing mathematical simulation models for calculation on parallel computers

    NASA Technical Reports Server (NTRS)

    Arpasi, D. J.; Milner, E. J.

    1986-01-01

    The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.

  13. Basic research planning in mathematical pattern recognition and image analysis

    NASA Technical Reports Server (NTRS)

    Bryant, J.; Guseman, L. F., Jr.

    1981-01-01

    Fundamental problems encountered while attempting to develop automated techniques for applications of remote sensing are discussed under the following categories: (1) geometric and radiometric preprocessing; (2) spatial, spectral, temporal, syntactic, and ancillary digital image representation; (3) image partitioning, proportion estimation, and error models in object scene interference; (4) parallel processing and image data structures; and (5) continuing studies in polarization; computer architectures and parallel processing; and the applicability of "expert systems" to interactive analysis.

  14. Parallel Visualization Co-Processing of Overnight CFD Propulsion Applications

    NASA Technical Reports Server (NTRS)

    Edwards, David E.; Haimes, Robert

    1999-01-01

    An interactive visualization system pV3 is being developed for the investigation of advanced computational methodologies employing visualization and parallel processing for the extraction of information contained in large-scale transient engineering simulations. Visual techniques for extracting information from the data in terms of cutting planes, iso-surfaces, particle tracing and vector fields are included in this system. This paper discusses improvements to the pV3 system developed under NASA's Affordable High Performance Computing project.

  15. Nonlinear Real-Time Optical Signal Processing

    DTIC Science & Technology

    1990-09-01

    pattern recognition. Additional work concerns the relationship of parallel computation paradigms to optical computing and halftone screen techniques...paradigms to optical computing and halftone screen techniques for implementing general nonlinear functions. 3\\ 2 Research Progress This section...Vol. 23, No. 8, pp. 34-57, 1986. 2.4 Nonlinear Optical Processing with Halftones : Degradation and Compen- sation Models This paper is concerned with

  16. Graphics Processing Unit Assisted Thermographic Compositing

    NASA Technical Reports Server (NTRS)

    Ragasa, Scott; Russell, Samuel S.

    2012-01-01

    Objective Develop a software application utilizing high performance computing techniques, including general purpose graphics processing units (GPGPUs), for the analysis and visualization of large thermographic data sets. Over the past several years, an increasing effort among scientists and engineers to utilize graphics processing units (GPUs) in a more general purpose fashion is allowing for previously unobtainable levels of computation by individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU which yield significant increases in performance. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Image processing is one area were GPUs are being used to greatly increase the performance of certain analysis and visualization techniques.

  17. Density-based parallel skin lesion border detection with webCL

    PubMed Central

    2015-01-01

    Background Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Methods Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Results Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. Conclusions When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser. PMID:26423836

  18. Density-based parallel skin lesion border detection with webCL.

    PubMed

    Lemon, James; Kockara, Sinan; Halic, Tansel; Mete, Mutlu

    2015-01-01

    Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser.

  19. Segmentation of remotely sensed data using parallel region growing

    NASA Technical Reports Server (NTRS)

    Tilton, J. C.; Cox, S. C.

    1983-01-01

    The improved spatial resolution of the new earth resources satellites will increase the need for effective utilization of spatial information in machine processing of remotely sensed data. One promising technique is scene segmentation by region growing. Region growing can use spatial information in two ways: only spatially adjacent regions merge together, and merging criteria can be based on region-wide spatial features. A simple region growing approach is described in which the similarity criterion is based on region mean and variance (a simple spatial feature). An effective way to implement region growing for remote sensing is as an iterative parallel process on a large parallel processor. A straightforward parallel pixel-based implementation of the algorithm is explored and its efficiency is compared with sequential pixel-based, sequential region-based, and parallel region-based implementations. Experimental results from on aircraft scanner data set are presented, as is a discussioon of proposed improvements to the segmentation algorithm.

  20. Decomposition method for fast computation of gigapixel-sized Fresnel holograms on a graphics processing unit cluster.

    PubMed

    Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu

    2018-04-20

    A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.

  1. A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

    NASA Astrophysics Data System (ADS)

    Griffiths, M. K.; Fedun, V.; Erdélyi, R.

    2015-03-01

    Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.

  2. Real-time processing of radar return on a parallel computer

    NASA Technical Reports Server (NTRS)

    Aalfs, David D.

    1992-01-01

    NASA is working with the FAA to demonstrate the feasibility of pulse Doppler radar as a candidate airborne sensor to detect low altitude windshears. The need to provide the pilot with timely information about possible hazards has motivated a demand for real-time processing of a radar return. Investigated here is parallel processing as a means of accommodating the high data rates required. A PC based parallel computer, called the transputer, is used to investigate issues in real time concurrent processing of radar signals. A transputer network is made up of an array of single instruction stream processors that can be networked in a variety of ways. They are easily reconfigured and software development is largely independent of the particular network topology. The performance of the transputer is evaluated in light of the computational requirements. A number of algorithms have been implemented on the transputers in OCCAM, a language specially designed for parallel processing. These include signal processing algorithms such as the Fast Fourier Transform (FFT), pulse-pair, and autoregressive modelling, as well as routing software to support concurrency. The most computationally intensive task is estimating the spectrum. Two approaches have been taken on this problem, the first and most conventional of which is to use the FFT. By using table look-ups for the basis function and other optimizing techniques, an algorithm has been developed that is sufficient for real time. The other approach is to model the signal as an autoregressive process and estimate the spectrum based on the model coefficients. This technique is attractive because it does not suffer from the spectral leakage problem inherent in the FFT. Benchmark tests indicate that autoregressive modeling is feasible in real time.

  3. Successful Design Patterns in the Day-to-Day Work with Planetary Mission Data

    NASA Astrophysics Data System (ADS)

    Aye, K.-M.

    2018-04-01

    I will describe successful data storage, data access, and data processing techniques, like embarrassingly parallel processing, that I have established over the years working with large datasets in the planetary science domain; using Jupyter notebooks.

  4. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  5. A scalable parallel black oil simulator on distributed memory parallel computers

    NASA Astrophysics Data System (ADS)

    Wang, Kun; Liu, Hui; Chen, Zhangxin

    2015-11-01

    This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.

  6. Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs

    NASA Astrophysics Data System (ADS)

    Wang, D.; Zhang, J.; Wei, Y.

    2013-12-01

    As the spatial and temporal resolutions of Earth observatory data and Earth system simulation outputs are getting higher, in-situ and/or post- processing such large amount of geospatial data increasingly becomes a bottleneck in scientific inquires of Earth systems and their human impacts. Existing geospatial techniques that are based on outdated computing models (e.g., serial algorithms and disk-resident systems), as have been implemented in many commercial and open source packages, are incapable of processing large-scale geospatial data and achieve desired level of performance. In this study, we have developed a set of parallel data structures and algorithms that are capable of utilizing massively data parallel computing power available on commodity Graphics Processing Units (GPUs) for a popular geospatial technique called Zonal Statistics. Given two input datasets with one representing measurements (e.g., temperature or precipitation) and the other one represent polygonal zones (e.g., ecological or administrative zones), Zonal Statistics computes major statistics (or complete distribution histograms) of the measurements in all regions. Our technique has four steps and each step can be mapped to GPU hardware by identifying its inherent data parallelisms. First, a raster is divided into blocks and per-block histograms are derived. Second, the Minimum Bounding Boxes (MBRs) of polygons are computed and are spatially matched with raster blocks; matched polygon-block pairs are tested and blocks that are either inside or intersect with polygons are identified. Third, per-block histograms are aggregated to polygons for blocks that are completely within polygons. Finally, for blocks that intersect with polygon boundaries, all the raster cells within the blocks are examined using point-in-polygon-test and cells that are within polygons are used to update corresponding histograms. As the task becomes I/O bound after applying spatial indexing and GPU hardware acceleration, we have developed a GPU-based data compression technique by reusing our previous work on Bitplane Quadtree (or BPQ-Tree) based indexing of binary bitmaps. Results have shown that our GPU-based parallel Zonal Statistic technique on 3000+ US counties over 20+ billion NASA SRTM 30 meter resolution Digital Elevation (DEM) raster cells has achieved impressive end-to-end runtimes: 101 seconds and 46 seconds a low-end workstation equipped with a Nvidia GTX Titan GPU using cold and hot cache, respectively; and, 60-70 seconds using a single OLCF TITAN computing node and 10-15 seconds using 8 nodes. Our experiment results clearly show the potentials of using high-end computing facilities for large-scale geospatial processing.

  7. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  8. Improved preconditioned conjugate gradient algorithm and application in 3D inversion of gravity-gradiometry data

    NASA Astrophysics Data System (ADS)

    Wang, Tai-Han; Huang, Da-Nian; Ma, Guo-Qing; Meng, Zhao-Hai; Li, Ye

    2017-06-01

    With the continuous development of full tensor gradiometer (FTG) measurement techniques, three-dimensional (3D) inversion of FTG data is becoming increasingly used in oil and gas exploration. In the fast processing and interpretation of large-scale high-precision data, the use of the graphics processing unit process unit (GPU) and preconditioning methods are very important in the data inversion. In this paper, an improved preconditioned conjugate gradient algorithm is proposed by combining the symmetric successive over-relaxation (SSOR) technique and the incomplete Choleksy decomposition conjugate gradient algorithm (ICCG). Since preparing the preconditioner requires extra time, a parallel implement based on GPU is proposed. The improved method is then applied in the inversion of noisecontaminated synthetic data to prove its adaptability in the inversion of 3D FTG data. Results show that the parallel SSOR-ICCG algorithm based on NVIDIA Tesla C2050 GPU achieves a speedup of approximately 25 times that of a serial program using a 2.0 GHz Central Processing Unit (CPU). Real airborne gravity-gradiometry data from Vinton salt dome (southwest Louisiana, USA) are also considered. Good results are obtained, which verifies the efficiency and feasibility of the proposed parallel method in fast inversion of 3D FTG data.

  9. Small file aggregation in a parallel computing system

    DOEpatents

    Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Zhang, Jingwang

    2014-09-02

    Techniques are provided for small file aggregation in a parallel computing system. An exemplary method for storing a plurality of files generated by a plurality of processes in a parallel computing system comprises aggregating the plurality of files into a single aggregated file; and generating metadata for the single aggregated file. The metadata comprises an offset and a length of each of the plurality of files in the single aggregated file. The metadata can be used to unpack one or more of the files from the single aggregated file.

  10. Method and apparatus for fabrication of high gradient insulators with parallel surface conductors spaced less than one millimeter apart

    DOEpatents

    Sanders, David M.; Decker, Derek E.

    1999-01-01

    Optical patterns and lithographic techniques are used as part of a process to embed parallel and evenly spaced conductors in the non-planar surfaces of an insulator to produce high gradient insulators. The approach extends the size that high gradient insulating structures can be fabricated as well as improves the performance of those insulators by reducing the scale of the alternating parallel lines of insulator and conductor along the surface. This fabrication approach also substantially decreases the cost required to produce high gradient insulators.

  11. The Wang Landau parallel algorithm for the simple grids. Optimizing OpenMPI parallel implementation

    NASA Astrophysics Data System (ADS)

    Kussainov, A. S.

    2017-12-01

    The Wang Landau Monte Carlo algorithm to calculate density of states for the different simple spin lattices was implemented. The energy space was split between the individual threads and balanced according to the expected runtime for the individual processes. Custom spin clustering mechanism, necessary for overcoming of the critical slowdown in the certain energy subspaces, was devised. Stable reconstruction of the density of states was of primary importance. Some data post-processing techniques were involved to produce the expected smooth density of states.

  12. Plagiarism Detection for Indonesian Language using Winnowing with Parallel Processing

    NASA Astrophysics Data System (ADS)

    Arifin, Y.; Isa, S. M.; Wulandhari, L. A.; Abdurachman, E.

    2018-03-01

    The plagiarism has many forms, not only copy paste but include changing passive become active voice, or paraphrasing without appropriate acknowledgment. It happens on all language include Indonesian Language. There are many previous research that related with plagiarism detection in Indonesian Language with different method. But there are still some part that still has opportunity to improve. This research proposed the solution that can improve the plagiarism detection technique that can detect not only copy paste form but more advance than that. The proposed solution is using Winnowing with some addition process in pre-processing stage. With stemming processing in Indonesian Language and generate fingerprint in parallel processing that can saving time processing and produce the plagiarism result on the suspected document.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Demeure, I.M.

    The research presented here is concerned with representation techniques and tools to support the design, prototyping, simulation, and evaluation of message-based parallel, distributed computations. The author describes ParaDiGM-Parallel, Distributed computation Graph Model-a visual representation technique for parallel, message-based distributed computations. ParaDiGM provides several views of a computation depending on the aspect of concern. It is made of two complementary submodels, the DCPG-Distributed Computing Precedence Graph-model, and the PAM-Process Architecture Model-model. DCPGs are precedence graphs used to express the functionality of a computation in terms of tasks, message-passing, and data. PAM graphs are used to represent the partitioning of a computationmore » into schedulable units or processes, and the pattern of communication among those units. There is a natural mapping between the two models. He illustrates the utility of ParaDiGM as a representation technique by applying it to various computations (e.g., an adaptive global optimization algorithm, the client-server model). ParaDiGM representations are concise. They can be used in documenting the design and the implementation of parallel, distributed computations, in describing such computations to colleagues, and in comparing and contrasting various implementations of the same computation. He then describes VISA-VISual Assistant, a software tool to support the design, prototyping, and simulation of message-based parallel, distributed computations. VISA is based on the ParaDiGM model. In particular, it supports the editing of ParaDiGM graphs to describe the computations of interest, and the animation of these graphs to provide visual feedback during simulations. The graphs are supplemented with various attributes, simulation parameters, and interpretations which are procedures that can be executed by VISA.« less

  14. Methodologies and systems for heterogeneous concurrent computing

    NASA Technical Reports Server (NTRS)

    Sunderam, V. S.

    1994-01-01

    Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.

  15. Storing files in a parallel computing system based on user-specified parser function

    DOEpatents

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  16. Improved CDMA Performance Using Parallel Interference Cancellation

    NASA Technical Reports Server (NTRS)

    Simon, Marvin; Divsalar, Dariush

    1995-01-01

    This report considers a general parallel interference cancellation scheme that significantly reduces the degradation effect of user interference but with a lesser implementation complexity than the maximum-likelihood technique. The scheme operates on the fact that parallel processing simultaneously removes from each user the interference produced by the remaining users accessing the channel in an amount proportional to their reliability. The parallel processing can be done in multiple stages. The proposed scheme uses tentative decision devices with different optimum thresholds at the multiple stages to produce the most reliably received data for generation and cancellation of user interference. The 1-stage interference cancellation is analyzed for three types of tentative decision devices, namely, hard, null zone, and soft decision, and two types of user power distribution, namely, equal and unequal powers. Simulation results are given for a multitude of different situations, in particular, those cases for which the analysis is too complex.

  17. Evaluation of a parallel implementation of the learning portion of the backward error propagation neural network: experiments in artifact identification.

    PubMed Central

    Sittig, D. F.; Orr, J. A.

    1991-01-01

    Various methods have been proposed in an attempt to solve problems in artifact and/or alarm identification including expert systems, statistical signal processing techniques, and artificial neural networks (ANN). ANNs consist of a large number of simple processing units connected by weighted links. To develop truly robust ANNs, investigators are required to train their networks on huge training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from approximately 6.4 hours to 1.5 hours. We conclude that use of the master-worker model of parallel computation is an excellent method for obtaining speedups in the backward error propagation neural network training algorithm. PMID:1807607

  18. Gate tunable parallel double quantum dots in InAs double-nanowire devices

    NASA Astrophysics Data System (ADS)

    Baba, S.; Matsuo, S.; Kamata, H.; Deacon, R. S.; Oiwa, A.; Li, K.; Jeppesen, S.; Samuelson, L.; Xu, H. Q.; Tarucha, S.

    2017-12-01

    We report fabrication and characterization of InAs nanowire devices with two closely placed parallel nanowires. The fabrication process we develop includes selective deposition of the nanowires with micron scale alignment onto predefined finger bottom gates using a polymer transfer technique. By tuning the double nanowire with the finger bottom gates, we observed the formation of parallel double quantum dots with one quantum dot in each nanowire bound by the normal metal contact edges. We report the gate tunability of the charge states in individual dots as well as the inter-dot electrostatic coupling. In addition, we fabricate a device with separate normal metal contacts and a common superconducting contact to the two parallel wires and confirm the dot formation in each wire from comparison of the transport properties and a superconducting proximity gap feature for the respective wires. With the fabrication techniques established in this study, devices can be realized for more advanced experiments on Cooper-pair splitting, generation of Parafermions, and so on.

  19. Real-time Nyquist signaling with dynamic precision and flexible non-integer oversampling.

    PubMed

    Schmogrow, R; Meyer, M; Schindler, P C; Nebendahl, B; Dreschmann, M; Meyer, J; Josten, A; Hillerkuss, D; Ben-Ezra, S; Becker, J; Koos, C; Freude, W; Leuthold, J

    2014-01-13

    We demonstrate two efficient processing techniques for Nyquist signals, namely computation of signals using dynamic precision as well as arbitrary rational oversampling factors. With these techniques along with massively parallel processing it becomes possible to generate and receive high data rate Nyquist signals with flexible symbol rates and bandwidths, a feature which is highly desirable for novel flexgrid networks. We achieved maximum bit rates of 252 Gbit/s in real-time.

  20. Assessing clutter reduction in parallel coordinates using image processing techniques

    NASA Astrophysics Data System (ADS)

    Alhamaydh, Heba; Alzoubi, Hussein; Almasaeid, Hisham

    2018-01-01

    Information visualization has appeared as an important research field for multidimensional data and correlation analysis in recent years. Parallel coordinates (PCs) are one of the popular techniques to visual high-dimensional data. A problem with the PCs technique is that it suffers from crowding, a clutter which hides important data and obfuscates the information. Earlier research has been conducted to reduce clutter without loss in data content. We introduce the use of image processing techniques as an approach for assessing the performance of clutter reduction techniques in PC. We use histogram analysis as our first measure, where the mean feature of the color histograms of the possible alternative orderings of coordinates for the PC images is calculated and compared. The second measure is the extracted contrast feature from the texture of PC images based on gray-level co-occurrence matrices. The results show that the best PC image is the one that has the minimal mean value of the color histogram feature and the maximal contrast value of the texture feature. In addition to its simplicity, the proposed assessment method has the advantage of objectively assessing alternative ordering of PC visualization.

  1. Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

    NASA Technical Reports Server (NTRS)

    Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

    1989-01-01

    Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.

  2. Parallel optoelectronic trinary signed-digit division

    NASA Astrophysics Data System (ADS)

    Alam, Mohammad S.

    1999-03-01

    The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.

  3. Multiprocessor smalltalk: Implementation, performance, and analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pallas, J.I.

    1990-01-01

    Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less

  4. Parallel computational fluid dynamics '91; Conference Proceedings, Stuttgart, Germany, Jun. 10-12, 1991

    NASA Technical Reports Server (NTRS)

    Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)

    1992-01-01

    A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.

  5. Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor

    DTIC Science & Technology

    2010-05-01

    Abstract—The Multiple Signal Classification ( MUSIC ) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals...Broadband Engine Processor (Cell BE). The process of adapting the serial based MUSIC algorithm to the Cell BE will be analyzed in terms of parallelism and...using Multiple Signal Classification MUSIC algorithm [4] • Computation of Focus matrix • Computation of number of sources • Separation of Signal

  6. Managing Algorithmic Skeleton Nesting Requirements in Realistic Image Processing Applications: The Case of the SKiPPER-II Parallel Programming Environment's Operating Model

    NASA Astrophysics Data System (ADS)

    Coudarcher, Rémi; Duculty, Florent; Serot, Jocelyn; Jurie, Frédéric; Derutin, Jean-Pierre; Dhome, Michel

    2005-12-01

    SKiPPER is a SKeleton-based Parallel Programming EnviRonment being developed since 1996 and running at LASMEA Laboratory, the Blaise-Pascal University, France. The main goal of the project was to demonstrate the applicability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This paper deals with the special features embedded in the latest version of the project: algorithmic skeleton nesting capabilities and a fully dynamic operating model. Throughout the case study of a complete and realistic image processing application, in which we have pointed out the requirement for skeleton nesting, we are presenting the operating model of this feature. The work described here is one of the few reported experiments showing the application of skeleton nesting facilities for the parallelisation of a realistic application, especially in the area of image processing. The image processing application we have chosen is a 3D face-tracking algorithm from appearance.

  7. Hybrid parallel computing architecture for multiview phase shifting

    NASA Astrophysics Data System (ADS)

    Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

    2014-11-01

    The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.

  8. A parallel time integrator for noisy nonlinear oscillatory systems

    NASA Astrophysics Data System (ADS)

    Subber, Waad; Sarkar, Abhijit

    2018-06-01

    In this paper, we adapt a parallel time integration scheme to track the trajectories of noisy non-linear dynamical systems. Specifically, we formulate a parallel algorithm to generate the sample path of nonlinear oscillator defined by stochastic differential equations (SDEs) using the so-called parareal method for ordinary differential equations (ODEs). The presence of Wiener process in SDEs causes difficulties in the direct application of any numerical integration techniques of ODEs including the parareal algorithm. The parallel implementation of the algorithm involves two SDEs solvers, namely a fine-level scheme to integrate the system in parallel and a coarse-level scheme to generate and correct the required initial conditions to start the fine-level integrators. For the numerical illustration, a randomly excited Duffing oscillator is investigated in order to study the performance of the stochastic parallel algorithm with respect to a range of system parameters. The distributed implementation of the algorithm exploits Massage Passing Interface (MPI).

  9. Rubus: A compiler for seamless and extensible parallelism.

    PubMed

    Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

    2017-01-01

    Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.

  10. Rubus: A compiler for seamless and extensible parallelism

    PubMed Central

    Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

    2017-01-01

    Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758

  11. Load balancing for massively-parallel soft-real-time systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hailperin, M.

    1988-09-01

    Global load balancing, if practical, would allow the effective use of massively-parallel ensemble architectures for large soft-real-problems. The challenge is to replace quick global communications, which is impractical in a massively-parallel system, with statistical techniques. In this vein, the author proposes a novel approach to decentralized load balancing based on statistical time-series analysis. Each site estimates the system-wide average load using information about past loads of individual sites and attempts to equal that average. This estimation process is practical because the soft-real-time systems of interest naturally exhibit loads that are periodic, in a statistical sense akin to seasonality in econometrics.more » It is shown how this load-characterization technique can be the foundation for a load-balancing system in an architecture employing cut-through routing and an efficient multicast protocol.« less

  12. Alignment of an acoustic manipulation device with cepstral analysis of electronic impedance data.

    PubMed

    Hughes, D A; Qiu, Y; Démoré, C; Weijer, C J; Cochran, S

    2015-02-01

    Acoustic particle manipulation is an emerging technology that uses ultrasonic standing waves to position objects with pressure gradients and acoustic radiation forces. To produce strong standing waves, the transducer and the reflector must be aligned properly such that they are parallel to each other. This can be a difficult process due to the need to visualise the ultrasound waves and as higher frequencies are introduced, this alignment requires higher accuracy. In this paper, we present a method for aligning acoustic resonators with cepstral analysis. This is a simple signal processing technique that requires only the electrical impedance measurement data of the resonator, which is usually recorded during the fabrication process of the device. We first introduce the mathematical basis of cepstral analysis and then demonstrate and validate it using a computer simulation of an acoustic resonator. Finally, the technique is demonstrated experimentally to create many parallel linear traps for 10 μm fluorescent beads inside an acoustic resonator. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Hierarchical Parallelization of Gene Differential Association Analysis

    PubMed Central

    2011-01-01

    Background Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Results Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. Conclusions The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels. PMID:21936916

  14. Hierarchical parallelization of gene differential association analysis.

    PubMed

    Needham, Mark; Hu, Rui; Dwarkadas, Sandhya; Qiu, Xing

    2011-09-21

    Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.

  15. The influence of hand positions on biomechanical injury risk factors at the wrist joint during the round-off skills in female gymnastics.

    PubMed

    Farana, Roman; Jandacka, Daniel; Uchytil, Jaroslav; Zahradnik, David; Irwin, Gareth

    2017-01-01

    The aim of this study was to examine the biomechanical injury risk factors at the wrist, including joint kinetics, kinematics and stiffness in the first and second contact limb for parallel and T-shape round-off (RO) techniques. Seven international-level female gymnasts performed 10 trials of the RO to back handspring with parallel and T-shape hand positions. Synchronised kinematic (3D motion analysis system; 247 Hz) and kinetic (two force plates; 1235 Hz) data were collected for each trial. A two-way repeated measure analysis of variance (ANOVA) assessed differences in the kinematic and kinetic parameters between the techniques for each contact limb. The main findings highlighted that in both the RO techniques, the second contact limb wrist joint is exposed to higher mechanical loads than the first contact limb demonstrated by increased axial compression force and loading rate. In the parallel technique, the second contact limb wrist joint is exposed to higher axial compression load. Differences between wrist joint kinetics highlight that the T-shape technique may potentially lead to reducing these bio-physical loads and consequently protect the second contact limb wrist joint from overload and biological failure. Highlighting the biomechanical risk factors facilitates the process of technique selection making more objective and safe.

  16. Thin-Film Material Science and Processing | Materials Science | NREL

    Science.gov Websites

    , a prime example of this research is thin-film photovoltaics (PV). Thin films are important because have developed a quantitative high-throughput technique that can measure many barriers in parallel with

  17. A real time, FEM based optimal control algorithm and its implementation using parallel processing hardware (transistors) in a microprocessor environment

    NASA Technical Reports Server (NTRS)

    Patten, William Neff

    1989-01-01

    There is an evident need to discover a means of establishing reliable, implementable controls for systems that are plagued by nonlinear and, or uncertain, model dynamics. The development of a generic controller design tool for tough-to-control systems is reported. The method utilizes a moving grid, time infinite element based solution of the necessary conditions that describe an optimal controller for a system. The technique produces a discrete feedback controller. Real time laboratory experiments are now being conducted to demonstrate the viability of the method. The algorithm that results is being implemented in a microprocessor environment. Critical computational tasks are accomplished using a low cost, on-board, multiprocessor (INMOS T800 Transputers) and parallel processing. Progress to date validates the methodology presented. Applications of the technique to the control of highly flexible robotic appendages are suggested.

  18. Flexible, FEP-Teflon covered solar cell module development

    NASA Technical Reports Server (NTRS)

    Rauschenbach, H. S.; Cannady, M. D.

    1976-01-01

    Techniques and equipment were developed for the large scale, low-cost fabrication of lightweight, roll-up and fold-up, FEP-Teflon encapsulated solar cell modules. Modules were fabricated by interconnecting solderless single-crystal silicon solar cells and heat laminating them at approximately 300 C between layers of optically clear FEP and to a loadbearing Kapton substrate sheet. Modules were fabricated from both conventional and wraparound contact solar cells. A heat seal technique was developed for mechanically interconnecting modules into an array. The electrical interconnections for both roll-up and fold-up arrays were also developed. The use of parallel-gap resistance welding, ultrasonic bonding, and thermocompression bonding processes for attaching interconnects to solar cells were investigated. Parallel-gap welding was found to be best suited for interconnecting the solderless solar cells into modules. Details of the fabrication equipment, fabrication processes, module and interconnect designs, environmental test equipment, and test results are presented.

  19. High-speed technique based on a parallel projection correlation procedure for digital image correlation

    NASA Astrophysics Data System (ADS)

    Zaripov, D. I.; Renfu, Li

    2018-05-01

    The implementation of high-efficiency digital image correlation methods based on a zero-normalized cross-correlation (ZNCC) procedure for high-speed, time-resolved measurements using a high-resolution digital camera is associated with big data processing and is often time consuming. In order to speed-up ZNCC computation, a high-speed technique based on a parallel projection correlation procedure is proposed. The proposed technique involves the use of interrogation window projections instead of its two-dimensional field of luminous intensity. This simplification allows acceleration of ZNCC computation up to 28.8 times compared to ZNCC calculated directly, depending on the size of interrogation window and region of interest. The results of three synthetic test cases, such as a one-dimensional uniform flow, a linear shear flow and a turbulent boundary-layer flow, are discussed in terms of accuracy. In the latter case, the proposed technique is implemented together with an iterative window-deformation technique. On the basis of the results of the present work, the proposed technique is recommended to be used for initial velocity field calculation, with further correction using more accurate techniques.

  20. Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel® Xeon Phi™ Coprocessor.

    PubMed

    Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas

    2015-01-01

    Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.

  1. Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

    NASA Technical Reports Server (NTRS)

    OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

    1998-01-01

    This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

  2. Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

    NASA Technical Reports Server (NTRS)

    Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

    1991-01-01

    Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.

  3. B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC.

    PubMed

    Cui, Yingbo; Liao, Xiangke; Zhu, Xiaoqian; Wang, Bingqiang; Peng, Shaoliang

    2016-03-01

    Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision.

  4. Implementation of ADI: Schemes on MIMD parallel computers

    NASA Technical Reports Server (NTRS)

    Vanderwijngaart, Rob F.

    1993-01-01

    In order to simulate the effects of the impingement of hot exhaust jets of High Performance Aircraft on landing surfaces a multi-disciplinary computation coupling flow dynamics to heat conduction in the runway needs to be carried out. Such simulations, which are essentially unsteady, require very large computational power in order to be completed within a reasonable time frame of the order of an hour. Such power can be furnished by the latest generation of massively parallel computers. These remove the bottleneck of ever more congested data paths to one or a few highly specialized central processing units (CPU's) by having many off-the-shelf CPU's work independently on their own data, and exchange information only when needed. During the past year the first phase of this project was completed, in which the optimal strategy for mapping an ADI-algorithm for the three dimensional unsteady heat equation to a MIMD parallel computer was identified. This was done by implementing and comparing three different domain decomposition techniques that define the tasks for the CPU's in the parallel machine. These implementations were done for a Cartesian grid and Dirichlet boundary conditions. The most promising technique was then used to implement the heat equation solver on a general curvilinear grid with a suite of nontrivial boundary conditions. Finally, this technique was also used to implement the Scalar Penta-diagonal (SP) benchmark, which was taken from the NAS Parallel Benchmarks report. All implementations were done in the programming language C on the Intel iPSC/860 computer.

  5. Performance evaluation of GPU parallelization, space-time adaptive algorithms, and their combination for simulating cardiac electrophysiology.

    PubMed

    Sachetto Oliveira, Rafael; Martins Rocha, Bernardo; Burgarelli, Denise; Meira, Wagner; Constantinides, Christakis; Weber Dos Santos, Rodrigo

    2018-02-01

    The use of computer models as a tool for the study and understanding of the complex phenomena of cardiac electrophysiology has attained increased importance nowadays. At the same time, the increased complexity of the biophysical processes translates into complex computational and mathematical models. To speed up cardiac simulations and to allow more precise and realistic uses, 2 different techniques have been traditionally exploited: parallel computing and sophisticated numerical methods. In this work, we combine a modern parallel computing technique based on multicore and graphics processing units (GPUs) and a sophisticated numerical method based on a new space-time adaptive algorithm. We evaluate each technique alone and in different combinations: multicore and GPU, multicore and GPU and space adaptivity, multicore and GPU and space adaptivity and time adaptivity. All the techniques and combinations were evaluated under different scenarios: 3D simulations on slabs, 3D simulations on a ventricular mouse mesh, ie, complex geometry, sinus-rhythm, and arrhythmic conditions. Our results suggest that multicore and GPU accelerate the simulations by an approximate factor of 33×, whereas the speedups attained by the space-time adaptive algorithms were approximately 48. Nevertheless, by combining all the techniques, we obtained speedups that ranged between 165 and 498. The tested methods were able to reduce the execution time of a simulation by more than 498× for a complex cellular model in a slab geometry and by 165× in a realistic heart geometry simulating spiral waves. The proposed methods will allow faster and more realistic simulations in a feasible time with no significant loss of accuracy. Copyright © 2017 John Wiley & Sons, Ltd.

  6. Observing with HST V: Improvements to the Scheduling of HST Parallel Observations

    NASA Astrophysics Data System (ADS)

    Taylor, D. K.; Vanorsow, D.; Lucks, M.; Henry, R.; Ratnatunga, K.; Patterson, A.

    1994-12-01

    Recent improvements to the Hubble Space Telescope (HST) ground system have significantly increased the frequency of pure parallel observations, i.e. the simultaneous use of multiple HST instruments by different observers. Opportunities for parallel observations are limited by a variety of timing, hardware, and scientific constraints. Formerly, such opportunities were heuristically predicted prior to the construction of the primary schedule (or calendar), and lack of complete information resulted in high rates of scheduling failures and missed opportunities. In the current process the search for parallel opportunities is delayed until the primary schedule is complete, at which point new software tools are employed to identify places where parallel observations are supported. The result has been a considerable increase in parallel throughput. A new technique, known as ``parallel crafting,'' is currently under development to streamline further the parallel scheduling process. This radically new method will replace the standard exposure logsheet with a set of abstract rules from which observation parameters will be constructed ``on the fly'' to best match the constraints of the parallel opportunity. Currently, parallel observers must specify a huge (and highly redundant) set of exposure types in order to cover all possible types of parallel opportunities. Crafting rules permit the observer to express timing, filter, and splitting preferences in a far more succinct manner. The issue of coordinated parallel observations (same PI using different instruments simultaneously), long a troublesome aspect of the ground system, is also being addressed. For Cycle 5, the Phase II Proposal Instructions now have an exposure-level PAR WITH special requirement. While only the primary's alignment will be scheduled on the calendar, new commanding will provide for parallel exposures with both instruments.

  7. Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing Units

    PubMed Central

    Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.

    2010-01-01

    Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792

  8. Dynamic Load Balancing Based on Constrained K-D Tree Decomposition for Parallel Particle Tracing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jiang; Guo, Hanqi; Yuan, Xiaoru

    Particle tracing is a fundamental technique in flow field data visualization. In this work, we present a novel dynamic load balancing method for parallel particle tracing. Specifically, we employ a constrained k-d tree decomposition approach to dynamically redistribute tasks among processes. Each process is initially assigned a regularly partitioned block along with duplicated ghost layer under the memory limit. During particle tracing, the k-d tree decomposition is dynamically performed by constraining the cutting planes in the overlap range of duplicated data. This ensures that each process is reassigned particles as even as possible, and on the other hand the newmore » assigned particles for a process always locate in its block. Result shows good load balance and high efficiency of our method.« less

  9. Introduction of Parallel GPGPU Acceleration Algorithms for the Solution of Radiative Transfer

    NASA Technical Reports Server (NTRS)

    Godoy, William F.; Liu, Xu

    2011-01-01

    General-purpose computing on graphics processing units (GPGPU) is a recent technique that allows the parallel graphics processing unit (GPU) to accelerate calculations performed sequentially by the central processing unit (CPU). To introduce GPGPU to radiative transfer, the Gauss-Seidel solution of the well-known expressions for 1-D and 3-D homogeneous, isotropic media is selected as a test case. Different algorithms are introduced to balance memory and GPU-CPU communication, critical aspects of GPGPU. Results show that speed-ups of one to two orders of magnitude are obtained when compared to sequential solutions. The underlying value of GPGPU is its potential extension in radiative solvers (e.g., Monte Carlo, discrete ordinates) at a minimal learning curve.

  10. Parallel algorithm for computation of second-order sequential best rotations

    NASA Astrophysics Data System (ADS)

    Redif, Soydan; Kasap, Server

    2013-12-01

    Algorithms for computing an approximate polynomial matrix eigenvalue decomposition of para-Hermitian systems have emerged as a powerful, generic signal processing tool. A technique that has shown much success in this regard is the sequential best rotation (SBR2) algorithm. Proposed is a scheme for parallelising SBR2 with a view to exploiting the modern architectural features and inherent parallelism of field-programmable gate array (FPGA) technology. Experiments show that the proposed scheme can achieve low execution times while requiring minimal FPGA resources.

  11. Graphics Processing Unit Assisted Thermographic Compositing

    NASA Technical Reports Server (NTRS)

    Ragasa, Scott; McDougal, Matthew; Russell, Sam

    2012-01-01

    Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often great, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques. Technical Methodology/Approach: Apply massively parallel algorithms and data structures to the specific analysis requirements presented when working with thermographic data sets.

  12. Proceedings of the 3rd Annual Conference on Aerospace Computational Control, volume 2

    NASA Technical Reports Server (NTRS)

    Bernard, Douglas E. (Editor); Man, Guy K. (Editor)

    1989-01-01

    This volume of the conference proceedings contain papers and discussions in the following topical areas: Parallel processing; Emerging integrated capabilities; Low order controllers; Real time simulation; Multibody component representation; User environment; and Distributed parameter techniques.

  13. Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

    NASA Technical Reports Server (NTRS)

    Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

    2005-01-01

    The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.

  14. Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

    In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO, combined with a highly optimized finegrain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intelmore » Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.« less

  15. Accelerating EPI distortion correction by utilizing a modern GPU-based parallel computation.

    PubMed

    Yang, Yao-Hao; Huang, Teng-Yi; Wang, Fu-Nien; Chuang, Tzu-Chao; Chen, Nan-Kuei

    2013-04-01

    The combination of phase demodulation and field mapping is a practical method to correct echo planar imaging (EPI) geometric distortion. However, since phase dispersion accumulates in each phase-encoding step, the calculation complexity of phase modulation is Ny-fold higher than conventional image reconstructions. Thus, correcting EPI images via phase demodulation is generally a time-consuming task. Parallel computing by employing general-purpose calculations on graphics processing units (GPU) can accelerate scientific computing if the algorithm is parallelized. This study proposes a method that incorporates the GPU-based technique into phase demodulation calculations to reduce computation time. The proposed parallel algorithm was applied to a PROPELLER-EPI diffusion tensor data set. The GPU-based phase demodulation method reduced the EPI distortion correctly, and accelerated the computation. The total reconstruction time of the 16-slice PROPELLER-EPI diffusion tensor images with matrix size of 128 × 128 was reduced from 1,754 seconds to 101 seconds by utilizing the parallelized 4-GPU program. GPU computing is a promising method to accelerate EPI geometric correction. The resulting reduction in computation time of phase demodulation should accelerate postprocessing for studies performed with EPI, and should effectuate the PROPELLER-EPI technique for clinical practice. Copyright © 2011 by the American Society of Neuroimaging.

  16. Ballistic Deficits for Ionization Chamber Pulses in Pulse Shaping Amplifiers

    NASA Astrophysics Data System (ADS)

    Kumar, G. Anil; Sharma, S. L.; Choudhury, R. K.

    2007-04-01

    In order to understand the dependence of the ballistic deficit on the shape of rising portion of the voltage pulse at the input of a pulse shaping amplifier, we have estimated the ballistic deficits for the pulses from a two-electrode parallel plate ionization chamber as well as for the pulses from a gridded parallel plate ionization chamber. These estimations have been made using numerical integration method when the pulses are processed through the CR-RCn (n=1-6) shaping network as well as when the pulses are processed through the complex shaping network of the ORTEC Model 472 spectroscopic amplifier. Further, we have made simulations to see the effect of ballistic deficit on the pulse-height spectra under different conditions. We have also carried out measurements of the ballistic deficits for the pulses from a two-electrode parallel plate ionization chamber as well as for the pulses from a gridded parallel plate ionization chamber when these pulses are processed through the ORTEC 572 linear amplifier having a simple CR-RC shaping network. The reasonable matching of the simulated ballistic deficits with the experimental ballistic deficits for the CR-RC shaping network clearly establishes the validity of the simulation technique

  17. Speculation and replication in temperature accelerated dynamics

    DOE PAGES

    Zamora, Richard J.; Perez, Danny; Voter, Arthur F.

    2018-02-12

    Accelerated Molecular Dynamics (AMD) is a class of MD-based algorithms for the long-time scale simulation of atomistic systems that are characterized by rare-event transitions. Temperature-Accelerated Dynamics (TAD), a traditional AMD approach, hastens state-to-state transitions by performing MD at an elevated temperature. Recently, Speculatively-Parallel TAD (SpecTAD) was introduced, allowing the TAD procedure to exploit parallel computing systems by concurrently executing in a dynamically generated list of speculative future states. Although speculation can be very powerful, it is not always the most efficient use of parallel resources. In this paper, we compare the performance of speculative parallelism with a replica-based technique, similarmore » to the Parallel Replica Dynamics method. A hybrid SpecTAD approach is also presented, in which each speculation process is further accelerated by a local set of replicas. Finally and overall, this work motivates the use of hybrid parallelism whenever possible, as some combination of speculation and replication is typically most efficient.« less

  18. Speculation and replication in temperature accelerated dynamics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zamora, Richard J.; Perez, Danny; Voter, Arthur F.

    Accelerated Molecular Dynamics (AMD) is a class of MD-based algorithms for the long-time scale simulation of atomistic systems that are characterized by rare-event transitions. Temperature-Accelerated Dynamics (TAD), a traditional AMD approach, hastens state-to-state transitions by performing MD at an elevated temperature. Recently, Speculatively-Parallel TAD (SpecTAD) was introduced, allowing the TAD procedure to exploit parallel computing systems by concurrently executing in a dynamically generated list of speculative future states. Although speculation can be very powerful, it is not always the most efficient use of parallel resources. In this paper, we compare the performance of speculative parallelism with a replica-based technique, similarmore » to the Parallel Replica Dynamics method. A hybrid SpecTAD approach is also presented, in which each speculation process is further accelerated by a local set of replicas. Finally and overall, this work motivates the use of hybrid parallelism whenever possible, as some combination of speculation and replication is typically most efficient.« less

  19. Microscale bioprocess optimisation.

    PubMed

    Micheletti, Martina; Lye, Gary J

    2006-12-01

    Microscale processing techniques offer the potential to speed up the delivery of new drugs to the market, reducing development costs and increasing patient benefit. These techniques have application across both the chemical and biopharmaceutical sectors. The approach involves the study of individual bioprocess operations at the microlitre scale using either microwell or microfluidic formats. In both cases the aim is to generate quantitative bioprocess information early on, so as to inform bioprocess design and speed translation to the manufacturing scale. Automation can enhance experimental throughput and will facilitate the parallel evaluation of competing biocatalyst and process options.

  20. The parallel-sequential field subtraction technique for coherent nonlinear ultrasonic imaging

    NASA Astrophysics Data System (ADS)

    Cheng, Jingwei; Potter, Jack N.; Drinkwater, Bruce W.

    2018-06-01

    Nonlinear imaging techniques have recently emerged which have the potential to detect cracks at a much earlier stage than was previously possible and have sensitivity to partially closed defects. This study explores a coherent imaging technique based on the subtraction of two modes of focusing: parallel, in which the elements are fired together with a delay law and sequential, in which elements are fired independently. In the parallel focusing a high intensity ultrasonic beam is formed in the specimen at the focal point. However, in sequential focusing only low intensity signals from individual elements enter the sample and the full matrix of transmit-receive signals is recorded and post-processed to form an image. Under linear elastic assumptions, both parallel and sequential images are expected to be identical. Here we measure the difference between these images and use this to characterise the nonlinearity of small closed fatigue cracks. In particular we monitor the change in relative phase and amplitude at the fundamental frequencies for each focal point and use this nonlinear coherent imaging metric to form images of the spatial distribution of nonlinearity. The results suggest the subtracted image can suppress linear features (e.g. back wall or large scatters) effectively when instrumentation noise compensation in applied, thereby allowing damage to be detected at an early stage (c. 15% of fatigue life) and reliably quantified in later fatigue life.

  1. A study of the parallel algorithm for large-scale DC simulation of nonlinear systems

    NASA Astrophysics Data System (ADS)

    Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel

    Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.

  2. What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.

  3. Parallel implementation of the particle simulation method with dynamic load balancing: Toward realistic geodynamical simulation

    NASA Astrophysics Data System (ADS)

    Furuichi, M.; Nishiura, D.

    2015-12-01

    Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).

  4. Accelerated numerical processing of electronically recorded holograms with reduced speckle noise.

    PubMed

    Trujillo, Carlos; Garcia-Sucerquia, Jorge

    2013-09-01

    The numerical reconstruction of digitally recorded holograms suffers from speckle noise. An accelerated method that uses general-purpose computing in graphics processing units to reduce that noise is shown. The proposed methodology utilizes parallelized algorithms to record, reconstruct, and superimpose multiple uncorrelated holograms of a static scene. For the best tradeoff between reduction of the speckle noise and processing time, the method records, reconstructs, and superimposes six holograms of 1024 × 1024 pixels in 68 ms; for this case, the methodology reduces the speckle noise by 58% compared with that exhibited by a single hologram. The fully parallelized method running on a commodity graphics processing unit is one order of magnitude faster than the same technique implemented on a regular CPU using its multithreading capabilities. Experimental results are shown to validate the proposal.

  5. SPM oxidation and parallel writing on zirconium nitride thin films

    NASA Astrophysics Data System (ADS)

    Farkas, N.; Comer, J. R.; Zhang, G.; Evans, E. A.; Ramsier, R. D.; Dagata, J. A.

    2005-07-01

    Systematic investigation of the SPM oxidation process of sputter-deposited ZrN thin films is reported. During the intrinsic part of the oxidation, the density of the oxide increases until the total oxide thickness is approximately twice the feature height. Further oxide growth is sustainable as the system undergoes plastic flow followed by delamination from the ZrN-silicon interface keeping the oxide density constant. ZrN exhibits superdiffusive oxidation kinetics in these single tip SPM studies. We extend this work to the fabrication of parallel oxide patterns 70 nm in height covering areas in the square centimeter range. This simple, quick, and well-controlled parallel nanolithographic technique has great potential for biomedical template fabrication.

  6. Nonlinear relaxation algorithms for circuit simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saleh, R.A.

    Circuit simulation is an important Computer-Aided Design (CAD) tool in the design of Integrated Circuits (IC). However, the standard techniques used in programs such as SPICE result in very long computer-run times when applied to large problems. In order to reduce the overall run time, a number of new approaches to circuit simulation were developed and are described. These methods are based on nonlinear relaxation techniques and exploit the relative inactivity of large circuits. Simple waveform-processing techniques are described to determine the maximum possible speed improvement that can be obtained by exploiting this property of large circuits. Three simulation algorithmsmore » are described, two of which are based on the Iterated Timing Analysis (ITA) method and a third based on the Waveform-Relaxation Newton (WRN) method. New programs that incorporate these techniques were developed and used to simulate a variety of industrial circuits. The results from these simulations are provided. The techniques are shown to be much faster than the standard approach. In addition, a number of parallel aspects of these algorithms are described, and a general space-time model of parallel-task scheduling is developed.« less

  7. A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data

    NASA Astrophysics Data System (ADS)

    Li, Z.; Hodgson, M.; Li, W.

    2016-12-01

    Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.

  8. An Intrinsic Algorithm for Parallel Poisson Disk Sampling on Arbitrary Surfaces.

    PubMed

    Ying, Xiang; Xin, Shi-Qing; Sun, Qian; He, Ying

    2013-03-08

    Poisson disk sampling plays an important role in a variety of visual computing, due to its useful statistical property in distribution and the absence of aliasing artifacts. While many effective techniques have been proposed to generate Poisson disk distribution in Euclidean space, relatively few work has been reported to the surface counterpart. This paper presents an intrinsic algorithm for parallel Poisson disk sampling on arbitrary surfaces. We propose a new technique for parallelizing the dart throwing. Rather than the conventional approaches that explicitly partition the spatial domain to generate the samples in parallel, our approach assigns each sample candidate a random and unique priority that is unbiased with regard to the distribution. Hence, multiple threads can process the candidates simultaneously and resolve conflicts by checking the given priority values. It is worth noting that our algorithm is accurate as the generated Poisson disks are uniformly and randomly distributed without bias. Our method is intrinsic in that all the computations are based on the intrinsic metric and are independent of the embedding space. This intrinsic feature allows us to generate Poisson disk distributions on arbitrary surfaces. Furthermore, by manipulating the spatially varying density function, we can obtain adaptive sampling easily.

  9. High-performance computing — an overview

    NASA Astrophysics Data System (ADS)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  10. New Parallel Algorithms for Landscape Evolution Model

    NASA Astrophysics Data System (ADS)

    Jin, Y.; Zhang, H.; Shi, Y.

    2017-12-01

    Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.

  11. Algorithms and programming tools for image processing on the MPP, part 2

    NASA Technical Reports Server (NTRS)

    Reeves, Anthony P.

    1986-01-01

    A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.

  12. Creation of the BMA ensemble for SST using a parallel processing technique

    NASA Astrophysics Data System (ADS)

    Kim, Kwangjin; Lee, Yang Won

    2013-10-01

    Despite the same purpose, each satellite product has different value because of its inescapable uncertainty. Also the satellite products have been calculated for a long time, and the kinds of the products are various and enormous. So the efforts for reducing the uncertainty and dealing with enormous data will be necessary. In this paper, we create an ensemble Sea Surface Temperature (SST) using MODIS Aqua, MODIS Terra and COMS (Communication Ocean and Meteorological Satellite). We used Bayesian Model Averaging (BMA) as ensemble method. The principle of the BMA is synthesizing the conditional probability density function (PDF) using posterior probability as weight. The posterior probability is estimated using EM algorithm. The BMA PDF is obtained by weighted average. As the result, the ensemble SST showed the lowest RMSE and MAE, which proves the applicability of BMA for satellite data ensemble. As future work, parallel processing techniques using Hadoop framework will be adopted for more efficient computation of very big satellite data.

  13. Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units.

    PubMed

    Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V

    2010-06-01

    Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  14. Multiple grid problems on concurrent-processing computers

    NASA Technical Reports Server (NTRS)

    Eberhardt, D. S.; Baganoff, D.

    1986-01-01

    Three computer codes were studied which make use of concurrent processing computer architectures in computational fluid dynamics (CFD). The three parallel codes were tested on a two processor multiple-instruction/multiple-data (MIMD) facility at NASA Ames Research Center, and are suggested for efficient parallel computations. The first code is a well-known program which makes use of the Beam and Warming, implicit, approximate factored algorithm. This study demonstrates the parallelism found in a well-known scheme and it achieved speedups exceeding 1.9 on the two processor MIMD test facility. The second code studied made use of an embedded grid scheme which is used to solve problems having complex geometries. The particular application for this study considered an airfoil/flap geometry in an incompressible flow. The scheme eliminates some of the inherent difficulties found in adapting approximate factorization techniques onto MIMD machines and allows the use of chaotic relaxation and asynchronous iteration techniques. The third code studied is an application of overset grids to a supersonic blunt body problem. The code addresses the difficulties encountered when using embedded grids on a compressible, and therefore nonlinear, problem. The complex numerical boundary system associated with overset grids is discussed and several boundary schemes are suggested. A boundary scheme based on the method of characteristics achieved the best results.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

    CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less

  16. Wave scheduling - Decentralized scheduling of task forces in multicomputers

    NASA Technical Reports Server (NTRS)

    Van Tilborg, A. M.; Wittie, L. D.

    1984-01-01

    Decentralized operating systems that control large multicomputers need techniques to schedule competing parallel programs called task forces. Wave scheduling is a probabilistic technique that uses a hierarchical distributed virtual machine to schedule task forces by recursively subdividing and issuing wavefront-like commands to processing elements capable of executing individual tasks. Wave scheduling is highly resistant to processing element failures because it uses many distributed schedulers that dynamically assign scheduling responsibilities among themselves. The scheduling technique is trivially extensible as more processing elements join the host multicomputer. A simple model of scheduling cost is used by every scheduler node to distribute scheduling activity and minimize wasted processing capacity by using perceived workload to vary decentralized scheduling rules. At low to moderate levels of network activity, wave scheduling is only slightly less efficient than a central scheduler in its ability to direct processing elements to accomplish useful work.

  17. Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

    NASA Astrophysics Data System (ADS)

    Work, Paul R.

    1991-12-01

    This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.

  18. Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory

  19. On-line range images registration with GPGPU

    NASA Astrophysics Data System (ADS)

    Będkowski, J.; Naruniec, J.

    2013-03-01

    This paper concerns implementation of algorithms in the two important aspects of modern 3D data processing: data registration and segmentation. Solution proposed for the first topic is based on the 3D space decomposition, while the latter on image processing and local neighbourhood search. Data processing is implemented by using NVIDIA compute unified device architecture (NIVIDIA CUDA) parallel computation. The result of the segmentation is a coloured map where different colours correspond to different objects, such as walls, floor and stairs. The research is related to the problem of collecting 3D data with a RGB-D camera mounted on a rotated head, to be used in mobile robot applications. Performance of the data registration algorithm is aimed for on-line processing. The iterative closest point (ICP) approach is chosen as a registration method. Computations are based on the parallel fast nearest neighbour search. This procedure decomposes 3D space into cubic buckets and, therefore, the time of the matching is deterministic. First technique of the data segmentation uses accele-rometers integrated with a RGB-D sensor to obtain rotation compensation and image processing method for defining pre-requisites of the known categories. The second technique uses the adapted nearest neighbour search procedure for obtaining normal vectors for each range point.

  20. Planning and Resource Management in an Intelligent Automated Power Management System

    NASA Technical Reports Server (NTRS)

    Morris, Robert A.

    1991-01-01

    Power system management is a process of guiding a power system towards the objective of continuous supply of electrical power to a set of loads. Spacecraft power system management requires planning and scheduling, since electrical power is a scarce resource in space. The automation of power system management for future spacecraft has been recognized as an important R&D goal. Several automation technologies have emerged including the use of expert systems for automating human problem solving capabilities such as rule based expert system for fault diagnosis and load scheduling. It is questionable whether current generation expert system technology is applicable for power system management in space. The objective of the ADEPTS (ADvanced Electrical Power management Techniques for Space systems) is to study new techniques for power management automation. These techniques involve integrating current expert system technology with that of parallel and distributed computing, as well as a distributed, object-oriented approach to software design. The focus of the current study is the integration of new procedures for automatically planning and scheduling loads with procedures for performing fault diagnosis and control. The objective is the concurrent execution of both sets of tasks on separate transputer processors, thus adding parallelism to the overall management process.

  1. Simple and robust generation of ultrafast laser pulse trains using polarization-independent parallel-aligned thin films

    NASA Astrophysics Data System (ADS)

    Wang, Andong; Jiang, Lan; Li, Xiaowei; Wang, Zhi; Du, Kun; Lu, Yongfeng

    2018-05-01

    Ultrafast laser pulse temporal shaping has been widely applied in various important applications such as laser materials processing, coherent control of chemical reactions, and ultrafast imaging. However, temporal pulse shaping has been limited to only-in-lab technique due to the high cost, low damage threshold, and polarization dependence. Herein we propose a novel design of ultrafast laser pulse train generation device, which consists of multiple polarization-independent parallel-aligned thin films. Various pulse trains with controllable temporal profile can be generated flexibly by multi-reflections within the splitting films. Compared with other pulse train generation techniques, this method has advantages of compact structure, low cost, high damage threshold and polarization independence. These advantages endow it with high potential for broad utilization in ultrafast applications.

  2. Parallel image logical operations using cross correlation

    NASA Technical Reports Server (NTRS)

    Strong, J. P., III

    1972-01-01

    Methods are presented for counting areas in an image in a parallel manner using noncoherent optical techniques. The techniques presented include the Levialdi algorithm for counting, optical techniques for binary operations, and cross-correlation.

  3. Machining heavy plastic sections

    NASA Technical Reports Server (NTRS)

    Stalkup, O. M.

    1967-01-01

    Machining technique produces consistently satisfactory plane-parallel optical surfaces for pressure windows, made of plexiglass, required to support a photographic study of liquid rocket combustion processes. The surfaces are machined and polished to the required tolerances and show no degradation from stress relaxation over periods as long as 6 months.

  4. Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.

    PubMed

    Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T

    2017-01-01

    Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.

  5. Ordering Traces Logically to Identify Lateness in Message Passing Programs

    DOE PAGES

    Isaacs, Katherine E.; Gamblin, Todd; Bhatele, Abhinav; ...

    2015-03-30

    Event traces are valuable for understanding the behavior of parallel programs. However, automatically analyzing a large parallel trace is difficult, especially without a specific objective. We aid this endeavor by extracting a trace's logical structure, an ordering of trace events derived from happened-before relationships, while taking into account developer intent. Using this structure, we can calculate an operation's delay relative to its peers on other processes. The logical structure also serves as a platform for comparing and clustering processes as well as highlighting communication patterns in a trace visualization. We present an algorithm for determining this idealized logical structure frommore » traces of message passing programs, and we develop metrics to quantify delays and differences among processes. We implement our techniques in Ravel, a parallel trace visualization tool that displays both logical and physical timelines. Rather than showing the duration of each operation, we display where delays begin and end, and how they propagate. As a result, we apply our approach to the traces of several message passing applications, demonstrating the accuracy of our extracted structure and its utility in analyzing these codes.« less

  6. A Stochastic Spiking Neural Network for Virtual Screening.

    PubMed

    Morro, A; Canals, V; Oliver, A; Alomar, M L; Galan-Prado, F; Ballester, P J; Rossello, J L

    2018-04-01

    Virtual screening (VS) has become a key computational tool in early drug design and screening performance is of high relevance due to the large volume of data that must be processed to identify molecules with the sought activity-related pattern. At the same time, the hardware implementations of spiking neural networks (SNNs) arise as an emerging computing technique that can be applied to parallelize processes that normally present a high cost in terms of computing time and power. Consequently, SNN represents an attractive alternative to perform time-consuming processing tasks, such as VS. In this brief, we present a smart stochastic spiking neural architecture that implements the ultrafast shape recognition (USR) algorithm achieving two order of magnitude of speed improvement with respect to USR software implementations. The neural system is implemented in hardware using field-programmable gate arrays allowing a highly parallelized USR implementation. The results show that, due to the high parallelization of the system, millions of compounds can be checked in reasonable times. From these results, we can state that the proposed architecture arises as a feasible methodology to efficiently enhance time-consuming data-mining processes such as 3-D molecular similarity search.

  7. SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeff S.

    1992-01-01

    Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.

  8. Automated design of spacecraft systems power subsystems

    NASA Technical Reports Server (NTRS)

    Terrile, Richard J.; Kordon, Mark; Mandutianu, Dan; Salcedo, Jose; Wood, Eric; Hashemi, Mona

    2006-01-01

    This paper discusses the application of evolutionary computing to a dynamic space vehicle power subsystem resource and performance simulation in a parallel processing environment. Our objective is to demonstrate the feasibility, application and advantage of using evolutionary computation techniques for the early design search and optimization of space systems.

  9. Co-creating Emotionally Safe Environments at Camp: Training Staff To Facilitate Adventure Activities.

    ERIC Educational Resources Information Center

    Brownlee, Matt; Yerkes, Rita

    2003-01-01

    An emotionally safe environment helps campers participate in adventure activities. Staff development tips for creating a safe environment include using cooperative goal setting; using parallel training processes; developing working lesson plans that outline facilitation techniques for creating emotionally safe environments; and using co-created…

  10. Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations

    PubMed Central

    Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W.

    2016-01-01

    Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures. PMID:26904094

  11. Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations.

    PubMed

    Langenkämper, Daniel; Jakobi, Tobias; Feld, Dustin; Jelonek, Lukas; Goesmann, Alexander; Nattkemper, Tim W

    2016-01-01

    Within the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.

  12. Novel wavelength diversity technique for high-speed atmospheric turbulence compensation

    NASA Astrophysics Data System (ADS)

    Arrasmith, William W.; Sullivan, Sean F.

    2010-04-01

    The defense, intelligence, and homeland security communities are driving a need for software dominant, real-time or near-real time atmospheric turbulence compensated imagery. The development of parallel processing capabilities are finding application in diverse areas including image processing, target tracking, pattern recognition, and image fusion to name a few. A novel approach to the computationally intensive case of software dominant optical and near infrared imaging through atmospheric turbulence is addressed in this paper. Previously, the somewhat conventional wavelength diversity method has been used to compensate for atmospheric turbulence with great success. We apply a new correlation based approach to the wavelength diversity methodology using a parallel processing architecture enabling high speed atmospheric turbulence compensation. Methods for optical imaging through distributed turbulence are discussed, simulation results are presented, and computational and performance assessments are provided.

  13. Massively parallel multicanonical simulations

    NASA Astrophysics Data System (ADS)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  14. Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver

    NASA Technical Reports Server (NTRS)

    Ajmani, Kumud; Taylor, Arthur C., III

    1994-01-01

    This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.

  15. Recent Advances in Techniques for Hyperspectral Image Processing

    NASA Technical Reports Server (NTRS)

    Plaza, Antonio; Benediktsson, Jon Atli; Boardman, Joseph W.; Brazile, Jason; Bruzzone, Lorenzo; Camps-Valls, Gustavo; Chanussot, Jocelyn; Fauvel, Mathieu; Gamba, Paolo; Gualtieri, Anthony; hide

    2009-01-01

    Imaging spectroscopy, also known as hyperspectral imaging, has been transformed in less than 30 years from being a sparse research tool into a commodity product available to a broad user community. Currently, there is a need for standardized data processing techniques able to take into account the special properties of hyperspectral data. In this paper, we provide a seminal view on recent advances in techniques for hyperspectral image processing. Our main focus is on the design of techniques able to deal with the highdimensional nature of the data, and to integrate the spatial and spectral information. Performance of the discussed techniques is evaluated in different analysis scenarios. To satisfy time-critical constraints in specific applications, we also develop efficient parallel implementations of some of the discussed algorithms. Combined, these parts provide an excellent snapshot of the state-of-the-art in those areas, and offer a thoughtful perspective on future potentials and emerging challenges in the design of robust hyperspectral imaging algorithms

  16. High data volume and transfer rate techniques used at NASA's image processing facility

    NASA Technical Reports Server (NTRS)

    Heffner, P.; Connell, E.; Mccaleb, F.

    1978-01-01

    Data storage and transfer operations at a new image processing facility are described. The equipment includes high density digital magnetic tape drives and specially designed controllers to provide an interface between the tape drives and computerized image processing systems. The controller performs the functions necessary to convert the continuous serial data stream from the tape drive to a word-parallel blocked data stream which then goes to the computer-based system. With regard to the tape packing density, 1.8 times 10 to the tenth data bits are stored on a reel of one-inch tape. System components and their operation are surveyed, and studies on advanced storage techniques are summarized.

  17. Parallelized modelling and solution scheme for hierarchically scaled simulations

    NASA Technical Reports Server (NTRS)

    Padovan, Joe

    1995-01-01

    This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

  18. Pyramidal neurovision architecture for vision machines

    NASA Astrophysics Data System (ADS)

    Gupta, Madan M.; Knopf, George K.

    1993-08-01

    The vision system employed by an intelligent robot must be active; active in the sense that it must be capable of selectively acquiring the minimal amount of relevant information for a given task. An efficient active vision system architecture that is based loosely upon the parallel-hierarchical (pyramidal) structure of the biological visual pathway is presented in this paper. Although the computational architecture of the proposed pyramidal neuro-vision system is far less sophisticated than the architecture of the biological visual pathway, it does retain some essential features such as the converging multilayered structure of its biological counterpart. In terms of visual information processing, the neuro-vision system is constructed from a hierarchy of several interactive computational levels, whereupon each level contains one or more nonlinear parallel processors. Computationally efficient vision machines can be developed by utilizing both the parallel and serial information processing techniques within the pyramidal computing architecture. A computer simulation of a pyramidal vision system for active scene surveillance is presented.

  19. Distributed Parallel Processing and Dynamic Load Balancing Techniques for Multidisciplinary High Speed Aircraft Design

    NASA Technical Reports Server (NTRS)

    Krasteva, Denitza T.

    1998-01-01

    Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.

  20. Vectorial finite elements for solving the radiative transfer equation

    NASA Astrophysics Data System (ADS)

    Badri, M. A.; Jolivet, P.; Rousseau, B.; Le Corre, S.; Digonnet, H.; Favennec, Y.

    2018-06-01

    The discrete ordinate method coupled with the finite element method is often used for the spatio-angular discretization of the radiative transfer equation. In this paper we attempt to improve upon such a discretization technique. Instead of using standard finite elements, we reformulate the radiative transfer equation using vectorial finite elements. In comparison to standard finite elements, this reformulation yields faster timings for the linear system assemblies, as well as for the solution phase when using scattering media. The proposed vectorial finite element discretization for solving the radiative transfer equation is cross-validated against a benchmark problem available in literature. In addition, we have used the method of manufactured solutions to verify the order of accuracy for our discretization technique within different absorbing, scattering, and emitting media. For solving large problems of radiation on parallel computers, the vectorial finite element method is parallelized using domain decomposition. The proposed domain decomposition method scales on large number of processes, and its performance is unaffected by the changes in optical thickness of the medium. Our parallel solver is used to solve a large scale radiative transfer problem of the Kelvin-cell radiation.

  1. Contingency Analysis Post-Processing With Advanced Computing and Visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Yousu; Glaesemann, Kurt; Fitzhenry, Erin

    Contingency analysis is a critical function widely used in energy management systems to assess the impact of power system component failures. Its outputs are important for power system operation for improved situational awareness, power system planning studies, and power market operations. With the increased complexity of power system modeling and simulation caused by increased energy production and demand, the penetration of renewable energy and fast deployment of smart grid devices, and the trend of operating grids closer to their capacity for better efficiency, more and more contingencies must be executed and analyzed quickly in order to ensure grid reliability andmore » accuracy for the power market. Currently, many researchers have proposed different techniques to accelerate the computational speed of contingency analysis, but not much work has been published on how to post-process the large amount of contingency outputs quickly. This paper proposes a parallel post-processing function that can analyze contingency analysis outputs faster and display them in a web-based visualization tool to help power engineers improve their work efficiency by fast information digestion. Case studies using an ESCA-60 bus system and a WECC planning system are presented to demonstrate the functionality of the parallel post-processing technique and the web-based visualization tool.« less

  2. Smart cloud system with image processing server in diagnosing brain diseases dedicated for hospitals with limited resources.

    PubMed

    Fahmi, Fahmi; Nasution, Tigor H; Anggreiny, Anggreiny

    2017-01-01

    The use of medical imaging in diagnosing brain disease is growing. The challenges are related to the big size of data and complexity of the image processing. High standard of hardware and software are demanded, which can only be provided in big hospitals. Our purpose was to provide a smart cloud system to help diagnosing brain diseases for hospital with limited infrastructure. The expertise of neurologists was first implanted in cloud server to conduct an automatic diagnosis in real time using image processing technique developed based on ITK library and web service. Users upload images through website and the result, in this case the size of tumor was sent back immediately. A specific image compression technique was developed for this purpose. The smart cloud system was able to measure the area and location of tumors, with average size of 19.91 ± 2.38 cm2 and an average response time 7.0 ± 0.3 s. The capability of the server decreased when multiple clients accessed the system simultaneously: 14 ± 0 s (5 parallel clients) and 27 ± 0.2 s (10 parallel clients). The cloud system was successfully developed to process and analyze medical images for diagnosing brain diseases in this case for tumor.

  3. Parallel Processing Systems for Passive Ranging During Helicopter Flight

    NASA Technical Reports Server (NTRS)

    Sridhar, Bavavar; Suorsa, Raymond E.; Showman, Robert D. (Technical Monitor)

    1994-01-01

    The complexity of rotorcraft missions involving operations close to the ground result in high pilot workload. In order to allow a pilot time to perform mission-oriented tasks, sensor-aiding and automation of some of the guidance and control functions are highly desirable. Images from an electro-optical sensor provide a covert way of detecting objects in the flight path of a low-flying helicopter. Passive ranging consists of processing a sequence of images using techniques based on optical low computation and recursive estimation. The passive ranging algorithm has to extract obstacle information from imagery at rates varying from five to thirty or more frames per second depending on the helicopter speed. We have implemented and tested the passive ranging algorithm off-line using helicopter-collected images. However, the real-time data and computation requirements of the algorithm are beyond the capability of any off-the-shelf microprocessor or digital signal processor. This paper describes the computational requirements of the algorithm and uses parallel processing technology to meet these requirements. Various issues in the selection of a parallel processing architecture are discussed and four different computer architectures are evaluated regarding their suitability to process the algorithm in real-time. Based on this evaluation, we conclude that real-time passive ranging is a realistic goal and can be achieved with a short time.

  4. Improving the scalability of hyperspectral imaging applications on heterogeneous platforms using adaptive run-time data compression

    NASA Astrophysics Data System (ADS)

    Plaza, Antonio; Plaza, Javier; Paz, Abel

    2010-10-01

    Latest generation remote sensing instruments (called hyperspectral imagers) are now able to generate hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth. In previous work, we have reported that the scalability of parallel processing algorithms dealing with these high-dimensional data volumes is affected by the amount of data to be exchanged through the communication network of the system. However, large messages are common in hyperspectral imaging applications since processing algorithms are pixel-based, and each pixel vector to be exchanged through the communication network is made up of hundreds of spectral values. Thus, decreasing the amount of data to be exchanged could improve the scalability and parallel performance. In this paper, we propose a new framework based on intelligent utilization of wavelet-based data compression techniques for improving the scalability of a standard hyperspectral image processing chain on heterogeneous networks of workstations. This type of parallel platform is quickly becoming a standard in hyperspectral image processing due to the distributed nature of collected hyperspectral data as well as its flexibility and low cost. Our experimental results indicate that adaptive lossy compression can lead to improvements in the scalability of the hyperspectral processing chain without sacrificing analysis accuracy, even at sub-pixel precision levels.

  5. An Analysis of Performance Enhancement Techniques for Overset Grid Applications

    NASA Technical Reports Server (NTRS)

    Djomehri, J. J.; Biswas, R.; Potsdam, M.; Strawn, R. C.; Biegel, Bryan (Technical Monitor)

    2002-01-01

    The overset grid methodology has significantly reduced time-to-solution of high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement techniques on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machine. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.

  6. Parallel design of JPEG-LS encoder on graphics processing units

    NASA Astrophysics Data System (ADS)

    Duan, Hao; Fang, Yong; Huang, Bormin

    2012-01-01

    With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.

  7. Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

    PubMed Central

    Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan

    2014-01-01

    Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design. PMID:24757432

  8. Parallelization and checkpointing of GPU applications through program transformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solano-Quinde, Lizandro Damian

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less

  9. Multiscale Simulations of Magnetic Island Coalescence

    NASA Technical Reports Server (NTRS)

    Dorelli, John C.

    2010-01-01

    We describe a new interactive parallel Adaptive Mesh Refinement (AMR) framework written in the Python programming language. This new framework, PyAMR, hides the details of parallel AMR data structures and algorithms (e.g., domain decomposition, grid partition, and inter-process communication), allowing the user to focus on the development of algorithms for advancing the solution of a systems of partial differential equations on a single uniform mesh. We demonstrate the use of PyAMR by simulating the pairwise coalescence of magnetic islands using the resistive Hall MHD equations. Techniques for coupling different physics models on different levels of the AMR grid hierarchy are discussed.

  10. Low level image processing techniques using the pipeline image processing engine in the flight telerobotic servicer

    NASA Technical Reports Server (NTRS)

    Nashman, Marilyn; Chaconas, Karen J.

    1988-01-01

    The sensory processing system for the NASA/NBS Standard Reference Model (NASREM) for telerobotic control is described. This control system architecture was adopted by NASA of the Flight Telerobotic Servicer. The control system is hierarchically designed and consists of three parallel systems: task decomposition, world modeling, and sensory processing. The Sensory Processing System is examined, and in particular the image processing hardware and software used to extract features at low levels of sensory processing for tasks representative of those envisioned for the Space Station such as assembly and maintenance are described.

  11. An application of analyzing the trajectories of two disorders: A parallel piecewise growth model of substance use and attention-deficit/hyperactivity disorder.

    PubMed

    Mamey, Mary Rose; Barbosa-Leiker, Celestina; McPherson, Sterling; Burns, G Leonard; Parks, Craig; Roll, John

    2015-12-01

    Researchers often want to examine 2 comorbid conditions simultaneously. One strategy to do so is through the use of parallel latent growth curve modeling (LGCM). This statistical technique allows for the simultaneous evaluation of 2 disorders to determine the explanations and predictors of change over time. Additionally, a piecewise model can help identify whether there are more than 2 growth processes within each disorder (e.g., during a clinical trial). A parallel piecewise LGCM was applied to self-reported attention-deficit/hyperactivity disorder (ADHD) and self-reported substance use symptoms in 303 adolescents enrolled in cognitive-behavioral therapy treatment for a substance use disorder and receiving either oral-methylphenidate or placebo for ADHD across 16 weeks. Assessing these 2 disorders concurrently allowed us to determine whether elevated levels of 1 disorder predicted elevated levels or increased risk of the other disorder. First, a piecewise growth model measured ADHD and substance use separately. Next, a parallel piecewise LGCM was used to estimate the regressions across disorders to determine whether higher scores at baseline of the disorders (i.e., ADHD or substance use disorder) predicted rates of change in the related disorder. Finally, treatment was added to the model to predict change. While the analyses revealed no significant relationships across disorders, this study explains and applies a parallel piecewise growth model to examine the developmental processes of comorbid conditions over the course of a clinical trial. Strengths of piecewise and parallel LGCMs for other addictions researchers interested in examining dual processes over time are discussed. (PsycINFO Database Record (c) 2015 APA, all rights reserved).

  12. Digital signal processing and control and estimation theory -- Points of tangency, area of intersection, and parallel directions

    NASA Technical Reports Server (NTRS)

    Willsky, A. S.

    1976-01-01

    A number of current research directions in the fields of digital signal processing and modern control and estimation theory were studied. Topics such as stability theory, linear prediction and parameter identification, system analysis and implementation, two-dimensional filtering, decentralized control and estimation, image processing, and nonlinear system theory were examined in order to uncover some of the basic similarities and differences in the goals, techniques, and philosophy of the two disciplines. An extensive bibliography is included.

  13. GPU accelerated fuzzy connected image segmentation by using CUDA.

    PubMed

    Zhuge, Ying; Cao, Yong; Miller, Robert W

    2009-01-01

    Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.

  14. Study on parallel and distributed management of RS data based on spatial database

    NASA Astrophysics Data System (ADS)

    Chen, Yingbiao; Qian, Qinglan; Wu, Hongqiao; Liu, Shijin

    2009-10-01

    With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.

  15. Study on parallel and distributed management of RS data based on spatial data base

    NASA Astrophysics Data System (ADS)

    Chen, Yingbiao; Qian, Qinglan; Liu, Shijin

    2006-12-01

    With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.

  16. Function modeling improves the efficiency of spatial modeling using big data from remote sensing

    Treesearch

    John Hogland; Nathaniel Anderson

    2017-01-01

    Spatial modeling is an integral component of most geographic information systems (GISs). However, conventional GIS modeling techniques can require substantial processing time and storage space and have limited statistical and machine learning functionality. To address these limitations, many have parallelized spatial models using multiple coding libraries and have...

  17. Evolutionary computing for the design search and optimization of space vehicle power subsystems

    NASA Technical Reports Server (NTRS)

    Kordon, M.; Klimeck, G.; Hanks, D.

    2004-01-01

    Evolutionary computing has proven to be a straightforward and robust approach for optimizing a wide range of difficult analysis and design problems. This paper discusses the application of these techniques to an existing space vehicle power subsystem resource and performance analysis simulation in a parallel processing environment.

  18. User's guide to the Reliability Estimation System Testbed (REST)

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Palumbo, Daniel L.; Rifkin, Adam

    1992-01-01

    The Reliability Estimation System Testbed is an X-window based reliability modeling tool that was created to explore the use of the Reliability Modeling Language (RML). RML was defined to support several reliability analysis techniques including modularization, graphical representation, Failure Mode Effects Simulation (FMES), and parallel processing. These techniques are most useful in modeling large systems. Using modularization, an analyst can create reliability models for individual system components. The modules can be tested separately and then combined to compute the total system reliability. Because a one-to-one relationship can be established between system components and the reliability modules, a graphical user interface may be used to describe the system model. RML was designed to permit message passing between modules. This feature enables reliability modeling based on a run time simulation of the system wide effects of a component's failure modes. The use of failure modes effects simulation enhances the analyst's ability to correctly express system behavior when using the modularization approach to reliability modeling. To alleviate the computation bottleneck often found in large reliability models, REST was designed to take advantage of parallel processing on hypercube processors.

  19. Parallel and fault-tolerant algorithms for hypercube multiprocessors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aykanat, C.

    1988-01-01

    Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less

  20. Graphics Processing Unit Assisted Thermographic Compositing

    NASA Technical Reports Server (NTRS)

    Ragasa, Scott; McDougal, Matthew; Russell, Sam

    2013-01-01

    Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques.

  1. Real time software tools and methodologies

    NASA Technical Reports Server (NTRS)

    Christofferson, M. J.

    1981-01-01

    Real time systems are characterized by high speed processing and throughput as well as asynchronous event processing requirements. These requirements give rise to particular implementations of parallel or pipeline multitasking structures, of intertask or interprocess communications mechanisms, and finally of message (buffer) routing or switching mechanisms. These mechanisms or structures, along with the data structue, describe the essential character of the system. These common structural elements and mechanisms are identified, their implementation in the form of routines, tasks or macros - in other words, tools are formalized. The tools developed support or make available the following: reentrant task creation, generalized message routing techniques, generalized task structures/task families, standardized intertask communications mechanisms, and pipeline and parallel processing architectures in a multitasking environment. Tools development raise some interesting prospects in the areas of software instrumentation and software portability. These issues are discussed following the description of the tools themselves.

  2. A Bioluminometric Method of DNA Sequencing

    NASA Technical Reports Server (NTRS)

    Ronaghi, Mostafa; Pourmand, Nader; Stolc, Viktor; Arnold, Jim (Technical Monitor)

    2001-01-01

    Pyrosequencing is a bioluminometric single-tube DNA sequencing method that takes advantage of co-operativity between four enzymes to monitor DNA synthesis. In this sequencing-by-synthesis method, a cascade of enzymatic reactions yields detectable light, which is proportional to incorporated nucleotides. Pyrosequencing has the advantages of accuracy, flexibility and parallel processing. It can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled nucleotides and gel-electrophoresis. In this chapter, the use of this technique for different applications is discussed.

  3. Implementation of a partitioned algorithm for simulation of large CSI problems

    NASA Technical Reports Server (NTRS)

    Alvin, Kenneth F.; Park, K. C.

    1991-01-01

    The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.

  4. Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

    2000-01-01

    The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.

  5. Scalable computing for evolutionary genomics.

    PubMed

    Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

    2012-01-01

    Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.

  6. Production of yarns composed of oriented nanofibers for ophthalmological implants

    NASA Astrophysics Data System (ADS)

    Shynkarenko, A.; Klapstova, A.; Krotov, A.; Moucka, M.; Lukas, D.

    2017-10-01

    Parallelized nanofibrous structures are commonly used in medical sector, especially for the ophthalmological implants. In this research self-fabricated device is tested for improved collection and twisting of the parallel nanofibers. Previously manual techniques are used to collect the nanofibers and then twist is given, where as in our device different parameters can be optimized to obtained parallel nanofibers and further twisting can be given. The device is used to bring automation to the technique of achieving parallel fibrous structures for medical applications.

  7. Simplified Parallel Domain Traversal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Erickson III, David J

    2011-01-01

    Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less

  8. Runtime support for parallelizing data mining algorithms

    NASA Astrophysics Data System (ADS)

    Jin, Ruoming; Agrawal, Gagan

    2002-03-01

    With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.

  9. Data flow modeling techniques

    NASA Technical Reports Server (NTRS)

    Kavi, K. M.

    1984-01-01

    There have been a number of simulation packages developed for the purpose of designing, testing and validating computer systems, digital systems and software systems. Complex analytical tools based on Markov and semi-Markov processes have been designed to estimate the reliability and performance of simulated systems. Petri nets have received wide acceptance for modeling complex and highly parallel computers. In this research data flow models for computer systems are investigated. Data flow models can be used to simulate both software and hardware in a uniform manner. Data flow simulation techniques provide the computer systems designer with a CAD environment which enables highly parallel complex systems to be defined, evaluated at all levels and finally implemented in either hardware or software. Inherent in data flow concept is the hierarchical handling of complex systems. In this paper we will describe how data flow can be used to model computer system.

  10. Every factor helps: Rapid Ptychographic Reconstruction

    NASA Astrophysics Data System (ADS)

    Nashed, Youssef

    2015-03-01

    Recent advances in microscopy, specifically higher spatial resolution and data acquisition rates, require faster and more robust phase retrieval reconstruction methods. Ptychography is a phase retrieval technique for reconstructing the complex transmission function of a specimen from a sequence of diffraction patterns in visible light, X-ray, and electron microscopes. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes. Waiting to postprocess datasets offline results in missed opportunities. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs). A final specimen reconstruction is then achieved by different techniques to merge sub-dataset results into a single complex phase and amplitude image. Results are shown on a simulated specimen and real datasets from X-ray experiments conducted at a synchrotron light source.

  11. Parallel computation in a three-dimensional elastic-plastic finite-element analysis

    NASA Technical Reports Server (NTRS)

    Shivakumar, K. N.; Bigelow, C. A.; Newman, J. C., Jr.

    1992-01-01

    A CRAY parallel processing technique called autotasking was implemented in a three-dimensional elasto-plastic finite-element code. The technique was evaluated on two CRAY supercomputers, a CRAY 2 and a CRAY Y-MP. Autotasking was implemented in all major portions of the code, except the matrix equations solver. Compiler directives alone were not able to properly multitask the code; user-inserted directives were required to achieve better performance. It was noted that the connect time, rather than wall-clock time, was more appropriate to determine speedup in multiuser environments. For a typical example problem, a speedup of 2.1 (1.8 when the solution time was included) was achieved in a dedicated environment and 1.7 (1.6 with solution time) in a multiuser environment on a four-processor CRAY 2 supercomputer. The speedup on a three-processor CRAY Y-MP was about 2.4 (2.0 with solution time) in a multiuser environment.

  12. Architecture and design of a 500-MHz gallium-arsenide processing element for a parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Fouts, Douglas J.; Butner, Steven E.

    1991-01-01

    The design of the processing element of GASP, a GaAs supercomputer with a 500-MHz instruction issue rate and 1-GHz subsystem clocks, is presented. The novel, functionally modular, block data flow architecture of GASP is described. The architecture and design of a GASP processing element is then presented. The processing element (PE) is implemented in a hybrid semiconductor module with 152 custom GaAs ICs of eight different types. The effects of the implementation technology on both the system-level architecture and the PE design are discussed. SPICE simulations indicate that parts of the PE are capable of being clocked at 1 GHz, while the rest of the PE uses a 500-MHz clock. The architecture utilizes data flow techniques at a program block level, which allows efficient execution of parallel programs while maintaining reasonably good performance on sequential programs. A simulation study of the architecture indicates that an instruction execution rate of over 30,000 MIPS can be attained with 65 PEs.

  13. Optimizing transformations of stencil operations for parallel cache-based architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bassetti, F.; Davis, K.

    This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less

  14. Property-driven functional verification technique for high-speed vision system-on-chip processor

    NASA Astrophysics Data System (ADS)

    Nshunguyimfura, Victor; Yang, Jie; Liu, Liyuan; Wu, Nanjian

    2017-04-01

    The implementation of functional verification in a fast, reliable, and effective manner is a challenging task in a vision chip verification process. The main reason for this challenge is the stepwise nature of existing functional verification techniques. This vision chip verification complexity is also related to the fact that in most vision chip design cycles, extensive efforts are focused on how to optimize chip metrics such as performance, power, and area. Design functional verification is not explicitly considered at an earlier stage at which the most sound decisions are made. In this paper, we propose a semi-automatic property-driven verification technique. The implementation of all verification components is based on design properties. We introduce a low-dimension property space between the specification space and the implementation space. The aim of this technique is to speed up the verification process for high-performance parallel processing vision chips. Our experimentation results show that the proposed technique can effectively improve the verification effort up to 20% for the complex vision chip design while reducing the simulation and debugging overheads.

  15. Determination of rhenium in molybdenite by neutron-activation analysis.

    PubMed

    Terada, K; Yoshimura, Y; Osaki, S; Kiba, T

    1967-01-01

    A neutron-activation method is described for the determination of rhenium in molybdenite. Radiochemical separation by a carrier technique was carried out very rapidly by means of successive liquid-liquid extraction processes. The recovery of rhenium, which was determined by a spectrophotometric method, was about 93%. About 10 samples could be analysed within 6 hr in parallel runs.

  16. A Planning Approach for Developing Inventory and Monitoring Programs In National Parks

    Treesearch

    David L. Peterson; David G. Silsbee; Daniel L. Schmoldt

    1995-01-01

    This document offers some conceptual ideas on how individual parks could plan and implement an Inventory and Monitoring (I&M) program. In several respects, the authors ideas parallel and complement the I&M project planning and development process outlined in the Natural Resources Inventory and Monitoring Guideline (NPS-75). However, no universal techniques...

  17. Parallel Fortran-MPI software for numerical inversion of the Laplace transform and its application to oscillatory water levels in groundwater environments

    USGS Publications Warehouse

    Zhan, X.

    2005-01-01

    A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.

  18. Overcoming rule-based rigidity and connectionist limitations through massively-parallel case-based reasoning

    NASA Technical Reports Server (NTRS)

    Barnden, John; Srinivas, Kankanahalli

    1990-01-01

    Symbol manipulation as used in traditional Artificial Intelligence has been criticized by neural net researchers for being excessively inflexible and sequential. On the other hand, the application of neural net techniques to the types of high-level cognitive processing studied in traditional artificial intelligence presents major problems as well. A promising way out of this impasse is to build neural net models that accomplish massively parallel case-based reasoning. Case-based reasoning, which has received much attention recently, is essentially the same as analogy-based reasoning, and avoids many of the problems leveled at traditional artificial intelligence. Further problems are avoided by doing many strands of case-based reasoning in parallel, and by implementing the whole system as a neural net. In addition, such a system provides an approach to some aspects of the problems of noise, uncertainty and novelty in reasoning systems. The current neural net system (Conposit), which performs standard rule-based reasoning, is being modified into a massively parallel case-based reasoning version.

  19. An approach to enhance pnetCDF performance in ...

    EPA Pesticide Factsheets

    Data intensive simulations are often limited by their I/O (input/output) performance, and "novel" techniques need to be developed in order to overcome this limitation. The software package pnetCDF (parallel network Common Data Form), which works with parallel file systems, was developed to address this issue by providing parallel I/O capability. This study examines the performance of an application-level data aggregation approach which performs data aggregation along either row or column dimension of MPI (Message Passing Interface) processes on a spatially decomposed domain, and then applies the pnetCDF parallel I/O paradigm. The test was done with three different domain sizes which represent small, moderately large, and large data domains, using a small-scale Community Multiscale Air Quality model (CMAQ) mock-up code. The examination includes comparing I/O performance with traditional serial I/O technique, straight application of pnetCDF, and the data aggregation along row and column dimension before applying pnetCDF. After the comparison, "optimal" I/O configurations of this application-level data aggregation approach were quantified. Data aggregation along the row dimension (pnetCDFcr) works better than along the column dimension (pnetCDFcc) although it may perform slightly worse than the straight pnetCDF method with a small number of processors. When the number of processors becomes larger, pnetCDFcr outperforms pnetCDF significantly. If the number of proces

  20. Computer Science Techniques Applied to Parallel Atomistic Simulation

    NASA Astrophysics Data System (ADS)

    Nakano, Aiichiro

    1998-03-01

    Recent developments in parallel processing technology and multiresolution numerical algorithms have established large-scale molecular dynamics (MD) simulations as a new research mode for studying materials phenomena such as fracture. However, this requires large system sizes and long simulated times. We have developed: i) Space-time multiresolution schemes; ii) fuzzy-clustering approach to hierarchical dynamics; iii) wavelet-based adaptive curvilinear-coordinate load balancing; iv) multilevel preconditioned conjugate gradient method; and v) spacefilling-curve-based data compression for parallel I/O. Using these techniques, million-atom parallel MD simulations are performed for the oxidation dynamics of nanocrystalline Al. The simulations take into account the effect of dynamic charge transfer between Al and O using the electronegativity equalization scheme. The resulting long-range Coulomb interaction is calculated efficiently with the fast multipole method. Results for temperature and charge distributions, residual stresses, bond lengths and bond angles, and diffusivities of Al and O will be presented. The oxidation of nanocrystalline Al is elucidated through immersive visualization in virtual environments. A unique dual-degree education program at Louisiana State University will also be discussed in which students can obtain a Ph.D. in Physics & Astronomy and a M.S. from the Department of Computer Science in five years. This program fosters interdisciplinary research activities for interfacing High Performance Computing and Communications with large-scale atomistic simulations of advanced materials. This work was supported by NSF (CAREER Program), ARO, PRF, and Louisiana LEQSF.

  1. Traffic Simulations on Parallel Computers Using Domain Decomposition Techniques

    DOT National Transportation Integrated Search

    1995-01-01

    Large scale simulations of Intelligent Transportation Systems (ITS) can only be acheived by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic...

  2. Efficient implementation of real-time programs under the VAX/VMS operating system

    NASA Technical Reports Server (NTRS)

    Johnson, S. C.

    1985-01-01

    Techniques for writing efficient real-time programs under the VAX/VMS oprating system are presented. Basic operations are presented for executing at real-time priority and for avoiding needlless processing delays. A highly efficient technique for accessing physical devices by mapping to the input/output space and accessing the device registrs directly is described. To illustrate the application of the technique, examples are included of different uses of the technique on three devices in the Langley Avionics Integration Research Lab (AIRLAB): the KW11-K dual programmable real-time clock, the Parallel Communications Link (PCL11-B) communication system, and the Datacom Synchronization Network. Timing data are included to demonstrate the performance improvements realized with these applications of the technique.

  3. Iterative nonlinear joint transform correlation for the detection of objects in cluttered scenes

    NASA Astrophysics Data System (ADS)

    Haist, Tobias; Tiziani, Hans J.

    1999-03-01

    An iterative correlation technique with digital image processing in the feedback loop for the detection of small objects in cluttered scenes is proposed. A scanning aperture is combined with the method in order to improve the immunity against noise and clutter. Multiple reference objects or different views of one object are processed in parallel. We demonstrate the method by detecting a noisy and distorted face in a crowd with a nonlinear joint transform correlator.

  4. Current status and future prospects for enabling chemistry technology in the drug discovery process.

    PubMed

    Djuric, Stevan W; Hutchins, Charles W; Talaty, Nari N

    2016-01-01

    This review covers recent advances in the implementation of enabling chemistry technologies into the drug discovery process. Areas covered include parallel synthesis chemistry, high-throughput experimentation, automated synthesis and purification methods, flow chemistry methodology including photochemistry, electrochemistry, and the handling of "dangerous" reagents. Also featured are advances in the "computer-assisted drug design" area and the expanding application of novel mass spectrometry-based techniques to a wide range of drug discovery activities.

  5. An Adaptive Kalman Filter Using a Simple Residual Tuning Method

    NASA Technical Reports Server (NTRS)

    Harman, Richard R.

    1999-01-01

    One difficulty in using Kalman filters in real world situations is the selection of the correct process noise, measurement noise, and initial state estimate and covariance. These parameters are commonly referred to as tuning parameters. Multiple methods have been developed to estimate these parameters. Most of those methods such as maximum likelihood, subspace, and observer Kalman Identification require extensive offline processing and are not suitable for real time processing. One technique, which is suitable for real time processing, is the residual tuning method. Any mismodeling of the filter tuning parameters will result in a non-white sequence for the filter measurement residuals. The residual tuning technique uses this information to estimate corrections to those tuning parameters. The actual implementation results in a set of sequential equations that run in parallel with the Kalman filter. A. H. Jazwinski developed a specialized version of this technique for estimation of process noise. Equations for the estimation of the measurement noise have also been developed. These algorithms are used to estimate the process noise and measurement noise for the Wide Field Infrared Explorer star tracker and gyro.

  6. High Performance Compression of Science Data

    NASA Technical Reports Server (NTRS)

    Storer, James A.; Carpentieri, Bruno; Cohn, Martin

    1994-01-01

    Two papers make up the body of this report. One presents a single-pass adaptive vector quantization algorithm that learns a codebook of variable size and shape entries; the authors present experiments on a set of test images showing that with no training or prior knowledge of the data, for a given fidelity, the compression achieved typically equals or exceeds that of the JPEG standard. The second paper addresses motion compensation, one of the most effective techniques used in interframe data compression. A parallel block-matching algorithm for estimating interframe displacement of blocks with minimum error is presented. The algorithm is designed for a simple parallel architecture to process video in real time.

  7. Resource Management for Distributed Parallel Systems

    NASA Technical Reports Server (NTRS)

    Neuman, B. Clifford; Rao, Santosh

    1993-01-01

    Multiprocessor systems should exist in the the larger context of distributed systems, allowing multiprocessor resources to be shared by those that need them. Unfortunately, typical multiprocessor resource management techniques do not scale to large networks. The Prospero Resource Manager (PRM) is a scalable resource allocation system that supports the allocation of processing resources in large networks and multiprocessor systems. To manage resources in such distributed parallel systems, PRM employs three types of managers: system managers, job managers, and node managers. There exist multiple independent instances of each type of manager, reducing bottlenecks. The complexity of each manager is further reduced because each is designed to utilize information at an appropriate level of abstraction.

  8. A survey of GPU-based acceleration techniques in MRI reconstructions

    PubMed Central

    Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou

    2018-01-01

    Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community. PMID:29675361

  9. A survey of GPU-based acceleration techniques in MRI reconstructions.

    PubMed

    Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou; Liang, Dong

    2018-03-01

    Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community.

  10. Exploiting Symmetry on Parallel Architectures.

    NASA Astrophysics Data System (ADS)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  11. The correlation study of parallel feature extractor and noise reduction approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dewi, Deshinta Arrova; Sundararajan, Elankovan; Prabuwono, Anton Satria

    2015-05-15

    This paper presents literature reviews that show variety of techniques to develop parallel feature extractor and finding its correlation with noise reduction approaches for low light intensity images. Low light intensity images are normally displayed as darker images and low contrast. Without proper handling techniques, those images regularly become evidences of misperception of objects and textures, the incapability to section them. The visual illusions regularly clues to disorientation, user fatigue, poor detection and classification performance of humans and computer algorithms. Noise reduction approaches (NR) therefore is an essential step for other image processing steps such as edge detection, image segmentation,more » image compression, etc. Parallel Feature Extractor (PFE) meant to capture visual contents of images involves partitioning images into segments, detecting image overlaps if any, and controlling distributed and redistributed segments to extract the features. Working on low light intensity images make the PFE face challenges and closely depend on the quality of its pre-processing steps. Some papers have suggested many well established NR as well as PFE strategies however only few resources have suggested or mentioned the correlation between them. This paper reviews best approaches of the NR and the PFE with detailed explanation on the suggested correlation. This finding may suggest relevant strategies of the PFE development. With the help of knowledge based reasoning, computational approaches and algorithms, we present the correlation study between the NR and the PFE that can be useful for the development and enhancement of other existing PFE.« less

  12. Big data driven cycle time parallel prediction for production planning in wafer manufacturing

    NASA Astrophysics Data System (ADS)

    Wang, Junliang; Yang, Jungang; Zhang, Jie; Wang, Xiaoxi; Zhang, Wenjun Chris

    2018-07-01

    Cycle time forecasting (CTF) is one of the most crucial issues for production planning to keep high delivery reliability in semiconductor wafer fabrication systems (SWFS). This paper proposes a novel data-intensive cycle time (CT) prediction system with parallel computing to rapidly forecast the CT of wafer lots with large datasets. First, a density peak based radial basis function network (DP-RBFN) is designed to forecast the CT with the diverse and agglomerative CT data. Second, the network learning method based on a clustering technique is proposed to determine the density peak. Third, a parallel computing approach for network training is proposed in order to speed up the training process with large scaled CT data. Finally, an experiment with respect to SWFS is presented, which demonstrates that the proposed CTF system can not only speed up the training process of the model but also outperform the radial basis function network, the back-propagation-network and multivariate regression methodology based CTF methods in terms of the mean absolute deviation and standard deviation.

  13. Fast Face-Recognition Optical Parallel Correlator Using High Accuracy Correlation Filter

    NASA Astrophysics Data System (ADS)

    Watanabe, Eriko; Kodate, Kashiko

    2005-11-01

    We designed and fabricated a fully automatic fast face recognition optical parallel correlator [E. Watanabe and K. Kodate: Appl. Opt. 44 (2005) 5666] based on the VanderLugt principle. The implementation of an as-yet unattained ultra high-speed system was aided by reconfiguring the system to make it suitable for easier parallel processing, as well as by composing a higher accuracy correlation filter and high-speed ferroelectric liquid crystal-spatial light modulator (FLC-SLM). In running trial experiments using this system (dubbed FARCO), we succeeded in acquiring remarkably low error rates of 1.3% for false match rate (FMR) and 2.6% for false non-match rate (FNMR). Given the results of our experiments, the aim of this paper is to examine methods of designing correlation filters and arranging database image arrays for even faster parallel correlation, underlining the issues of calculation technique, quantization bit rate, pixel size and shift from optical axis. The correlation filter has proved its excellent performance and higher precision than classical correlation and joint transform correlator (JTC). Moreover, arrangement of multi-object reference images leads to 10-channel correlation signals, as sharply marked as those of a single channel. This experiment result demonstrates great potential for achieving the process speed of 10000 face/s.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Song

    CFD (Computational Fluid Dynamics) is a widely used technique in engineering design field. It uses mathematical methods to simulate and predict flow characteristics in a certain physical space. Since the numerical result of CFD computation is very hard to understand, VR (virtual reality) and data visualization techniques are introduced into CFD post-processing to improve the understandability and functionality of CFD computation. In many cases CFD datasets are very large (multi-gigabytes), and more and more interactions between user and the datasets are required. For the traditional VR application, the limitation of computing power is a major factor to prevent visualizing largemore » dataset effectively. This thesis presents a new system designing to speed up the traditional VR application by using parallel computing and distributed computing, and the idea of using hand held device to enhance the interaction between a user and VR CFD application as well. Techniques in different research areas including scientific visualization, parallel computing, distributed computing and graphical user interface designing are used in the development of the final system. As the result, the new system can flexibly be built on heterogeneous computing environment, dramatically shorten the computation time.« less

  15. Parallel halftoning technique using dot diffusion optimization

    NASA Astrophysics Data System (ADS)

    Molina-Garcia, Javier; Ponomaryov, Volodymyr I.; Reyes-Reyes, Rogelio; Cruz-Ramos, Clara

    2017-05-01

    In this paper, a novel approach for halftone images is proposed and implemented for images that are obtained by the Dot Diffusion (DD) method. Designed technique is based on an optimization of the so-called class matrix used in DD algorithm and it consists of generation new versions of class matrix, which has no baron and near-baron in order to minimize inconsistencies during the distribution of the error. Proposed class matrix has different properties and each is designed for two different applications: applications where the inverse-halftoning is necessary, and applications where this method is not required. The proposed method has been implemented in GPU (NVIDIA GeForce GTX 750 Ti), multicore processors (AMD FX(tm)-6300 Six-Core Processor and in Intel core i5-4200U), using CUDA and OpenCV over a PC with linux. Experimental results have shown that novel framework generates a good quality of the halftone images and the inverse halftone images obtained. The simulation results using parallel architectures have demonstrated the efficiency of the novel technique when it is implemented in real-time processing.

  16. New technique for real-time distortion-invariant multiobject recognition and classification

    NASA Astrophysics Data System (ADS)

    Hong, Rutong; Li, Xiaoshun; Hong, En; Wang, Zuyi; Wei, Hongan

    2001-04-01

    A real-time hybrid distortion-invariant OPR system was established to make 3D multiobject distortion-invariant automatic pattern recognition. Wavelet transform technique was used to make digital preprocessing of the input scene, to depress the noisy background and enhance the recognized object. A three-layer backpropagation artificial neural network was used in correlation signal post-processing to perform multiobject distortion-invariant recognition and classification. The C-80 and NOA real-time processing ability and the multithread programming technology were used to perform high speed parallel multitask processing and speed up the post processing rate to ROIs. The reference filter library was constructed for the distortion version of 3D object model images based on the distortion parameter tolerance measuring as rotation, azimuth and scale. The real-time optical correlation recognition testing of this OPR system demonstrates that using the preprocessing, post- processing, the nonlinear algorithm os optimum filtering, RFL construction technique and the multithread programming technology, a high possibility of recognition and recognition rate ere obtained for the real-time multiobject distortion-invariant OPR system. The recognition reliability and rate was improved greatly. These techniques are very useful to automatic target recognition.

  17. Minimum envelope roughness pulse design for reduced amplifier distortion in parallel excitation.

    PubMed

    Grissom, William A; Kerr, Adam B; Stang, Pascal; Scott, Greig C; Pauly, John M

    2010-11-01

    Parallel excitation uses multiple transmit channels and coils, each driven by independent waveforms, to afford the pulse designer an additional spatial encoding mechanism that complements gradient encoding. In contrast to parallel reception, parallel excitation requires individual power amplifiers for each transmit channel, which can be cost prohibitive. Several groups have explored the use of low-cost power amplifiers for parallel excitation; however, such amplifiers commonly exhibit nonlinear memory effects that distort radio frequency pulses. This is especially true for pulses with rapidly varying envelopes, which are common in parallel excitation. To overcome this problem, we introduce a technique for parallel excitation pulse design that yields pulses with smoother envelopes. We demonstrate experimentally that pulses designed with the new technique suffer less amplifier distortion than unregularized pulses and pulses designed with conventional regularization.

  18. Code Optimization and Parallelization on the Origins: Looking from Users' Perspective

    NASA Technical Reports Server (NTRS)

    Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)

    2002-01-01

    Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.

  19. Use of a parallel path nebulizer for capillary-based microseparation techniques coupled with an inductively coupled plasma mass spectrometer for speciation measurements

    NASA Astrophysics Data System (ADS)

    Yanes, Enrique G.; Miller-Ihli, Nancy J.

    2004-06-01

    A low flow, parallel path Mira Mist CE nebulizer designed for capillary electrophoresis (CE) was evaluated as a function of make-up solution flow rate, composition, and concentration, as well as the nebulizer gas flow rate. This research was conducted in support of a project related to the separation and quantification of cobalamin (vitamin B-12) species using microseparation techniques combined with inductively coupled plasma mass spectrometry (ICP-MS) detection. As such, Co signals were monitored during the nebulizer characterization process. Transient effects in the ICP were studied to evaluate the suitability of using gradients for microseparations and the benefit of using methanol for the make-up solution was demonstrated. Co signal response changed significantly as a function of changing methanol concentrations of the make-up solution and maximum signal enhancement was seen at 20% methanol with a 15 μl/min flow rate. Evaluation of the effect of changing the nebulizer gas flow rates showed that argon flows from 0.8 to 1.2 l/min were equally effective. The Mira Mist CE parallel path nebulizer was then evaluated for interfacing capillary microseparation techniques including capillary electrophoresis (CE) and micro high performance liquid chromatography (μHPLC) to inductively coupled plasma mass spectrometry (ICP-MS). A mixture of four cobalamin species standards (cyanocobalamin, hydroxocobalamin, methylcobalamin, and 5' deoxyadenosylcobalamin) and the corrinoid analogue cobinamide dicyanide were successfully separated using both CE-ICP-MS and μHPLC-ICP-MS using the parallel path nebulizer with a make-up solution containing 20% methanol with a flow rate of 15 μl/min.

  20. A phase-based stereo vision system-on-a-chip.

    PubMed

    Díaz, Javier; Ros, Eduardo; Sabatini, Silvio P; Solari, Fabio; Mota, Sonia

    2007-02-01

    A simple and fast technique for depth estimation based on phase measurement has been adopted for the implementation of a real-time stereo system with sub-pixel resolution on an FPGA device. The technique avoids the attendant problem of phase warping. The designed system takes full advantage of the inherent processing parallelism and segmentation capabilities of FPGA devices to achieve a computation speed of 65megapixels/s, which can be arranged with a customized frame-grabber module to process 211frames/s at a size of 640x480 pixels. The processing speed achieved is higher than conventional camera frame rates, thus allowing the system to extract multiple estimations and be used as a platform to evaluate integration schemes of a population of neurons without increasing hardware resource demands.

  1. Model-based tomographic reconstruction

    DOEpatents

    Chambers, David H; Lehman, Sean K; Goodman, Dennis M

    2012-06-26

    A model-based approach to estimating wall positions for a building is developed and tested using simulated data. It borrows two techniques from geophysical inversion problems, layer stripping and stacking, and combines them with a model-based estimation algorithm that minimizes the mean-square error between the predicted signal and the data. The technique is designed to process multiple looks from an ultra wideband radar array. The processed signal is time-gated and each section processed to detect the presence of a wall and estimate its position, thickness, and material parameters. The floor plan of a building is determined by moving the array around the outside of the building. In this paper we describe how the stacking and layer stripping algorithms are combined and show the results from a simple numerical example of three parallel walls.

  2. Benchmarking GPU and CPU codes for Heisenberg spin glass over-relaxation

    NASA Astrophysics Data System (ADS)

    Bernaschi, M.; Parisi, G.; Parisi, L.

    2011-06-01

    We present a set of possible implementations for Graphics Processing Units (GPU) of the Over-relaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/s of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits the GPU shared memory further reduces this time. Such results are compared with those obtained by means of a highly-tuned vector-parallel code on latest generation multi-core CPUs.

  3. [Three-dimensional parallel collagen scaffold promotes tendon extracellular matrix formation].

    PubMed

    Zheng, Zefeng; Shen, Weiliang; Le, Huihui; Dai, Xuesong; Ouyang, Hongwei; Chen, Weishan

    2016-03-01

    To investigate the effects of three-dimensional parallel collagen scaffold on the cell shape, arrangement and extracellular matrix formation of tendon stem cells. Parallel collagen scaffold was fabricated by unidirectional freezing technique, while random collagen scaffold was fabricated by freeze-drying technique. The effects of two scaffolds on cell shape and extracellular matrix formation were investigated in vitro by seeding tendon stem/progenitor cells and in vivo by ectopic implantation. Parallel and random collagen scaffolds were produced successfully. Parallel collagen scaffold was more akin to tendon than random collagen scaffold. Tendon stem/progenitor cells were spindle-shaped and unified orientated in parallel collagen scaffold, while cells on random collagen scaffold had disorder orientation. Two weeks after ectopic implantation, cells had nearly the same orientation with the collagen substance. In parallel collagen scaffold, cells had parallel arrangement, and more spindly cells were observed. By contrast, cells in random collagen scaffold were disorder. Parallel collagen scaffold can induce cells to be in spindly and parallel arrangement, and promote parallel extracellular matrix formation; while random collagen scaffold can induce cells in random arrangement. The results indicate that parallel collagen scaffold is an ideal structure to promote tendon repairing.

  4. Parallel algorithm of real-time infrared image restoration based on total variation theory

    NASA Astrophysics Data System (ADS)

    Zhu, Ran; Li, Miao; Long, Yunli; Zeng, Yaoyuan; An, Wei

    2015-10-01

    Image restoration is a necessary preprocessing step for infrared remote sensing applications. Traditional methods allow us to remove the noise but penalize too much the gradients corresponding to edges. Image restoration techniques based on variational approaches can solve this over-smoothing problem for the merits of their well-defined mathematical modeling of the restore procedure. The total variation (TV) of infrared image is introduced as a L1 regularization term added to the objective energy functional. It converts the restoration process to an optimization problem of functional involving a fidelity term to the image data plus a regularization term. Infrared image restoration technology with TV-L1 model exploits the remote sensing data obtained sufficiently and preserves information at edges caused by clouds. Numerical implementation algorithm is presented in detail. Analysis indicates that the structure of this algorithm can be easily implemented in parallelization. Therefore a parallel implementation of the TV-L1 filter based on multicore architecture with shared memory is proposed for infrared real-time remote sensing systems. Massive computation of image data is performed in parallel by cooperating threads running simultaneously on multiple cores. Several groups of synthetic infrared image data are used to validate the feasibility and effectiveness of the proposed parallel algorithm. Quantitative analysis of measuring the restored image quality compared to input image is presented. Experiment results show that the TV-L1 filter can restore the varying background image reasonably, and that its performance can achieve the requirement of real-time image processing.

  5. Highly Concurrent Scalar Processing.

    DTIC Science & Technology

    1986-01-01

    rearrangement arise from data dependencies between instructions, hence it is critical that artificial - dependencies are eliminated whenever possible...An important class of artificial depen- *. dencies arise due to register reuse. In the following example, no parallelism can be • . exploited in the...specific procedure call site. The use of inteligent procedure expansion techniques is expected to be crucial to the achievement of high performance

  6. Current status and future prospects for enabling chemistry technology in the drug discovery process

    PubMed Central

    Djuric, Stevan W.; Hutchins, Charles W.; Talaty, Nari N.

    2016-01-01

    This review covers recent advances in the implementation of enabling chemistry technologies into the drug discovery process. Areas covered include parallel synthesis chemistry, high-throughput experimentation, automated synthesis and purification methods, flow chemistry methodology including photochemistry, electrochemistry, and the handling of “dangerous” reagents. Also featured are advances in the “computer-assisted drug design” area and the expanding application of novel mass spectrometry-based techniques to a wide range of drug discovery activities. PMID:27781094

  7. Partitioning in parallel processing of production systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oflazer, K.

    1987-01-01

    This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less

  8. Development of iterative techniques for the solution of unsteady compressible viscous flows

    NASA Technical Reports Server (NTRS)

    Hixon, Duane; Sankar, L. N.

    1993-01-01

    During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.

  9. Parallel processing considerations for image recognition tasks

    NASA Astrophysics Data System (ADS)

    Simske, Steven J.

    2011-01-01

    Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.

  10. Parallel MR imaging: a user's guide.

    PubMed

    Glockner, James F; Hu, Houchun H; Stanley, David W; Angelos, Lisa; King, Kevin

    2005-01-01

    Parallel imaging is a recently developed family of techniques that take advantage of the spatial information inherent in phased-array radiofrequency coils to reduce acquisition times in magnetic resonance imaging. In parallel imaging, the number of sampled k-space lines is reduced, often by a factor of two or greater, thereby significantly shortening the acquisition time. Parallel imaging techniques have only recently become commercially available, and the wide range of clinical applications is just beginning to be explored. The potential clinical applications primarily involve reduction in acquisition time, improved spatial resolution, or a combination of the two. Improvements in image quality can be achieved by reducing the echo train lengths of fast spin-echo and single-shot fast spin-echo sequences. Parallel imaging is particularly attractive for cardiac and vascular applications and will likely prove valuable as 3-T body and cardiovascular imaging becomes part of standard clinical practice. Limitations of parallel imaging include reduced signal-to-noise ratio and reconstruction artifacts. It is important to consider these limitations when deciding when to use these techniques. (c) RSNA, 2005.

  11. Multifaceted free-space image distributor for optical interconnects in massively parrallel processing

    NASA Astrophysics Data System (ADS)

    Zhao, Feng; Frietman, Edward E. E.; Han, Zhong; Chen, Ray T.

    1999-04-01

    A characteristic feature of a conventional von Neumann computer is that computing power is delivered by a single processing unit. Although increasing the clock frequency improves the performance of the computer, the switching speed of the semiconductor devices and the finite speed at which electrical signals propagate along the bus set the boundaries. Architectures containing large numbers of nodes can solve this performance dilemma, with the comment that main obstacles in designing such systems are caused by difficulties to come up with solutions that guarantee efficient communications among the nodes. Exchanging data becomes really a bottleneck should al nodes be connected by a shared resource. Only optics, due to its inherent parallelism, could solve that bottleneck. Here, we explore a multi-faceted free space image distributor to be used in optical interconnects in massively parallel processing. In this paper, physical and optical models of the image distributor are focused on from diffraction theory of light wave to optical simulations. the general features and the performance of the image distributor are also described. The new structure of an image distributor and the simulations for it are discussed. From the digital simulation and experiment, it is found that the multi-faceted free space image distributing technique is quite suitable for free space optical interconnection in massively parallel processing and new structure of the multifaceted free space image distributor would perform better.

  12. Storing files in a parallel computing system based on user or application specification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Faibish, Sorin; Bent, John M.; Nick, Jeffrey M.

    2016-03-29

    Techniques are provided for storing files in a parallel computing system based on a user-specification. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a specification from the distributed application indicating how the plurality of files should be stored; and storing one or more of the plurality of files in one or more storage nodes of a multi-tier storage system based on the specification. The plurality of files comprise a plurality of complete files and/or a plurality of sub-files. The specification can optionally be processed by a daemon executing on onemore » or more nodes in a multi-tier storage system. The specification indicates how the plurality of files should be stored, for example, identifying one or more storage nodes where the plurality of files should be stored.« less

  13. Storing files in a parallel computing system using list-based index to identify replica files

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Faibish, Sorin; Bent, John M.; Tzelnic, Percy

    Improved techniques are provided for storing files in a parallel computing system using a list-based index to identify file replicas. A file and at least one replica of the file are stored in one or more storage nodes of the parallel computing system. An index for the file comprises at least one list comprising a pointer to a storage location of the file and a storage location of the at least one replica of the file. The file comprises one or more of a complete file and one or more sub-files. The index may also comprise a checksum value formore » one or more of the file and the replica(s) of the file. The checksum value can be evaluated to validate the file and/or the file replica(s). A query can be processed using the list.« less

  14. Domain Decomposition By the Advancing-Partition Method for Parallel Unstructured Grid Generation

    NASA Technical Reports Server (NTRS)

    Pirzadeh, Shahyar Z.; Zagaris, George

    2009-01-01

    A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.

  15. The Development of Models for Carbon Dioxide Reduction Technologies for Spacecraft Air Revitalization

    NASA Technical Reports Server (NTRS)

    Swickrath, Michael J.; Anderson, Molly

    2012-01-01

    Through the respiration process, humans consume oxygen (O2) while producing carbon dioxide (CO2) and water (H2O) as byproducts. For long term space exploration, CO2 concentration in the atmosphere must be managed to prevent hypercapnia. Moreover, CO2 can be used as a source of oxygen through chemical reduction serving to minimize the amount of oxygen required at launch. Reduction can be achieved through a number of techniques. NASA is currently exploring the Sabatier reaction, the Bosch reaction, and co- electrolysis of CO2 and H2O for this process. Proof-of-concept experiments and prototype units for all three processes have proven capable of returning useful commodities for space exploration. All three techniques have demonstrated the capacity to reduce CO2 in the laboratory, yet there is interest in understanding how all three techniques would perform at a system level within a spacecraft. Consequently, there is an impetus to develop predictive models for these processes that can be readily rescaled and integrated into larger system models. Such analysis tools provide the ability to evaluate each technique on a comparable basis with respect to processing rates. This manuscript describes the current models for the carbon dioxide reduction processes under parallel developmental efforts. Comparison to experimental data is provided were available for verification purposes.

  16. Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations

    NASA Astrophysics Data System (ADS)

    Hunsaker, Isaac L.

    Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.

  17. Computers for symbolic processing

    NASA Technical Reports Server (NTRS)

    Wah, Benjamin W.; Lowrie, Matthew B.; Li, Guo-Jie

    1989-01-01

    A detailed survey on the motivations, design, applications, current status, and limitations of computers designed for symbolic processing is provided. Symbolic processing computations are performed at the word, relation, or meaning levels, and the knowledge used in symbolic applications may be fuzzy, uncertain, indeterminate, and ill represented. Various techniques for knowledge representation and processing are discussed from both the designers' and users' points of view. The design and choice of a suitable language for symbolic processing and the mapping of applications into a software architecture are then considered. The process of refining the application requirements into hardware and software architectures is treated, and state-of-the-art sequential and parallel computers designed for symbolic processing are discussed.

  18. A wireless data acquisition system for acoustic emission testing

    NASA Astrophysics Data System (ADS)

    Zimmerman, A. T.; Lynch, J. P.

    2013-01-01

    As structural health monitoring (SHM) systems have seen increased demand due to lower costs and greater capabilities, wireless technologies have emerged that enable the dense distribution of transducers and the distributed processing of sensor data. In parallel, ultrasonic techniques such as acoustic emission (AE) testing have become increasingly popular in the non-destructive evaluation of materials and structures. These techniques, which involve the analysis of frequency content between 1 kHz and 1 MHz, have proven effective in detecting the onset of cracking and other early-stage failure in active structures such as airplanes in flight. However, these techniques typically involve the use of expensive and bulky monitoring equipment capable of accurately sensing AE signals at sampling rates greater than 1 million samples per second. In this paper, a wireless data acquisition system is presented that is capable of collecting, storing, and processing AE data at rates of up to 20 MHz. Processed results can then be wirelessly transmitted in real-time, creating a system that enables the use of ultrasonic techniques in large-scale SHM systems.

  19. High performance compression of science data

    NASA Technical Reports Server (NTRS)

    Storer, James A.; Cohn, Martin

    1994-01-01

    Two papers make up the body of this report. One presents a single-pass adaptive vector quantization algorithm that learns a codebook of variable size and shape entries; the authors present experiments on a set of test images showing that with no training or prior knowledge of the data, for a given fidelity, the compression achieved typically equals or exceeds that of the JPEG standard. The second paper addresses motion compensation, one of the most effective techniques used in the interframe data compression. A parallel block-matching algorithm for estimating interframe displacement of blocks with minimum error is presented. The algorithm is designed for a simple parallel architecture to process video in real time.

  20. Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald.

    PubMed

    Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

    2014-02-28

    In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.

  1. Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald

    PubMed Central

    Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

    2015-01-01

    In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230

  2. FPGA-based protein sequence alignment : A review

    NASA Astrophysics Data System (ADS)

    Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi

    2017-11-01

    Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.

  3. DInSAR time series generation within a cloud computing environment: from ERS to Sentinel-1 scenario

    NASA Astrophysics Data System (ADS)

    Casu, Francesco; Elefante, Stefano; Imperatore, Pasquale; Lanari, Riccardo; Manunta, Michele; Zinno, Ivana; Mathot, Emmanuel; Brito, Fabrice; Farres, Jordi; Lengert, Wolfgang

    2013-04-01

    One of the techniques that will strongly benefit from the advent of the Sentinel-1 system is Differential SAR Interferometry (DInSAR), which has successfully demonstrated to be an effective tool to detect and monitor ground displacements with centimetre accuracy. The geoscience communities (volcanology, seismicity, …), as well as those related to hazard monitoring and risk mitigation, make extensively use of the DInSAR technique and they will take advantage from the huge amount of SAR data acquired by Sentinel-1. Indeed, such an information will successfully permit the generation of Earth's surface displacement maps and time series both over large areas and long time span. However, the issue of managing, processing and analysing the large Sentinel data stream is envisaged by the scientific community to be a major bottleneck, particularly during crisis phases. The emerging need of creating a common ecosystem in which data, results and processing tools are shared, is envisaged to be a successful way to address such a problem and to contribute to the information and knowledge spreading. The Supersites initiative as well as the ESA SuperSites Exploitation Platform (SSEP) and the ESA Cloud Computing Operational Pilot (CIOP) projects provide effective answers to this need and they are pushing towards the development of such an ecosystem. It is clear that all the current and existent tools for querying, processing and analysing SAR data are required to be not only updated for managing the large data stream of Sentinel-1 satellite, but also reorganized for quickly replying to the simultaneous and highly demanding user requests, mainly during emergency situations. This translates into the automatic and unsupervised processing of large amount of data as well as the availability of scalable, widely accessible and high performance computing capabilities. The cloud computing environment permits to achieve all of these objectives, particularly in case of spike and peak requests of processing resources linked to disaster events. This work aims at presenting a parallel computational model for the widely used DInSAR algorithm named as Small BAseline Subset (SBAS), which has been implemented within the cloud computing environment provided by the ESA-CIOP platform. This activity has resulted in developing a scalable, unsupervised, portable, and widely accessible (through a web portal) parallel DInSAR computational tool. The activity has rewritten and developed the SBAS application algorithm within a parallel system environment, i.e., in a form that allows us to benefit from multiple processing units. This requires the devising a parallel version of the SBAS algorithm and its subsequent implementation, implying additional complexity in algorithm designing and an efficient multi processor programming, with the final aim of a parallel performance optimization. Although the presented algorithm has been designed to work with Sentinel-1 data, it can also process other satellite SAR data (ERS, ENVISAT, CSK, TSX, ALOS). Indeed, the performance analysis of the implemented SBAS parallel version has been tested on the full ASAR archive (64 acquisitions) acquired over the Napoli Bay, a volcanic and densely urbanized area in Southern Italy. The full processing - from the raw data download to the generation of DInSAR time series - has been carried out by engaging 4 nodes, each one with 2 cores and 16 GB of RAM, and has taken about 36 hours, with respect to about 135 hours of the sequential version. Extensive analysis on other test areas significant from DInSAR and geophysical viewpoint will be presented. Finally, preliminary performance evaluation of the presented approach within the Sentinel-1 scenario will be provided.

  4. Novel casting processes for single-crystal turbine blades of superalloys

    NASA Astrophysics Data System (ADS)

    Ma, Dexin

    2018-03-01

    This paper presents a brief review of the current casting techniques for single-crystal (SC) blades, as well as an analysis of the solidification process in complex turbine blades. A series of novel casting methods based on the Bridgman process were presented to illustrate the development in the production of SC blades from superalloys. The grain continuator and the heat conductor techniques were developed to remove geometry-related grain defects. In these techniques, the heat barrier that hinders lateral SC growth from the blade airfoil into the extremities of the platform is minimized. The parallel heating and cooling system was developed to achieve symmetric thermal conditions for SC solidification in blade clusters, thus considerably decreasing the negative shadow effect and its related defects in the current Bridgman process. The dipping and heaving technique, in which thinshell molds are utilized, was developed to enable the establishment of a high temperature gradient for SC growth and the freckle-free solidification of superalloy castings. Moreover, by applying the targeted cooling and heating technique, a novel concept for the three-dimensional and precise control of SC growth, a proper thermal arrangement may be dynamically established for the microscopic control of SC growth in the critical areas of large industrial gas turbine blades.

  5. The computer-aided parallel external fixator for complex lower limb deformity correction.

    PubMed

    Wei, Mengting; Chen, Jianwen; Guo, Yue; Sun, Hao

    2017-12-01

    Since parameters of the parallel external fixator are difficult to measure and calculate in real applications, this study developed computer software that can help the doctor measure parameters using digital technology and generate an electronic prescription for deformity correction. According to Paley's deformity measurement method, we provided digital measurement techniques. In addition, we proposed an deformity correction algorithm to calculate the elongations of the six struts and developed a electronic prescription software. At the same time, a three-dimensional simulation of the parallel external fixator and deformed fragment was made using virtual reality modeling language technology. From 2013 to 2015, fifteen patients with complex lower limb deformity were treated with parallel external fixators and the self-developed computer software. All of the cases had unilateral limb deformity. The deformities were caused by old osteomyelitis in nine cases and traumatic sequelae in six cases. A doctor measured the related angulation, displacement and rotation on postoperative radiographs using the digital measurement techniques. Measurement data were input into the electronic prescription software to calculate the daily adjustment elongations of the struts. Daily strut adjustments were conducted according to the data calculated. The frame was removed when expected results were achieved. Patients lived independently during the adjustment. The mean follow-up was 15 months (range 10-22 months). The duration of frame fixation from the time of application to the time of removal averaged 8.4 months (range 2.5-13.1 months). All patients were satisfied with the corrected limb alignment. No cases of wound infections or complications occurred. Using the computer-aided parallel external fixator for the correction of lower limb deformities can achieve satisfactory outcomes. The correction process can be simplified and is precise and digitized, which will greatly improve the treatment in a clinical application.

  6. Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0

    DOE PAGES

    Huck, Kevin A.; Malony, Allen D.; Shende, Sameer; ...

    2008-01-01

    The integration of scalable performance analysis in parallel development tools is difficult. The potential size of data sets and the need to compare results from multiple experiments presents a challenge to manage and process the information. Simply to characterize the performance of parallel applications running on potentially hundreds of thousands of processor cores requires new scalable analysis techniques. Furthermore, many exploratory analysis processes are repeatable and could be automated, but are now implemented as manual procedures. In this paper, we will discuss the current version of PerfExplorer, a performance analysis framework which provides dimension reduction, clustering and correlation analysis ofmore » individual trails of large dimensions, and can perform relative performance analysis between multiple application executions. PerfExplorer analysis processes can be captured in the form of Python scripts, automating what would otherwise be time-consuming tasks. We will give examples of large-scale analysis results, and discuss the future development of the framework, including the encoding and processing of expert performance rules, and the increasing use of performance metadata.« less

  7. 3-D modeling of ductile tearing using finite elements: Computational aspects and techniques

    NASA Astrophysics Data System (ADS)

    Gullerud, Arne Stewart

    This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.

  8. Rotation and scale change invariant point pattern relaxation matching by the Hopfield neural network

    NASA Astrophysics Data System (ADS)

    Sang, Nong; Zhang, Tianxu

    1997-12-01

    Relaxation matching is one of the most relevant methods for image matching. The original relaxation matching technique using point patterns is sensitive to rotations and scale changes. We improve the original point pattern relaxation matching technique to be invariant to rotations and scale changes. A method that makes the Hopfield neural network perform this matching process is discussed. An advantage of this is that the relaxation matching process can be performed in real time with the neural network's massively parallel capability to process information. Experimental results with large simulated images demonstrate the effectiveness and feasibility of the method to perform point patten relaxation matching invariant to rotations and scale changes and the method to perform this matching by the Hopfield neural network. In addition, we show that the method presented can be tolerant to small random error.

  9. Metal-assisted exfoliation (MAE): green process for transferring graphene to flexible substrates and templating of sub-nanometer plasmonic gaps (Presentation Recording)

    NASA Astrophysics Data System (ADS)

    Zaretski, Aliaksandr V.; Marin, Brandon C.; Moetazedi, Herad; Dill, Tyler J.; Jibril, Liban; Kong, Casey; Tao, Andrea R.; Lipomi, Darren J.

    2015-09-01

    This paper describes a new technique, termed "metal-assisted exfoliation," for the scalable transfer of graphene from catalytic copper foils to flexible polymeric supports. The process is amenable to roll-to-roll manufacturing, and the copper substrate can be recycled. We then demonstrate the use of single-layer graphene as a template for the formation of sub-nanometer plasmonic gaps using a scalable fabrication process called "nanoskiving." These gaps are formed between parallel gold nanowires in a process that first produces three-layer thin films with the architecture gold/single-layer graphene/gold, and then sections the composite films with an ultramicrotome. The structures produced can be treated as two gold nanowires separated along their entire lengths by an atomically thin graphene nanoribbon. Oxygen plasma etches the sandwiched graphene to a finite depth; this action produces a sub-nanometer gap near the top surface of the junction between the wires that is capable of supporting highly confined optical fields. The confinement of light is confirmed by surface-enhanced Raman spectroscopy measurements, which indicate that the enhancement of the electric field arises from the junction between the gold nanowires. These experiments demonstrate nanoskiving as a unique and easy-to-implement fabrication technique that is capable of forming sub-nanometer plasmonic gaps between parallel metallic nanostructures over long, macroscopic distances. These structures could be valuable for fundamental investigations as well as applications in plasmonics and molecular electronics.

  10. Investigating the Role of Global Histogram Equalization Technique for 99mTechnetium-Methylene diphosphonate Bone Scan Image Enhancement.

    PubMed

    Pandey, Anil Kumar; Sharma, Param Dev; Dheer, Pankaj; Parida, Girish Kumar; Goyal, Harish; Patel, Chetan; Bal, Chandrashekhar; Kumar, Rakesh

    2017-01-01

    99m Technetium-methylene diphosphonate ( 99m Tc-MDP) bone scan images have limited number of counts per pixel, and hence, they have inferior image quality compared to X-rays. Theoretically, global histogram equalization (GHE) technique can improve the contrast of a given image though practical benefits of doing so have only limited acceptance. In this study, we have investigated the effect of GHE technique for 99m Tc-MDP-bone scan images. A set of 89 low contrast 99m Tc-MDP whole-body bone scan images were included in this study. These images were acquired with parallel hole collimation on Symbia E gamma camera. The images were then processed with histogram equalization technique. The image quality of input and processed images were reviewed by two nuclear medicine physicians on a 5-point scale where score of 1 is for very poor and 5 is for the best image quality. A statistical test was applied to find the significance of difference between the mean scores assigned to input and processed images. This technique improves the contrast of the images; however, oversaturation was noticed in the processed images. Student's t -test was applied, and a statistically significant difference in the input and processed image quality was found at P < 0.001 (with α = 0.05). However, further improvement in image quality is needed as per requirements of nuclear medicine physicians. GHE techniques can be used on low contrast bone scan images. In some of the cases, a histogram equalization technique in combination with some other postprocessing technique is useful.

  11. Quakefinder: A scalable data mining system for detecting earthquakes from space

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stolorz, P.; Dean, C.

    1996-12-31

    We present an application of novel massively parallel datamining techniques to highly precise inference of important physical processes from remote sensing imagery. Specifically, we have developed and applied a system, Quakefinder, that automatically detects and measures tectonic activity in the earth`s crust by examination of satellite data. We have used Quakefinder to automatically map the direction and magnitude of ground displacements due to the 1992 Landers earthquake in Southern California, over a spatial region of several hundred square kilometers, at a resolution of 10 meters, to a (sub-pixel) precision of 1 meter. This is the first calculation that has evermore » been able to extract area-mapped information about 2D tectonic processes at this level of detail. We outline the architecture of the Quakefinder system, based upon a combination of techniques drawn from the fields of statistical inference, massively parallel computing and global optimization. We confirm the overall correctness of the procedure by comparison of our results with known locations of targeted faults obtained by careful and time-consuming field measurements. The system also performs knowledge discovery by indicating novel unexplained tectonic activity away from the primary faults that has never before been observed. We conclude by discussing the future potential of this data mining system in the broad context of studying subtle spatio-temporal processes within massive image streams.« less

  12. Using parallel evolutionary development for a biologically-inspired computer vision system for mobile robots.

    PubMed

    Wright, Cameron H G; Barrett, Steven F; Pack, Daniel J

    2005-01-01

    We describe a new approach to attacking the problem of robust computer vision for mobile robots. The overall strategy is to mimic the biological evolution of animal vision systems. Our basic imaging sensor is based upon the eye of the common house fly, Musca domestica. The computational algorithms are a mix of traditional image processing, subspace techniques, and multilayer neural networks.

  13. Fast Computation and Assessment Methods in Power System Analysis

    NASA Astrophysics Data System (ADS)

    Nagata, Masaki

    Power system analysis is essential for efficient and reliable power system operation and control. Recently, online security assessment system has become of importance, as more efficient use of power networks is eagerly required. In this article, fast power system analysis techniques such as contingency screening, parallel processing and intelligent systems application are briefly surveyed from the view point of their application to online dynamic security assessment.

  14. Reliable Early Classification on Multivariate Time Series with Numerical and Categorical Attributes

    DTIC Science & Technology

    2015-05-22

    design a procedure of feature extraction in REACT named MEG (Mining Equivalence classes with shapelet Generators) based on the concept of...Equivalence Classes Mining [12, 15]. MEG can efficiently and effectively generate the discriminative features. In addition, several strategies are proposed...technique of parallel computing [4] to propose a process of pa- rallel MEG for substantially reducing the computational overhead of discovering shapelet

  15. The Distributed Diagonal Force Decomposition Method for Parallelizing Molecular Dynamics Simulations

    PubMed Central

    Boršnik, Urban; Miller, Benjamin T.; Brooks, Bernard R.; Janežič, Dušanka

    2011-01-01

    Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used. PMID:21793007

  16. Decentralized diagnostics based on a distributed micro-genetic algorithm for transducer networks monitoring large experimental systems.

    PubMed

    Arpaia, P; Cimmino, P; Girone, M; La Commara, G; Maisto, D; Manna, C; Pezzetti, M

    2014-09-01

    Evolutionary approach to centralized multiple-faults diagnostics is extended to distributed transducer networks monitoring large experimental systems. Given a set of anomalies detected by the transducers, each instance of the multiple-fault problem is formulated as several parallel communicating sub-tasks running on different transducers, and thus solved one-by-one on spatially separated parallel processes. A micro-genetic algorithm merges evaluation time efficiency, arising from a small-size population distributed on parallel-synchronized processors, with the effectiveness of centralized evolutionary techniques due to optimal mix of exploitation and exploration. In this way, holistic view and effectiveness advantages of evolutionary global diagnostics are combined with reliability and efficiency benefits of distributed parallel architectures. The proposed approach was validated both (i) by simulation at CERN, on a case study of a cold box for enhancing the cryogeny diagnostics of the Large Hadron Collider, and (ii) by experiments, under the framework of the industrial research project MONDIEVOB (Building Remote Monitoring and Evolutionary Diagnostics), co-funded by EU and the company Del Bo srl, Napoli, Italy.

  17. Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.

  18. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  19. Calibrationless parallel magnetic resonance imaging: a joint sparsity model.

    PubMed

    Majumdar, Angshul; Chaudhury, Kunal Narayan; Ward, Rabab

    2013-12-05

    State-of-the-art parallel MRI techniques either explicitly or implicitly require certain parameters to be estimated, e.g., the sensitivity map for SENSE, SMASH and interpolation weights for GRAPPA, SPIRiT. Thus all these techniques are sensitive to the calibration (parameter estimation) stage. In this work, we have proposed a parallel MRI technique that does not require any calibration but yields reconstruction results that are at par with (or even better than) state-of-the-art methods in parallel MRI. Our proposed method required solving non-convex analysis and synthesis prior joint-sparsity problems. This work also derives the algorithms for solving them. Experimental validation was carried out on two datasets-eight channel brain and eight channel Shepp-Logan phantom. Two sampling methods were used-Variable Density Random sampling and non-Cartesian Radial sampling. For the brain data, acceleration factor of 4 was used and for the other an acceleration factor of 6 was used. The reconstruction results were quantitatively evaluated based on the Normalised Mean Squared Error between the reconstructed image and the originals. The qualitative evaluation was based on the actual reconstructed images. We compared our work with four state-of-the-art parallel imaging techniques; two calibrated methods-CS SENSE and l1SPIRiT and two calibration free techniques-Distributed CS and SAKE. Our method yields better reconstruction results than all of them.

  20. CLAss-Specific Subspace Kernel Representations and Adaptive Margin Slack Minimization for Large Scale Classification.

    PubMed

    Yu, Yinan; Diamantaras, Konstantinos I; McKelvey, Tomas; Kung, Sun-Yuan

    2018-02-01

    In kernel-based classification models, given limited computational power and storage capacity, operations over the full kernel matrix becomes prohibitive. In this paper, we propose a new supervised learning framework using kernel models for sequential data processing. The framework is based on two components that both aim at enhancing the classification capability with a subset selection scheme. The first part is a subspace projection technique in the reproducing kernel Hilbert space using a CLAss-specific Subspace Kernel representation for kernel approximation. In the second part, we propose a novel structural risk minimization algorithm called the adaptive margin slack minimization to iteratively improve the classification accuracy by an adaptive data selection. We motivate each part separately, and then integrate them into learning frameworks for large scale data. We propose two such frameworks: the memory efficient sequential processing for sequential data processing and the parallelized sequential processing for distributed computing with sequential data acquisition. We test our methods on several benchmark data sets and compared with the state-of-the-art techniques to verify the validity of the proposed techniques.

  1. Ultrasonic and radiographic evaluation of advanced aerospace materials: Ceramic composites

    NASA Technical Reports Server (NTRS)

    Generazio, Edward R.

    1990-01-01

    Two conventional nondestructive evaluation techniques were used to evaluate advanced ceramic composite materials. It was shown that neither ultrasonic C-scan nor radiographic imaging can individually provide sufficient data for an accurate nondestructive evaluation. Both ultrasonic C-scan and conventional radiographic imaging are required for preliminary evaluation of these complex systems. The material variations that were identified by these two techniques are porosity, delaminations, bond quality between laminae, fiber alignment, fiber registration, fiber parallelism, and processing density flaws. The degree of bonding between fiber and matrix cannot be determined by either of these methods. An alternative ultrasonic technique, angular power spectrum scanning (APSS) is recommended for quantification of this interfacial bond.

  2. Peak-picking fundamental period estimation for hearing prostheses.

    PubMed

    Howard, D M

    1989-09-01

    A real-time peak-picking fundamental period estimation device is described which is used in advanced hearing prostheses for the totally and profoundly deafened. The operation of the peak picker is compared with three well-established fundamental frequency estimation techniques: the electrolaryngograph, which is used as a "standard" hardware implementations of the cepstral technique, and the Gold/Rabiner parallel processing algorithm. These comparisons illustrate and highlight some of the important advantages and disadvantages that characterize the operation of these techniques. The special requirements of the hearing prostheses are discussed with respect to the operation of each device, and the choice of the peak picker is found to be felicitous in this application.

  3. Application of advanced multidisciplinary analysis and optimization methods to vehicle design synthesis

    NASA Technical Reports Server (NTRS)

    Consoli, Robert David; Sobieszczanski-Sobieski, Jaroslaw

    1990-01-01

    Advanced multidisciplinary analysis and optimization methods, namely system sensitivity analysis and non-hierarchical system decomposition, are applied to reduce the cost and improve the visibility of an automated vehicle design synthesis process. This process is inherently complex due to the large number of functional disciplines and associated interdisciplinary couplings. Recent developments in system sensitivity analysis as applied to complex non-hierarchic multidisciplinary design optimization problems enable the decomposition of these complex interactions into sub-processes that can be evaluated in parallel. The application of these techniques results in significant cost, accuracy, and visibility benefits for the entire design synthesis process.

  4. Some thoughts about parallel process and psychotherapy supervision: when is a parallel just a parallel?

    PubMed

    Watkins, C Edward

    2012-09-01

    In a way not done before, Tracey, Bludworth, and Glidden-Tracey ("Are there parallel processes in psychotherapy supervision: An empirical examination," Psychotherapy, 2011, advance online publication, doi.10.1037/a0026246) have shown us that parallel process in psychotherapy supervision can indeed be rigorously and meaningfully researched, and their groundbreaking investigation provides a nice prototype for future supervision studies to emulate. In what follows, I offer a brief complementary comment to Tracey et al., addressing one matter that seems to be a potentially important conceptual and empirical parallel process consideration: When is a parallel just a parallel? PsycINFO Database Record (c) 2012 APA, all rights reserved.

  5. An evaluation of machine processing techniques of ERTS-1 data for user applications. [urban land use and soil association mapping in Indiana

    NASA Technical Reports Server (NTRS)

    Landgrebe, D.

    1974-01-01

    A broad study is described to evaluate a set of machine analysis and processing techniques applied to ERTS-1 data. Based on the analysis results in urban land use analysis and soil association mapping together with previously reported results in general earth surface feature identification and crop species classification, a profile of general applicability of this procedure is beginning to emerge. Put in the hands of a user who knows well the information needed from the data and also is familiar with the region to be analyzed it appears that significantly useful information can be generated by these methods. When supported by preprocessing techniques such as the geometric correction and temporal registration capabilities, final products readily useable by user agencies appear possible. In parallel with application, through further research, there is much potential for further development of these techniques both with regard to providing higher performance and in new situations not yet studied.

  6. Seeing the forest for the trees: Networked workstations as a parallel processing computer

    NASA Technical Reports Server (NTRS)

    Breen, J. O.; Meleedy, D. M.

    1992-01-01

    Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.

  7. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  8. Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

    DOE PAGES

    Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De; ...

    2017-01-28

    Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less

  9. Trace: a high-throughput tomographic reconstruction engine for large-scale datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bicer, Tekin; Gursoy, Doga; Andrade, Vincent De

    Here, synchrotron light source and detector technologies enable scientists to perform advanced experiments. These scientific instruments and experiments produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used data acquisition technique at light sources is Computed Tomography, which can generate tens of GB/s depending on x-ray range. A large-scale tomographic dataset, such as mouse brain, may require hours of computation time with a medium size workstation. In this paper, we present Trace, a data-intensive computing middleware we developed for implementation and parallelization of iterative tomographic reconstruction algorithms. Tracemore » provides fine-grained reconstruction of tomography datasets using both (thread level) shared memory and (process level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations we have done on the replicated reconstruction objects and evaluate them using a shale and a mouse brain sinogram. Our experimental evaluations show that the applied optimizations and parallelization techniques can provide 158x speedup (using 32 compute nodes) over single core configuration, which decreases the reconstruction time of a sinogram (with 4501 projections and 22400 detector resolution) from 12.5 hours to less than 5 minutes per iteration.« less

  10. Parallel computing for automated model calibration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burke, John S.; Danielson, Gary R.; Schulz, Douglas A.

    2002-07-29

    Natural resources model calibration is a significant burden on computing and staff resources in modeling efforts. Most assessments must consider multiple calibration objectives (for example magnitude and timing of stream flow peak). An automated calibration process that allows real time updating of data/models, allowing scientists to focus effort on improving models is needed. We are in the process of building a fully featured multi objective calibration tool capable of processing multiple models cheaply and efficiently using null cycle computing. Our parallel processing and calibration software routines have been generically, but our focus has been on natural resources model calibration. Somore » far, the natural resources models have been friendly to parallel calibration efforts in that they require no inter-process communication, only need a small amount of input data and only output a small amount of statistical information for each calibration run. A typical auto calibration run might involve running a model 10,000 times with a variety of input parameters and summary statistical output. In the past model calibration has been done against individual models for each data set. The individual model runs are relatively fast, ranging from seconds to minutes. The process was run on a single computer using a simple iterative process. We have completed two Auto Calibration prototypes and are currently designing a more feature rich tool. Our prototypes have focused on running the calibration in a distributed computing cross platform environment. They allow incorporation of?smart? calibration parameter generation (using artificial intelligence processing techniques). Null cycle computing similar to SETI@Home has also been a focus of our efforts. This paper details the design of the latest prototype and discusses our plans for the next revision of the software.« less

  11. GPU-completeness: theory and implications

    NASA Astrophysics Data System (ADS)

    Lin, I.-Jong

    2011-01-01

    This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.

  12. Massive parallel 3D PIC simulation of negative ion extraction

    NASA Astrophysics Data System (ADS)

    Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu

    2017-09-01

    The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.

  13. Using the Statecharts paradigm for simulation of patient flow in surgical care.

    PubMed

    Sobolev, Boris; Harel, David; Vasilakis, Christos; Levy, Adrian

    2008-03-01

    Computer simulation of patient flow has been used extensively to assess the impacts of changes in the management of surgical care. However, little research is available on the utility of existing modeling techniques. The purpose of this paper is to examine the capacity of Statecharts, a system of graphical specification, for constructing a discrete-event simulation model of the perioperative process. The Statecharts specification paradigm was originally developed for representing reactive systems by extending the formalism of finite-state machines through notions of hierarchy, parallelism, and event broadcasting. Hierarchy permits subordination between states so that one state may contain other states. Parallelism permits more than one state to be active at any given time. Broadcasting of events allows one state to detect changes in another state. In the context of the peri-operative process, hierarchy provides the means to describe steps within activities and to cluster related activities, parallelism provides the means to specify concurrent activities, and event broadcasting provides the means to trigger a series of actions in one activity according to transitions that occur in another activity. Combined with hierarchy and parallelism, event broadcasting offers a convenient way to describe the interaction of concurrent activities. We applied the Statecharts formalism to describe the progress of individual patients through surgical care as a series of asynchronous updates in patient records generated in reaction to events produced by parallel finite-state machines representing concurrent clinical and managerial activities. We conclude that Statecharts capture successfully the behavioral aspects of surgical care delivery by specifying permissible chronology of events, conditions, and actions.

  14. Modified signed-digit trinary addition using synthetic wavelet filter

    NASA Astrophysics Data System (ADS)

    Iftekharuddin, K. M.; Razzaque, M. A.

    2000-09-01

    The modified signed-digit (MSD) number system has been a topic of interest as it allows for parallel carry-free addition of two numbers for digital optical computing. In this paper, harmonic wavelet joint transform (HWJT)-based correlation technique is introduced for optical implementation of MSD trinary adder implementation. The realization of the carry-propagation-free addition of MSD trinary numerals is demonstrated using synthetic HWJT correlator model. It is also shown that the proposed synthetic wavelet filter-based correlator shows high performance in logic processing. Simulation results are presented to validate the performance of the proposed technique.

  15. Three-dimensional real-time imaging of bi-phasic flow through porous media

    NASA Astrophysics Data System (ADS)

    Sharma, Prerna; Aswathi, P.; Sane, Anit; Ghosh, Shankar; Bhattacharya, S.

    2011-11-01

    We present a scanning laser-sheet video imaging technique to image bi-phasic flow in three-dimensional porous media in real time with pore-scale spatial resolution, i.e., 35 μm and 500 μm for directions parallel and perpendicular to the flow, respectively. The technique is illustrated for the case of viscous fingering. Using suitable image processing protocols, both the morphology and the movement of the two-fluid interface, were quantitatively estimated. Furthermore, a macroscopic parameter such as the displacement efficiency obtained from a microscopic (pore-scale) analysis demonstrates the versatility and usefulness of the method.

  16. Advanced techniques in reliability model representation and solution

    NASA Technical Reports Server (NTRS)

    Palumbo, Daniel L.; Nicol, David M.

    1992-01-01

    The current tendency of flight control system designs is towards increased integration of applications and increased distribution of computational elements. The reliability analysis of such systems is difficult because subsystem interactions are increasingly interdependent. Researchers at NASA Langley Research Center have been working for several years to extend the capability of Markov modeling techniques to address these problems. This effort has been focused in the areas of increased model abstraction and increased computational capability. The reliability model generator (RMG) is a software tool that uses as input a graphical object-oriented block diagram of the system. RMG uses a failure-effects algorithm to produce the reliability model from the graphical description. The ASSURE software tool is a parallel processing program that uses the semi-Markov unreliability range evaluator (SURE) solution technique and the abstract semi-Markov specification interface to the SURE tool (ASSIST) modeling language. A failure modes-effects simulation is used by ASSURE. These tools were used to analyze a significant portion of a complex flight control system. The successful combination of the power of graphical representation, automated model generation, and parallel computation leads to the conclusion that distributed fault-tolerant system architectures can now be analyzed.

  17. Distributed process manager for an engineering network computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gait, J.

    1987-08-01

    MP is a manager for systems of cooperating processes in a local area network of engineering workstations. MP supports transparent continuation by maintaining multiple copies of each process on different workstations. Computational bandwidth is optimized by executing processes in parallel on different workstations. Responsiveness is high because workstations compete among themselves to respond to requests. The technique is to select a master from among a set of replicates of a process by a competitive election between the copies. Migration of the master when a fault occurs or when response slows down is effected by inducing the election of a newmore » master. Competitive response stabilizes system behavior under load, so MP exhibits realtime behaviors.« less

  18. A continuous process to align electrospun nanofibers into parallel and crossed arrays

    NASA Astrophysics Data System (ADS)

    Laudenslager, Michael J.; Sigmund, Wolfgang M.

    2013-04-01

    Electrical, optical, and mechanical properties of nanofibers are strongly affected by their orientation. Electrospinning is a nanofiber processing technique that typically produces nonwoven meshes of randomly oriented fibers. While several alignment techniques exist, they are only able to produce either a very thin layer of aligned fibers or larger quantities of fibers with less control over their alignment and orientation. The technique presented herein fills the gap between these two methods allowing one to produce thick meshes of highly oriented nanofibers. In addition, this technique is not limited to collection of fibers along a single axis. Modifications to the basic setup allow collection of crossed fibers without stopping and repositioning the apparatus. The technique works for a range of fiber sizes. In this study, fiber diameters ranged from 100 nm to 1 micron. This allows a few fibers at a time to rapidly deposit in alternating directions creating an almost woven structure. These aligned nanofibers have the potential to improve the performance of energy storage and thermoelectric devices and hold great promise for directed cell growth applications.

  19. An Adaptive Kalman Filter using a Simple Residual Tuning Method

    NASA Technical Reports Server (NTRS)

    Harman, Richard R.

    1999-01-01

    One difficulty in using Kalman filters in real world situations is the selection of the correct process noise, measurement noise, and initial state estimate and covariance. These parameters are commonly referred to as tuning parameters. Multiple methods have been developed to estimate these parameters. Most of those methods such as maximum likelihood, subspace, and observer Kalman Identification require extensive offline processing and are not suitable for real time processing. One technique, which is suitable for real time processing, is the residual tuning method. Any mismodeling of the filter tuning parameters will result in a non-white sequence for the filter measurement residuals. The residual tuning technique uses this information to estimate corrections to those tuning parameters. The actual implementation results in a set of sequential equations that run in parallel with the Kalman filter. Equations for the estimation of the measurement noise have also been developed. These algorithms are used to estimate the process noise and measurement noise for the Wide Field Infrared Explorer star tracker and gyro.

  20. Application specific serial arithmetic arrays

    NASA Technical Reports Server (NTRS)

    Winters, K.; Mathews, D.; Thompson, T.

    1990-01-01

    High performance systolic arrays of serial-parallel multiplier elements may be rapidly constructed for specific applications by applying hardware description language techniques to a library of full-custom CMOS building blocks. Single clock pre-charged circuits have been implemented for these arrays at clock rates in excess of 100 Mhz using economical 2-micron (minimum feature size) CMOS processes, which may be quickly configured for a variety of applications. A number of application-specific arrays are presented, including a 2-D convolver for image processing, an integer polynomial solver, and a finite-field polynomial solver.

  1. Evaluation of usefulness of 3D views for clinical photography.

    PubMed

    Jinnin, Masatoshi; Fukushima, Satoshi; Masuguchi, Shinichi; Tanaka, Hiroki; Kawashita, Yoshio; Ishihara, Tsuyoshi; Ihn, Hironobu

    2011-01-01

    This is the first report investigating the usefulness of a 3D viewing technique (parallel viewing and cross-eyed viewing) for presenting clinical photography. Using the technique, we can grasp 3D structure of various lesions (e.g. tumors, wounds) or surgical procedures (e.g. lymph node dissection, flap) much more easily even without any cost and optical aids compared to 2D photos. Most recently 3D cameras started to be commercially available, but they may not be useful for presentation in scientific papers or poster sessions. To create a stereogram, two different pictures were taken from the right and left eye views using a digital camera. Then, the two pictures were placed next to one another. Using 9 stereograms, we performed a questionnaire-based survey. Our survey revealed 57.7% of the doctors/students had acquired the 3D viewing technique and an additional 15.4% could learn parallel viewing with 10 minutes training. Among the subjects capable of 3D views, 73.7% used the parallel view technique whereas only 26.3% chose the cross-eyed view. There was no significant difference in the results of the questionnaire about the efficiency and usefulness of 3D views between parallel view users and cross-eyed users. Almost all subjects (94.7%) answered that the technique is useful. Lesions with multiple undulations are a good application. 3D views, especially parallel viewing, are likely to be common and easy enough to consider for practical use in doctors/students. The wide use of the technique may revolutionize presentation of clinical pictures in meetings, educational lectures, or manuscripts.

  2. Physics Structure Analysis of Parallel Waves Concept of Physics Teacher Candidate

    NASA Astrophysics Data System (ADS)

    Sarwi, S.; Supardi, K. I.; Linuwih, S.

    2017-04-01

    The aim of this research was to find a parallel structure concept of wave physics and the factors that influence on the formation of parallel conceptions of physics teacher candidates. The method used qualitative research which types of cross-sectional design. These subjects were five of the third semester of basic physics and six of the fifth semester of wave course students. Data collection techniques used think aloud and written tests. Quantitative data were analysed with descriptive technique-percentage. The data analysis technique for belief and be aware of answers uses an explanatory analysis. Results of the research include: 1) the structure of the concept can be displayed through the illustration of a map containing the theoretical core, supplements the theory and phenomena that occur daily; 2) the trend of parallel conception of wave physics have been identified on the stationary waves, resonance of the sound and the propagation of transverse electromagnetic waves; 3) the influence on the parallel conception that reading textbooks less comprehensive and knowledge is partial understanding as forming the structure of the theory.

  3. The source of dual-task limitations: Serial or parallel processing of multiple response selections?

    PubMed Central

    Marois, René

    2014-01-01

    Although it is generally recognized that the concurrent performance of two tasks incurs costs, the sources of these dual-task costs remain controversial. The serial bottleneck model suggests that serial postponement of task performance in dual-task conditions results from a central stage of response selection that can only process one task at a time. Cognitive-control models, by contrast, propose that multiple response selections can proceed in parallel, but that serial processing of task performance is predominantly adopted because its processing efficiency is higher than that of parallel processing. In the present study, we empirically tested this proposition by examining whether parallel processing would occur when it was more efficient and financially rewarded. The results indicated that even when parallel processing was more efficient and was incentivized by financial reward, participants still failed to process tasks in parallel. We conclude that central information processing is limited by a serial bottleneck. PMID:23864266

  4. The Development of Models for Carbon Dioxide Reduction Technologies for Spacecraft Air Revitalization

    NASA Technical Reports Server (NTRS)

    Swickrath, Michael J.; Anderson, Molly

    2011-01-01

    Through the respiration process, humans consume oxygen (O2) while producing carbon dioxide (CO2) and water (H2O) as byproducts. For long term space exploration, CO2 concentration in the atmosphere must be managed to prevent hypercapnia. Moreover, CO2 can be used as a source of oxygen through chemical reduction serving to minimize the amount of oxygen required at launch. Reduction can be achieved through a number of techniques. The National Aeronautics and Space Administration (NASA) is currently exploring the Sabatier reaction, the Bosch reaction, and co-electrolysis of CO2 and H2O for this process. Proof-of-concept experiments and prototype units for all three processes have proven capable of returning useful commodities for space exploration. While all three techniques have demonstrated the capacity to reduce CO2 in the laboratory, there is interest in understanding how all three techniques would perform at a system-level within a spacecraft. Consequently, there is an impetus to develop predictive models for these processes that can be readily re-scaled and integrated into larger system models. Such analysis tools provide the ability to evaluate each technique on a comparable basis with respect to processing rates. This manuscript describes the current models for the carbon dioxide reduction processes under parallel developmental e orts. Comparison to experimental data is provided were available for veri cation purposes.

  5. Genetic Algorithms and Their Application to the Protein Folding Problem

    DTIC Science & Technology

    1993-12-01

    and symbolic methods, random methods such as Monte Carlo simulation and simulated annealing, distance geometry, and molecular dynamics. Many of these...calculated energies with those obtained using the molecular simulation software package called CHARMm. 10 9) Test both the simple and parallel simpie genetic...homology-based, and simplification techniques. 3.21 Molecular Dynamics. Perhaps the most natural approach is to actually simulate the folding process. This

  6. Polymer Based Highly Parallel Nanoscopic Sensors for Rapid Detection of Chemical and Biological Threats

    DTIC Science & Technology

    2007-09-18

    Xuliang Han, PI of Brewer Science, Inc. Subcontract Center for Applied Science & Engineering Missouri State University 901 South National Avenue...Science an effective post-growth purification procedure was developed to reduce the amount of impurities, and several characterization techniques were...CNTs) contain a wide range of impurities from the growth process. At Brewer Science an effective post-growth purification procedure was developed to

  7. Synthesis of a drug-like focused library of trisubstituted pyrrolidines using integrated flow chemistry and batch methods.

    PubMed

    Baumann, Marcus; Baxendale, Ian R; Kuratli, Christoph; Ley, Steven V; Martin, Rainer E; Schneider, Josef

    2011-07-11

    A combination of flow and batch chemistries has been successfully applied to the assembly of a series of trisubstituted drug-like pyrrolidines. This study demonstrates the efficient preparation of a focused library of these pharmaceutically important structures using microreactor technologies, as well as classical parallel synthesis techniques, and thus exemplifies the impact of integrating innovative enabling tools within the drug discovery process.

  8. Parallel PAB3D: Experiences with a Prototype in MPI

    NASA Technical Reports Server (NTRS)

    Guerinoni, Fabio; Abdol-Hamid, Khaled S.; Pao, S. Paul

    1998-01-01

    PAB3D is a three-dimensional Navier Stokes solver that has gained acceptance in the research and industrial communities. It takes as computational domain, a set disjoint blocks covering the physical domain. This is the first report on the implementation of PAB3D using the Message Passing Interface (MPI), a standard for parallel processing. We discuss briefly the characteristics of tile code and define a prototype for testing. The principal data structure used for communication is derived from preprocessing "patching". We describe a simple interface (COMMSYS) for MPI communication, and some general techniques likely to be encountered when working on problems of this nature. Last, we identify levels of improvement from the current version and outline future work.

  9. Load Balancing Strategies for Multi-Block Overset Grid Applications

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)

    2002-01-01

    The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.

  10. Negative tunnel magnetoresistance and differential conductance in transport through double quantum dots

    NASA Astrophysics Data System (ADS)

    Trocha, Piotr; Weymann, Ireneusz; Barnaś, Józef

    2009-10-01

    Spin-dependent transport through two coupled single-level quantum dots weakly connected to ferromagnetic leads with collinear magnetizations is considered theoretically. Transport characteristics, including the current, linear and nonlinear conductances, and tunnel magnetoresistance are calculated using the real-time diagrammatic technique in the parallel, serial, and intermediate geometries. The effects due to virtual tunneling processes between the two dots via the leads, associated with off-diagonal coupling matrix elements, are also considered. Negative differential conductance and negative tunnel magnetoresistance have been found in the case of serial and intermediate geometries, while no such behavior has been observed for double quantum dots coupled in parallel. It is also shown that transport characteristics strongly depend on the magnitude of the off-diagonal coupling matrix elements.

  11. Exploring the Energy Landscapes of Protein Folding Simulations with Bayesian Computation

    PubMed Central

    Burkoff, Nikolas S.; Várnai, Csilla; Wells, Stephen A.; Wild, David L.

    2012-01-01

    Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used. PMID:22385859

  12. Exploring the energy landscapes of protein folding simulations with Bayesian computation.

    PubMed

    Burkoff, Nikolas S; Várnai, Csilla; Wells, Stephen A; Wild, David L

    2012-02-22

    Nested sampling is a Bayesian sampling technique developed to explore probability distributions localized in an exponentially small area of the parameter space. The algorithm provides both posterior samples and an estimate of the evidence (marginal likelihood) of the model. The nested sampling algorithm also provides an efficient way to calculate free energies and the expectation value of thermodynamic observables at any temperature, through a simple post processing of the output. Previous applications of the algorithm have yielded large efficiency gains over other sampling techniques, including parallel tempering. In this article, we describe a parallel implementation of the nested sampling algorithm and its application to the problem of protein folding in a Gō-like force field of empirical potentials that were designed to stabilize secondary structure elements in room-temperature simulations. We demonstrate the method by conducting folding simulations on a number of small proteins that are commonly used for testing protein-folding procedures. A topological analysis of the posterior samples is performed to produce energy landscape charts, which give a high-level description of the potential energy surface for the protein folding simulations. These charts provide qualitative insights into both the folding process and the nature of the model and force field used. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  13. Parallel Activation in Bilingual Phonological Processing

    ERIC Educational Resources Information Center

    Lee, Su-Yeon

    2011-01-01

    In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…

  14. Picosecond time-resolved photoluminescence using picosecond excitation correlation spectroscopy

    NASA Astrophysics Data System (ADS)

    Johnson, M. B.; McGill, T. C.; Hunter, A. T.

    1988-03-01

    We present a study of the temporal decay of photoluminescence (PL) as detected by picosecond excitation correlation spectroscopy (PECS). We analyze the correlation signal that is obtained from two simple models; one where radiative recombination dominates, the other where trapping processes dominate. It is found that radiative recombination alone does not lead to a correlation signal. Parallel trapping type processes are found to be required to see a signal. To illustrate this technique, we examine the temporal decay of the PL signal for In-alloyed, semi-insulating GaAs substrates. We find that the PL signal indicates a carrier lifetime of roughly 100 ps, for excitation densities of 1×1016-5×1017 cm-3. PECS is shown to be an easy technique to measure the ultrafast temporal behavior of PL processes because it requires no ultrafast photon detection. It is particularly well suited to measuring carrier lifetimes.

  15. Packet telemetry and packet telecommand - The new generation of spacecraft data handling techniques

    NASA Technical Reports Server (NTRS)

    Hooke, A. J.

    1983-01-01

    Because of rising costs and reduced reliability of spacecraft and ground network hardware and software customization, standardization Packet Telemetry and Packet Telecommand concepts are emerging as viable alternatives. Autonomous packets of data, within each concept, which are created within ground and space application processes through the use of formatting techniques, are switched end-to-end through the space data network to their destination application processes through the use of standard transfer protocols. This process may result in facilitating a high degree of automation and interoperability because of completely mission-independent-designed intermediate data networks. The adoption of an international guideline for future space telemetry formatting of the Packet Telemetry concept, and the advancement of the NASA-ESA Working Group's Packet Telecommand concept to a level of maturity parallel to the of Packet Telemetry are the goals of the Consultative Committee for Space Data Systems. Both the Packet Telemetry and Packet Telecommand concepts are reviewed.

  16. Parallel Processing of Large Scale Microphone Arrays for Sound Capture

    NASA Astrophysics Data System (ADS)

    Jan, Ea-Ee.

    1995-01-01

    Performance of microphone sound pick up is degraded by deleterious properties of the acoustic environment, such as multipath distortion (reverberation) and ambient noise. The degradation becomes more prominent in a teleconferencing environment in which the microphone is positioned far away from the speaker. Besides, the ideal teleconference should feel as easy and natural as face-to-face communication with another person. This suggests hands-free sound capture with no tether or encumbrance by hand-held or body-worn sound equipment. Microphone arrays for this application represent an appropriate approach. This research develops new microphone array and signal processing techniques for high quality hands-free sound capture in noisy, reverberant enclosures. The new techniques combine matched-filtering of individual sensors and parallel processing to provide acute spatial volume selectivity which is capable of mitigating the deleterious effects of noise interference and multipath distortion. The new method outperforms traditional delay-and-sum beamformers which provide only directional spatial selectivity. The research additionally explores truncated matched-filtering and random distribution of transducers to reduce complexity and improve sound capture quality. All designs are first established by computer simulation of array performance in reverberant enclosures. The simulation is achieved by a room model which can efficiently calculate the acoustic multipath in a rectangular enclosure up to a prescribed order of images. It also calculates the incident angle of the arriving signal. Experimental arrays were constructed and their performance was measured in real rooms. Real room data were collected in a hard-walled laboratory and a controllable variable acoustics enclosure of similar size, approximately 6 x 6 x 3 m. An extensive speech database was also collected in these two enclosures for future research on microphone arrays. The simulation results are shown to be consistent with the real room data. Localization of sound sources has been explored using cross-power spectrum time delay estimation and has been evaluated using real room data under slightly, moderately and highly reverberant conditions. To improve the accuracy and reliability of the source localization, an outlier detector that removes incorrect time delay estimation has been invented. To provide speaker selectivity for microphone array systems, a hands-free speaker identification system has been studied. A recently invented feature using selected spectrum information outperforms traditional recognition methods. Measured results demonstrate the capabilities of speaker selectivity from a matched-filtered array. In addition, simulation utilities, including matched -filtering processing of the array and hands-free speaker identification, have been implemented on the massively -parallel nCube super-computer. This parallel computation highlights the requirements for real-time processing of array signals.

  17. Instrumentation, performance visualization, and debugging tools for multiprocessors

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

    1991-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.

  18. Quasi-disjoint pentadiagonal matrix systems for the parallelization of compact finite-difference schemes and filters

    NASA Astrophysics Data System (ADS)

    Kim, Jae Wook

    2013-05-01

    This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.

  19. RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

    PubMed Central

    Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan

    2017-01-01

    This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325

  20. RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization.

    PubMed

    Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan

    2017-08-04

    This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.

  1. Sentinel-1 data massive processing for large scale DInSAR analyses within Cloud Computing environments through the P-SBAS approach

    NASA Astrophysics Data System (ADS)

    Lanari, Riccardo; Bonano, Manuela; Buonanno, Sabatino; Casu, Francesco; De Luca, Claudio; Fusco, Adele; Manunta, Michele; Manzo, Mariarosaria; Pepe, Antonio; Zinno, Ivana

    2017-04-01

    The SENTINEL-1 (S1) mission is designed to provide operational capability for continuous mapping of the Earth thanks to its two polar-orbiting satellites (SENTINEL-1A and B) performing C-band synthetic aperture radar (SAR) imaging. It is, indeed, characterized by enhanced revisit frequency, coverage and reliability for operational services and applications requiring long SAR data time series. Moreover, SENTINEL-1 is specifically oriented to interferometry applications with stringent requirements based on attitude and orbit accuracy and it is intrinsically characterized by small spatial and temporal baselines. Consequently, SENTINEL-1 data are particularly suitable to be exploited through advanced interferometric techniques such as the well-known DInSAR algorithm referred to as Small BAseline Subset (SBAS), which allows the generation of deformation time series and displacement velocity maps. In this work we present an advanced interferometric processing chain, based on the Parallel SBAS (P-SBAS) approach, for the massive processing of S1 Interferometric Wide Swath (IWS) data aimed at generating deformation time series in efficient, automatic and systematic way. Such a DInSAR chain is designed to exploit distributed computing infrastructures, and more specifically Cloud Computing environments, to properly deal with the storage and the processing of huge S1 datasets. In particular, since S1 IWS data are acquired with the innovative Terrain Observation with Progressive Scans (TOPS) mode, we could benefit from the structure of S1 data, which are composed by bursts that can be considered as separate acquisitions. Indeed, the processing is intrinsically parallelizable with respect to such independent input data and therefore we basically exploited this coarse granularity parallelization strategy in the majority of the steps of the SBAS processing chain. Moreover, we also implemented more sophisticated parallelization approaches, exploiting both multi-node and multi-core programming techniques. Currently, Cloud Computing environments make available large collections of computing resources and storage that can be effectively exploited through the presented S1 P-SBAS processing chain to carry out interferometric analyses at a very large scale, in reduced time. This allows us to deal also with the problems connected to the use of S1 P-SBAS chain in operational contexts, related to hazard monitoring and risk prevention and mitigation, where handling large amounts of data represents a challenging task. As a significant experimental result we performed a large spatial scale SBAS analysis relevant to the Central and Southern Italy by exploiting the Amazon Web Services Cloud Computing platform. In particular, we processed in parallel 300 S1 acquisitions covering the Italian peninsula from Lazio to Sicily through the presented S1 P-SBAS processing chain, generating 710 interferograms, thus finally obtaining the displacement time series of the whole processed area. This work has been partially supported by the CNR-DPC agreement, the H2020 EPOS-IP project (GA 676564) and the ESA GEP project.

  2. Parallel processing architecture for H.264 deblocking filter on multi-core platforms

    NASA Astrophysics Data System (ADS)

    Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

    2012-03-01

    Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.

  3. Physics Mining of Multi-Source Data Sets

    NASA Technical Reports Server (NTRS)

    Helly, John; Karimabadi, Homa; Sipes, Tamara

    2012-01-01

    Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission replanning to optimize the allocation of observational resources. The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool's outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as physics-mining of data. The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.

  4. Investigating the Role of Global Histogram Equalization Technique for 99mTechnetium-Methylene diphosphonate Bone Scan Image Enhancement

    PubMed Central

    Pandey, Anil Kumar; Sharma, Param Dev; Dheer, Pankaj; Parida, Girish Kumar; Goyal, Harish; Patel, Chetan; Bal, Chandrashekhar; Kumar, Rakesh

    2017-01-01

    Purpose of the Study: 99mTechnetium-methylene diphosphonate (99mTc-MDP) bone scan images have limited number of counts per pixel, and hence, they have inferior image quality compared to X-rays. Theoretically, global histogram equalization (GHE) technique can improve the contrast of a given image though practical benefits of doing so have only limited acceptance. In this study, we have investigated the effect of GHE technique for 99mTc-MDP-bone scan images. Materials and Methods: A set of 89 low contrast 99mTc-MDP whole-body bone scan images were included in this study. These images were acquired with parallel hole collimation on Symbia E gamma camera. The images were then processed with histogram equalization technique. The image quality of input and processed images were reviewed by two nuclear medicine physicians on a 5-point scale where score of 1 is for very poor and 5 is for the best image quality. A statistical test was applied to find the significance of difference between the mean scores assigned to input and processed images. Results: This technique improves the contrast of the images; however, oversaturation was noticed in the processed images. Student's t-test was applied, and a statistically significant difference in the input and processed image quality was found at P < 0.001 (with α = 0.05). However, further improvement in image quality is needed as per requirements of nuclear medicine physicians. Conclusion: GHE techniques can be used on low contrast bone scan images. In some of the cases, a histogram equalization technique in combination with some other postprocessing technique is useful. PMID:29142344

  5. Brain plasticity and functionality explored by nonlinear optical microscopy

    NASA Astrophysics Data System (ADS)

    Sacconi, L.; Allegra, L.; Buffelli, M.; Cesare, P.; D'Angelo, E.; Gandolfi, D.; Grasselli, G.; Lotti, J.; Mapelli, J.; Strata, P.; Pavone, F. S.

    2010-02-01

    In combination with fluorescent protein (XFP) expression techniques, two-photon microscopy has become an indispensable tool to image cortical plasticity in living mice. In parallel to its application in imaging, multi-photon absorption has also been used as a tool for the dissection of single neurites with submicrometric precision without causing any visible collateral damage to the surrounding neuronal structures. In this work, multi-photon nanosurgery is applied to dissect single climbing fibers expressing GFP in the cerebellar cortex. The morphological consequences are then characterized with time lapse 3-dimensional two-photon imaging over a period of minutes to days after the procedure. Preliminary investigations show that the laser induced fiber dissection recalls a regenerative process in the fiber itself over a period of days. These results show the possibility of this innovative technique to investigate regenerative processes in adult brain. In parallel with imaging and manipulation technique, non-linear microscopy offers the opportunity to optically record electrical activity in intact neuronal networks. In this work, we combined the advantages of second-harmonic generation (SHG) with a random access (RA) excitation scheme to realize a new microscope (RASH) capable of optically recording fast membrane potential events occurring in a wide-field of view. The RASH microscope, in combination with bulk loading of tissue with FM4-64 dye, was used to simultaneously record electrical activity from clusters of Purkinje cells in acute cerebellar slices. Complex spikes, both synchronous and asynchronous, were optically recorded simultaneously across a given population of neurons. Spontaneous electrical activity was also monitored simultaneously in pairs of neurons, where action potentials were recorded without averaging across trials. These results show the strength of this technique in describing the temporal dynamics of neuronal assemblies, opening promising perspectives in understanding the computations of neuronal networks.

  6. Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

    NASA Astrophysics Data System (ADS)

    Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

    1997-12-01

    Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.

  7. Detection, location, and quantification of structural damage by neural-net-processed moiré profilometry

    NASA Astrophysics Data System (ADS)

    Grossman, Barry G.; Gonzalez, Frank S.; Blatt, Joel H.; Hooker, Jeffery A.

    1992-03-01

    The development of efficient high speed techniques to recognize, locate, and quantify damage is vitally important for successful automated inspection systems such as ones used for the inspection of undersea pipelines. Two critical problems must be solved to achieve these goals: the reduction of nonuseful information present in the video image and automatic recognition and quantification of extent and location of damage. Artificial neural network processed moire profilometry appears to be a promising technique to accomplish this. Real time video moire techniques have been developed which clearly distinguish damaged and undamaged areas on structures, thus reducing the amount of extraneous information input into an inspection system. Artificial neural networks have demonstrated advantages for image processing, since they can learn the desired response to a given input and are inherently fast when implemented in hardware due to their parallel computing architecture. Video moire images of pipes with dents of different depths were used to train a neural network, with the desired output being the location and severity of the damage. The system was then successfully tested with a second series of moire images. The techniques employed and the results obtained are discussed.

  8. Development of a stereo analysis algorithm for generating topographic maps using interactive techniques of the MPP

    NASA Technical Reports Server (NTRS)

    Strong, James P.

    1987-01-01

    A local area matching algorithm was developed on the Massively Parallel Processor (MPP). It is an iterative technique that first matches coarse or low resolution areas and at each iteration performs matches of higher resolution. Results so far show that when good matches are possible in the two images, the MPP algorithm matches corresponding areas as well as a human observer. To aid in developing this algorithm, a control or shell program was developed for the MPP that allows interactive experimentation with various parameters and procedures to be used in the matching process. (This would not be possible without the high speed of the MPP). With the system, optimal techniques can be developed for different types of matching problems.

  9. Lazy checkpoint coordination for bounding rollback propagation

    NASA Technical Reports Server (NTRS)

    Wang, Yi-Min; Fuchs, W. Kent

    1992-01-01

    Independent checkpointing allows maximum process autonomy but suffers from potential domino effects. Coordinated checkpointing eliminates the domino effect by sacrificing a certain degree of process autonomy. In this paper, we propose the technique of lazy checkpoint coordination which preserves process autonomy while employing communication-induced checkpoint coordination for bounding rollback propagation. The introduction of the notion of laziness allows a flexible trade-off between the cost for checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme for real applications.

  10. Under-sampling in a Multiple-Channel Laser Vibrometry System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Corey, Jordan

    2007-03-01

    Laser vibrometry is a technique used to detect vibrations on objects using the interference of coherent light with itself. Most vibrometry systems process only one target location at a time, but processing multiple locations simultaneously provides improved detection capabilities. Traditional laser vibrometry systems employ oversampling to sample the incoming modulated-light signal, however as the number of channels increases in these systems, certain issues arise such a higher computational cost, excessive heat, increased power requirements, and increased component cost. This thesis describes a novel approach to laser vibrometry that utilizes undersampling to control the undesirable issues associated with over-sampled systems. Undersamplingmore » allows for significantly less samples to represent the modulated-light signals, which offers several advantages in the overall system design. These advantages include an improvement in thermal efficiency, lower processing requirements, and a higher immunity to the relative intensity noise inherent in laser vibrometry applications. A unique feature of this implementation is the use of a parallel architecture to increase the overall system throughput. This parallelism is realized using a hierarchical multi-channel architecture based on off-the-shelf programmable logic devices (PLDs).« less

  11. Hospital integrated parallel cluster for fast and cost-efficient image analysis: clinical experience and research evaluation

    NASA Astrophysics Data System (ADS)

    Erberich, Stephan G.; Hoppe, Martin; Jansen, Christian; Schmidt, Thomas; Thron, Armin; Oberschelp, Walter

    2001-08-01

    In the last few years more and more University Hospitals as well as private hospitals changed to digital information systems for patient record, diagnostic files and digital images. Not only that patient management becomes easier, it is also very remarkable how clinical research can profit from Picture Archiving and Communication Systems (PACS) and diagnostic databases, especially from image databases. Since images are available on the finger tip, difficulties arise when image data needs to be processed, e.g. segmented, classified or co-registered, which usually demands a lot computational power. Today's clinical environment does support PACS very well, but real image processing is still under-developed. The purpose of this paper is to introduce a parallel cluster of standard distributed systems and its software components and how such a system can be integrated into a hospital environment. To demonstrate the cluster technique we present our clinical experience with the crucial but cost-intensive motion correction of clinical routine and research functional MRI (fMRI) data, as it is processed in our Lab on a daily basis.

  12. Optimization Algorithm for Kalman Filter Exploiting the Numerical Characteristics of SINS/GPS Integrated Navigation Systems.

    PubMed

    Hu, Shaoxing; Xu, Shike; Wang, Duhu; Zhang, Aiwu

    2015-11-11

    Aiming at addressing the problem of high computational cost of the traditional Kalman filter in SINS/GPS, a practical optimization algorithm with offline-derivation and parallel processing methods based on the numerical characteristics of the system is presented in this paper. The algorithm exploits the sparseness and/or symmetry of matrices to simplify the computational procedure. Thus plenty of invalid operations can be avoided by offline derivation using a block matrix technique. For enhanced efficiency, a new parallel computational mechanism is established by subdividing and restructuring calculation processes after analyzing the extracted "useful" data. As a result, the algorithm saves about 90% of the CPU processing time and 66% of the memory usage needed in a classical Kalman filter. Meanwhile, the method as a numerical approach needs no precise-loss transformation/approximation of system modules and the accuracy suffers little in comparison with the filter before computational optimization. Furthermore, since no complicated matrix theories are needed, the algorithm can be easily transplanted into other modified filters as a secondary optimization method to achieve further efficiency.

  13. Parallel ptychographic reconstruction

    DOE PAGES

    Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom; ...

    2014-12-19

    Ptychography is an imaging method whereby a coherent beam is scanned across an object, and an image is obtained by iterative phasing of the set of diffraction patterns. It is able to be used to image extended objects at a resolution limited by scattering strength of the object and detector geometry, rather than at an optics-imposed limit. As technical advances allow larger fields to be imaged, computational challenges arise for reconstructing the correspondingly larger data volumes, yet at the same time there is also a need to deliver reconstructed images immediately so that one can evaluate the next steps tomore » take in an experiment. Here we present a parallel method for real-time ptychographic phase retrieval. It uses a hybrid parallel strategy to divide the computation between multiple graphics processing units (GPUs) and then employs novel techniques to merge sub-datasets into a single complex phase and amplitude image. Results are shown on a simulated specimen and a real dataset from an X-ray experiment conducted at a synchrotron light source.« less

  14. Concentric Parallel Combining Balun for Millimeter-Wave Power Amplifier in Low-Power CMOS with High-Power Density

    NASA Astrophysics Data System (ADS)

    Han, Jiang-An; Kong, Zhi-Hui; Ma, Kaixue; Yeo, Kiat Seng; Lim, Wei Meng

    2016-11-01

    This paper presents a novel balun for a millimeter-wave power amplifier (PA) design to achieve high-power density in a 65-nm low-power (LP) CMOS process. By using a concentric winding technique, the proposed parallel combining balun with compact size accomplishes power combining and unbalance-balance conversion concurrently. For calculating its power combination efficiency in the condition of various amplitude and phase wave components, a method basing on S-parameters is derived. Based on the proposed parallel combining balun, a fabricated 60-GHz industrial, scientific, and medical (ISM) band PA with single-ended I/O achieves an 18.9-dB gain and an 8.8-dBm output power at 1-dB compression and 14.3-dBm saturated output power ( P sat) at 62 GHz. This PA occupying only a 0.10-mm2 core area has demonstrated a high-power density of 269.15 mW/mm2 in 65 nm LP CMOS.

  15. FDI and Accommodation Using NN Based Techniques

    NASA Astrophysics Data System (ADS)

    Garcia, Ramon Ferreiro; de Miguel Catoira, Alberto; Sanz, Beatriz Ferreiro

    Massive application of dynamic backpropagation neural networks is used on closed loop control FDI (fault detection and isolation) tasks. The process dynamics is mapped by means of a trained backpropagation NN to be applied on residual generation. Process supervision is then applied to discriminate faults on process sensors, and process plant parameters. A rule based expert system is used to implement the decision making task and the corresponding solution in terms of faults accommodation and/or reconfiguration. Results show an efficient and robust FDI system which could be used as the core of an SCADA or alternatively as a complement supervision tool operating in parallel with the SCADA when applied on a heat exchanger.

  16. Post-analysis report on Chesapeake Bay data processing. [spectral analysis and recognition computer signature extension

    NASA Technical Reports Server (NTRS)

    Thomson, F.

    1972-01-01

    The additional processing performed on data collected over the Rhode River Test Site and Forestry Site in November 1970 is reported. The techniques and procedures used to obtain the processed results are described. Thermal data collected over three approximately parallel lines of the site were contoured, and the results color coded, for the purpose of delineating important scene constituents and to identify trees attacked by pine bark beetles. Contouring work and histogram preparation are reviewed and the important conclusions from the spectral analysis and recognition computer (SPARC) signature extension work are summarized. The SPARC setup and processing records are presented and recommendations are made for future data collection over the site.

  17. ARPA surveillance technology for detection of targets hidden in foliage

    NASA Astrophysics Data System (ADS)

    Hoff, Lawrence E.; Stotts, Larry B.

    1994-02-01

    The processing of large quantities of synthetic aperture radar data in real time is a complex problem. Even the image formation process taxes today's most advanced computers. The use of complex algorithms with multiple channels adds another dimension to the computational problem. Advanced Research Projects Agency (ARPA) is currently planning on using the Paragon parallel processor for this task. The Paragon is small enough to allow its use in a sensor aircraft. Candidate algorithms will be implemented on the Paragon for evaluation for real time processing. In this paper ARPA technology developments for detecting targets hidden in foliage are reviewed and examples of signal processing techniques on field collected data are presented.

  18. Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

    NASA Technical Reports Server (NTRS)

    O'Connell, Matthew; Karman, Steve L.

    2015-01-01

    Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.

  19. VLSI neuroprocessors

    NASA Technical Reports Server (NTRS)

    Kemeny, Sabrina E.

    1994-01-01

    Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.

  20. Parallel heterogeneous architectures for efficient OMP compressive sensing reconstruction

    NASA Astrophysics Data System (ADS)

    Kulkarni, Amey; Stanislaus, Jerome L.; Mohsenin, Tinoosh

    2014-05-01

    Compressive Sensing (CS) is a novel scheme, in which a signal that is sparse in a known transform domain can be reconstructed using fewer samples. The signal reconstruction techniques are computationally intensive and have sluggish performance, which make them impractical for real-time processing applications . The paper presents novel architectures for Orthogonal Matching Pursuit algorithm, one of the popular CS reconstruction algorithms. We show the implementation results of proposed architectures on FPGA, ASIC and on a custom many-core platform. For FPGA and ASIC implementation, a novel thresholding method is used to reduce the processing time for the optimization problem by at least 25%. Whereas, for the custom many-core platform, efficient parallelization techniques are applied, to reconstruct signals with variant signal lengths of N and sparsity of m. The algorithm is divided into three kernels. Each kernel is parallelized to reduce execution time, whereas efficient reuse of the matrix operators allows us to reduce area. Matrix operations are efficiently paralellized by taking advantage of blocked algorithms. For demonstration purpose, all architectures reconstruct a 256-length signal with maximum sparsity of 8 using 64 measurements. Implementation on Xilinx Virtex-5 FPGA, requires 27.14 μs to reconstruct the signal using basic OMP. Whereas, with thresholding method it requires 18 μs. ASIC implementation reconstructs the signal in 13 μs. However, our custom many-core, operating at 1.18 GHz, takes 18.28 μs to complete. Our results show that compared to the previous published work of the same algorithm and matrix size, proposed architectures for FPGA and ASIC implementations perform 1.3x and 1.8x respectively faster. Also, the proposed many-core implementation performs 3000x faster than the CPU and 2000x faster than the GPU.

  1. Parallel algorithms for placement and routing in VLSI design. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Brouwer, Randall Jay

    1991-01-01

    The computational requirements for high quality synthesis, analysis, and verification of very large scale integration (VLSI) designs have rapidly increased with the fast growing complexity of these designs. Research in the past has focused on the development of heuristic algorithms, special purpose hardware accelerators, or parallel algorithms for the numerous design tasks to decrease the time required for solution. Two new parallel algorithms are proposed for two VLSI synthesis tasks, standard cell placement and global routing. The first algorithm, a parallel algorithm for global routing, uses hierarchical techniques to decompose the routing problem into independent routing subproblems that are solved in parallel. Results are then presented which compare the routing quality to the results of other published global routers and which evaluate the speedups attained. The second algorithm, a parallel algorithm for cell placement and global routing, hierarchically integrates a quadrisection placement algorithm, a bisection placement algorithm, and the previous global routing algorithm. Unique partitioning techniques are used to decompose the various stages of the algorithm into independent tasks which can be evaluated in parallel. Finally, results are presented which evaluate the various algorithm alternatives and compare the algorithm performance to other placement programs. Measurements are presented on the parallel speedups available.

  2. Thin-film-transistor array: an exploratory attempt for high throughput cell manipulation using electrowetting principle

    NASA Astrophysics Data System (ADS)

    Shaik, F. Azam; Cathcart, G.; Ihida, S.; Lereau-Bernier, M.; Leclerc, E.; Sakai, Y.; Toshiyoshi, H.; Tixier-Mita, A.

    2017-05-01

    In lab-on-a-chip (LoC) devices, microfluidic displacement of liquids is a key component. electrowetting on dielectric (EWOD) is a technique to move fluids, with the advantage of not requiring channels, pumps or valves. Fluids are discretized into droplets on microelectrodes and moved by applying an electric field via the electrodes to manipulate the contact angle. Micro-objects, such as biological cells, can be transported inside of these droplets. However, the design of conventional microelectrodes, made by standard micro-fabrication techniques, fixes the path of the droplets, and limits the reconfigurability of paths and thus limits the parallel processing of droplets. In that respect, thin film transistor (TFT) technology presents a great opportunity as it allows infinitely reconfigurable paths, with high parallelizability. We propose here to investigate the possibility of using TFT array devices for high throughput cell manipulation using EWOD. A COMSOL based 2D simulation coupled with a MATLAB algorithm was used to simulate the contact angle modulation, displacement and mixing of droplets. These simulations were confirmed by experimental results. The EWOD technique was applied to a droplet of culture medium containing HepG2 carcinoma cells and demonstrated no negative effects on the viability of the cells. This confirms the possibility of applying EWOD techniques to cellular applications, such as parallel cell analysis.

  3. Thread concept for automatic task parallelization in image analysis

    NASA Astrophysics Data System (ADS)

    Lueckenhaus, Maximilian; Eckstein, Wolfgang

    1998-09-01

    Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.

  4. Potentials of acousto-optical spectrum analysis on a basis of a novel algorithm of the collinear wave heterodyning in a large-aperture KRS-5 crystalline cell

    NASA Astrophysics Data System (ADS)

    Shcherbakov, Alexandre S.; Maximov, Jewgenij; Bliznetsov, Alexej M.; Sanchez Perez, Karla J.

    2011-03-01

    The technique under proposal for a precise spectrum analysis within an algorithm of the collinear wave heterodyning implies a two-stage integrated processing, namely, the wave heterodyning of a signal in a square-law nonlinear medium and then the optical processing in the same solid state cell. The technical advantage of this approach lies in providing a direct multichannel parallel processing of ultra-high-frequency radio-wave signals with essentially improved frequency resolution. This technique imposes specific requirements on the cell's material. We focus our attention on the solid solutions of thallium chalcogenides and take the TlBr-TlI (thallium bromine-thallium iodine) solution, which forms KRS-5 cubic-symmetry crystals with the mass-ratio 58% of TlBr to 42% of TlI. Analysis shows that the acousto-optical cell made of a KRS-5 crystal oriented along the [111]-axis and the corresponding longitudinal elastic mode for producing the dynamic diffractive grating can be exploited. With the acoustic velocity of about 1.92 × 105 cm/s and attenuation of ~10 dB/(cm GHz2), a similar cell is capable of providing an optical aperture of ~5.0 cm and one of the highest figures of acousto-optical merit in solid states in the visible range. Such a cell is rather desirable for the application to direct 5000-channel parallel spectrum analysis with an improved up to 10-5 relative frequency resolution.

  5. Electron molecular ion recombination: product excitation and fragmentation.

    PubMed

    Adams, Nigel G; Poterya, Viktoriya; Babcock, Lucia M

    2006-01-01

    Electron-ion dissociative recombination is an important ionization loss process in any ionized gas containing molecular ions. This includes the interstellar medium, circumstellar shells, cometary comae, planetary ionospheres, fusion plasma boundaries, combustion flames, laser plasmas and chemical deposition and etching plasmas. In addition to controlling the ionization density, the process generates many radical species, which can contribute to a parallel neutral chemistry. Techniques used to obtain rate data and product information (flowing afterglows and storage rings) are discussed and recent data are reviewed including diatomic to polyatomic ions and cluster ions. The data are divided into rate coefficients and cross sections, including their temperature/energy dependencies, and quantitative identification of neutral reaction products. The latter involve both ground and electronically excited states and including vibrational excitation. The data from the different techniques are compared and trends in the data are examined. The reactions are considered in terms of the basic mechanisms (direct and indirect processes including tunneling) and recent theoretical developments are discussed. Finally, new techniques are mentioned (for product identification; electrostatic storage rings, including single and double rings; Coulomb explosion) and new ways forward are suggested.

  6. Studies in optical parallel processing. [All optical and electro-optic approaches

    NASA Technical Reports Server (NTRS)

    Lee, S. H.

    1978-01-01

    Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.

  7. Kalman Filter Tracking on Parallel Architectures

    NASA Astrophysics Data System (ADS)

    Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2016-11-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment.

  8. A unified framework for building high performance DVEs

    NASA Astrophysics Data System (ADS)

    Lei, Kaibin; Ma, Zhixia; Xiong, Hua

    2011-10-01

    A unified framework for integrating PC cluster based parallel rendering with distributed virtual environments (DVEs) is presented in this paper. While various scene graphs have been proposed in DVEs, it is difficult to enable collaboration of different scene graphs. This paper proposes a technique for non-distributed scene graphs with the capability of object and event distribution. With the increase of graphics data, DVEs require more powerful rendering ability. But general scene graphs are inefficient in parallel rendering. The paper also proposes a technique to connect a DVE and a PC cluster based parallel rendering environment. A distributed multi-player video game is developed to show the interaction of different scene graphs and the parallel rendering performance on a large tiled display wall.

  9. Accelerating Wright–Fisher Forward Simulations on the Graphics Processing Unit

    PubMed Central

    Lawrie, David S.

    2017-01-01

    Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. PMID:28768689

  10. Increasing the perceptual salience of relationships in parallel coordinate plots.

    PubMed

    Harter, Jonathan M; Wu, Xunlei; Alabi, Oluwafemi S; Phadke, Madhura; Pinto, Lifford; Dougherty, Daniel; Petersen, Hannah; Bass, Steffen; Taylor, Russell M

    2012-01-01

    We present three extensions to parallel coordinates that increase the perceptual salience of relationships between axes in multivariate data sets: (1) luminance modulation maintains the ability to preattentively detect patterns in the presence of overplotting, (2) adding a one-vs.-all variable display highlights relationships between one variable and all others, and (3) adding a scatter plot within the parallel-coordinates display preattentively highlights clusters and spatial layouts without strongly interfering with the parallel-coordinates display. These techniques can be combined with one another and with existing extensions to parallel coordinates, and two of them generalize beyond cases with known-important axes. We applied these techniques to two real-world data sets (relativistic heavy-ion collision hydrodynamics and weather observations with statistical principal component analysis) as well as the popular car data set. We present relationships discovered in the data sets using these methods.

  11. Removal of floating dust in glow discharge using plasma jet

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ticos, C. M.; Jepu, I.; Lungu, C. P.

    2010-07-05

    Dust can be an inconvenient source of impurities in plasma processing reactors and in many cases it can cause damage to the plasma-treated surfaces. A technique for dust expulsion out of the trapping region in plasma is presented here, based on the wind force exerted on dust particles by a pulsed plasma jet. Its applicability is demonstrated by removing floating dust in the sheath of parallel-plate capacitive radio-frequency plasma.

  12. Proceedings of the Expert Systems Workshop Held in Pacific Grove, California on 16-18 April 1986

    DTIC Science & Technology

    1986-04-18

    13- NUMBER OF PAGES 197 N IS. SECURITY CLASS, (ol Mm raport) UNCLASSIFIED I5a. DECLASSIFI CATION/DOWNGRADING SCHEDULE 16. DISTRIBUTION...are distributed and parallel. * - Features unimplemented at present; scheduled for phase 2. Table 1-1: Key design characteristics of ABE 2. a...data structuring techniques and a semi- deterministic scheduler . A program for the DF framework consists of a number of independent processing modules

  13. Geopotential Error Analysis from Satellite Gradiometer and Global Positioning System Observables on Parallel Architecture

    NASA Technical Reports Server (NTRS)

    Schutz, Bob E.; Baker, Gregory A.

    1997-01-01

    The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.

  14. Parallel computing in experimental mechanics and optical measurement: A review (II)

    NASA Astrophysics Data System (ADS)

    Wang, Tianyi; Kemao, Qian

    2018-05-01

    With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.

  15. Geopotential error analysis from satellite gradiometer and global positioning system observables on parallel architectures

    NASA Astrophysics Data System (ADS)

    Baker, Gregory Allen

    The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.

  16. A Cyber-ITS Framework for Massive Traffic Data Analysis Using Cyber Infrastructure

    PubMed Central

    Fontaine, Michael D.

    2013-01-01

    Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing. PMID:23766690

  17. A Cyber-ITS framework for massive traffic data analysis using cyber infrastructure.

    PubMed

    Xia, Yingjie; Hu, Jia; Fontaine, Michael D

    2013-01-01

    Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing.

  18. Modulation and coding for satellite and space communications

    NASA Technical Reports Server (NTRS)

    Yuen, Joseph H.; Simon, Marvin K.; Pollara, Fabrizio; Divsalar, Dariush; Miller, Warner H.; Morakis, James C.; Ryan, Carl R.

    1990-01-01

    Several modulation and coding advances supported by NASA are summarized. To support long-constraint-length convolutional code, a VLSI maximum-likelihood decoder, utilizing parallel processing techniques, which is being developed to decode convolutional codes of constraint length 15 and a code rate as low as 1/6 is discussed. A VLSI high-speed 8-b Reed-Solomon decoder which is being developed for advanced tracking and data relay satellite (ATDRS) applications is discussed. A 300-Mb/s modem with continuous phase modulation (CPM) and codings which is being developed for ATDRS is discussed. Trellis-coded modulation (TCM) techniques are discussed for satellite-based mobile communication applications.

  19. Comparison of infinite and wedge fringe settings in Mach Zehnder interferometer for temperature field measurement

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Haridas, Divya; P, Vibin Antony; Sajith, V.

    2014-10-15

    Interferometric method, which utilizes the interference of coherent light beams, is used to determine the temperature distribution in the vicinity of a vertical heater plate. The optical components are arranged so as to obtain wedge fringe and infinite fringe patterns and isotherms obtained in each case were compared. In wedge fringe setting, image processing techniques has been used for obtaining isotherms by digital subtraction of initial parallel fringe pattern from deformed fringe pattern. The experimental results obtained are compared with theoretical correlations. The merits and demerits of the fringe analysis techniques are discussed on the basis of the experimental results.

  20. Static analysis techniques for semiautomatic synthesis of message passing software skeletons

    DOE PAGES

    Sottile, Matthew; Dagit, Jason; Zhang, Deli; ...

    2015-06-29

    The design of high-performance computing architectures demands performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a “program skeleton” that we discuss in this article is an abstracted program that is derived from a larger program where source code that is determined to be irrelevant is removed formore » the purposes of the skeleton. In this work, we develop a semiautomatic approach for extracting program skeletons based on compiler program analysis. Finally, we demonstrate correctness of our skeleton extraction process by comparing details from communication traces, as well as show the performance speedup of using skeletons by running simulations in the SST/macro simulator.« less

  1. Development of a Navier-Stokes algorithm for parallel-processing supercomputers. Ph.D. Thesis - Colorado State Univ., Dec. 1988

    NASA Technical Reports Server (NTRS)

    Swisshelm, Julie M.

    1989-01-01

    An explicit flow solver, applicable to the hierarchy of model equations ranging from Euler to full Navier-Stokes, is combined with several techniques designed to reduce computational expense. The computational domain consists of local grid refinements embedded in a global coarse mesh, where the locations of these refinements are defined by the physics of the flow. Flow characteristics are also used to determine which set of model equations is appropriate for solution in each region, thereby reducing not only the number of grid points at which the solution must be obtained, but also the computational effort required to get that solution. Acceleration to steady-state is achieved by applying multigrid on each of the subgrids, regardless of the particular model equations being solved. Since each of these components is explicit, advantage can readily be taken of the vector- and parallel-processing capabilities of machines such as the Cray X-MP and Cray-2.

  2. Real time emotion aware applications: a case study employing emotion evocative pictures and neuro-physiological sensing enhanced by Graphic Processor Units.

    PubMed

    Konstantinidis, Evdokimos I; Frantzidis, Christos A; Pappas, Costas; Bamidis, Panagiotis D

    2012-07-01

    In this paper the feasibility of adopting Graphic Processor Units towards real-time emotion aware computing is investigated for boosting the time consuming computations employed in such applications. The proposed methodology was employed in analysis of encephalographic and electrodermal data gathered when participants passively viewed emotional evocative stimuli. The GPU effectiveness when processing electroencephalographic and electrodermal recordings is demonstrated by comparing the execution time of chaos/complexity analysis through nonlinear dynamics (multi-channel correlation dimension/D2) and signal processing algorithms (computation of skin conductance level/SCL) into various popular programming environments. Apart from the beneficial role of parallel programming, the adoption of special design techniques regarding memory management may further enhance the time minimization which approximates a factor of 30 in comparison with ANSI C language (single-core sequential execution). Therefore, the use of GPU parallel capabilities offers a reliable and robust solution for real-time sensing the user's affective state. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  3. Efficient multitasking: parallel versus serial processing of multiple tasks

    PubMed Central

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling. PMID:26441742

  4. Efficient multitasking: parallel versus serial processing of multiple tasks.

    PubMed

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.

  5. Parallel Logic Programming and Parallel Systems Software and Hardware

    DTIC Science & Technology

    1989-07-29

    Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted

  6. 76 FR 21076 - Self-Regulatory Organizations; EDGA Exchange, Inc.; Notice of Filing and Immediate Effectiveness...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-14

    ... on the Exchange's Internet Web site at http://www.directedge.com . \\3\\ A Member is any registered... Parallel D or Parallel 2D with the DRT (Dark routing technique) option on BZX. BZX charges $0.0020 per... the DRT (Dark routing technique) option on BZX or SCAN/STGY on Nasdaq OMX Exchange (``Nasdaq.'') BATS...

  7. Parallel plan execution with self-processing networks

    NASA Technical Reports Server (NTRS)

    Dautrechy, C. Lynne; Reggia, James A.

    1989-01-01

    A critical issue for space operations is how to develop and apply advanced automation techniques to reduce the cost and complexity of working in space. In this context, it is important to examine how recent advances in self-processing networks can be applied for planning and scheduling tasks. For this reason, the feasibility of applying self-processing network models to a variety of planning and control problems relevant to spacecraft activities is being explored. Goals are to demonstrate that self-processing methods are applicable to these problems, and that MIRRORS/II, a general purpose software environment for implementing self-processing models, is sufficiently robust to support development of a wide range of application prototypes. Using MIRRORS/II and marker passing modelling techniques, a model of the execution of a Spaceworld plan was implemented. This is a simplified model of the Voyager spacecraft which photographed Jupiter, Saturn, and their satellites. It is shown that plan execution, a task usually solved using traditional artificial intelligence (AI) techniques, can be accomplished using a self-processing network. The fact that self-processing networks were applied to other space-related tasks, in addition to the one discussed here, demonstrates the general applicability of this approach to planning and control problems relevant to spacecraft activities. It is also demonstrated that MIRRORS/II is a powerful environment for the development and evaluation of self-processing systems.

  8. Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Youngblood, John N.; Saha, Aindam

    1987-01-01

    Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.

  9. Complexity of parallel implementation of domain decomposition techniques for elliptic partial differential equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gropp, W.D.; Keyes, D.E.

    1988-03-01

    The authors discuss the parallel implementation of preconditioned conjugate gradient (PCG)-based domain decomposition techniques for self-adjoint elliptic partial differential equations in two dimensions on several architectures. The complexity of these methods is described on a variety of message-passing parallel computers as a function of the size of the problem, number of processors and relative communication speeds of the processors. They show that communication startups are very important, and that even the small amount of global communication in these methods can significantly reduce the performance of many message-passing architectures.

  10. Portable parallel portfolio optimization in the Aurora Financial Management System

    NASA Astrophysics Data System (ADS)

    Laure, Erwin; Moritsch, Hans

    2001-07-01

    Financial planning problems are formulated as large scale, stochastic, multiperiod, tree structured optimization problems. An efficient technique for solving this kind of problems is the nested Benders decomposition method. In this paper we present a parallel, portable, asynchronous implementation of this technique. To achieve our portability goals we elected the programming language Java for our implementation and used a high level Java based framework, called OpusJava, for expressing the parallelism potential as well as synchronization constraints. Our implementation is embedded within a modular decision support tool for portfolio and asset liability management, the Aurora Financial Management System.

  11. Optimization technique for problems with an inequality constraint

    NASA Technical Reports Server (NTRS)

    Russell, K. J.

    1972-01-01

    General technique uses a modified version of an existing technique termed the pattern search technique. New procedure called the parallel move strategy permits pattern search technique to be used with problems involving a constraint.

  12. Evolutionary computing for the design search and optimization of space vehicle power subsystems

    NASA Technical Reports Server (NTRS)

    Kordon, Mark; Klimeck, Gerhard; Hanks, David; Hua, Hook

    2004-01-01

    Evolutionary computing has proven to be a straightforward and robust approach for optimizing a wide range of difficult analysis and design problems. This paper discusses the application of these techniques to an existing space vehicle power subsystem resource and performance analysis simulation in a parallel processing environment. Out preliminary results demonstrate that this approach has the potential to improve the space system trade study process by allowing engineers to statistically weight subsystem goals of mass, cost and performance then automatically size power elements based on anticipated performance of the subsystem rather than on worst-case estimates.

  13. Seq-ing answers: uncovering the unexpected in global gene regulation.

    PubMed

    Otto, George Maxwell; Brar, Gloria Ann

    2018-04-19

    The development of techniques for measuring gene expression globally has greatly expanded our understanding of gene regulatory mechanisms in depth and scale. We can now quantify every intermediate and transition in the canonical pathway of gene expression-from DNA to mRNA to protein-genome-wide. Employing such measurements in parallel can produce rich datasets, but extracting the most information requires careful experimental design and analysis. Here, we argue for the value of genome-wide studies that measure multiple outputs of gene expression over many timepoints during the course of a natural developmental process. We discuss our findings from a highly parallel gene expression dataset of meiotic differentiation, and those of others, to illustrate how leveraging these features can provide new and surprising insight into fundamental mechanisms of gene regulation.

  14. Chip architecture - A revolution brewing

    NASA Astrophysics Data System (ADS)

    Guterl, F.

    1983-07-01

    Techniques being explored by microchip designers and manufacturers to both speed up memory access and instruction execution while protecting memory are discussed. Attention is given to hardwiring control logic, pipelining for parallel processing, devising orthogonal instruction sets for interchangeable instruction fields, and the development of hardware for implementation of virtual memory and multiuser systems to provide memory management and protection. The inclusion of microcode in mainframes eliminated logic circuits that control timing and gating of the CPU. However, improvements in memory architecture have reduced access time to below that needed for instruction execution. Hardwiring the functions as a virtual memory enhances memory protection. Parallelism involves a redundant architecture, which allows identical operations to be performed simultaneously, and can be directed with microcode to avoid abortion of intermediate instructions once on set of instructions has been completed.

  15. Error Control Coding Techniques for Space and Satellite Communications

    NASA Technical Reports Server (NTRS)

    Costello, Daniel J., Jr.; Takeshita, Oscar Y.; Cabral, Hermano A.

    1998-01-01

    It is well known that the BER performance of a parallel concatenated turbo-code improves roughly as 1/N, where N is the information block length. However, it has been observed by Benedetto and Montorsi that for most parallel concatenated turbo-codes, the FER performance does not improve monotonically with N. In this report, we study the FER of turbo-codes, and the effects of their concatenation with an outer code. Two methods of concatenation are investigated: across several frames and within each frame. Some asymmetric codes are shown to have excellent FER performance with an information block length of 16384. We also show that the proposed outer coding schemes can improve the BER performance as well by eliminating pathological frames generated by the iterative MAP decoding process.

  16. Program Correctness, Verification and Testing for Exascale (Corvette)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sen, Koushik; Iancu, Costin; Demmel, James W

    The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partialmore » program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.« less

  17. A smart technique for attendance system to recognize faces through parallelism

    NASA Astrophysics Data System (ADS)

    Prabhavathi, B.; Tanuja, V.; Madhu Viswanatham, V.; Rajashekhara Babu, M.

    2017-11-01

    Major part of recognising a person is face with the help of image processing techniques we can exploit the physical features of a person. In the old approach method that is used in schools and colleges it is there that the professor calls the student name and then the attendance for the students marked. Here in paper want to deviate from the old approach and go with the new approach by using techniques that are there in image processing. In this paper we presenting spontaneous presence for students in classroom. At first classroom image has been in use and after that image is kept in data record. For the images that are stored in the database we apply system algorithm which includes steps such as, histogram classification, noise removal, face detection and face recognition methods. So by using these steps we detect the faces and then compare it with the database. The attendance gets marked automatically if the system recognizes the faces.

  18. A TEMPLATE-BASED FABRICATION TECHNIQUE FOR SPATIALLY-DESIGNED POLYMER MICRO/NANOFIBER COMPOSITES

    PubMed Central

    Naik, Nisarga; Caves, Jeff; Kumar, Vivek; Chaikof, Elliot; Allen, Mark G.

    2013-01-01

    This paper reports a template-based technique for the fabrication of polymer micro/nanofiber composites, exercising control over the fiber dimensions and alignment. Unlike conventional spinning-based methods of fiber production, the presented approach is based on micro-transfer molding. It is a parallel processing technique capable of producing fibers with control over both in-plane and out-of-plane geometries, in addition to packing density and layout of the fibers. Collagen has been used as a test polymer to demonstrate the concept. Hollow and solid collagen fibers with various spatial layouts have been fabricated. Produced fibers have widths ranging from 2 µm to 50 µm, and fiber thicknesses ranging from 300 nm to 3 µm. Also, three-dimensionality of the process has been demonstrated by producing in-plane serpentine fibers with designed arc lengths, out-of-plane wavy fibers, fibers with focalized particle encapsulation, and porous fibers with desired periodicity and pore sizes. PMID:24533428

  19. Novel techniques for data decomposition and load balancing for parallel processing of vision systems: Implementation and evaluation using a motion estimation system

    NASA Technical Reports Server (NTRS)

    Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

    1989-01-01

    Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.

  20. Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.

    PubMed

    Meng, Bowen; Pratx, Guillem; Xing, Lei

    2011-12-01

    Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.

  1. Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment

    PubMed Central

    Meng, Bowen; Pratx, Guillem; Xing, Lei

    2011-01-01

    Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842

  2. Parallel machine architecture and compiler design facilities

    NASA Technical Reports Server (NTRS)

    Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

    1990-01-01

    The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.

  3. Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation

    NASA Technical Reports Server (NTRS)

    Kouatchou, Jules; Halem, Milton (Technical Monitor)

    2000-01-01

    We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.

  4. Parallelization of a spatial random field characterization process using the Method of Anchored Distributions and the HTCondor high throughput computing system

    NASA Astrophysics Data System (ADS)

    Osorio-Murillo, C. A.; Over, M. W.; Frystacky, H.; Ames, D. P.; Rubin, Y.

    2013-12-01

    A new software application called MAD# has been coupled with the HTCondor high throughput computing system to aid scientists and educators with the characterization of spatial random fields and enable understanding the spatial distribution of parameters used in hydrogeologic and related modeling. MAD# is an open source desktop software application used to characterize spatial random fields using direct and indirect information through Bayesian inverse modeling technique called the Method of Anchored Distributions (MAD). MAD relates indirect information with a target spatial random field via a forward simulation model. MAD# executes inverse process running the forward model multiple times to transfer information from indirect information to the target variable. MAD# uses two parallelization profiles according to computational resources available: one computer with multiple cores and multiple computers - multiple cores through HTCondor. HTCondor is a system that manages a cluster of desktop computers for submits serial or parallel jobs using scheduling policies, resources monitoring, job queuing mechanism. This poster will show how MAD# reduces the time execution of the characterization of random fields using these two parallel approaches in different case studies. A test of the approach was conducted using 1D problem with 400 cells to characterize saturated conductivity, residual water content, and shape parameters of the Mualem-van Genuchten model in four materials via the HYDRUS model. The number of simulations evaluated in the inversion was 10 million. Using the one computer approach (eight cores) were evaluated 100,000 simulations in 12 hours (10 million - 1200 hours approximately). In the evaluation on HTCondor, 32 desktop computers (132 cores) were used, with a processing time of 60 hours non-continuous in five days. HTCondor reduced the processing time for uncertainty characterization by a factor of 20 (1200 hours reduced to 60 hours.)

  5. Application of Cross-Correlation Greens Function Along With FDTD for Fast Computation of Envelope Correlation Coefficient Over Wideband for MIMO Antennas

    NASA Astrophysics Data System (ADS)

    Sarkar, Debdeep; Srivastava, Kumar Vaibhav

    2017-02-01

    In this paper, the concept of cross-correlation Green's functions (CGF) is used in conjunction with the finite difference time domain (FDTD) technique for calculation of envelope correlation coefficient (ECC) of any arbitrary MIMO antenna system over wide frequency band. Both frequency-domain (FD) and time-domain (TD) post-processing techniques are proposed for possible application with this FDTD-CGF scheme. The FDTD-CGF time-domain (FDTD-CGF-TD) scheme utilizes time-domain signal processing methods and exhibits significant reduction in ECC computation time as compared to the FDTD-CGF frequency domain (FDTD-CGF-FD) scheme, for high frequency-resolution requirements. The proposed FDTD-CGF based schemes can be applied for accurate and fast prediction of wideband ECC response, instead of the conventional scattering parameter based techniques which have several limitations. Numerical examples of the proposed FDTD-CGF techniques are provided for two-element MIMO systems involving thin-wire half-wavelength dipoles in parallel side-by-side as well as orthogonal arrangements. The results obtained from the FDTD-CGF techniques are compared with results from commercial electromagnetic solver Ansys HFSS, to verify the validity of proposed approach.

  6. Teaching ethics to engineers: ethical decision making parallels the engineering design process.

    PubMed

    Bero, Bridget; Kuhlman, Alana

    2011-09-01

    In order to fulfill ABET requirements, Northern Arizona University's Civil and Environmental engineering programs incorporate professional ethics in several of its engineering courses. This paper discusses an ethics module in a 3rd year engineering design course that focuses on the design process and technical writing. Engineering students early in their student careers generally possess good black/white critical thinking skills on technical issues. Engineering design is the first time students are exposed to "grey" or multiple possible solution technical problems. To identify and solve these problems, the engineering design process is used. Ethical problems are also "grey" problems and present similar challenges to students. Students need a practical tool for solving these ethical problems. The step-wise engineering design process was used as a model to demonstrate a similar process for ethical situations. The ethical decision making process of Martin and Schinzinger was adapted for parallelism to the design process and presented to students as a step-wise technique for identification of the pertinent ethical issues, relevant moral theories, possible outcomes and a final decision. Students had greatest difficulty identifying the broader, global issues presented in an ethical situation, but by the end of the module, were better able to not only identify the broader issues, but also to more comprehensively assess specific issues, generate solutions and a desired response to the issue.

  7. Tuning HDF5 subfiling performance on parallel file systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Byna, Suren; Chaarawi, Mohamad; Koziol, Quincey

    Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable number of files. In this paper, we evaluate and tune the performance of recently implemented subfiling feature in HDF5. In specific, we explain the implementation strategy of subfiling feature in HDF5, provide examples of using the feature, and evaluate andmore » tune parallel I/O performance of this feature with parallel file systems of the Cray XC40 system at NERSC (Cori) that include a burst buffer storage and a Lustre disk-based storage. We also evaluate I/O performance on the Cray XC30 system, Edison, at NERSC. Our results show performance benefits of 1.2X to 6X performance advantage with subfiling compared to writing a single shared HDF5 file. We present our exploration of configurations, such as the number of subfiles and the number of Lustre storage targets to storing files, as optimization parameters to obtain superior I/O performance. Based on this exploration, we discuss recommendations for achieving good I/O performance as well as limitations with using the subfiling feature.« less

  8. A high-speed linear algebra library with automatic parallelism

    NASA Technical Reports Server (NTRS)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  9. NeuroSeek dual-color image processing infrared focal plane array

    NASA Astrophysics Data System (ADS)

    McCarley, Paul L.; Massie, Mark A.; Baxter, Christopher R.; Huynh, Buu L.

    1998-09-01

    Several technologies have been developed in recent years to advance the state of the art of IR sensor systems including dual color affordable focal planes, on-focal plane array biologically inspired image and signal processing techniques and spectral sensing techniques. Pacific Advanced Technology (PAT) and the Air Force Research Lab Munitions Directorate have developed a system which incorporates the best of these capabilities into a single device. The 'NeuroSeek' device integrates these technologies into an IR focal plane array (FPA) which combines multicolor Midwave IR/Longwave IR radiometric response with on-focal plane 'smart' neuromorphic analog image processing. The readout and processing integrated circuit very large scale integration chip which was developed under this effort will be hybridized to a dual color detector array to produce the NeuroSeek FPA, which will have the capability to fuse multiple pixel-based sensor inputs directly on the focal plane. Great advantages are afforded by application of massively parallel processing algorithms to image data in the analog domain; the high speed and low power consumption of this device mimic operations performed in the human retina.

  10. Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

    NASA Technical Reports Server (NTRS)

    Chen, Paul Peichuan

    1993-01-01

    Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.

  11. Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

    PubMed

    Tam, Wing-Kin; Yang, Zhi

    2018-05-01

    Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.

  12. DOE SBIR Phase-1 Report on Hybrid CPU-GPU Parallel Development of the Eulerian-Lagrangian Barracuda Multiphase Program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dr. Dale M. Snider

    2011-02-28

    This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendlymore » environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and jointly parallelize subroutines (CPFD chose the small business EMPhotonics for the Phase-1 the technical partner. See Section Technical Objective and Approach) Task 3: Integrate parallel subroutines into Barracuda (See Section Results from Phase-1 and its subsections) Task 4: Testing, refinement, and optimization of parallel methodology (See Section Results from Phase-1 and Section Result Comparison Program) Task 5: Integrate Phase-1 parallel subroutines into Barracuda and release (See Section Results from Phase-1 and its subsections) Task 6: Roadmap of Phase-2 (See Section Plan for Phase-2) With the completion of Phase 1 we have the base understanding to completely parallelize Barracuda. An overview of the work to move Barracuda to a parallelized code is given in Plan for Phase-2.« less

  13. Anatomically constrained neural network models for the categorization of facial expression

    NASA Astrophysics Data System (ADS)

    McMenamin, Brenton W.; Assadi, Amir H.

    2004-12-01

    The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.

  14. Anatomically constrained neural network models for the categorization of facial expression

    NASA Astrophysics Data System (ADS)

    McMenamin, Brenton W.; Assadi, Amir H.

    2005-01-01

    The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.

  15. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, D.B.

    1996-12-31

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.

  16. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, Dario B.

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  17. A Scalable Cloud Library Empowering Big Data Management, Diagnosis, and Visualization of Cloud-Resolving Models

    NASA Astrophysics Data System (ADS)

    Zhou, S.; Tao, W. K.; Li, X.; Matsui, T.; Sun, X. H.; Yang, X.

    2015-12-01

    A cloud-resolving model (CRM) is an atmospheric numerical model that can numerically resolve clouds and cloud systems at 0.25~5km horizontal grid spacings. The main advantage of the CRM is that it can allow explicit interactive processes between microphysics, radiation, turbulence, surface, and aerosols without subgrid cloud fraction, overlapping and convective parameterization. Because of their fine resolution and complex physical processes, it is challenging for the CRM community to i) visualize/inter-compare CRM simulations, ii) diagnose key processes for cloud-precipitation formation and intensity, and iii) evaluate against NASA's field campaign data and L1/L2 satellite data products due to large data volume (~10TB) and complexity of CRM's physical processes. We have been building the Super Cloud Library (SCL) upon a Hadoop framework, capable of CRM database management, distribution, visualization, subsetting, and evaluation in a scalable way. The current SCL capability includes (1) A SCL data model enables various CRM simulation outputs in NetCDF, including the NASA-Unified Weather Research and Forecasting (NU-WRF) and Goddard Cumulus Ensemble (GCE) model, to be accessed and processed by Hadoop, (2) A parallel NetCDF-to-CSV converter supports NU-WRF and GCE model outputs, (3) A technique visualizes Hadoop-resident data with IDL, (4) A technique subsets Hadoop-resident data, compliant to the SCL data model, with HIVE or Impala via HUE's Web interface, (5) A prototype enables a Hadoop MapReduce application to dynamically access and process data residing in a parallel file system, PVFS2 or CephFS, where high performance computing (HPC) simulation outputs such as NU-WRF's and GCE's are located. We are testing Apache Spark to speed up SCL data processing and analysis.With the SCL capabilities, SCL users can conduct large-domain on-demand tasks without downloading voluminous CRM datasets and various observations from NASA Field Campaigns and Satellite data to a local computer, and inter-compare CRM output and data with GCE and NU-WRF.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lincoln, Don

    The manner in which particle physicists investigate collisions in particle accelerators is a puzzling process. Using vaguely-defined “detectors,” scientists are able to somehow reconstruct the collisions and convert that information into physics measurements. In this video, Fermilab’s Dr. Don Lincoln sheds light on this mysterious technique. In a surprising analogy, he draws a parallel between experimental particle physics and bomb squad investigators and uses an explosive example to illustrate his points. Be sure to watch this video… it’s totally the bomb.

  19. Parallel optimization of signal detection in active magnetospheric signal injection experiments

    NASA Astrophysics Data System (ADS)

    Gowanlock, Michael; Li, Justin D.; Rude, Cody M.; Pankratius, Victor

    2018-05-01

    Signal detection and extraction requires substantial manual parameter tuning at different stages in the processing pipeline. Time-series data depends on domain-specific signal properties, necessitating unique parameter selection for a given problem. The large potential search space makes this parameter selection process time-consuming and subject to variability. We introduce a technique to search and prune such parameter search spaces in parallel and select parameters for time series filters using breadth- and depth-first search strategies to increase the likelihood of detecting signals of interest in the field of magnetospheric physics. We focus on studying geomagnetic activity in the extremely and very low frequency ranges (ELF/VLF) using ELF/VLF transmissions from Siple Station, Antarctica, received at Québec, Canada. Our technique successfully detects amplified transmissions and achieves substantial speedup performance gains as compared to an exhaustive parameter search. We present examples where our algorithmic approach reduces the search from hundreds of seconds down to less than 1 s, with a ranked signal detection in the top 99th percentile, thus making it valuable for real-time monitoring. We also present empirical performance models quantifying the trade-off between the quality of signal recovered and the algorithm response time required for signal extraction. In the future, improved signal extraction in scenarios like the Siple experiment will enable better real-time diagnostics of conditions of the Earth's magnetosphere for monitoring space weather activity.

  20. Super and parallel computers and their impact on civil engineering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamat, M.P.

    1986-01-01

    This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.

  1. Parallel processing architecture for computing inverse differential kinematic equations of the PUMA arm

    NASA Technical Reports Server (NTRS)

    Hsia, T. C.; Lu, G. Z.; Han, W. H.

    1987-01-01

    In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.

  2. Performance evaluation of canny edge detection on a tiled multicore architecture

    NASA Astrophysics Data System (ADS)

    Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

    2011-01-01

    In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.

  3. Use of parallel computing in mass processing of laser data

    NASA Astrophysics Data System (ADS)

    Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

    2015-12-01

    The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.

  4. Low cost solar array project production process and equipment task: A Module Experimental Process System Development Unit (MEPSDU)

    NASA Technical Reports Server (NTRS)

    1981-01-01

    Several major modifications were made to the design presented at the PDR. The frame was deleted in favor of a "frameless" design which will provide a substantially improved cell packing factor. Potential shaded cell damage resulting from operation into a short circuit can be eliminated by a change in the cell series/parallel electrical interconnect configuration. The baseline process sequence defined for the MEPSON was refined and equipment design and specification work was completed. SAMICS cost analysis work accelerated, format A's were prepared and computer simulations completed. Design work on the automated cell interconnect station was focused on bond technique selection experiments.

  5. 3D motion picture of transparent gas flow by parallel phase-shifting digital holography

    NASA Astrophysics Data System (ADS)

    Awatsuji, Yasuhiro; Fukuda, Takahito; Wang, Yexin; Xia, Peng; Kakue, Takashi; Nishio, Kenzo; Matoba, Osamu

    2018-03-01

    Parallel phase-shifting digital holography is a technique capable of recording three-dimensional (3D) motion picture of dynamic object, quantitatively. This technique can record single hologram of an object with an image sensor having a phase-shift array device and reconstructs the instantaneous 3D image of the object with a computer. In this technique, a single hologram in which the multiple holograms required for phase-shifting digital holography are multiplexed by using space-division multiplexing technique pixel by pixel. We demonstrate 3D motion picture of dynamic and transparent gas flow recorded and reconstructed by the technique. A compressed air duster was used to generate the gas flow. A motion picture of the hologram of the gas flow was recorded at 180,000 frames/s by parallel phase-shifting digital holography. The phase motion picture of the gas flow was reconstructed from the motion picture of the hologram. The Abel inversion was applied to the phase motion picture and then the 3D motion picture of the gas flow was obtained.

  6. Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing.

    PubMed

    Palkowski, Marek; Bielecki, Wlodzimierz

    2017-06-02

    RNA secondary structure prediction is a compute intensive task that lies at the core of several search algorithms in bioinformatics. Fortunately, the RNA folding approaches, such as the Nussinov base pair maximization, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. Polyhedral compilation techniques have proven to be a powerful tool for optimization of dense array codes. However, classical affine loop nest transformations used with these techniques do not optimize effectively codes of dynamic programming of RNA structure predictions. The purpose of this paper is to present a novel approach allowing for generation of a parallel tiled Nussinov RNA loop nest exposing significantly higher performance than that of known related code. This effect is achieved due to improving code locality and calculation parallelization. In order to improve code locality, we apply our previously published technique of automatic loop nest tiling to all the three loops of the Nussinov loop nest. This approach first forms original rectangular 3D tiles and then corrects them to establish their validity by means of applying the transitive closure of a dependence graph. To produce parallel code, we apply the loop skewing technique to a tiled Nussinov loop nest. The technique is implemented as a part of the publicly available polyhedral source-to-source TRACO compiler. Generated code was run on modern Intel multi-core processors and coprocessors. We present the speed-up factor of generated Nussinov RNA parallel code and demonstrate that it is considerably faster than related codes in which only the two outer loops of the Nussinov loop nest are tiled.

  7. Development of novel series and parallel sensing system based on nanostructured surface enhanced Raman scattering substrate for biomedical application

    NASA Astrophysics Data System (ADS)

    Chang, Te-Wei

    With the advance of nanofabrication, the capability of nanoscale metallic structure fabrication opens a whole new study in nanoplasmonics, which is defined as the investigation of photon-electron interaction in the vicinity of nanoscale metallic structures. The strong oscillation of free electrons at the interface between metal and surrounding dielectric material caused by propagating surface plasmon resonance (SPR) or localized surface plasmon resonance (LSPR) enables a variety of new applications in different areas, especially biological sensing techniques. One of the promising biological sensing applications by surface resonance polariton is surface enhanced Raman spectroscopy (SERS), which significantly reinforces the feeble signal of traditional Raman scattering by at least 104 times. It enables highly sensitive and precise molecule identification with the assistance of a SERS substrate. Until now, the design of new SERS substrate fabrication process is still thriving since no dominant design has emerged yet. The ideal process should be able to achieve both a high sensitivity and low cost device in a simple and reliable way. In this thesis two promising approaches for fabricating nanostructured SERS substrate are proposed: thermal dewetting technique and nanoimprint replica technique. These two techniques are demonstrated to show the capability of fabricating high performance SERS substrate in a reliable and cost efficient fashion. In addition, these two techniques have their own unique characteristics and can be integrated with other sensing techniques to build a serial or parallel sensing system. The breakthrough of a combination system with different sensing techniques overcomes the inherent limitations of SERS detection and leverages it to a whole new level of systematic sensing. The development of a sensing platform based on thermal dewetting technique is covered as the first half of this thesis. The process optimization, selection of substrate material, and improved deposition technique are discussed in detail. Interesting phenomena have been found including the influence of Raman enhancement on substrate material selection and hot-spot rich bimetallic nanostructures by physical vapor deposition on metallic seed array, which are barely discussed in past literature but significantly affect the performance of SERS substrate. The optimized bimetallic backplane assisted resonating nanoantenna (BARNA) SERS substrate is demonstrated with the enhancement factor (EF) of 5.8 x 108 with 4.7 % relative standard deviation. By serial combination with optical focusing from nanojet effect, the nanojet and surface enhanced Raman scattering (NASERS) are proved to provide more than three orders of enhancement and enable us to perform stable, nearly single molecule detection. The second part of this thesis includes the development of a parallel dual functional nano Lycurgus cup array (nanoLCA) plasmonic device fabricated by nanoimprint replica technique. The unique configuration of the periodic nanoscale cup-shaped substrate enables a novel hybrid resonance coupling between SPR from extraordinary (EOT) and LSPR from dense sidewall metal nanoparticles with only single deposition process. The sub-50nm dense sidewall metal nanoparticles lead to high SERS performance in solution based detection, by which most biological and chemical analyses are typically performed. The SERS EF was calculated as 2.8 x 107 in a solution based environment with 10.2 % RSD, which is so far the highest reported SERS enhancement achieved with similar periodic EOT devices. In addition, plasmonic colorimetric sensing can be achieved in the very same device and the sensitivity was calculated as 796 nm/RIU with the FOM of 12.7. It creates a unique complementary sensing platform with both rapid on-site colorimetric screening and follow-up precise Raman analysis for point of care and resource limited environment applications. The implementations of bifunctional sensing on opto-microfluidic and smartphone platforms are proposed and examined here as well.

  8. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.

  9. Electro-optic Mach-Zehnder Interferometer based Optical Digital Magnitude Comparator and 1's Complement Calculator

    NASA Astrophysics Data System (ADS)

    Kumar, Ajay; Raghuwanshi, Sanjeev Kumar

    2016-06-01

    The optical switching activity is one of the most essential phenomena in the optical domain. The electro-optic effect-based switching phenomena are applicable to generate some effective combinational and sequential logic circuits. The processing of digital computational technique in the optical domain includes some considerable advantages of optical communication technology, e.g. immunity to electro-magnetic interferences, compact size, signal security, parallel computing and larger bandwidth. The paper describes some efficient technique to implement single bit magnitude comparator and 1's complement calculator using the concepts of electro-optic effect. The proposed techniques are simulated on the MATLAB software. However, the suitability of the techniques is verified using the highly reliable Opti-BPM software. It is interesting to analyze the circuits in order to specify some optimized device parameter in order to optimize some performance affecting parameters, e.g. crosstalk, extinction ratio, signal losses through the curved and straight waveguide sections.

  10. Automatic recognition of vector and parallel operations in a higher level language

    NASA Technical Reports Server (NTRS)

    Schneck, P. B.

    1971-01-01

    A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.

  11. Fluid Structure Interaction Techniques For Extrusion And Mixing Processes

    NASA Astrophysics Data System (ADS)

    Valette, Rudy; Vergnes, Bruno; Coupez, Thierry

    2007-05-01

    This work focuses on the development of numerical techniques devoted to the simulation of mixing processes of complex fluids such as twin-screw extrusion or batch mixing. In mixing process simulation, the absence of symmetry of the moving boundaries (the screws or the rotors) implies that their rigid body motion has to be taken into account by using a special treatment We therefore use a mesh immersion technique (MIT), which consists in using a P1+/P1-based (MINI-element) mixed finite element method for solving the velocity-pressure problem and then solving the problem in the whole barrel cavity by imposing a rigid motion (rotation) to nodes found located inside the so called immersed domain, each sub-domain (screw, rotor) being represented by a surface CAD mesh (or its mathematical equation in simple cases). The independent meshes are immersed into a unique background computational mesh by computing the distance function to their boundaries. Intersections of meshes are accounted for, allowing to compute a fill factor usable as for the VOF methodology. This technique, combined with the use of parallel computing, allows to compute the time-dependent flow of generalized Newtonian fluids including yield stress fluids in a complex system such as a twin screw extruder, including moving free surfaces, which are treated by a "level set" and Hamilton-Jacobi method.

  12. Singular value decomposition utilizing parallel algorithms on graphical processors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kotas, Charlotte W; Barhen, Jacob

    2011-01-01

    One of the current challenges in underwater acoustic array signal processing is the detection of quiet targets in the presence of noise. In order to enable robust detection, one of the key processing steps requires data and replica whitening. This, in turn, involves the eigen-decomposition of the sample spectral matrix, Cx = 1/K xKX(k)XH(k) where X(k) denotes a single frequency snapshot with an element for each element of the array. By employing the singular value decomposition (SVD) method, the eigenvectors and eigenvalues can be determined directly from the data without computing the sample covariance matrix, reducing the computational requirements formore » a given level of accuracy (van Trees, Optimum Array Processing). (Recall that the SVD of a complex matrix A involves determining V, , and U such that A = U VH where U and V are orthonormal and is a positive, real, diagonal matrix containing the singular values of A. U and V are the eigenvectors of AAH and AHA, respectively, while the singular values are the square roots of the eigenvalues of AAH.) Because it is desirable to be able to compute these quantities in real time, an efficient technique for computing the SVD is vital. In addition, emerging multicore processors like graphical processing units (GPUs) are bringing parallel processing capabilities to an ever increasing number of users. Since the computational tasks involved in array signal processing are well suited for parallelization, it is expected that these computations will be implemented using GPUs as soon as users have the necessary computational tools available to them. Thus, it is important to have an SVD algorithm that is suitable for these processors. This work explores the effectiveness of two different parallel SVD implementations on an NVIDIA Tesla C2050 GPU (14 multiprocessors, 32 cores per multiprocessor, 1.15 GHz clock - peed). The first algorithm is based on a two-step algorithm which bidiagonalizes the matrix using Householder transformations, and then diagonalizes the intermediate bidiagonal matrix through implicit QR shifts. This is similar to that implemented for real matrices by Lahabar and Narayanan ("Singular Value Decomposition on GPU using CUDA", IEEE International Parallel Distributed Processing Symposium 2009). The implementation is done in a hybrid manner, with the bidiagonalization stage done using the GPU while the diagonalization stage is done using the CPU, with the GPU used to update the U and V matrices. The second algorithm is based on a one-sided Jacobi scheme utilizing a sequence of pair-wise column orthogonalizations such that A is replaced by AV until the resulting matrix is sufficiently orthogonal (that is, equal to U ). V is obtained from the sequence of orthogonalizations, while can be found from the square root of the diagonal elements of AH A and, once is known, U can be found from column scaling the resulting matrix. These implementations utilize CUDA Fortran and NVIDIA's CUB LAS library. The primary goal of this study is to quantify the comparative performance of these two techniques against themselves and other standard implementations (for example, MATLAB). Considering that there is significant overhead associated with transferring data to the GPU and with synchronization between the GPU and the host CPU, it is also important to understand when it is worthwhile to use the GPU in terms of the matrix size and number of concurrent SVDs to be calculated.« less

  13. Multi-arm spectrometer for parallel frequency analysis of radio-wave signals oriented to astronomical observations

    NASA Astrophysics Data System (ADS)

    Shcherbakov, Alexandre S.; Chavez Dagostino, Miguel; Arellanes, Adan Omar; Tepichin Rodriguez, Eduardo

    2017-08-01

    We describe a potential prototype of modern spectrometer based on acousto-optical technique with three parallel optical arms for analysis of radio-wave signals specific to astronomical observations. Each optical arm exhibits original performances to provide parallel multi-band observations with different scales simultaneously. Similar multi-band instrument is able to realize measurements within various scenarios from planetary atmospheres to attractive objects in the distant Universe. The arrangement under development has two novelties. First, each optical arm represents an individual spectrum analyzer with its individual performances. Such an approach is conditioned by exploiting various materials for acousto-optical cells operating within various regimes, frequency ranges, and light wavelengths from independent light sources. Individually produced beam shapers give both the needed incident light polarization and the required apodization for light beam to increase the dynamic range of the system as a whole. After parallel acousto-optical processing, a few data flows from these optical arms are united by the joint CCD matrix on the stage of the combined extremely high-bit rate electronic data processing that provides the system performances as well. The other novelty consists in the usage of various materials for designing wide-aperture acousto-optical cells exhibiting the best performances within each of optical arms. Here, one can mention specifically selected cuts of tellurium dioxide, bastron, and lithium niobate, which overlap selected areas within the frequency range from 40 MHz to 2.0 GHz. Thus one yields the united versatile instrument for comprehensive studies of astronomical objects simultaneously with precise synchronization in various frequency ranges.

  14. Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

    NASA Astrophysics Data System (ADS)

    Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

    2016-05-01

    This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.

  15. Seamless, axially aligned, fiber tubes, meshes, microbundles and gradient biomaterial constructs

    PubMed Central

    Elia, Roberto; Firpo, Matthew A.; Kaplan, David L.; Peattie, Robert A.

    2012-01-01

    A new electrospinning apparatus was developed to generate nanofibrous materials with improved organizational control. The system functions by oscillating the deposition signal (ODS) of multiple collectors, allowing significantly improved nanofiber control by manipulating the electric field which drives the electrospinning process. Other electrospinning techniques designed to impart deposited fiber organizational control, such as rotating mandrels or parallel collector systems, do not generate seamless constructs with high quality alignment in sizes large enough for medical devices. In contrast, the ODS collection system produces deposited fiber networks with highly pure alignment in a variety of forms and sizes, including flat (8 × 8 cm2), tubular (1.3 cm diameter), or rope-like microbundle (45 μm diameter) samples. Additionally, the mechanism of our technique allows for scale-up beyond these dimensions. The ODS collection system produced 81.6 % of fibers aligned within 5° of the axial direction, nearly a four-fold improvement over the rotating mandrel technique. The meshes produced from the 9 % (w/v) fibroin/PEO blend demonstrated significant mechanical anisotropy due to the fiber alignment. In 37 °C PBS, aligned samples produced an ultimate tensile strength of 16.47 ± 1.18 MPa, a Young's modulus of 37.33 MPa, and a yield strength of 7.79 ± 1.13 MPa. The material was 300 % stiffer when extended in the direction of fiber alignment and required 20 times the amount of force to be deformed, compared to aligned meshes extended perpendicular to the fiber direction. The ODS technique could be applied to any electrospinnable polymer to overcome the more limited uniformity and induced mechanical strain of rotating mandrel techniques, and greatly surpasses the limited length of standard parallel collector techniques. PMID:22890517

  16. Massively parallel information processing systems for space applications

    NASA Technical Reports Server (NTRS)

    Schaefer, D. H.

    1979-01-01

    NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.

  17. Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data

    DOEpatents

    Grider, Gary A.; Poole, Stephen W.

    2015-09-01

    Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.

  18. High-performance computing in image registration

    NASA Astrophysics Data System (ADS)

    Zanin, Michele; Remondino, Fabio; Dalla Mura, Mauro

    2012-10-01

    Thanks to the recent technological advances, a large variety of image data is at our disposal with variable geometric, radiometric and temporal resolution. In many applications the processing of such images needs high performance computing techniques in order to deliver timely responses e.g. for rapid decisions or real-time actions. Thus, parallel or distributed computing methods, Digital Signal Processor (DSP) architectures, Graphical Processing Unit (GPU) programming and Field-Programmable Gate Array (FPGA) devices have become essential tools for the challenging issue of processing large amount of geo-data. The article focuses on the processing and registration of large datasets of terrestrial and aerial images for 3D reconstruction, diagnostic purposes and monitoring of the environment. For the image alignment procedure, sets of corresponding feature points need to be automatically extracted in order to successively compute the geometric transformation that aligns the data. The feature extraction and matching are ones of the most computationally demanding operations in the processing chain thus, a great degree of automation and speed is mandatory. The details of the implemented operations (named LARES) exploiting parallel architectures and GPU are thus presented. The innovative aspects of the implementation are (i) the effectiveness on a large variety of unorganized and complex datasets, (ii) capability to work with high-resolution images and (iii) the speed of the computations. Examples and comparisons with standard CPU processing are also reported and commented.

  19. schwimmbad: A uniform interface to parallel processing pools in Python

    NASA Astrophysics Data System (ADS)

    Price-Whelan, Adrian M.; Foreman-Mackey, Daniel

    2017-09-01

    Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).

  20. Dual-Mode Electro-Optical Techniques for Biosensing Applications: A Review

    PubMed Central

    Johnson, Steven

    2017-01-01

    The monitoring of biomolecular interactions is a key requirement for the study of complex biological processes and the diagnosis of disease. Technologies that are capable of providing label-free, real-time insight into these interactions are of great value for the scientific and clinical communities. Greater understanding of biomolecular interactions alongside increased detection accuracy can be achieved using technology that can provide parallel information about multiple parameters of a single biomolecular process. For example, electro-optical techniques combine optical and electrochemical information to provide more accurate and detailed measurements that provide unique insights into molecular structure and function. Here, we present a comparison of the main methods for electro-optical biosensing, namely, electrochemical surface plasmon resonance (EC-SPR), electrochemical optical waveguide lightmode spectroscopy (EC-OWLS), and the recently reported silicon-based electrophotonic approach. The comparison considers different application spaces, such as the detection of low concentrations of biomolecules, integration, the tailoring of light-matter interaction for the understanding of biomolecular processes, and 2D imaging of biointeractions on a surface. PMID:28880211

  1. Trajectory Optimization for Missions to Small Bodies with a Focus on Scientific Merit.

    PubMed

    Englander, Jacob A; Vavrina, Matthew A; Lim, Lucy F; McFadden, Lucy A; Rhoden, Alyssa R; Noll, Keith S

    2017-01-01

    Trajectory design for missions to small bodies is tightly coupled both with the selection of targets for a mission and with the choice of spacecraft power, propulsion, and other hardware. Traditional methods of trajectory optimization have focused on finding the optimal trajectory for an a priori selection of destinations and spacecraft parameters. Recent research has expanded the field of trajectory optimization to multidisciplinary systems optimization that includes spacecraft parameters. The logical next step is to extend the optimization process to include target selection based not only on engineering figures of merit but also scientific value. This paper presents a new technique to solve the multidisciplinary mission optimization problem for small-bodies missions, including classical trajectory design, the choice of spacecraft power and propulsion systems, and also the scientific value of the targets. This technique, when combined with modern parallel computers, enables a holistic view of the small body mission design process that previously required iteration among several different design processes.

  2. Dual-Mode Electro-Optical Techniques for Biosensing Applications: A Review.

    PubMed

    Juan-Colás, José; Johnson, Steven; Krauss, Thomas F

    2017-09-07

    The monitoring of biomolecular interactions is a key requirement for the study of complex biological processes and the diagnosis of disease. Technologies that are capable of providing label-free, real-time insight into these interactions are of great value for the scientific and clinical communities. Greater understanding of biomolecular interactions alongside increased detection accuracy can be achieved using technology that can provide parallel information about multiple parameters of a single biomolecular process. For example, electro-optical techniques combine optical and electrochemical information to provide more accurate and detailed measurements that provide unique insights into molecular structure and function. Here, we present a comparison of the main methods for electro-optical biosensing, namely, electrochemical surface plasmon resonance (EC-SPR), electrochemical optical waveguide lightmode spectroscopy (EC-OWLS), and the recently reported silicon-based electrophotonic approach. The comparison considers different application spaces, such as the detection of low concentrations of biomolecules, integration, the tailoring of light-matter interaction for the understanding of biomolecular processes, and 2D imaging of biointeractions on a surface.

  3. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  4. Very large scale characterization of graphene mechanical devices using a colorimetry technique.

    PubMed

    Cartamil-Bueno, Santiago Jose; Centeno, Alba; Zurutuza, Amaia; Steeneken, Peter Gerard; van der Zant, Herre Sjoerd Jan; Houri, Samer

    2017-06-08

    We use a scalable optical technique to characterize more than 21 000 circular nanomechanical devices made of suspended single- and double-layer graphene on cavities with different diameters (D) and depths (g). To maximize the contrast between suspended and broken membranes we used a model for selecting the optimal color filter. The method enables parallel and automatized image processing for yield statistics. We find the survival probability to be correlated with a structural mechanics scaling parameter given by D 4 /g 3 . Moreover, we extract a median adhesion energy of Γ = 0.9 J m -2 between the membrane and the native SiO 2 at the bottom of the cavities.

  5. Connectionism, parallel constraint satisfaction processes, and gestalt principles: (re) introducing cognitive dynamics to social psychology.

    PubMed

    Read, S J; Vanman, E J; Miller, L C

    1997-01-01

    We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.

  6. Using Parallel Processing for Problem Solving.

    DTIC Science & Technology

    1979-12-01

    are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language

  7. Lamellar boundary alignment of DS-processed TiAl-W alloys by a solidification procedure

    NASA Astrophysics Data System (ADS)

    Jung, In-Soo; Oh, Myung-Hoon; Park, No-Jin; Kumar, K. Sharvan; Wee, Dang-Moon

    2007-12-01

    In this study, a β solidification procedure was used to align the lamellae in a Ti-47Al-2W (at.%) alloy parallel to the growth direction. The Bridgman technique and the floating zone process were used for directional solidification. The mechanical properties of the directionally solidified alloy were evaluated in tension at room temperature and at 800°C. At a growth rate of 30 mm/h (with the floating zone approach), the lamellae were well aligned parallel to the growth direction. The aligned lamellae yielded excellent room temperature tensile ductility. The tensile yield strength at 800°C was similar to that at room temperature. The orientation of the γ lamellar laths in the directionally solidified ingots, which were manufactured by means of a floating zone process, was identified with the aid of electron backscattered diffraction analysis. On the basis of this analysis, the preferred growth direction of the bcc-β dendrites that formed at high temperatures close to the melting point was inferred to be [001]β at a growth rate of 30 mm/h and [111]β at a growth rate of 90 mm/h.

  8. Magnetohydrodynamics with GAMER

    NASA Astrophysics Data System (ADS)

    Zhang, Ui-Han; Schive, Hsi-Yu; Chiueh, Tzihong

    2018-06-01

    GAMER, a parallel Graphic-processing-unit-accelerated Adaptive-MEsh-Refinement (AMR) hydrodynamic code, has been extended to support magnetohydrodynamics (MHD) with both the corner-transport-upwind and MUSCL-Hancock schemes and the constraint transport technique. The divergent preserving operator for AMR has been applied to reinforce the divergence-free constraint on the magnetic field. GAMER-MHD has fully exploited the concurrent executions between the graphic process unit (GPU) MHD solver and other central processing unit computation pertinent to AMR. We perform various standard tests to demonstrate that GAMER-MHD is both second-order accurate and robust, producing results as accurate as those given by high-resolution uniform-grid runs. We also explore a new 3D MHD test, where the magnetic field assumes the Arnold–Beltrami–Childress configuration, temporarily becomes turbulent with current sheets, and finally settles to a lowest-energy equilibrium state. This 3D problem is adopted for the performance test of GAMER-MHD. The single-GPU performance reaches 1.2 × 108 and 5.5 × 107 cell updates per second for the single- and double-precision calculations, respectively, on Tesla P100. We also demonstrate a parallel efficiency of ∼70% for both weak and strong scaling using 1024 XK nodes on the Blue Waters supercomputers.

  9. Probing amyloid fibril formation of the NFGAIL peptide by computer simulations

    NASA Astrophysics Data System (ADS)

    Melquiond, Adrien; Gelly, Jean-Christophe; Mousseau, Normand; Derreumaux, Philippe

    2007-02-01

    Amyloid fibril formation, as observed in Alzheimer's disease and type II diabetes, is currently described by a nucleation-condensation mechanism, but the details of the process preceding the formation of the nucleus are still lacking. In this study, using an activation-relaxation technique coupled to a generic energy model, we explore the aggregation pathways of 12 chains of the hexapeptide NFGAIL. The simulations show, starting from a preformed parallel dimer and ten disordered chains, that the peptides form essentially amorphous oligomers or more rarely ordered β-sheet structures where the peptides adopt a parallel orientation within the sheets. Comparison between the simulations indicates that a dimer is not a sufficient seed for avoiding amorphous aggregates and that there is a critical threshold in the number of connections between the chains above which exploration of amorphous aggregates is preferred.

  10. Initial-stage examination of a testbed for the big data transfer over parallel links. The SDN approach

    NASA Astrophysics Data System (ADS)

    Khoruzhnikov, S. E.; Grudinin, V. A.; Sadov, O. L.; Shevel, A. E.; Titov, V. B.; Kairkanov, A. B.

    2015-04-01

    The transfer of Big Data over a computer network is an important and unavoidable operation in the past, present, and in any feasible future. A large variety of astronomical projects produces the Big Data. There are a number of methods to transfer the data over a global computer network (Internet) with a range of tools. In this paper we consider the transfer of one piece of Big Data from one point in the Internet to another, in general over a long-range distance: many thousand kilometers. Several free of charge systems to transfer the Big Data are analyzed here. The most important architecture features are emphasized, and the idea is discussed to add the SDN OpenFlow protocol technique for fine-grain tuning of the data transfer process over several parallel data links.

  11. The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

    1997-01-01

    Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.

  12. CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation

    NASA Astrophysics Data System (ADS)

    Schneider, Evan E.; Robertson, Brant E.

    2015-04-01

    We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.

  13. Aerodynamic Design of Complex Configurations Using Cartesian Methods and CAD Geometry

    NASA Technical Reports Server (NTRS)

    Nemec, Marian; Aftosmis, Michael J.; Pulliam, Thomas H.

    2003-01-01

    The objective for this paper is to present the development of an optimization capability for the Cartesian inviscid-flow analysis package of Aftosmis et al. We evaluate and characterize the following modules within the new optimization framework: (1) A component-based geometry parameterization approach using a CAD solid representation and the CAPRI interface. (2) The use of Cartesian methods in the development Optimization techniques using a genetic algorithm. The discussion and investigations focus on several real world problems of the optimization process. We examine the architectural issues associated with the deployment of a CAD-based design approach in a heterogeneous parallel computing environment that contains both CAD workstations and dedicated compute nodes. In addition, we study the influence of noise on the performance of optimization techniques, and the overall efficiency of the optimization process for aerodynamic design of complex three-dimensional configurations. of automated optimization tools. rithm and a gradient-based algorithm.

  14. Simulation of hypersonic rarefied flows with the immersed-boundary method

    NASA Astrophysics Data System (ADS)

    Bruno, D.; De Palma, P.; de Tullio, M. D.

    2011-05-01

    This paper provides a validation of an immersed boundary method for computing hypersonic rarefied gas flows. The method is based on the solution of the Navier-Stokes equation and is validated versus numerical results obtained by the DSMC approach. The Navier-Stokes solver employs a flexible local grid refinement technique and is implemented on parallel machines using a domain-decomposition approach. Thanks to the efficient grid generation process, based on the ray-tracing technique, and the use of the METIS software, it is possible to obtain the partitioned grids to be assigned to each processor with a minimal effort by the user. This allows one to by-pass the expensive (in terms of time and human resources) classical generation process of a body fitted grid. First-order slip-velocity boundary conditions are employed and tested for taking into account rarefied gas effects.

  15. Single Molecule Enzymology via Nanoelectronic Circuits

    NASA Astrophysics Data System (ADS)

    Collins, Philip

    Traditional single-molecule techniques rely on fluorescence or force transduction to monitor conformational changes and biochemical activity. Recent demonstrations of single-molecule monitoring with electronic transistors are poised to add to the single-molecule research toolkit. The transistor-based technique is sensitive to the motion of single charged side chain residues and can transduce those motions with microsecond resolution, opening the doors to single-molecule enzymology with unprecedented resolution. Furthermore, the solid-state platform provides opportunities for parallelization in arrays and long-duration monitoring of one molecule's activity or processivity, all without the limitations caused by photo-oxidation or mutagenic fluorophore incorporation. This presentation will review some of these advantages and their particular application to DNA polymerase I processing single-stranded DNA templates. This research was supported financially by the NIH NCI (R01 CA133592-01), the NIH NIGMS (1R01GM106957-01) and the NSF (DMR-1104629 and ECCS-1231910).

  16. Scan line graphics generation on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1988-01-01

    Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.

  17. A parallel computing engine for a class of time critical processes.

    PubMed

    Nabhan, T M; Zomaya, A Y

    1997-01-01

    This paper focuses on the efficient parallel implementation of systems of numerically intensive nature over loosely coupled multiprocessor architectures. These analytical models are of significant importance to many real-time systems that have to meet severe time constants. A parallel computing engine (PCE) has been developed in this work for the efficient simplification and the near optimal scheduling of numerical models over the different cooperating processors of the parallel computer. First, the analytical system is efficiently coded in its general form. The model is then simplified by using any available information (e.g., constant parameters). A task graph representing the interconnections among the different components (or equations) is generated. The graph can then be compressed to control the computation/communication requirements. The task scheduler employs a graph-based iterative scheme, based on the simulated annealing algorithm, to map the vertices of the task graph onto a Multiple-Instruction-stream Multiple-Data-stream (MIMD) type of architecture. The algorithm uses a nonanalytical cost function that properly considers the computation capability of the processors, the network topology, the communication time, and congestion possibilities. Moreover, the proposed technique is simple, flexible, and computationally viable. The efficiency of the algorithm is demonstrated by two case studies with good results.

  18. Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue

    NASA Astrophysics Data System (ADS)

    Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.

    2017-11-01

    The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).

  19. Image Processing Using a Parallel Architecture.

    DTIC Science & Technology

    1987-12-01

    ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than

  20. An Abstract Systolic Model and Its Application to the Design of Finite Element Systems.

    DTIC Science & Technology

    1983-01-01

    networks as a collection of communicating. parallel :.,’-.processes, some of the techniques for the verification of distributed systems ,.woi (see for...item must be collected . even If there is no Interest In its value. In this case. the collection of the data is simply achieved by changing the state of...the appropriate data as well as for collecting the output data and performing some additional tasks that we will discuss later. A basic functional

  1. Unstructured mesh algorithms for aerodynamic calculations

    NASA Technical Reports Server (NTRS)

    Mavriplis, D. J.

    1992-01-01

    The use of unstructured mesh techniques for solving complex aerodynamic flows is discussed. The principle advantages of unstructured mesh strategies, as they relate to complex geometries, adaptive meshing capabilities, and parallel processing are emphasized. The various aspects required for the efficient and accurate solution of aerodynamic flows are addressed. These include mesh generation, mesh adaptivity, solution algorithms, convergence acceleration, and turbulence modeling. Computations of viscous turbulent two-dimensional flows and inviscid three-dimensional flows about complex configurations are demonstrated. Remaining obstacles and directions for future research are also outlined.

  2. Communication-Efficient Arbitration Models for Low-Resolution Data Flow Computing

    DTIC Science & Technology

    1988-12-01

    phase can be formally described as follows: Graph Partitioning Problem NP-complete: (Garey & Johnson) Given graph G = (V, E), weights w (v) for each v e V...Technical Report, MIT/LCS/TR-218, Cambridge, Mass. Agerwala, Tilak, February 1982, "Data Flow Systems", Computer, pp. 10-13. Babb, Robert G ., July 1984...34Parallel Processing with Large-Grain Data Flow Techniques," IEEE Computer 17, 7, pp. 55-61. Babb, Robert G ., II, Lise Storc, and William C. Ragsdale

  3. ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

    PubMed

    Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

    2018-04-27

    A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.

  4. Design of a dataway processor for a parallel image signal processing system

    NASA Astrophysics Data System (ADS)

    Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

    1995-04-01

    Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.

  5. Accelerating image recognition on mobile devices using GPGPU

    NASA Astrophysics Data System (ADS)

    Bordallo López, Miguel; Nykänen, Henri; Hannuksela, Jari; Silvén, Olli; Vehviläinen, Markku

    2011-01-01

    The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators with programmable pipelines are available, enabling the GPGPU implementation of several image processing algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the challenges of designing on a mobile platform, present the performance achieved and provide measurement results for the actual power consumption in comparison to using the CPU (ARM) on the same platform.

  6. Management of Transjugular Intrahepatic Portosystemic Shunt (TIPS)-associated Refractory Hepatic Encephalopathy by Shunt Reduction Using the Parallel Technique: Outcomes of a Retrospective Case Series

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cookson, Daniel T., E-mail: danielthomascookson@yahoo.co.uk; Zaman, Zubayr; Gordon-Smith, James

    2011-02-15

    Purpose: To investigate the reproducibility and technical and clinical success of the parallel technique of transjugular intrahepatic portosystemic shunt (TIPS) reduction in the management of refractory hepatic encephalopathy (HE). Materials and Methods: A 10-mm-diameter self-expanding stent graft and a 5-6-mm-diameter balloon-expandable stent were placed in parallel inside the existing TIPS in 8 patients via a dual unilateral transjugular approach. Changes in portosystemic pressure gradient and HE grade were used as primary end points. Results: TIPS reduction was technically successful in all patients. Mean {+-} standard deviation portosystemic pressure gradient before and after shunt reduction was 4.9 {+-} 3.6 mmHg (range,more » 0-12 mmHg) and 10.5 {+-} 3.9 mmHg (range, 6-18 mmHg). Duration of follow-up was 137 {+-} 117.8 days (range, 18-326 days). Clinical improvement of HE occurred in 5 patients (62.5%) with resolution of HE in 4 patients (50%). Single episodes of recurrent gastrointestinal hemorrhage occurred in 3 patients (37.5%). These were self-limiting in 2 cases and successfully managed in 1 case by correction of coagulopathy and blood transfusion. Two of these patients (25%) died, one each of renal failure and hepatorenal failure. Conclusion: The parallel technique of TIPS reduction is reproducible and has a high technical success rate. A dual unilateral transjugular approach is advantageous when performing this procedure. The parallel technique allows repeat bidirectional TIPS adjustment and may be of significant clinical benefit in the management of refractory HE.« less

  7. Genten: Software for Generalized Tensor Decompositions v. 1.0.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Phipps, Eric T.; Kolda, Tamara G.; Dunlavy, Daniel

    Tensors, or multidimensional arrays, are a powerful mathematical means of describing multiway data. This software provides computational means for decomposing or approximating a given tensor in terms of smaller tensors of lower dimension, focusing on decomposition of large, sparse tensors. These techniques have applications in many scientific areas, including signal processing, linear algebra, computer vision, numerical analysis, data mining, graph analysis, neuroscience and more. The software is designed to take advantage of parallelism present emerging computer architectures such has multi-core CPUs, many-core accelerators such as the Intel Xeon Phi, and computation-oriented GPUs to enable efficient processing of large tensors.

  8. Computational study of noise in a large signal transduction network.

    PubMed

    Intosalmi, Jukka; Manninen, Tiina; Ruohonen, Keijo; Linne, Marja-Leena

    2011-06-21

    Biochemical systems are inherently noisy due to the discrete reaction events that occur in a random manner. Although noise is often perceived as a disturbing factor, the system might actually benefit from it. In order to understand the role of noise better, its quality must be studied in a quantitative manner. Computational analysis and modeling play an essential role in this demanding endeavor. We implemented a large nonlinear signal transduction network combining protein kinase C, mitogen-activated protein kinase, phospholipase A2, and β isoform of phospholipase C networks. We simulated the network in 300 different cellular volumes using the exact Gillespie stochastic simulation algorithm and analyzed the results in both the time and frequency domain. In order to perform simulations in a reasonable time, we used modern parallel computing techniques. The analysis revealed that time and frequency domain characteristics depend on the system volume. The simulation results also indicated that there are several kinds of noise processes in the network, all of them representing different kinds of low-frequency fluctuations. In the simulations, the power of noise decreased on all frequencies when the system volume was increased. We concluded that basic frequency domain techniques can be applied to the analysis of simulation results produced by the Gillespie stochastic simulation algorithm. This approach is suited not only to the study of fluctuations but also to the study of pure noise processes. Noise seems to have an important role in biochemical systems and its properties can be numerically studied by simulating the reacting system in different cellular volumes. Parallel computing techniques make it possible to run massive simulations in hundreds of volumes and, as a result, accurate statistics can be obtained from computational studies. © 2011 Intosalmi et al; licensee BioMed Central Ltd.

  9. Search asymmetries: parallel processing of uncertain sensory information.

    PubMed

    Vincent, Benjamin T

    2011-08-01

    What is the mechanism underlying search phenomena such as search asymmetry? Two-stage models such as Feature Integration Theory and Guided Search propose parallel pre-attentive processing followed by serial post-attentive processing. They claim search asymmetry effects are indicative of finding pairs of features, one processed in parallel, the other in serial. An alternative proposal is that a 1-stage parallel process is responsible, and search asymmetries occur when one stimulus has greater internal uncertainty associated with it than another. While the latter account is simpler, only a few studies have set out to empirically test its quantitative predictions, and many researchers still subscribe to the 2-stage account. This paper examines three separate parallel models (Bayesian optimal observer, max rule, and a heuristic decision rule). All three parallel models can account for search asymmetry effects and I conclude that either people can optimally utilise the uncertain sensory data available to them, or are able to select heuristic decision rules which approximate optimal performance. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Aerostructural analysis and design optimization of composite aircraft

    NASA Astrophysics Data System (ADS)

    Kennedy, Graeme James

    High-performance composite materials exhibit both anisotropic strength and stiffness properties. These anisotropic properties can be used to produce highly-tailored aircraft structures that meet stringent performance requirements, but these properties also present unique challenges for analysis and design. New tools and techniques are developed to address some of these important challenges. A homogenization-based theory for beams is developed to accurately predict the through-thickness stress and strain distribution in thick composite beams. Numerical comparisons demonstrate that the proposed beam theory can be used to obtain highly accurate results in up to three orders of magnitude less computational time than three-dimensional calculations. Due to the large finite-element model requirements for thin composite structures used in aerospace applications, parallel solution methods are explored. A parallel direct Schur factorization method is developed. The parallel scalability of the direct Schur approach is demonstrated for a large finite-element problem with over 5 million unknowns. In order to address manufacturing design requirements, a novel laminate parametrization technique is presented that takes into account the discrete nature of the ply-angle variables, and ply-contiguity constraints. This parametrization technique is demonstrated on a series of structural optimization problems including compliance minimization of a plate, buckling design of a stiffened panel and layup design of a full aircraft wing. The design and analysis of composite structures for aircraft is not a stand-alone problem and cannot be performed without multidisciplinary considerations. A gradient-based aerostructural design optimization framework is presented that partitions the disciplines into distinct process groups. An approximate Newton-Krylov method is shown to be an efficient aerostructural solution algorithm and excellent parallel scalability of the algorithm is demonstrated. An induced drag optimization study is performed to compare the trade-off between wing weight and induced drag for wing tip extensions, raked wing tips and winglets. The results demonstrate that it is possible to achieve a 43% induced drag reduction with no weight penalty, a 28% induced drag reduction with a 10% wing weight reduction, or a 20% wing weight reduction with a 5% induced drag penalty from a baseline wing obtained from a structural mass-minimization problem with fixed aerodynamic loads.

  11. 77 FR 47573 - Approval and Promulgation of Implementation Plans; Mississippi; 110(a)(2)(E)(ii) Infrastructure...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-08-09

    ... Mississippi Department of Environmental Quality (MDEQ), on July 13, 2012, for parallel processing. This... of Contents I. What is parallel processing? II. Background III. What elements are required under... Executive Order Reviews I. What is parallel processing? Consistent with EPA regulations found at 40 CFR Part...

  12. Double Take: Parallel Processing by the Cerebral Hemispheres Reduces Attentional Blink

    ERIC Educational Resources Information Center

    Scalf, Paige E.; Banich, Marie T.; Kramer, Arthur F.; Narechania, Kunjan; Simon, Clarissa D.

    2007-01-01

    Recent data have shown that parallel processing by the cerebral hemispheres can expand the capacity of visual working memory for spatial locations (J. F. Delvenne, 2005) and attentional tracking (G. A. Alvarez & P. Cavanagh, 2005). Evidence that parallel processing by the cerebral hemispheres can improve item identification has remained elusive.…

  13. Implementation of digital equality comparator circuit on memristive memory crossbar array using material implication logic

    NASA Astrophysics Data System (ADS)

    Haron, Adib; Mahdzair, Fazren; Luqman, Anas; Osman, Nazmie; Junid, Syed Abdul Mutalib Al

    2018-03-01

    One of the most significant constraints of Von Neumann architecture is the limited bandwidth between memory and processor. The cost to move data back and forth between memory and processor is considerably higher than the computation in the processor itself. This architecture significantly impacts the Big Data and data-intensive application such as DNA analysis comparison which spend most of the processing time to move data. Recently, the in-memory processing concept was proposed, which is based on the capability to perform the logic operation on the physical memory structure using a crossbar topology and non-volatile resistive-switching memristor technology. This paper proposes a scheme to map digital equality comparator circuit on memristive memory crossbar array. The 2-bit, 4-bit, 8-bit, 16-bit, 32-bit, and 64-bit of equality comparator circuit are mapped on memristive memory crossbar array by using material implication logic in a sequential and parallel method. The simulation results show that, for the 64-bit word size, the parallel mapping exhibits 2.8× better performance in total execution time than sequential mapping but has a trade-off in terms of energy consumption and area utilization. Meanwhile, the total crossbar area can be reduced by 1.2× for sequential mapping and 1.5× for parallel mapping both by using the overlapping technique.

  14. On the costs of parallel processing in dual-task performance: The case of lexical processing in word production.

    PubMed

    Paucke, Madlen; Oppermann, Frank; Koch, Iring; Jescheniak, Jörg D

    2015-12-01

    Previous dual-task picture-naming studies suggest that lexical processes require capacity-limited processes and prevent other tasks to be carried out in parallel. However, studies involving the processing of multiple pictures suggest that parallel lexical processing is possible. The present study investigated the specific costs that may arise when such parallel processing occurs. We used a novel dual-task paradigm by presenting 2 visual objects associated with different tasks and manipulating between-task similarity. With high similarity, a picture-naming task (T1) was combined with a phoneme-decision task (T2), so that lexical processes were shared across tasks. With low similarity, picture-naming was combined with a size-decision T2 (nonshared lexical processes). In Experiment 1, we found that a manipulation of lexical processes (lexical frequency of T1 object name) showed an additive propagation with low between-task similarity and an overadditive propagation with high between-task similarity. Experiment 2 replicated this differential forward propagation of the lexical effect and showed that it disappeared with longer stimulus onset asynchronies. Moreover, both experiments showed backward crosstalk, indexed as worse T1 performance with high between-task similarity compared with low similarity. Together, these findings suggest that conditions of high between-task similarity can lead to parallel lexical processing in both tasks, which, however, does not result in benefits but rather in extra performance costs. These costs can be attributed to crosstalk based on the dual-task binding problem arising from parallel processing. Hence, the present study reveals that capacity-limited lexical processing can run in parallel across dual tasks but only at the expense of extraordinary high costs. (c) 2015 APA, all rights reserved).

  15. A Parallel Genetic Algorithm for Automated Electronic Circuit Design

    NASA Technical Reports Server (NTRS)

    Long, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris

    2000-01-01

    Parallelized versions of genetic algorithms (GAs) are popular primarily for three reasons: the GA is an inherently parallel algorithm, typical GA applications are very compute intensive, and powerful computing platforms, especially Beowulf-style computing clusters, are becoming more affordable and easier to implement. In addition, the low communication bandwidth required allows the use of inexpensive networking hardware such as standard office ethernet. In this paper we describe a parallel GA and its use in automated high-level circuit design. Genetic algorithms are a type of trial-and-error search technique that are guided by principles of Darwinian evolution. Just as the genetic material of two living organisms can intermix to produce offspring that are better adapted to their environment, GAs expose genetic material, frequently strings of 1s and Os, to the forces of artificial evolution: selection, mutation, recombination, etc. GAs start with a pool of randomly-generated candidate solutions which are then tested and scored with respect to their utility. Solutions are then bred by probabilistically selecting high quality parents and recombining their genetic representations to produce offspring solutions. Offspring are typically subjected to a small amount of random mutation. After a pool of offspring is produced, this process iterates until a satisfactory solution is found or an iteration limit is reached. Genetic algorithms have been applied to a wide variety of problems in many fields, including chemistry, biology, and many engineering disciplines. There are many styles of parallelism used in implementing parallel GAs. One such method is called the master-slave or processor farm approach. In this technique, slave nodes are used solely to compute fitness evaluations (the most time consuming part). The master processor collects fitness scores from the nodes and performs the genetic operators (selection, reproduction, variation, etc.). Because of dependency issues in the GA, it is possible to have idle processors. However, as long as the load at each processing node is similar, the processors are kept busy nearly all of the time. In applying GAs to circuit design, a suitable genetic representation 'is that of a circuit-construction program. We discuss one such circuit-construction programming language and show how evolution can generate useful analog circuit designs. This language has the desirable property that virtually all sets of combinations of primitives result in valid circuit graphs. Our system allows circuit size (number of devices), circuit topology, and device values to be evolved. Using a parallel genetic algorithm and circuit simulation software, we present experimental results as applied to three analog filter and two amplifier design tasks. For example, a figure shows an 85 dB amplifier design evolved by our system, and another figure shows the performance of that circuit (gain and frequency response). In all tasks, our system is able to generate circuits that achieve the target specifications.

  16. Observations of large parallel electric fields in the auroral ionosphere

    NASA Technical Reports Server (NTRS)

    Mozer, F. S.

    1976-01-01

    Rocket borne measurements employing a double probe technique were used to gather evidence for the existence of electric fields in the auroral ionosphere having components parallel to the magnetic field direction. An analysis of possible experimental errors leads to the conclusion that no known uncertainties can account for the roughly 10 mV/m parallel electric fields that are observed.

  17. Graphical Representation of Parallel Algorithmic Processes

    DTIC Science & Technology

    1990-12-01

    interface with the AAARF main process . The source code for the AAARF class-common library is in the common subdi- rectory and consists of the following files... for public release; distribution unlimited AFIT/GCE/ENG/90D-07 Graphical Representation of Parallel Algorithmic Processes THESIS Presented to the...goal of this study is to develop an algorithm animation facility for parallel processes executing on different architectures, from multiprocessor

  18. Parallel gene analysis with allele-specific padlock probes and tag microarrays

    PubMed Central

    Banér, Johan; Isaksson, Anders; Waldenström, Erik; Jarvius, Jonas; Landegren, Ulf; Nilsson, Mats

    2003-01-01

    Parallel, highly specific analysis methods are required to take advantage of the extensive information about DNA sequence variation and of expressed sequences. We present a scalable laboratory technique suitable to analyze numerous target sequences in multiplexed assays. Sets of padlock probes were applied to analyze single nucleotide variation directly in total genomic DNA or cDNA for parallel genotyping or gene expression analysis. All reacted probes were then co-amplified and identified by hybridization to a standard tag oligonucleotide array. The technique was illustrated by analyzing normal and pathogenic variation within the Wilson disease-related ATP7B gene, both at the level of DNA and RNA, using allele-specific padlock probes. PMID:12930977

  19. Parallel compression of data chunks of a shared data object using a log-structured file system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2016-10-25

    Techniques are provided for parallel compression of data chunks being written to a shared object. A client executing on a compute node or a burst buffer node in a parallel computing system stores a data chunk generated by the parallel computing system to a shared data object on a storage node by compressing the data chunk; and providing the data compressed data chunk to the storage node that stores the shared object. The client and storage node may employ Log-Structured File techniques. The compressed data chunk can be de-compressed by the client when the data chunk is read. A storagemore » node stores a data chunk as part of a shared object by receiving a compressed version of the data chunk from a compute node; and storing the compressed version of the data chunk to the shared data object on the storage node.« less

  20. Using Unified Modelling Language (UML) as a process-modelling technique for clinical-research process improvement.

    PubMed

    Kumarapeli, P; De Lusignan, S; Ellis, T; Jones, B

    2007-03-01

    The Primary Care Data Quality programme (PCDQ) is a quality-improvement programme which processes routinely collected general practice computer data. Patient data collected from a wide range of different brands of clinical computer systems are aggregated, processed, and fed back to practices in an educational context to improve the quality of care. Process modelling is a well-established approach used to gain understanding and systematic appraisal, and identify areas of improvement of a business process. Unified modelling language (UML) is a general purpose modelling technique used for this purpose. We used UML to appraise the PCDQ process to see if the efficiency and predictability of the process could be improved. Activity analysis and thinking-aloud sessions were used to collect data to generate UML diagrams. The UML model highlighted the sequential nature of the current process as a barrier for efficiency gains. It also identified the uneven distribution of process controls, lack of symmetric communication channels, critical dependencies among processing stages, and failure to implement all the lessons learned in the piloting phase. It also suggested that improved structured reporting at each stage - especially from the pilot phase, parallel processing of data and correctly positioned process controls - should improve the efficiency and predictability of research projects. Process modelling provided a rational basis for the critical appraisal of a clinical data processing system; its potential maybe underutilized within health care.

  1. Log-less metadata management on metadata server for parallel file systems.

    PubMed

    Liao, Jianwei; Xiao, Guoqiang; Peng, Xiaoning

    2014-01-01

    This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.

  2. Log-Less Metadata Management on Metadata Server for Parallel File Systems

    PubMed Central

    Xiao, Guoqiang; Peng, Xiaoning

    2014-01-01

    This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally. PMID:24892093

  3. Use Hierarchical Storage and Analysis to Exploit Intrinsic Parallelism

    NASA Astrophysics Data System (ADS)

    Zender, C. S.; Wang, W.; Vicente, P.

    2013-12-01

    Big Data is an ugly name for the scientific opportunities and challenges created by the growing wealth of geoscience data. How to weave large, disparate datasets together to best reveal their underlying properties, to exploit their strengths and minimize their weaknesses, to continually aggregate more information than the world knew yesterday and less than we will learn tomorrow? Data analytics techniques (statistics, data mining, machine learning, etc.) can accelerate pattern recognition and discovery. However, often researchers must, prior to analysis, organize multiple related datasets into a coherent framework. Hierarchical organization permits entire dataset to be stored in nested groups that reflect their intrinsic relationships and similarities. Hierarchical data can be simpler and faster to analyze by coding operators to automatically parallelize processes over isomorphic storage units, i.e., groups. The newest generation of netCDF Operators (NCO) embody this hierarchical approach, while still supporting traditional analysis approaches. We will use NCO to demonstrate the trade-offs involved in processing a prototypical Big Data application (analysis of CMIP5 datasets) using hierarchical and traditional analysis approaches.

  4. Embedded Implementation of VHR Satellite Image Segmentation

    PubMed Central

    Li, Chao; Balla-Arabé, Souleymane; Ginhac, Dominique; Yang, Fan

    2016-01-01

    Processing and analysis of Very High Resolution (VHR) satellite images provide a mass of crucial information, which can be used for urban planning, security issues or environmental monitoring. However, they are computationally expensive and, thus, time consuming, while some of the applications, such as natural disaster monitoring and prevention, require high efficiency performance. Fortunately, parallel computing techniques and embedded systems have made great progress in recent years, and a series of massively parallel image processing devices, such as digital signal processors or Field Programmable Gate Arrays (FPGAs), have been made available to engineers at a very convenient price and demonstrate significant advantages in terms of running-cost, embeddability, power consumption flexibility, etc. In this work, we designed a texture region segmentation method for very high resolution satellite images by using the level set algorithm and the multi-kernel theory in a high-abstraction C environment and realize its register-transfer level implementation with the help of a new proposed high-level synthesis-based design flow. The evaluation experiments demonstrate that the proposed design can produce high quality image segmentation with a significant running-cost advantage. PMID:27240370

  5. Acceleration and sensitivity analysis of lattice kinetic Monte Carlo simulations using parallel processing and rate constant rescaling

    NASA Astrophysics Data System (ADS)

    Núñez, M.; Robie, T.; Vlachos, D. G.

    2017-10-01

    Kinetic Monte Carlo (KMC) simulation provides insights into catalytic reactions unobtainable with either experiments or mean-field microkinetic models. Sensitivity analysis of KMC models assesses the robustness of the predictions to parametric perturbations and identifies rate determining steps in a chemical reaction network. Stiffness in the chemical reaction network, a ubiquitous feature, demands lengthy run times for KMC models and renders efficient sensitivity analysis based on the likelihood ratio method unusable. We address the challenge of efficiently conducting KMC simulations and performing accurate sensitivity analysis in systems with unknown time scales by employing two acceleration techniques: rate constant rescaling and parallel processing. We develop statistical criteria that ensure sufficient sampling of non-equilibrium steady state conditions. Our approach provides the twofold benefit of accelerating the simulation itself and enabling likelihood ratio sensitivity analysis, which provides further speedup relative to finite difference sensitivity analysis. As a result, the likelihood ratio method can be applied to real chemistry. We apply our methodology to the water-gas shift reaction on Pt(111).

  6. Dual Super-Systolic Core for Real-Time Reconstructive Algorithms of High-Resolution Radar/SAR Imaging Systems

    PubMed Central

    Atoche, Alejandro Castillo; Castillo, Javier Vázquez

    2012-01-01

    A high-speed dual super-systolic core for reconstructive signal processing (SP) operations consists of a double parallel systolic array (SA) machine in which each processing element of the array is also conceptualized as another SA in a bit-level fashion. In this study, we addressed the design of a high-speed dual super-systolic array (SSA) core for the enhancement/reconstruction of remote sensing (RS) imaging of radar/synthetic aperture radar (SAR) sensor systems. The selected reconstructive SP algorithms are efficiently transformed in their parallel representation and then, they are mapped into an efficient high performance embedded computing (HPEC) architecture in reconfigurable Xilinx field programmable gate array (FPGA) platforms. As an implementation test case, the proposed approach was aggregated in a HW/SW co-design scheme in order to solve the nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) from a remotely sensed scene. We show how such dual SSA core, drastically reduces the computational load of complex RS regularization techniques achieving the required real-time operational mode. PMID:22736964

  7. Parallel fuzzy connected image segmentation on GPU

    PubMed Central

    Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

    2011-01-01

    Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA’s compute unified device Architecture (cuda) platform for segmenting medical image data sets. Methods: In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as cuda kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Results: Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. Conclusions: The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set. PMID:21859037

  8. Parallel fuzzy connected image segmentation on GPU.

    PubMed

    Zhuge, Ying; Cao, Yong; Udupa, Jayaram K; Miller, Robert W

    2011-07-01

    Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA's compute unified device Architecture (CUDA) platform for segmenting medical image data sets. In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as CUDA kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set.

  9. High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

    NASA Astrophysics Data System (ADS)

    Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

    2012-09-01

    By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.

  10. Foveal analysis and peripheral selection during active visual sampling

    PubMed Central

    Ludwig, Casimir J. H.; Davies, J. Rhys; Eckstein, Miguel P.

    2014-01-01

    Human vision is an active process in which information is sampled during brief periods of stable fixation in between gaze shifts. Foveal analysis serves to identify the currently fixated object and has to be coordinated with a peripheral selection process of the next fixation location. Models of visual search and scene perception typically focus on the latter, without considering foveal processing requirements. We developed a dual-task noise classification technique that enables identification of the information uptake for foveal analysis and peripheral selection within a single fixation. Human observers had to use foveal vision to extract visual feature information (orientation) from different locations for a psychophysical comparison. The selection of to-be-fixated locations was guided by a different feature (luminance contrast). We inserted noise in both visual features and identified the uptake of information by looking at correlations between the noise at different points in time and behavior. Our data show that foveal analysis and peripheral selection proceeded completely in parallel. Peripheral processing stopped some time before the onset of an eye movement, but foveal analysis continued during this period. Variations in the difficulty of foveal processing did not influence the uptake of peripheral information and the efficacy of peripheral selection, suggesting that foveal analysis and peripheral selection operated independently. These results provide important theoretical constraints on how to model target selection in conjunction with foveal object identification: in parallel and independently. PMID:24385588

  11. Wideband aperture array using RF channelizers and massively parallel digital 2D IIR filterbank

    NASA Astrophysics Data System (ADS)

    Sengupta, Arindam; Madanayake, Arjuna; Gómez-García, Roberto; Engeberg, Erik D.

    2014-05-01

    Wideband receive-mode beamforming applications in wireless location, electronically-scanned antennas for radar, RF sensing, microwave imaging and wireless communications require digital aperture arrays that offer a relatively constant far-field beam over several octaves of bandwidth. Several beamforming schemes including the well-known true time-delay and the phased array beamformers have been realized using either finite impulse response (FIR) or fast Fourier transform (FFT) digital filter-sum based techniques. These beamforming algorithms offer the desired selectivity at the cost of a high computational complexity and frequency-dependant far-field array patterns. A novel approach to receiver beamforming is the use of massively parallel 2-D infinite impulse response (IIR) fan filterbanks for the synthesis of relatively frequency independent RF beams at an order of magnitude lower multiplier complexity compared to FFT or FIR filter based conventional algorithms. The 2-D IIR filterbanks demand fast digital processing that can support several octaves of RF bandwidth, fast analog-to-digital converters (ADCs) for RF-to-bits type direct conversion of wideband antenna element signals. Fast digital implementation platforms that can realize high-precision recursive filter structures necessary for real-time beamforming, at RF radio bandwidths, are also desired. We propose a novel technique that combines a passive RF channelizer, multichannel ADC technology, and single-phase massively parallel 2-D IIR digital fan filterbanks, realized at low complexity using FPGA and/or ASIC technology. There exists native support for a larger bandwidth than the maximum clock frequency of the digital implementation technology. We also strive to achieve More-than-Moore throughput by processing a wideband RF signal having content with N-fold (B = N Fclk/2) bandwidth compared to the maximum clock frequency Fclk Hz of the digital VLSI platform under consideration. Such increase in bandwidth is achieved without use of polyphase signal processing or time-interleaved ADC methods. That is, all digital processors operate at the same Fclk clock frequency without phasing, while wideband operation is achieved by sub-sampling of narrower sub-bands at the the RF channelizer outputs.

  12. Rapid data processing for ultrafast X-ray computed tomography using scalable and modular CUDA based pipelines

    NASA Astrophysics Data System (ADS)

    Frust, Tobias; Wagner, Michael; Stephan, Jan; Juckeland, Guido; Bieberle, André

    2017-10-01

    Ultrafast X-ray tomography is an advanced imaging technique for the study of dynamic processes basing on the principles of electron beam scanning. A typical application case for this technique is e.g. the study of multiphase flows, that is, flows of mixtures of substances such as gas-liquidflows in pipelines or chemical reactors. At Helmholtz-Zentrum Dresden-Rossendorf (HZDR) a number of such tomography scanners are operated. Currently, there are two main points limiting their application in some fields. First, after each CT scan sequence the data of the radiation detector must be downloaded from the scanner to a data processing machine. Second, the current data processing is comparably time-consuming compared to the CT scan sequence interval. To enable online observations or use this technique to control actuators in real-time, a modular and scalable data processing tool has been developed, consisting of user-definable stages working independently together in a so called data processing pipeline, that keeps up with the CT scanner's maximal frame rate of up to 8 kHz. The newly developed data processing stages are freely programmable and combinable. In order to achieve the highest processing performance all relevant data processing steps, which are required for a standard slice image reconstruction, were individually implemented in separate stages using Graphics Processing Units (GPUs) and NVIDIA's CUDA programming language. Data processing performance tests on different high-end GPUs (Tesla K20c, GeForce GTX 1080, Tesla P100) showed excellent performance. Program Files doi:http://dx.doi.org/10.17632/65sx747rvm.1 Licensing provisions: LGPLv3 Programming language: C++/CUDA Supplementary material: Test data set, used for the performance analysis. Nature of problem: Ultrafast computed tomography is performed with a scan rate of up to 8 kHz. To obtain cross-sectional images from projection data computer-based image reconstruction algorithms must be applied. The objective of the presented program is to reconstruct a data stream of around 1.3 GB s-1 in a minimum time period. Thus, the program allows to go into new fields of application and to use in the future even more compute-intensive algorithms, especially for data post-processing, to improve the quality of data analysis. Solution method: The program solves the given problem using a two-step process: first, by a generic, expandable and widely applicable template library implementing the streaming paradigm (GLADOS); second, by optimized processing stages for ultrafast computed tomography implementing the required algorithms in a performance-oriented way using CUDA (RISA). Thereby, task-parallelism between the processing stages as well as data parallelism within one processing stage is realized.

  13. Fast parallel algorithm for slicing STL based on pipeline

    NASA Astrophysics Data System (ADS)

    Ma, Xulong; Lin, Feng; Yao, Bo

    2016-05-01

    In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.

  14. An architecture of entropy decoder, inverse quantiser and predictor for multi-standard video decoding

    NASA Astrophysics Data System (ADS)

    Liu, Leibo; Chen, Yingjie; Yin, Shouyi; Lei, Hao; He, Guanghui; Wei, Shaojun

    2014-07-01

    A VLSI architecture for entropy decoder, inverse quantiser and predictor is proposed in this article. This architecture is used for decoding video streams of three standards on a single chip, i.e. H.264/AVC, AVS (China National Audio Video coding Standard) and MPEG2. The proposed scheme is called MPMP (Macro-block-Parallel based Multilevel Pipeline), which is intended to improve the decoding performance to satisfy the real-time requirements while maintaining a reasonable area and power consumption. Several techniques, such as slice level pipeline, MB (Macro-Block) level pipeline, MB level parallel, etc., are adopted. Input and output buffers for the inverse quantiser and predictor are shared by the decoding engines for H.264, AVS and MPEG2, therefore effectively reducing the implementation overhead. Simulation shows that decoding process consumes 512, 435 and 438 clock cycles per MB in H.264, AVS and MPEG2, respectively. Owing to the proposed techniques, the video decoder can support H.264 HP (High Profile) 1920 × 1088@30fps (frame per second) streams, AVS JP (Jizhun Profile) 1920 × 1088@41fps streams and MPEG2 MP (Main Profile) 1920 × 1088@39fps streams when exploiting a 200 MHz working frequency.

  15. Coil Compression for Accelerated Imaging with Cartesian Sampling

    PubMed Central

    Zhang, Tao; Pauly, John M.; Vasanawala, Shreyas S.; Lustig, Michael

    2012-01-01

    MRI using receiver arrays with many coil elements can provide high signal-to-noise ratio and increase parallel imaging acceleration. At the same time, the growing number of elements results in larger datasets and more computation in the reconstruction. This is of particular concern in 3D acquisitions and in iterative reconstructions. Coil compression algorithms are effective in mitigating this problem by compressing data from many channels into fewer virtual coils. In Cartesian sampling there often are fully sampled k-space dimensions. In this work, a new coil compression technique for Cartesian sampling is presented that exploits the spatially varying coil sensitivities in these non-subsampled dimensions for better compression and computation reduction. Instead of directly compressing in k-space, coil compression is performed separately for each spatial location along the fully-sampled directions, followed by an additional alignment process that guarantees the smoothness of the virtual coil sensitivities. This important step provides compatibility with autocalibrating parallel imaging techniques. Its performance is not susceptible to artifacts caused by a tight imaging fieldof-view. High quality compression of in-vivo 3D data from a 32 channel pediatric coil into 6 virtual coils is demonstrated. PMID:22488589

  16. Managing fear in public health campaigns: a theory-based formative evaluation process.

    PubMed

    Cho, Hyunyi; Witte, Kim

    2005-10-01

    The HIV/AIDS infection rate of Ethiopia is one of the world's highest. Prevention campaigns should systematically incorporate and respond to at-risk population's existing beliefs, emotions, and perceived barriers in the message design process to effectively promote behavior change. However, guidelines for conducting formative evaluation that are grounded in proven risk communication theory and empirical data analysis techniques are hard to find. This article provides a five-step formative evaluation process that translates theory and research for developing effective messages for behavior change. Guided by the extended parallel process model, the five-step process helps message designers manage public's fear surrounding issues such as HIV/AIDS. An entertainment education project that used the process to design HIV/AIDS prevention messages for Ethiopian urban youth is reported. Data were collected in five urban regions of Ethiopia and analyzed according to the process to develop key messages for a 26-week radio soap opera.

  17. Dynamic metrology and data processing for precision freeform optics fabrication and testing

    NASA Astrophysics Data System (ADS)

    Aftab, Maham; Trumper, Isaac; Huang, Lei; Choi, Heejoo; Zhao, Wenchuan; Graves, Logan; Oh, Chang Jin; Kim, Dae Wook

    2017-06-01

    Dynamic metrology holds the key to overcoming several challenging limitations of conventional optical metrology, especially with regards to precision freeform optical elements. We present two dynamic metrology systems: 1) adaptive interferometric null testing; and 2) instantaneous phase shifting deflectometry, along with an overview of a gradient data processing and surface reconstruction technique. The adaptive null testing method, utilizing a deformable mirror, adopts a stochastic parallel gradient descent search algorithm in order to dynamically create a null testing condition for unknown freeform optics. The single-shot deflectometry system implemented on an iPhone uses a multiplexed display pattern to enable dynamic measurements of time-varying optical components or optics in vibration. Experimental data, measurement accuracy / precision, and data processing algorithms are discussed.

  18. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  19. Distributed computing feasibility in a non-dedicated homogeneous distributed system

    NASA Technical Reports Server (NTRS)

    Leutenegger, Scott T.; Sun, Xian-He

    1993-01-01

    The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.

  20. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  1. New Techniques for Deep Learning with Geospatial Data using TensorFlow, Earth Engine, and Google Cloud Platform

    NASA Astrophysics Data System (ADS)

    Hancher, M.

    2017-12-01

    Recent years have seen promising results from many research teams applying deep learning techniques to geospatial data processing. In that same timeframe, TensorFlow has emerged as the most popular framework for deep learning in general, and Google has assembled petabytes of Earth observation data from a wide variety of sources and made them available in analysis-ready form in the cloud through Google Earth Engine. Nevertheless, developing and applying deep learning to geospatial data at scale has been somewhat cumbersome to date. We present a new set of tools and techniques that simplify this process. Our approach combines the strengths of several underlying tools: TensorFlow for its expressive deep learning framework; Earth Engine for data management, preprocessing, postprocessing, and visualization; and other tools in Google Cloud Platform to train TensorFlow models at scale, perform additional custom parallel data processing, and drive the entire process from a single familiar Python development environment. These tools can be used to easily apply standard deep neural networks, convolutional neural networks, and other custom model architectures to a variety of geospatial data structures. We discuss our experiences applying these and related tools to a range of machine learning problems, including classic problems like cloud detection, building detection, land cover classification, as well as more novel problems like illegal fishing detection. Our improved tools will make it easier for geospatial data scientists to apply modern deep learning techniques to their own problems, and will also make it easier for machine learning researchers to advance the state of the art of those techniques.

  2. Toward a Model Framework of Generalized Parallel Componential Processing of Multi-Symbol Numbers

    ERIC Educational Resources Information Center

    Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

    2015-01-01

    In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining…

  3. Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

    NASA Astrophysics Data System (ADS)

    Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

    2012-11-01

    In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.

  4. 10-channel fiber array fabrication technique for parallel optical coherence tomography system

    NASA Astrophysics Data System (ADS)

    Arauz, Lina J.; Luo, Yuan; Castillo, Jose E.; Kostuk, Raymond K.; Barton, Jennifer

    2007-02-01

    Optical Coherence Tomography (OCT) shows great promise for low intrusive biomedical imaging applications. A parallel OCT system is a novel technique that replaces mechanical transverse scanning with electronic scanning. This will reduce the time required to acquire image data. In this system an array of small diameter fibers is required to obtain an image in the transverse direction. Each fiber in the array is configured in an interferometer and is used to image one pixel in the transverse direction. In this paper we describe a technique to package 15μm diameter fibers on a siliconsilica substrate to be used in a 2mm endoscopic probe tip. Single mode fibers are etched to reduce the cladding diameter from 125μm to 15μm. Etched fibers are placed into a 4mm by 150μm trench in a silicon-silica substrate and secured with UV glue. Active alignment was used to simplify the lay out of the fibers and minimize unwanted horizontal displacement of the fibers. A 10-channel fiber array was built, tested and later incorporated into a parallel optical coherence system. This paper describes the packaging, testing, and operation of the array in a parallel OCT system.

  5. In vitro and in vivo tissue harmonic images obtained with parallel transmit beamforming by means of orthogonal frequency division multiplexing.

    PubMed

    Demi, Libertario; Ramalli, Alessandro; Giannini, Gabriele; Mischi, Massimo

    2015-01-01

    In classic pulse-echo ultrasound imaging, the data acquisition rate is limited by the speed of sound. To overcome this, parallel beamforming techniques in transmit (PBT) and in receive (PBR) mode have been proposed. In particular, PBT techniques, based on the transmission of focused beams, are more suitable for harmonic imaging because they are capable of generating stronger harmonics. Recently, orthogonal frequency division multiplexing (OFDM) has been investigated as a means to obtain parallel beamformed tissue harmonic images. To date, only numerical studies and experiments in water have been performed, hence neglecting the effect of frequencydependent absorption. Here we present the first in vitro and in vivo tissue harmonic images obtained with PBT by means of OFDM, and we compare the results with classic B-mode tissue harmonic imaging. The resulting contrast-to-noise ratio, here used as a performance metric, is comparable. A reduction by 2 dB is observed for the case in which three parallel lines are reconstructed. In conclusion, the applicability of this technique to ultrasonography as a means to improve the data acquisition rate is confirmed.

  6. Development of seismic tomography software for hybrid supercomputers

    NASA Astrophysics Data System (ADS)

    Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton

    2015-04-01

    Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on supercomputers using multicore CPUs only, with preliminary performance tests showing good parallel efficiency on large numerical grids. Porting of the algorithms to hybrid supercomputers is currently ongoing.

  7. On line biomonitors used as a tool for toxicity reduction evaluation of in situ groundwater remediation techniques.

    PubMed

    Küster, Eberhard; Dorusch, Falk; Vogt, Carsten; Weiss, Holger; Altenburger, Rolf

    2004-07-15

    Success of groundwater remediation is typically controlled via snapshot analysis of selected chemical substances or physical parameters. Biological parameters, i.e. ecotoxicological assays, are rarely employed. Hence the aim of the study was to develop a bioassay tool, which allows an on line monitoring of contaminated groundwater, as well as a toxicity reduction evaluation (TRE) of different remediation techniques in parallel and may furthermore be used as an additional tool for process control to supervise remediation techniques in a real time mode. Parallel testing of groundwater remediation techniques was accomplished for short and long time periods, by using the energy dependent luminescence of the bacterium Vibrio fischeri as biological monitoring parameter. One data point every hour for each remediation technique was generated by an automated biomonitor. The bacteria proved to be highly sensitive to the contaminated groundwater and the biomonitor showed a long standing time despite the highly corrosive groundwater present in Bitterfeld, Germany. The bacterial biomonitor is demonstrated to be a valuable tool for remediation success evaluation. Dose response relationships were generated for the six quantitatively dominant groundwater contaminants (2-chlortoluene, 1,2- and 1,4-dichlorobenzene, monochlorobenzene, ethylenbenzene and benzene). The concentrations of individual volatile organic chemicals (VOCs) could not explain the observed effects in the bacteria. An expected mixture toxicity was calculated for the six components using the concept of concentration addition. The calculated EC(50) for the mixture was still one order of magnitude lower than the observed EC(50) of the actual groundwater. The results pointed out that chemical analysis of the six most quantitative substances alone was not able to explain the effects observed with the bacteria. Thus chemical analysis alone may not be an adequate tool for remediation success evaluation in terms of toxicity reduction.

  8. Sensing underground coal gasification by ground penetrating radar

    NASA Astrophysics Data System (ADS)

    Kotyrba, Andrzej; Stańczyk, Krzysztof

    2017-12-01

    The paper describes the results of research on the applicability of the ground penetrating radar (GPR) method for remote sensing and monitoring of the underground coal gasification (UCG) processes. The gasification of coal in a bed entails various technological problems and poses risks to the environment. Therefore, in parallel with research on coal gasification technologies, it is necessary to develop techniques for remote sensing of the process environment. One such technique may be the radar method, which allows imaging of regions of mass loss (voids, fissures) in coal during and after carrying out a gasification process in the bed. The paper describes two research experiments. The first one was carried out on a large-scale model constructed on the surface. It simulated a coal seam in natural geological conditions. A second experiment was performed in a shallow coal deposit maintained in a disused mine and kept accessible for research purposes. Tests performed in the laboratory and in situ conditions showed that the method provides valuable data for assessing and monitoring gasification surfaces in the UCG processes. The advantage of the GPR method is its high resolution and the possibility of determining the spatial shape of various zones and forms created in the coal by the gasification process.

  9. Visual analysis of inter-process communication for large-scale parallel computing.

    PubMed

    Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

    2009-01-01

    In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.

  10. Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arumugam, Kamesh

    Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less

  11. Externally Calibrated Parallel Imaging for 3D Multispectral Imaging Near Metallic Implants Using Broadband Ultrashort Echo Time Imaging

    PubMed Central

    Wiens, Curtis N.; Artz, Nathan S.; Jang, Hyungseok; McMillan, Alan B.; Reeder, Scott B.

    2017-01-01

    Purpose To develop an externally calibrated parallel imaging technique for three-dimensional multispectral imaging (3D-MSI) in the presence of metallic implants. Theory and Methods A fast, ultrashort echo time (UTE) calibration acquisition is proposed to enable externally calibrated parallel imaging techniques near metallic implants. The proposed calibration acquisition uses a broadband radiofrequency (RF) pulse to excite the off-resonance induced by the metallic implant, fully phase-encoded imaging to prevent in-plane distortions, and UTE to capture rapidly decaying signal. The performance of the externally calibrated parallel imaging reconstructions was assessed using phantoms and in vivo examples. Results Phantom and in vivo comparisons to self-calibrated parallel imaging acquisitions show that significant reductions in acquisition times can be achieved using externally calibrated parallel imaging with comparable image quality. Acquisition time reductions are particularly large for fully phase-encoded methods such as spectrally resolved fully phase-encoded three-dimensional (3D) fast spin-echo (SR-FPE), in which scan time reductions of up to 8 min were obtained. Conclusion A fully phase-encoded acquisition with broadband excitation and UTE enabled externally calibrated parallel imaging for 3D-MSI, eliminating the need for repeated calibration regions at each frequency offset. Significant reductions in acquisition time can be achieved, particularly for fully phase-encoded methods like SR-FPE. PMID:27403613

  12. GASPACHO: a generic automatic solver using proximal algorithms for convex huge optimization problems

    NASA Astrophysics Data System (ADS)

    Goossens, Bart; Luong, Hiêp; Philips, Wilfried

    2017-08-01

    Many inverse problems (e.g., demosaicking, deblurring, denoising, image fusion, HDR synthesis) share various similarities: degradation operators are often modeled by a specific data fitting function while image prior knowledge (e.g., sparsity) is incorporated by additional regularization terms. In this paper, we investigate automatic algorithmic techniques for evaluating proximal operators. These algorithmic techniques also enable efficient calculation of adjoints from linear operators in a general matrix-free setting. In particular, we study the simultaneous-direction method of multipliers (SDMM) and the parallel proximal algorithm (PPXA) solvers and show that the automatically derived implementations are well suited for both single-GPU and multi-GPU processing. We demonstrate this approach for an Electron Microscopy (EM) deconvolution problem.

  13. Accelerating atomistic calculations of quantum energy eigenstates on graphic cards

    NASA Astrophysics Data System (ADS)

    Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.

    2014-10-01

    Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).

  14. On-board landmark navigation and attitude reference parallel processor system

    NASA Technical Reports Server (NTRS)

    Gilbert, L. E.; Mahajan, D. T.

    1978-01-01

    An approach to autonomous navigation and attitude reference for earth observing spacecraft is described along with the landmark identification technique based on a sequential similarity detection algorithm (SSDA). Laboratory experiments undertaken to determine if better than one pixel accuracy in registration can be achieved consistent with onboard processor timing and capacity constraints are included. The SSDA is implemented using a multi-microprocessor system including synchronization logic and chip library. The data is processed in parallel stages, effectively reducing the time to match the small known image within a larger image as seen by the onboard image system. Shared memory is incorporated in the system to help communicate intermediate results among microprocessors. The functions include finding mean values and summation of absolute differences over the image search area. The hardware is a low power, compact unit suitable to onboard application with the flexibility to provide for different parameters depending upon the environment.

  15. State-of-the-art in Heterogeneous Computing

    DOE PAGES

    Brodtkorb, Andre R.; Dyken, Christopher; Hagen, Trond R.; ...

    2010-01-01

    Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, availablemore » software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.« less

  16. Implementation of collisions on GPU architecture in the Vorpal code

    NASA Astrophysics Data System (ADS)

    Leddy, Jarrod; Averkin, Sergey; Cowan, Ben; Sides, Scott; Werner, Greg; Cary, John

    2017-10-01

    The Vorpal code contains a variety of collision operators allowing for the simulation of plasmas containing multiple charge species interacting with neutrals, background gas, and EM fields. These existing algorithms have been improved and reimplemented to take advantage of the massive parallelization allowed by GPU architecture. The use of GPUs is most effective when algorithms are single-instruction multiple-data, so particle collisions are an ideal candidate for this parallelization technique due to their nature as a series of independent processes with the same underlying operation. This refactoring required data memory reorganization and careful consideration of device/host data allocation to minimize memory access and data communication per operation. Successful implementation has resulted in an order of magnitude increase in simulation speed for a test-case involving multiple binary collisions using the null collision method. Work supported by DARPA under contract W31P4Q-16-C-0009.

  17. Parallel performance optimizations on unstructured mesh-based simulations

    DOE PAGES

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; ...

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less

  18. Performance Enhancement Strategies for Multi-Block Overset Grid CFD Applications

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Biswas, Rupak

    2003-01-01

    The overset grid methodology has significantly reduced time-to-solution of highfidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process resolves the geometrical complexity of the problem domain by using separately generated but overlapping structured discretization grids that periodically exchange information through interpolation. However, high performance computations of such large-scale realistic applications must be handled efficiently on state-of-the-art parallel supercomputers. This paper analyzes the effects of various performance enhancement strategies on the parallel efficiency of an overset grid Navier-Stokes CFD application running on an SGI Origin2000 machinc. Specifically, the role of asynchronous communication, grid splitting, and grid grouping strategies are presented and discussed. Details of a sophisticated graph partitioning technique for grid grouping are also provided. Results indicate that performance depends critically on the level of latency hiding and the quality of load balancing across the processors.

  19. Domain Decomposition By the Advancing-Partition Method

    NASA Technical Reports Server (NTRS)

    Pirzadeh, Shahyar Z.

    2008-01-01

    A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.

  20. Simulation/Emulation Techniques: Compressing Schedules With Parallel (HW/SW) Development

    NASA Technical Reports Server (NTRS)

    Mangieri, Mark L.; Hoang, June

    2014-01-01

    NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of simulators and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of simulators and emulators in advance of deliverable hardware to achieve parallel design and development on a compressed schedule.

  1. Domain Immersion Technique And Free Surface Computations Applied To Extrusion And Mixing Processes

    NASA Astrophysics Data System (ADS)

    Valette, Rudy; Vergnes, Bruno; Basset, Olivier; Coupez, Thierry

    2007-04-01

    This work focuses on the development of numerical techniques devoted to the simulation of mixing processes of complex fluids such as twin-screw extrusion or batch mixing. In mixing process simulation, the absence of symmetry of the moving boundaries (the screws or the rotors) implies that their rigid body motion has to be taken into account by using a special treatment. We therefore use a mesh immersion technique (MIT), which consists in using a P1+/P1-based (MINI-element) mixed finite element method for solving the velocity-pressure problem and then solving the problem in the whole barrel cavity by imposing a rigid motion (rotation) to nodes found located inside the so called immersed domain, each subdomain (screw, rotor) being represented by a surface CAD mesh (or its mathematical equation in simple cases). The independent meshes are immersed into a unique backgound computational mesh by computing the distance function to their boundaries. Intersections of meshes are accounted for, allowing to compute a fill factor usable as for the VOF methodology. This technique, combined with the use of parallel computing, allows to compute the time-dependent flow of generalized Newtonian fluids including yield stress fluids in a complex system such as a twin screw extruder, including moving free surfaces, which are treated by a "level set" and Hamilton-Jacobi method.

  2. From experiment to design -- Fault characterization and detection in parallel computer systems using computational accelerators

    NASA Astrophysics Data System (ADS)

    Yim, Keun Soo

    This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of program states that included dynamically allocated memory (to be spatially comprehensive). In GPUs, we used fault injection studies to demonstrate the importance of detecting silent data corruption (SDC) errors that are mainly due to the lack of fine-grained protections and the massive use of fault-insensitive data. This dissertation also presents transparent fault tolerance frameworks and techniques that are directly applicable to hybrid computers built using only commercial off-the-shelf hardware components. This dissertation shows that by developing understanding of the failure characteristics and error propagation paths of target programs, we were able to create fault tolerance frameworks and techniques that can quickly detect and recover from hardware faults with low performance and hardware overheads.

  3. [Parallel virtual reality visualization of extreme large medical datasets].

    PubMed

    Tang, Min

    2010-04-01

    On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.

  4. Enhancing instruction scheduling with a block-structured ISA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Melvin, S.; Patt, Y.

    It is now generally recognized that not enough parallelism exists within the small basic blocks of most general purpose programs to satisfy high performance processors. Thus, a wide variety of techniques have been developed to exploit instruction level parallelism across basic block boundaries. In this paper we discuss some previous techniques along with their hardware and software requirements. Then we propose a new paradigm for an instruction set architecture (ISA): block-structuring. This new paradigm is presented, its hardware and software requirements are discussed and the results from a simulation study are presented. We show that a block-structured ISA utilizes bothmore » dynamic and compile-time mechanisms for exploiting instruction level parallelism and has significant performance advantages over a conventional ISA.« less

  5. GPU accelerated dynamic functional connectivity analysis for functional MRI data.

    PubMed

    Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu

    2015-07-01

    Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Optimization under uncertainty of parallel nonlinear energy sinks

    NASA Astrophysics Data System (ADS)

    Boroson, Ethan; Missoum, Samy; Mattei, Pierre-Olivier; Vergez, Christophe

    2017-04-01

    Nonlinear Energy Sinks (NESs) are a promising technique for passively reducing the amplitude of vibrations. Through nonlinear stiffness properties, a NES is able to passively and irreversibly absorb energy. Unlike the traditional Tuned Mass Damper (TMD), NESs do not require a specific tuning and absorb energy over a wider range of frequencies. Nevertheless, they are still only efficient over a limited range of excitations. In order to mitigate this limitation and maximize the efficiency range, this work investigates the optimization of multiple NESs configured in parallel. It is well known that the efficiency of a NES is extremely sensitive to small perturbations in loading conditions or design parameters. In fact, the efficiency of a NES has been shown to be nearly discontinuous in the neighborhood of its activation threshold. For this reason, uncertainties must be taken into account in the design optimization of NESs. In addition, the discontinuities require a specific treatment during the optimization process. In this work, the objective of the optimization is to maximize the expected value of the efficiency of NESs in parallel. The optimization algorithm is able to tackle design variables with uncertainty (e.g., nonlinear stiffness coefficients) as well as aleatory variables such as the initial velocity of the main system. The optimal design of several parallel NES configurations for maximum mean efficiency is investigated. Specifically, NES nonlinear stiffness properties, considered random design variables, are optimized for cases with 1, 2, 3, 4, 5, and 10 NESs in parallel. The distributions of efficiency for the optimal parallel configurations are compared to distributions of efficiencies of non-optimized NESs. It is observed that the optimization enables a sharp increase in the mean value of efficiency while reducing the corresponding variance, thus leading to more robust NES designs.

  7. Poster – 39: Using Optical Scanner and 3D Printer Technology to Create Lead Shielding for Radiotherapy of Facial Skin Cancer with Low Energy Photons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rickey, Daniel; Leylek, Ahmet; Dubey, Arbind

    Purpose: Treatment of skin cancers of the face using orthovoltage radiotherapy often requires lead shielding. However, creating a lead shield can be difficult because the face has complex and intricate contours. The traditional process involved creating a plaster mould of the patient’s face can be difficult for patients. Our goal was to develop an improved process by using an optical scanner and 3D printer technology. Methods: The oncologist defined the treatment field by drawing on each patient’s skin. Three-dimensional images were acquired using a consumer-grade optical scanner. A 3D model of each patient’s face was processed with mesh editing softwaremore » before being printed on a 3D printer. Using a hammer, a 3 mm thick layer of lead was formed to closely fit the contours of the model. A hole was then cut out to define the field. Results: The lead shields created were remarkably accurate and fit the contours of the patients. The hole defining the field exposed only a minimally sized site to be exposed to radiation, while the rest of the face was protected. It was easy to obtain perfect symmetry for the definition of parallel opposed beams. Conclusion: We are routinely using this technique to build lead shielding that wraps around the patient as an alternative to cut-outs. We also use it for treatment of the tip of the nose using a parallel opposed pair beams with a wax nose block. We found this technique allows more accurate delineation of the cut-out and a more reproducible set-up.« less

  8. New Materials, Techniques and Device Concepts for Organic NLO Chromophore-based Electrooptic Devices. Part 1

    DTIC Science & Technology

    2006-08-23

    polarization the electric field vector is parallel to the substrate, for TM polarization the magnetic field vector is parallel to the substrate. Figure...section can be obtained for the case of the two electromagnetic field polarization vectors λ and µ describing the two photons being absorbed (of the same or... polarization effects on two-photon absorption as investigated by the technique of thermal lensing detected absorption of a mode- locked laser beam. This

  9. The Processing of Somatosensory Information Shifts from an Early Parallel into a Serial Processing Mode: A Combined fMRI/MEG Study.

    PubMed

    Klingner, Carsten M; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W

    2016-01-01

    The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI.

  10. The Processing of Somatosensory Information Shifts from an Early Parallel into a Serial Processing Mode: A Combined fMRI/MEG Study

    PubMed Central

    Klingner, Carsten M.; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W.

    2016-01-01

    The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI. PMID:28066197

  11. The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce

    NASA Astrophysics Data System (ADS)

    Chen, Xi; Zhou, Liqing

    2015-12-01

    With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.

  12. High-energy physics software parallelization using database techniques

    NASA Astrophysics Data System (ADS)

    Argante, E.; van der Stok, P. D. V.; Willers, I.

    1997-02-01

    A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradimg, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI.

  13. The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete; hide

    1998-01-01

    Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

  14. Computer architecture evaluation for structural dynamics computations: Project summary

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1989-01-01

    The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.

  15. Mobile robot self-localization system using single webcam distance measurement technology in indoor environments.

    PubMed

    Li, I-Hsum; Chen, Ming-Chang; Wang, Wei-Yen; Su, Shun-Feng; Lai, To-Wen

    2014-01-27

    A single-webcam distance measurement technique for indoor robot localization is proposed in this paper. The proposed localization technique uses webcams that are available in an existing surveillance environment. The developed image-based distance measurement system (IBDMS) and parallel lines distance measurement system (PLDMS) have two merits. Firstly, only one webcam is required for estimating the distance. Secondly, the set-up of IBDMS and PLDMS is easy, which only one known-dimension rectangle pattern is needed, i.e., a ground tile. Some common and simple image processing techniques, i.e., background subtraction are used to capture the robot in real time. Thus, for the purposes of indoor robot localization, the proposed method does not need to use expensive high-resolution webcams and complicated pattern recognition methods but just few simple estimating formulas. From the experimental results, the proposed robot localization method is reliable and effective in an indoor environment.

  16. Approximate Computing Techniques for Iterative Graph Algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh

    Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with lowmore » impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.« less

  17. Reducing software mass through behavior control. [of planetary roving robots

    NASA Technical Reports Server (NTRS)

    Miller, David P.

    1992-01-01

    Attention is given to the tradeoff between communication and computation as regards a planetary rover (both these subsystems are very power-intensive, and both can be the major driver of the rover's power subsystem, and therefore the minimum mass and size of the rover). Software techniques that can be used to reduce the requirements on both communciation and computation, allowing the overall robot mass to be greatly reduced, are discussed. Novel approaches to autonomous control, called behavior control, employ an entirely different approach, and for many tasks will yield a similar or superior level of autonomy to traditional control techniques, while greatly reducing the computational demand. Traditional systems have several expensive processes that operate serially, while behavior techniques employ robot capabilities that run in parallel. Traditional systems make extensive world models, while behavior control systems use minimal world models or none at all.

  18. Mobile Robot Self-Localization System Using Single Webcam Distance Measurement Technology in Indoor Environments

    PubMed Central

    Li, I-Hsum; Chen, Ming-Chang; Wang, Wei-Yen; Su, Shun-Feng; Lai, To-Wen

    2014-01-01

    A single-webcam distance measurement technique for indoor robot localization is proposed in this paper. The proposed localization technique uses webcams that are available in an existing surveillance environment. The developed image-based distance measurement system (IBDMS) and parallel lines distance measurement system (PLDMS) have two merits. Firstly, only one webcam is required for estimating the distance. Secondly, the set-up of IBDMS and PLDMS is easy, which only one known-dimension rectangle pattern is needed, i.e., a ground tile. Some common and simple image processing techniques, i.e., background subtraction are used to capture the robot in real time. Thus, for the purposes of indoor robot localization, the proposed method does not need to use expensive high-resolution webcams and complicated pattern recognition methods but just few simple estimating formulas. From the experimental results, the proposed robot localization method is reliable and effective in an indoor environment. PMID:24473282

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chrisochoides, N.; Sukup, F.

    In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less

  20. Recycling isoelectric focusing with computer controlled data acquisition system. [for high resolution electrophoretic separation and purification of biomolecules

    NASA Technical Reports Server (NTRS)

    Egen, N. B.; Twitty, G. E.; Bier, M.

    1979-01-01

    Isoelectric focusing is a high-resolution technique for separating and purifying large peptides, proteins, and other biomolecules. The apparatus described in the present paper constitutes a new approach to fluid stabilization and increased throughput. Stabilization is achieved by flowing the process fluid uniformly through an array of closely spaced filter elements oriented parallel both to the electrodes and the direction of the flow. This seems to overcome the major difficulties of parabolic flow and electroosmosis at the walls, while limiting the convection to chamber compartments defined by adjacent spacers. Increased throughput is achieved by recirculating the process fluid through external heat exchange reservoirs, where the Joule heat is dissipated.

Top