Sample records for parallel model combination

  1. Parallelization of fine-scale computation in Agile Multiscale Modelling Methodology

    NASA Astrophysics Data System (ADS)

    Macioł, Piotr; Michalik, Kazimierz

    2016-10-01

    Nowadays, multiscale modelling of material behavior is an extensively developed area. An important obstacle against its wide application is high computational demands. Among others, the parallelization of multiscale computations is a promising solution. Heterogeneous multiscale models are good candidates for parallelization, since communication between sub-models is limited. In this paper, the possibility of parallelization of multiscale models based on Agile Multiscale Methodology framework is discussed. A sequential, FEM based macroscopic model has been combined with concurrently computed fine-scale models, employing a MatCalc thermodynamic simulator. The main issues, being investigated in this work are: (i) the speed-up of multiscale models with special focus on fine-scale computations and (ii) on decreasing the quality of computations enforced by parallel execution. Speed-up has been evaluated on the basis of Amdahl's law equations. The problem of `delay error', rising from the parallel execution of fine scale sub-models, controlled by the sequential macroscopic sub-model is discussed. Some technical aspects of combining third-party commercial modelling software with an in-house multiscale framework and a MPI library are also discussed.

  2. Parallel and Preemptable Dynamically Dimensioned Search Algorithms for Single and Multi-objective Optimization in Water Resources

    NASA Astrophysics Data System (ADS)

    Tolson, B.; Matott, L. S.; Gaffoor, T. A.; Asadzadeh, M.; Shafii, M.; Pomorski, P.; Xu, X.; Jahanpour, M.; Razavi, S.; Haghnegahdar, A.; Craig, J. R.

    2015-12-01

    We introduce asynchronous parallel implementations of the Dynamically Dimensioned Search (DDS) family of algorithms including DDS, discrete DDS, PA-DDS and DDS-AU. These parallel algorithms are unique from most existing parallel optimization algorithms in the water resources field in that parallel DDS is asynchronous and does not require an entire population (set of candidate solutions) to be evaluated before generating and then sending a new candidate solution for evaluation. One key advance in this study is developing the first parallel PA-DDS multi-objective optimization algorithm. The other key advance is enhancing the computational efficiency of solving optimization problems (such as model calibration) by combining a parallel optimization algorithm with the deterministic model pre-emption concept. These two efficiency techniques can only be combined because of the asynchronous nature of parallel DDS. Model pre-emption functions to terminate simulation model runs early, prior to completely simulating the model calibration period for example, when intermediate results indicate the candidate solution is so poor that it will definitely have no influence on the generation of further candidate solutions. The computational savings of deterministic model preemption available in serial implementations of population-based algorithms (e.g., PSO) disappear in synchronous parallel implementations as these algorithms. In addition to the key advances above, we implement the algorithms across a range of computation platforms (Windows and Unix-based operating systems from multi-core desktops to a supercomputer system) and package these for future modellers within a model-independent calibration software package called Ostrich as well as MATLAB versions. Results across multiple platforms and multiple case studies (from 4 to 64 processors) demonstrate the vast improvement over serial DDS-based algorithms and highlight the important role model pre-emption plays in the performance of parallel, pre-emptable DDS algorithms. Case studies include single- and multiple-objective optimization problems in water resources model calibration and in many cases linear or near linear speedups are observed.

  3. LORAKS Makes Better SENSE: Phase-Constrained Partial Fourier SENSE Reconstruction without Phase Calibration

    PubMed Central

    Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P.

    2016-01-01

    Purpose Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. Theory and Methods The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly-accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely-used calibrationless uniformly-undersampled trajectories. Results Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. Conclusion The SENSE-LORAKS framework provides promising new opportunities for highly-accelerated MRI. PMID:27037836

  4. Combining Different Conceptual Change Methods within Four-Step Constructivist Teaching Model: A Sample Teaching of Series and Parallel Circuits

    ERIC Educational Resources Information Center

    Ipek, Hava; Calik, Muammer

    2008-01-01

    Based on students' alternative conceptions of the topics "electric circuits", "electric charge flows within an electric circuit", "how the brightness of bulbs and the resistance changes in series and parallel circuits", the current study aims to present a combination of different conceptual change methods within a four-step constructivist teaching…

  5. A model for optimizing file access patterns using spatio-temporal parallelism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boonthanome, Nouanesengsy; Patchett, John; Geveci, Berk

    2013-01-01

    For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible filemore » access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.« less

  6. PROTO-PLASM: parallel language for adaptive and scalable modelling of biosystems.

    PubMed

    Bajaj, Chandrajit; DiCarlo, Antonio; Paoluzzi, Alberto

    2008-09-13

    This paper discusses the design goals and the first developments of PROTO-PLASM, a novel computational environment to produce libraries of executable, combinable and customizable computer models of natural and synthetic biosystems, aiming to provide a supporting framework for predictive understanding of structure and behaviour through multiscale geometric modelling and multiphysics simulations. Admittedly, the PROTO-PLASM platform is still in its infancy. Its computational framework--language, model library, integrated development environment and parallel engine--intends to provide patient-specific computational modelling and simulation of organs and biosystem, exploiting novel functionalities resulting from the symbolic combination of parametrized models of parts at various scales. PROTO-PLASM may define the model equations, but it is currently focused on the symbolic description of model geometry and on the parallel support of simulations. Conversely, CellML and SBML could be viewed as defining the behavioural functions (the model equations) to be used within a PROTO-PLASM program. Here we exemplify the basic functionalities of PROTO-PLASM, by constructing a schematic heart model. We also discuss multiscale issues with reference to the geometric and physical modelling of neuromuscular junctions.

  7. Proto-Plasm: parallel language for adaptive and scalable modelling of biosystems

    PubMed Central

    Bajaj, Chandrajit; DiCarlo, Antonio; Paoluzzi, Alberto

    2008-01-01

    This paper discusses the design goals and the first developments of Proto-Plasm, a novel computational environment to produce libraries of executable, combinable and customizable computer models of natural and synthetic biosystems, aiming to provide a supporting framework for predictive understanding of structure and behaviour through multiscale geometric modelling and multiphysics simulations. Admittedly, the Proto-Plasm platform is still in its infancy. Its computational framework—language, model library, integrated development environment and parallel engine—intends to provide patient-specific computational modelling and simulation of organs and biosystem, exploiting novel functionalities resulting from the symbolic combination of parametrized models of parts at various scales. Proto-Plasm may define the model equations, but it is currently focused on the symbolic description of model geometry and on the parallel support of simulations. Conversely, CellML and SBML could be viewed as defining the behavioural functions (the model equations) to be used within a Proto-Plasm program. Here we exemplify the basic functionalities of Proto-Plasm, by constructing a schematic heart model. We also discuss multiscale issues with reference to the geometric and physical modelling of neuromuscular junctions. PMID:18559320

  8. Parallelization and automatic data distribution for nuclear reactor simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liebrock, L.M.

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less

  9. Reliability models applicable to space telescope solar array assembly system

    NASA Technical Reports Server (NTRS)

    Patil, S. A.

    1986-01-01

    A complex system may consist of a number of subsystems with several components in series, parallel, or combination of both series and parallel. In order to predict how well the system will perform, it is necessary to know the reliabilities of the subsystems and the reliability of the whole system. The objective of the present study is to develop mathematical models of the reliability which are applicable to complex systems. The models are determined by assuming k failures out of n components in a subsystem. By taking k = 1 and k = n, these models reduce to parallel and series models; hence, the models can be specialized to parallel, series combination systems. The models are developed by assuming the failure rates of the components as functions of time and as such, can be applied to processes with or without aging effects. The reliability models are further specialized to Space Telescope Solar Arrray (STSA) System. The STSA consists of 20 identical solar panel assemblies (SPA's). The reliabilities of the SPA's are determined by the reliabilities of solar cell strings, interconnects, and diodes. The estimates of the reliability of the system for one to five years are calculated by using the reliability estimates of solar cells and interconnects given n ESA documents. Aging effects in relation to breaks in interconnects are discussed.

  10. Synchronization Of Parallel Discrete Event Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S.

    1992-01-01

    Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

  11. LORAKS makes better SENSE: Phase-constrained partial fourier SENSE reconstruction without phase calibration.

    PubMed

    Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P

    2017-03-01

    Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely used calibrationless uniformly undersampled trajectories. Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. The SENSE-LORAKS framework provides promising new opportunities for highly accelerated MRI. Magn Reson Med 77:1021-1035, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

  12. Variable-Complexity Multidisciplinary Optimization on Parallel Computers

    NASA Technical Reports Server (NTRS)

    Grossman, Bernard; Mason, William H.; Watson, Layne T.; Haftka, Raphael T.

    1998-01-01

    This report covers work conducted under grant NAG1-1562 for the NASA High Performance Computing and Communications Program (HPCCP) from December 7, 1993, to December 31, 1997. The objective of the research was to develop new multidisciplinary design optimization (MDO) techniques which exploit parallel computing to reduce the computational burden of aircraft MDO. The design of the High-Speed Civil Transport (HSCT) air-craft was selected as a test case to demonstrate the utility of our MDO methods. The three major tasks of this research grant included: development of parallel multipoint approximation methods for the aerodynamic design of the HSCT, use of parallel multipoint approximation methods for structural optimization of the HSCT, mathematical and algorithmic development including support in the integration of parallel computation for items (1) and (2). These tasks have been accomplished with the development of a response surface methodology that incorporates multi-fidelity models. For the aerodynamic design we were able to optimize with up to 20 design variables using hundreds of expensive Euler analyses together with thousands of inexpensive linear theory simulations. We have thereby demonstrated the application of CFD to a large aerodynamic design problem. For the predicting structural weight we were able to combine hundreds of structural optimizations of refined finite element models with thousands of optimizations based on coarse models. Computations have been carried out on the Intel Paragon with up to 128 nodes. The parallel computation allowed us to perform combined aerodynamic-structural optimization using state of the art models of a complex aircraft configurations.

  13. F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

    NASA Technical Reports Server (NTRS)

    DiNucci, David C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).

  14. Modeling the Control Systems of Gas-Turbines to Ensure Their Reliable Parallel Operation in the UPS of Russia

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vinogradov, A. Yu., E-mail: vinogradov-a@ntcees.ru; Gerasimov, A. S.; Kozlov, A. V.

    Consideration is given to different approaches to modeling the control systems of gas turbines as a component of CCPP and GTPP to ensure their reliable parallel operation in the UPS of Russia. The disadvantages of the approaches to the modeling of combined-cycle units in studying long-term electromechanical transients accompanied by power imbalance are pointed out. Examples are presented to support the use of more detailed models of gas turbines in electromechanical transient calculations. It is shown that the modern speed control systems of gas turbines in combination with relatively low equivalent inertia have a considerable effect on electromechanical transients, includingmore » those caused by disturbances not related to power imbalance.« less

  15. Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

    PubMed

    Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

    2014-07-05

    A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.

  16. Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

    NASA Technical Reports Server (NTRS)

    Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

    1994-01-01

    The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).

  17. Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perumalla, Kalyan S; Seal, Sudip K

    2010-01-01

    The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over themore » $$\\mu$$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.« less

  18. The design of multi-core DSP parallel model based on message passing and multi-level pipeline

    NASA Astrophysics Data System (ADS)

    Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

    2017-10-01

    Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.

  19. Parallel reduced-instruction-set-computer architecture for real-time symbolic pattern matching

    NASA Astrophysics Data System (ADS)

    Parson, Dale E.

    1991-03-01

    This report discusses ongoing work on a parallel reduced-instruction- set-computer (RISC) architecture for automatic production matching. The PRIOPS compiler takes advantage of the memoryless character of automatic processing by translating a program's collection of automatic production tests into an equivalent combinational circuit-a digital circuit without memory, whose outputs are immediate functions of its inputs. The circuit provides a highly parallel, fine-grain model of automatic matching. The compiler then maps the combinational circuit onto RISC hardware. The heart of the processor is an array of comparators capable of testing production conditions in parallel, Each comparator attaches to private memory that contains virtual circuit nodes-records of the current state of nodes and busses in the combinational circuit. All comparator memories hold identical information, allowing simultaneous update for a single changing circuit node and simultaneous retrieval of different circuit nodes by different comparators. Along with the comparator-based logic unit is a sequencer that determines the current combination of production-derived comparisons to try, based on the combined success and failure of previous combinations of comparisons. The memoryless nature of automatic matching allows the compiler to designate invariant memory addresses for virtual circuit nodes, and to generate the most effective sequences of comparison test combinations. The result is maximal utilization of parallel hardware, indicating speed increases and scalability beyond that found for course-grain, multiprocessor approaches to concurrent Rete matching. Future work will consider application of this RISC architecture to the standard (controlled) Rete algorithm, where search through memory dominates portions of matching.

  20. Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Akil, Mohamed

    2017-05-01

    The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

  1. PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions

    NASA Astrophysics Data System (ADS)

    Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin

    2017-01-01

    We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.

  2. Improved packing of protein side chains with parallel ant colonies.

    PubMed

    Quan, Lijun; Lü, Qiang; Li, Haiou; Xia, Xiaoyan; Wu, Hongjie

    2014-01-01

    The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.

  3. Distributed parallel computing in stochastic modeling of groundwater systems.

    PubMed

    Dong, Yanhui; Li, Guomin; Xu, Haizhen

    2013-03-01

    Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.

  4. Analog Microcontroller Model for an Energy Harvesting Round Counter

    DTIC Science & Technology

    2012-07-01

    densities representing the duration of ≥ for all scaled piezo ................................7 1 INTRODUCTION An accurate count...limited surface area available for mounting piezos on the gun system. Figure 1. Equivalent circuit model for a piezoelectric transducer...circuit model for the linear I-V relationships is parallel combination of six stages, each of which is comprised of a series combination of a resistor , DC

  5. FoSSI: the family of simplified solver interfaces for the rapid development of parallel numerical atmosphere and ocean models

    NASA Astrophysics Data System (ADS)

    Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike

    The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.

  6. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

    PubMed

    Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

    2015-07-01

    GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.

  7. Real-world hydrologic assessment of a fully-distributed hydrological model in a parallel computing environment

    NASA Astrophysics Data System (ADS)

    Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.

    2011-10-01

    SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.

  8. A combined PHREEQC-2/parallel fracture model for the simulation of laminar/non-laminar flow and contaminant transport with reactions

    NASA Astrophysics Data System (ADS)

    Masciopinto, Costantino; Volpe, Angela; Palmiotta, Domenico; Cherubini, Claudia

    2010-09-01

    A combination of a parallel fracture model with the PHREEQC-2 geochemical model was developed to simulate sequential flow and chemical transport with reactions in fractured media where both laminar and turbulent flows occur. The integration of non-laminar flow resistances in one model produced relevant effects on water flow velocities, thus improving model prediction capabilities on contaminant transport. The proposed conceptual model consists of 3D rock-blocks, separated by horizontal bedding plane fractures with variable apertures. Particle tracking solved the transport equations for conservative compounds and provided input for PHREEQC-2. For each cluster of contaminant pathways, PHREEQC-2 determined the concentration for mass-transfer, sorption/desorption, ion exchange, mineral dissolution/precipitation and biodegradation, under kinetically controlled reactive processes of equilibrated chemical species. Field tests have been performed for the code verification. As an example, the combined model has been applied to a contaminated fractured aquifer of southern Italy in order to simulate the phenol transport. The code correctly fitted the field available data and also predicted a possible rapid depletion of phenols as a result of an increased biodegradation rate induced by a simulated artificial injection of nitrates, upgradient to the sources.

  9. Flexible language constructs for large parallel programs

    NASA Technical Reports Server (NTRS)

    Rosing, Matthew; Schnabel, Robert

    1993-01-01

    The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.

  10. Parallel processing in finite element structural analysis

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K.

    1987-01-01

    A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).

  11. Power combining in an array of microwave power rectifiers

    NASA Technical Reports Server (NTRS)

    Gutmann, R. J.; Borrego, J. M.

    1979-01-01

    This work analyzes the resultant efficiency degradation when identical rectifiers operate at different RF power levels as caused by the power beam taper. Both a closed-form analytical circuit model and a detailed computer-simulation model are used to obtain the output dc load line of the rectifier. The efficiency degradation is nearly identical with series and parallel combining, and the closed-form analytical model provides results which are similar to the detailed computer-simulation model.

  12. Biocellion: accelerating computer simulation of multicellular biological system models

    PubMed Central

    Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

    2014-01-01

    Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572

  13. Efficient partitioning and assignment on programs for multiprocessor execution

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1993-01-01

    The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.

  14. Decoupling Principle Analysis and Development of a Parallel Three-Dimensional Force Sensor

    PubMed Central

    Zhao, Yanzhi; Jiao, Leihao; Weng, Dacheng; Zhang, Dan; Zheng, Rencheng

    2016-01-01

    In the development of the multi-dimensional force sensor, dimension coupling is the ubiquitous factor restricting the improvement of the measurement accuracy. To effectively reduce the influence of dimension coupling on the parallel multi-dimensional force sensor, a novel parallel three-dimensional force sensor is proposed using a mechanical decoupling principle, and the influence of the friction on dimension coupling is effectively reduced by making the friction rolling instead of sliding friction. In this paper, the mathematical model is established by combining with the structure model of the parallel three-dimensional force sensor, and the modeling and analysis of mechanical decoupling are carried out. The coupling degree (ε) of the designed sensor is defined and calculated, and the calculation results show that the mechanical decoupling parallel structure of the sensor possesses good decoupling performance. A prototype of the parallel three-dimensional force sensor was developed, and FEM analysis was carried out. The load calibration and data acquisition experiment system are built, and then calibration experiments were done. According to the calibration experiments, the measurement accuracy is less than 2.86% and the coupling accuracy is less than 3.02%. The experimental results show that the sensor system possesses high measuring accuracy, which provides a basis for the applied research of the parallel multi-dimensional force sensor. PMID:27649194

  15. When the lowest energy does not induce native structures: parallel minimization of multi-energy values by hybridizing searching intelligences.

    PubMed

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise.

  16. When the Lowest Energy Does Not Induce Native Structures: Parallel Minimization of Multi-Energy Values by Hybridizing Searching Intelligences

    PubMed Central

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708

  17. Increased performance in the short-term water demand forecasting through the use of a parallel adaptive weighting strategy

    NASA Astrophysics Data System (ADS)

    Sardinha-Lourenço, A.; Andrade-Campos, A.; Antunes, A.; Oliveira, M. S.

    2018-03-01

    Recent research on water demand short-term forecasting has shown that models using univariate time series based on historical data are useful and can be combined with other prediction methods to reduce errors. The behavior of water demands in drinking water distribution networks focuses on their repetitive nature and, under meteorological conditions and similar consumers, allows the development of a heuristic forecast model that, in turn, combined with other autoregressive models, can provide reliable forecasts. In this study, a parallel adaptive weighting strategy of water consumption forecast for the next 24-48 h, using univariate time series of potable water consumption, is proposed. Two Portuguese potable water distribution networks are used as case studies where the only input data are the consumption of water and the national calendar. For the development of the strategy, the Autoregressive Integrated Moving Average (ARIMA) method and a short-term forecast heuristic algorithm are used. Simulations with the model showed that, when using a parallel adaptive weighting strategy, the prediction error can be reduced by 15.96% and the average error by 9.20%. This reduction is important in the control and management of water supply systems. The proposed methodology can be extended to other forecast methods, especially when it comes to the availability of multiple forecast models.

  18. A portable MPI-based parallel vector template library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  19. A Portable MPI-Based Parallel Vector Template Library

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.

    1995-01-01

    This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.

  20. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations

    PubMed Central

    Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

    2015-01-01

    GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008

  1. SCaLeM: A Framework for Characterizing and Analyzing Execution Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chavarría-Miranda, Daniel; Manzano Franco, Joseph B.; Krishnamoorthy, Sriram

    2014-10-13

    As scalable parallel systems evolve towards more complex nodes with many-core architectures and larger trans-petascale & upcoming exascale deployments, there is a need to understand, characterize and quantify the underlying execution models being used on such systems. Execution models are a conceptual layer between applications & algorithms and the underlying parallel hardware and systems software on which those applications run. This paper presents the SCaLeM (Synchronization, Concurrency, Locality, Memory) framework for characterizing and execution models. SCaLeM consists of three basic elements: attributes, compositions and mapping of these compositions to abstract parallel systems. The fundamental Synchronization, Concurrency, Locality and Memory attributesmore » are used to characterize each execution model, while the combinations of those attributes in the form of compositions are used to describe the primitive operations of the execution model. The mapping of the execution model’s primitive operations described by compositions, to an underlying abstract parallel system can be evaluated quantitatively to determine its effectiveness. Finally, SCaLeM also enables the representation and analysis of applications in terms of execution models, for the purpose of evaluating the effectiveness of such mapping.« less

  2. Improved packing of protein side chains with parallel ant colonies

    PubMed Central

    2014-01-01

    Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms. PMID:25474164

  3. Control of a simulated arm using a novel combination of Cerebellar learning mechanisms

    NASA Technical Reports Server (NTRS)

    Assad, C.; Hartmann, M.; Paulin, M. G.

    2001-01-01

    We present a model of cerebellar cortex that combines two types of learning: feedforward predicitve association based on local Hebbian-type learning between granule cell ascending branch and parallel fiber inputs, and reinforcement learning with feedback error correction based on climbing fiber activity.

  4. Additive interactions between 1-methyl-1,2,3,4-tetrahydroisoquinoline and clobazam in the mouse maximal electroshock-induced tonic seizure model--an isobolographic analysis for parallel dose-response relationship curves.

    PubMed

    Andres-Mach, Marta; Haratym-Maj, Agnieszka; Zagaja, Mirosław; Luszczki, Jarogniew J

    2014-01-01

    The aim of this study was to characterize the anticonvulsant effect of 1-methyl-1,2,3,4-tetrahydroisoquinoline (1-MeTHIQ) in combination with clobazam (CLB) in the mouse maximal electroshock-induced seizure (MES) model. The anticonvulsant interaction profile between 1-MeTHIQ and CLB in the mouse MES model was determined using an isobolographic analysis for parallel dose-response relationship curves. Electroconvulsions were produced in albino Swiss mice by a current (sine wave, 25 mA, 500 V, 50 Hz, 0.2-second stimulus duration) delivered via auricular electrodes by a Hugo Sachs generator. There was an additive effect of the combination of 1-MeTHIQ with CLB (at the fixed ratios of 1:3, 1:1 and 3:1) in the mouse MES-induced tonic seizure model. The additive interaction of the combination of 1-MeTHIQ with CLB (at fixed-ratios of 1:3, 1:1 and 3:1) in the mouse MES model seems to be pharmacodynamic in nature and worth of considering in further clinical practice. © 2014 S. Karger AG, Basel.

  5. Performance evaluation of GPU parallelization, space-time adaptive algorithms, and their combination for simulating cardiac electrophysiology.

    PubMed

    Sachetto Oliveira, Rafael; Martins Rocha, Bernardo; Burgarelli, Denise; Meira, Wagner; Constantinides, Christakis; Weber Dos Santos, Rodrigo

    2018-02-01

    The use of computer models as a tool for the study and understanding of the complex phenomena of cardiac electrophysiology has attained increased importance nowadays. At the same time, the increased complexity of the biophysical processes translates into complex computational and mathematical models. To speed up cardiac simulations and to allow more precise and realistic uses, 2 different techniques have been traditionally exploited: parallel computing and sophisticated numerical methods. In this work, we combine a modern parallel computing technique based on multicore and graphics processing units (GPUs) and a sophisticated numerical method based on a new space-time adaptive algorithm. We evaluate each technique alone and in different combinations: multicore and GPU, multicore and GPU and space adaptivity, multicore and GPU and space adaptivity and time adaptivity. All the techniques and combinations were evaluated under different scenarios: 3D simulations on slabs, 3D simulations on a ventricular mouse mesh, ie, complex geometry, sinus-rhythm, and arrhythmic conditions. Our results suggest that multicore and GPU accelerate the simulations by an approximate factor of 33×, whereas the speedups attained by the space-time adaptive algorithms were approximately 48. Nevertheless, by combining all the techniques, we obtained speedups that ranged between 165 and 498. The tested methods were able to reduce the execution time of a simulation by more than 498× for a complex cellular model in a slab geometry and by 165× in a realistic heart geometry simulating spiral waves. The proposed methods will allow faster and more realistic simulations in a feasible time with no significant loss of accuracy. Copyright © 2017 John Wiley & Sons, Ltd.

  6. Blade row dynamic digital compressor program. Volume 1: J85 clean inlet flow and parallel compressor models

    NASA Technical Reports Server (NTRS)

    Tesch, W. A.; Steenken, W. G.

    1976-01-01

    The results are presented of a one-dimensional dynamic digital blade row compressor model study of a J85-13 engine operating with uniform and with circumferentially distorted inlet flow. Details of the geometry and the derived blade row characteristics used to simulate the clean inlet performance are given. A stability criterion based upon the self developing unsteady internal flows near surge provided an accurate determination of the clean inlet surge line. The basic model was modified to include an arbitrary extent multi-sector parallel compressor configuration for investigating 180 deg 1/rev total pressure, total temperature, and combined total pressure and total temperature distortions. The combined distortions included opposed, coincident, and 90 deg overlapped patterns. The predicted losses in surge pressure ratio matched the measured data trends at all speeds and gave accurate predictions at high corrected speeds where the slope of the speed lines approached the vertical.

  7. Biocellion: accelerating computer simulation of multicellular biological system models.

    PubMed

    Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

    2014-11-01

    Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. The fast and the slow of skilled bimanual rhythm production: parallel versus integrated timing.

    PubMed

    Krampe, R T; Kliegl, R; Mayr, U; Engbert, R; Vorberg, D

    2000-02-01

    Professional pianists performed 2 bimanual rhythms at a wide range of different tempos. The polyrhythmic task required the combination of 2 isochronous sequences (3 against 4) between the hands; in the syncopated rhythm task successive keystrokes formed intervals of identical (isochronous) durations. At slower tempos, pianists relied on integrated timing control merging successive intervals between the hands into a common reference frame. A timer-motor model is proposed based on the concepts of rate fluctuation and the distinction between target specification and timekeeper execution processes as a quantitative account of performance at slow tempos. At rapid rates expert pianists used hand-independent, parallel timing control. In alternative to a model based on a single central clock, findings support a model of flexible control structures with multiple timekeepers that can work in parallel to accommodate specific task constraints.

  9. Flexible Language Constructs for Large Parallel Programs

    DOE PAGES

    Rosing, Matt; Schnabel, Robert

    1994-01-01

    The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less

  10. A time-parallel approach to strong-constraint four-dimensional variational data assimilation

    NASA Astrophysics Data System (ADS)

    Rao, Vishwas; Sandu, Adrian

    2016-05-01

    A parallel-in-time algorithm based on an augmented Lagrangian approach is proposed to solve four-dimensional variational (4D-Var) data assimilation problems. The assimilation window is divided into multiple sub-intervals that allows parallelization of cost function and gradient computations. The solutions to the continuity equations across interval boundaries are added as constraints. The augmented Lagrangian approach leads to a different formulation of the variational data assimilation problem than the weakly constrained 4D-Var. A combination of serial and parallel 4D-Vars to increase performance is also explored. The methodology is illustrated on data assimilation problems involving the Lorenz-96 and the shallow water models.

  11. Multi-Objective and Multidisciplinary Design Optimisation (MDO) of UAV Systems using Hierarchical Asynchronous Parallel Evolutionary Algorithms

    DTIC Science & Technology

    2007-09-17

    been proposed; these include a combination of variable fidelity models, parallelisation strategies and hybridisation techniques (Coello, Veldhuizen et...Coello et al (Coello, Veldhuizen et al. 2002). 4.4.2 HIERARCHICAL POPULATION TOPOLOGY A hierarchical population topology, when integrated into...to hybrid parallel Multi-Objective Evolutionary Algorithms (pMOEA) (Cantu-Paz 2000; Veldhuizen , Zydallis et al. 2003); it uses a master slave

  12. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.

  13. Integrated Task And Data Parallel Programming: Language Design

    NASA Technical Reports Server (NTRS)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.

  14. Parallel evolution of image processing tools for multispectral imagery

    NASA Astrophysics Data System (ADS)

    Harvey, Neal R.; Brumby, Steven P.; Perkins, Simon J.; Porter, Reid B.; Theiler, James P.; Young, Aaron C.; Szymanski, John J.; Bloch, Jeffrey J.

    2000-11-01

    We describe the implementation and performance of a parallel, hybrid evolutionary-algorithm-based system, which optimizes image processing tools for feature-finding tasks in multi-spectral imagery (MSI) data sets. Our system uses an integrated spatio-spectral approach and is capable of combining suitably-registered data from different sensors. We investigate the speed-up obtained by parallelization of the evolutionary process via multiple processors (a workstation cluster) and develop a model for prediction of run-times for different numbers of processors. We demonstrate our system on Landsat Thematic Mapper MSI , covering the recent Cerro Grande fire at Los Alamos, NM, USA.

  15. Parallel medicinal chemistry approaches to selective HDAC1/HDAC2 inhibitor (SHI-1:2) optimization.

    PubMed

    Kattar, Solomon D; Surdi, Laura M; Zabierek, Anna; Methot, Joey L; Middleton, Richard E; Hughes, Bethany; Szewczak, Alexander A; Dahlberg, William K; Kral, Astrid M; Ozerova, Nicole; Fleming, Judith C; Wang, Hongmei; Secrist, Paul; Harsch, Andreas; Hamill, Julie E; Cruz, Jonathan C; Kenific, Candia M; Chenard, Melissa; Miller, Thomas A; Berk, Scott C; Tempest, Paul

    2009-02-15

    The successful application of both solid and solution phase library synthesis, combined with tight integration into the medicinal chemistry effort, resulted in the efficient optimization of a novel structural series of selective HDAC1/HDAC2 inhibitors by the MRL-Boston Parallel Medicinal Chemistry group. An initial lead from a small parallel library was found to be potent and selective in biochemical assays. Advanced compounds were the culmination of iterative library design and possess excellent biochemical and cellular potency, as well as acceptable PK and efficacy in animal models.

  16. Demonstration of an optoelectronic interconnect architecture for a parallel modified signed-digit adder and subtracter

    NASA Astrophysics Data System (ADS)

    Sun, Degui; Wang, Na-Xin; He, Li-Ming; Weng, Zhao-Heng; Wang, Daheng; Chen, Ray T.

    1996-06-01

    A space-position-logic-encoding scheme is proposed and demonstrated. This encoding scheme not only makes the best use of the convenience of binary logic operation, but is also suitable for the trinary property of modified signed- digit (MSD) numbers. Based on the space-position-logic-encoding scheme, a fully parallel modified signed-digit adder and subtractor is built using optoelectronic switch technologies in conjunction with fiber-multistage 3D optoelectronic interconnects. Thus an effective combination of a parallel algorithm and a parallel architecture is implemented. In addition, the performance of the optoelectronic switches used in this system is experimentally studied and verified. Both the 3-bit experimental model and the experimental results of a parallel addition and a parallel subtraction are provided and discussed. Finally, the speed ratio between the MSD adder and binary adders is discussed and the advantage of the MSD in operating speed is demonstrated.

  17. A distributed, dynamic, parallel computational model: the role of noise in velocity storage

    PubMed Central

    Merfeld, Daniel M.

    2012-01-01

    Networks of neurons perform complex calculations using distributed, parallel computation, including dynamic “real-time” calculations required for motion control. The brain must combine sensory signals to estimate the motion of body parts using imperfect information from noisy neurons. Models and experiments suggest that the brain sometimes optimally minimizes the influence of noise, although it remains unclear when and precisely how neurons perform such optimal computations. To investigate, we created a model of velocity storage based on a relatively new technique–“particle filtering”–that is both distributed and parallel. It extends existing observer and Kalman filter models of vestibular processing by simulating the observer model many times in parallel with noise added. During simulation, the variance of the particles defining the estimator state is used to compute the particle filter gain. We applied our model to estimate one-dimensional angular velocity during yaw rotation, which yielded estimates for the velocity storage time constant, afferent noise, and perceptual noise that matched experimental data. We also found that the velocity storage time constant was Bayesian optimal by comparing the estimate of our particle filter with the estimate of the Kalman filter, which is optimal. The particle filter demonstrated a reduced velocity storage time constant when afferent noise increased, which mimics what is known about aminoglycoside ablation of semicircular canal hair cells. This model helps bridge the gap between parallel distributed neural computation and systems-level behavioral responses like the vestibuloocular response and perception. PMID:22514288

  18. Idealized model of polar cap currents, fields, and auroras

    NASA Technical Reports Server (NTRS)

    Cornwall, J. M.

    1985-01-01

    During periods of northward Bz, the electric field applied to the magnetosphere is generally opposite to that occurring during southward Bz and complicated patterns of convection result, showing some features reversed in comparison with the southward Bz case. A study is conducted of a simple generalization of early work on idealized convection models, which allows for coexistence of sunward convection over the central polar cap and antisunward convection elsewhere in the cap. The present model, valid for By approximately 0, has a four-cell convection pattern and is based on the combination of ionospheric current conservation with a relation between parallel auroral currents and parallel potential drops. Global magnetospheric issues involving, e.g., reconnection are not considered. The central result of this paper is an expression giving the parallel potential drop for polar cap auroras (with By approximately 0) in terms of the polar cap convection field profile.

  19. Parallel pivoting combined with parallel reduction

    NASA Technical Reports Server (NTRS)

    Alaghband, Gita

    1987-01-01

    Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.

  20. Oscillations and chaos in neural networks: an exactly solvable model.

    PubMed Central

    Wang, L P; Pichler, E E; Ross, J

    1990-01-01

    We consider a randomly diluted higher-order network with noise, consisting of McCulloch-Pitts neurons that interact by Hebbian-type connections. For this model, exact dynamical equations are derived and solved for both parallel and random sequential updating algorithms. For parallel dynamics, we find a rich spectrum of different behaviors including static retrieving and oscillatory and chaotic phenomena in different parts of the parameter space. The bifurcation parameters include first- and second-order neuronal interaction coefficients and a rescaled noise level, which represents the combined effects of the random synaptic dilution, interference between stored patterns, and additional background noise. We show that a marked difference in terms of the occurrence of oscillations or chaos exists between neural networks with parallel and random sequential dynamics. Images PMID:2251287

  1. A high-speed linear algebra library with automatic parallelism

    NASA Technical Reports Server (NTRS)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  2. Applications of Parallel Computation in Micro-Mechanics and Finite Element Method

    NASA Technical Reports Server (NTRS)

    Tan, Hui-Qian

    1996-01-01

    This project discusses the application of parallel computations related with respect to material analyses. Briefly speaking, we analyze some kind of material by elements computations. We call an element a cell here. A cell is divided into a number of subelements called subcells and all subcells in a cell have the identical structure. The detailed structure will be given later in this paper. It is obvious that the problem is "well-structured". SIMD machine would be a better choice. In this paper we try to look into the potentials of SIMD machine in dealing with finite element computation by developing appropriate algorithms on MasPar, a SIMD parallel machine. In section 2, the architecture of MasPar will be discussed. A brief review of the parallel programming language MPL also is given in that section. In section 3, some general parallel algorithms which might be useful to the project will be proposed. And, combining with the algorithms, some features of MPL will be discussed in more detail. In section 4, the computational structure of cell/subcell model will be given. The idea of designing the parallel algorithm for the model will be demonstrated. Finally in section 5, a summary will be given.

  3. Time-dependent Perpendicular Transport of Energetic Particles for Different Turbulence Configurations and Parallel Transport Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lasuik, J.; Shalchi, A., E-mail: andreasm4@yahoo.com

    Recently, a new theory for the transport of energetic particles across a mean magnetic field was presented. Compared to other nonlinear theories the new approach has the advantage that it provides a full time-dependent description of the transport. Furthermore, a diffusion approximation is no longer part of that theory. The purpose of this paper is to combine this new approach with a time-dependent model for parallel transport and different turbulence configurations in order to explore the parameter regimes for which we get ballistic transport, compound subdiffusion, and normal Markovian diffusion.

  4. Development of a parallel demodulation system used for extrinsic Fabry-Perot interferometer and fiber Bragg grating sensors.

    PubMed

    Jiang, Junfeng; Liu, Tiegen; Zhang, Yimo; Liu, Lina; Zha, Ying; Zhang, Fan; Wang, Yunxin; Long, Pin

    2006-01-20

    A parallel demodulation system for extrinsic Fabry-Perot interferometer (EFPI) and fiber Bragg grating (FBG) sensors is presented, which is based on a Michelson interferometer and combines the methods of low-coherence interference and a Fourier-transform spectrum. The parallel demodulation theory is modeled with Fourier-transform spectrum technology, and a signal separation method with an EFPI and FBG is proposed. The design of an optical path difference scanning and sampling method without a reference light is described. Experiments show that the parallel demodulation system has good spectrum demodulation and low-coherence interference demodulation performance. It can realize simultaneous strain and temperature measurements while keeping the whole system configuration less complex.

  5. Roofline model toolkit: A practical tool for architectural and program analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Yu Jung; Williams, Samuel; Van Straalen, Brian

    We present preliminary results of the Roofline Toolkit for multicore, many core, and accelerated architectures. This paper focuses on the processor architecture characterization engine, a collection of portable instrumented micro benchmarks implemented with Message Passing Interface (MPI), and OpenMP used to express thread-level parallelism. These benchmarks are specialized to quantify the behavior of different architectural features. Compared to previous work on performance characterization, these microbenchmarks focus on capturing the performance of each level of the memory hierarchy, along with thread-level parallelism, instruction-level parallelism and explicit SIMD parallelism, measured in the context of the compilers and run-time environments. We also measuremore » sustained PCIe throughput with four GPU memory managed mechanisms. By combining results from the architecture characterization with the Roofline model based solely on architectural specifications, this work offers insights for performance prediction of current and future architectures and their software systems. To that end, we instrument three applications and plot their resultant performance on the corresponding Roofline model when run on a Blue Gene/Q architecture.« less

  6. Coding for Parallel Links to Maximize the Expected Value of Decodable Messages

    NASA Technical Reports Server (NTRS)

    Klimesh, Matthew A.; Chang, Christopher S.

    2011-01-01

    When multiple parallel communication links are available, it is useful to consider link-utilization strategies that provide tradeoffs between reliability and throughput. Interesting cases arise when there are three or more available links. Under the model considered, the links have known probabilities of being in working order, and each link has a known capacity. The sender has a number of messages to send to the receiver. Each message has a size and a value (i.e., a worth or priority). Messages may be divided into pieces arbitrarily, and the value of each piece is proportional to its size. The goal is to choose combinations of messages to send on the links so that the expected value of the messages decodable by the receiver is maximized. There are three parts to the innovation: (1) Applying coding to parallel links under the model; (2) Linear programming formulation for finding the optimal combinations of messages to send on the links; and (3) Algorithms for assisting in finding feasible combinations of messages, as support for the linear programming formulation. There are similarities between this innovation and methods developed in the field of network coding. However, network coding has generally been concerned with either maximizing throughput in a fixed network, or robust communication of a fixed volume of data. In contrast, under this model, the throughput is expected to vary depending on the state of the network. Examples of error-correcting codes that are useful under this model but which are not needed under previous models have been found. This model can represent either a one-shot communication attempt, or a stream of communications. Under the one-shot model, message sizes and link capacities are quantities of information (e.g., measured in bits), while under the communications stream model, message sizes and link capacities are information rates (e.g., measured in bits/second). This work has the potential to increase the value of data returned from spacecraft under certain conditions.

  7. Mechanical Behavior of Collagen-Fibrin Co-Gels Reflects Transition From Series to Parallel Interactions With Increasing Collagen Content

    PubMed Central

    Lai, Victor K.; Lake, Spencer P.; Frey, Christina R.; Tranquillo, Robert T.; Barocas, Victor H.

    2012-01-01

    Fibrin and collagen, biopolymers occurring naturally in the body, are biomaterials commonly-used as scaffolds for tissue engineering. How collagen and fibrin interact to confer macroscopic mechanical properties in collagen-fibrin composite systems remains poorly understood. In this study, we formulated collagen-fibrin co-gels at different collagen-tofibrin ratios to observe changes in the overall mechanical behavior and microstructure. A modeling framework of a two-network system was developed by modifying our micro-scale model, considering two forms of interaction between the networks: (a) two interpenetrating but noninteracting networks (“parallel”), and (b) a single network consisting of randomly alternating collagen and fibrin fibrils (“series”). Mechanical testing of our gels show that collagen-fibrin co-gels exhibit intermediate properties (UTS, strain at failure, tangent modulus) compared to those of pure collagen and fibrin. The comparison with model predictions show that the parallel and series model cases provide upper and lower bounds, respectively, for the experimental data, suggesting that a combination of such interactions exists between the collagen and fibrin in co-gels. A transition from the series model to the parallel model occurs with increasing collagen content, with the series model best describing predominantly fibrin co-gels, and the parallel model best describing predominantly collagen co-gels. PMID:22482659

  8. Parallel numerical modeling of hybrid-dimensional compositional non-isothermal Darcy flows in fractured porous media

    NASA Astrophysics Data System (ADS)

    Xing, F.; Masson, R.; Lopez, S.

    2017-09-01

    This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.

  9. Block-Parallel Data Analysis with DIY2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morozov, Dmitriy; Peterka, Tom

    DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less

  10. Evaluation of parallel reduction strategies for fusion of sensory information from a robot team

    NASA Astrophysics Data System (ADS)

    Lyons, Damian M.; Leroy, Joseph

    2015-05-01

    The advantage of using a team of robots to search or to map an area is that by navigating the robots to different parts of the area, searching or mapping can be completed more quickly. A crucial aspect of the problem is the combination, or fusion, of data from team members to generate an integrated model of the search/mapping area. In prior work we looked at the issue of removing mutual robots views from an integrated point cloud model built from laser and stereo sensors, leading to a cleaner and more accurate model. This paper addresses a further challenge: Even with mutual views removed, the stereo data from a team of robots can quickly swamp a WiFi connection. This paper proposes and evaluates a communication and fusion approach based on the parallel reduction operation, where data is combined in a series of steps of increasing subsets of the team. Eight different strategies for selecting the subsets are evaluated for bandwidth requirements using three robot missions, each carried out with teams of four Pioneer 3-AT robots. Our results indicate that selecting groups to combine based on similar pose but distant location yields the best results.

  11. The Processing of Somatosensory Information Shifts from an Early Parallel into a Serial Processing Mode: A Combined fMRI/MEG Study.

    PubMed

    Klingner, Carsten M; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W

    2016-01-01

    The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI.

  12. The Processing of Somatosensory Information Shifts from an Early Parallel into a Serial Processing Mode: A Combined fMRI/MEG Study

    PubMed Central

    Klingner, Carsten M.; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W.

    2016-01-01

    The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI. PMID:28066197

  13. A Data Parallel Multizone Navier-Stokes Code

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

    1995-01-01

    We have developed a data parallel multizone compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the "chimera" approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. The design choices can be summarized as: 1. finite differences on structured grids; 2. implicit time-stepping with either distributed solves or data motion and local solves; 3. sequential stepping through multiple zones with interzone data transfer via a distributed data structure. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran (HPF). One interesting feature is the issue of turbulence modeling, where the architecture of a parallel machine makes the use of an algebraic turbulence model awkward, whereas models based on transport equations are more natural. We will present some performance figures for the code on the CM-5, and consider the issues involved in transitioning the code to HPF for portability to other parallel platforms.

  14. Anti-parallel versus Component Reconnection at the Earth Magnetopause

    NASA Astrophysics Data System (ADS)

    Trattner, K. J.; Burch, J. L.; Ergun, R.; Eriksson, S.; Fuselier, S. A.; Gomez, R. G.; Giles, B. L.; Steven, P. M.; Strangeway, R. J.; Wilder, F. D.

    2017-12-01

    Magnetic reconnection at the Earth's magnetopause is discussed and has been observed as anti-parallel and component reconnection. While anti-parallel reconnection occurs between magnetic field lines of (ideally) exactly opposite polarity, component reconnection (also known as the tilted X-line model) predicts the location of the reconnection line to be anchored at the sub-solar point and extend continuously along the dayside magnetopause, while the ratio of the IMF By/Bz component determines the tilt of the X-line relative to the equatorial plane.A reconnection location prediction model known as the Maximum Magnetic Shear Model combines these two scenarios. The model predicts that during dominant IMF By conditions, magnetic reconnection occurs along an extended line across the dayside magnetopause but generally not through the sub-solar point (as predicted in the original tilted X-line model). Rather, the line follows the ridge of maximum magnetic shear across the dayside magnetopause. In contrast, for dominant IMF Bz (155° < tan-1(By/Bz) < 205°) or dominant Bx (|Bx|/B > 0.7) conditions, the reconnection location bifurcates and traces to high-latitudes, in close agreement with the anti-parallel reconnection scenario, and does not cross the dayside magnetopause as a single tilted reconnection line. Using observations from the Magnetospheric MultiScale missions during a magnetopause crossing when the IMF rotated from an dominate IMF BZ to a dominant IMF BY field we will investigate when the transition between the anti-parallel and tilted X-line scenarios occurs.

  15. PaFlexPepDock: parallel ab-initio docking of peptides onto their receptors with full flexibility based on Rosetta.

    PubMed

    Li, Haiou; Lu, Liyao; Chen, Rong; Quan, Lijun; Xia, Xiaoyan; Lü, Qiang

    2014-01-01

    Structural information related to protein-peptide complexes can be very useful for novel drug discovery and design. The computational docking of protein and peptide can supplement the structural information available on protein-peptide interactions explored by experimental ways. Protein-peptide docking of this paper can be described as three processes that occur in parallel: ab-initio peptide folding, peptide docking with its receptor, and refinement of some flexible areas of the receptor as the peptide is approaching. Several existing methods have been used to sample the degrees of freedom in the three processes, which are usually triggered in an organized sequential scheme. In this paper, we proposed a parallel approach that combines all the three processes during the docking of a folding peptide with a flexible receptor. This approach mimics the actual protein-peptide docking process in parallel way, and is expected to deliver better performance than sequential approaches. We used 22 unbound protein-peptide docking examples to evaluate our method. Our analysis of the results showed that the explicit refinement of the flexible areas of the receptor facilitated more accurate modeling of the interfaces of the complexes, while combining all of the moves in parallel helped the constructing of energy funnels for predictions.

  16. Xyce Parallel Electronic Simulator : users' guide, version 2.0.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont

    2004-06-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for allmore » numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce These input formats include standard analytical models, behavioral models look-up Parallel Electronic Simulator is designed to support a variety of device model inputs. tables, and mesh-level PDE device models. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important feature of Xyce is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Ultimately, these capabilities are migrated to end users.« less

  17. LSPRAY-IV: A Lagrangian Spray Module

    NASA Technical Reports Server (NTRS)

    Raju, M. S.

    2012-01-01

    LSPRAY-IV is a Lagrangian spray solver developed for application with parallel computing and unstructured grids. It is designed to be massively parallel and could easily be coupled with any existing gas-phase flow and/or Monte Carlo Probability Density Function (PDF) solvers. The solver accommodates the use of an unstructured mesh with mixed elements of either triangular, quadrilateral, and/or tetrahedral type for the gas flow grid representation. It is mainly designed to predict the flow, thermal and transport properties of a rapidly vaporizing spray. Some important research areas covered as a part of the code development are: (1) the extension of combined CFD/scalar-Monte- Carlo-PDF method to spray modeling, (2) the multi-component liquid spray modeling, and (3) the assessment of various atomization models used in spray calculations. The current version contains the extension to the modeling of superheated sprays. The manual provides the user with an understanding of various models involved in the spray formulation, its code structure and solution algorithm, and various other issues related to parallelization and its coupling with other solvers.

  18. Combining Phase Identification and Statistic Modeling for Automated Parallel Benchmark Generation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Ye; Ma, Xiaosong; Liu, Qing Gary

    2015-01-01

    Parallel application benchmarks are indispensable for evaluating/optimizing HPC software and hardware. However, it is very challenging and costly to obtain high-fidelity benchmarks reflecting the scale and complexity of state-of-the-art parallel applications. Hand-extracted synthetic benchmarks are time-and labor-intensive to create. Real applications themselves, while offering most accurate performance evaluation, are expensive to compile, port, reconfigure, and often plainly inaccessible due to security or ownership concerns. This work contributes APPRIME, a novel tool for trace-based automatic parallel benchmark generation. Taking as input standard communication-I/O traces of an application's execution, it couples accurate automatic phase identification with statistical regeneration of event parameters tomore » create compact, portable, and to some degree reconfigurable parallel application benchmarks. Experiments with four NAS Parallel Benchmarks (NPB) and three real scientific simulation codes confirm the fidelity of APPRIME benchmarks. They retain the original applications' performance characteristics, in particular the relative performance across platforms.« less

  19. Heart Fibrillation and Parallel Supercomputers

    NASA Technical Reports Server (NTRS)

    Kogan, B. Y.; Karplus, W. J.; Chudin, E. E.

    1997-01-01

    The Luo and Rudy 3 cardiac cell mathematical model is implemented on the parallel supercomputer CRAY - T3D. The splitting algorithm combined with variable time step and an explicit method of integration provide reasonable solution times and almost perfect scaling for rectilinear wave propagation. The computer simulation makes it possible to observe new phenomena: the break-up of spiral waves caused by intracellular calcium and dynamics and the non-uniformity of the calcium distribution in space during the onset of the spiral wave.

  20. Parareal in time 3D numerical solver for the LWR Benchmark neutron diffusion transient model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baudron, Anne-Marie, E-mail: anne-marie.baudron@cea.fr; CEA-DRN/DMT/SERMA, CEN-Saclay, 91191 Gif sur Yvette Cedex; Lautard, Jean-Jacques, E-mail: jean-jacques.lautard@cea.fr

    2014-12-15

    In this paper we present a time-parallel algorithm for the 3D neutrons calculation of a transient model in a nuclear reactor core. The neutrons calculation consists in numerically solving the time dependent diffusion approximation equation, which is a simplified transport equation. The numerical resolution is done with finite elements method based on a tetrahedral meshing of the computational domain, representing the reactor core, and time discretization is achieved using a θ-scheme. The transient model presents moving control rods during the time of the reaction. Therefore, cross-sections (piecewise constants) are taken into account by interpolations with respect to the velocity ofmore » the control rods. The parallelism across the time is achieved by an adequate use of the parareal in time algorithm to the handled problem. This parallel method is a predictor corrector scheme that iteratively combines the use of two kinds of numerical propagators, one coarse and one fine. Our method is made efficient by means of a coarse solver defined with large time step and fixed position control rods model, while the fine propagator is assumed to be a high order numerical approximation of the full model. The parallel implementation of our method provides a good scalability of the algorithm. Numerical results show the efficiency of the parareal method on large light water reactor transient model corresponding to the Langenbuch–Maurer–Werner benchmark.« less

  1. Rigid-flexible coupling dynamic modeling and investigation of a redundantly actuated parallel manipulator with multiple actuation modes

    NASA Astrophysics Data System (ADS)

    Liang, Dong; Song, Yimin; Sun, Tao; Jin, Xueying

    2017-09-01

    A systematic dynamic modeling methodology is presented to develop the rigid-flexible coupling dynamic model (RFDM) of an emerging flexible parallel manipulator with multiple actuation modes. By virtue of assumed mode method, the general dynamic model of an arbitrary flexible body with any number of lumped parameters is derived in an explicit closed form, which possesses the modular characteristic. Then the completely dynamic model of system is formulated based on the flexible multi-body dynamics (FMD) theory and the augmented Lagrangian multipliers method. An approach of combining the Udwadia-Kalaba formulation with the hybrid TR-BDF2 numerical algorithm is proposed to address the nonlinear RFDM. Two simulation cases are performed to investigate the dynamic performance of the manipulator with different actuation modes. The results indicate that the redundant actuation modes can effectively attenuate vibration and guarantee higher dynamic performance compared to the traditional non-redundant actuation modes. Finally, a virtual prototype model is developed to demonstrate the validity of the presented RFDM. The systematic methodology proposed in this study can be conveniently extended for the dynamic modeling and controller design of other planar flexible parallel manipulators, especially the emerging ones with multiple actuation modes.

  2. Simulating spin models on GPU

    NASA Astrophysics Data System (ADS)

    Weigel, Martin

    2011-09-01

    Over the last couple of years it has been realized that the vast computational power of graphics processing units (GPUs) could be harvested for purposes other than the video game industry. This power, which at least nominally exceeds that of current CPUs by large factors, results from the relative simplicity of the GPU architectures as compared to CPUs, combined with a large number of parallel processing units on a single chip. To benefit from this setup for general computing purposes, the problems at hand need to be prepared in a way to profit from the inherent parallelism and hierarchical structure of memory accesses. In this contribution I discuss the performance potential for simulating spin models, such as the Ising model, on GPU as compared to conventional simulations on CPU.

  3. Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perumalla, Kalyan S; Seal, Sudip K

    2011-01-01

    In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallelmore » execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.« less

  4. Multiscale Multilevel Approach to Solution of Nanotechnology Problems

    NASA Astrophysics Data System (ADS)

    Polyakov, Sergey; Podryga, Viktoriia

    2018-02-01

    The paper is devoted to a multiscale multilevel approach for the solution of nanotechnology problems on supercomputer systems. The approach uses the combination of continuum mechanics models and the Newton dynamics for individual particles. This combination includes three scale levels: macroscopic, mesoscopic and microscopic. For gas-metal technical systems the following models are used. The quasihydrodynamic system of equations is used as a mathematical model at the macrolevel for gas and solid states. The system of Newton equations is used as a mathematical model at the mesoand microlevels; it is written for nanoparticles of the medium and larger particles moving in the medium. The numerical implementation of the approach is based on the method of splitting into physical processes. The quasihydrodynamic equations are solved by the finite volume method on grids of different types. The Newton equations of motion are solved by Verlet integration in each cell of the grid independently or in groups of connected cells. In the framework of the general methodology, four classes of algorithms and methods of their parallelization are provided. The parallelization uses the principles of geometric parallelism and the efficient partitioning of the computational domain. A special dynamic algorithm is used for load balancing the solvers. The testing of the developed approach was made by the example of the nitrogen outflow from a balloon with high pressure to a vacuum chamber through a micronozzle and a microchannel. The obtained results confirm the high efficiency of the developed methodology.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jain, Atul K.

    The overall objectives of this DOE funded project is to combine scientific and computational challenges in climate modeling by expanding our understanding of the biogeophysical-biogeochemical processes and their interactions in the northern high latitudes (NHLs) using an earth system modeling (ESM) approach, and by adopting an adaptive parallel runtime system in an ESM to achieve efficient and scalable climate simulations through improved load balancing algorithms.

  6. Numerical aspects and implementation of a two-layer zonal wall model for LES of compressible turbulent flows on unstructured meshes

    NASA Astrophysics Data System (ADS)

    Park, George Ilhwan; Moin, Parviz

    2016-01-01

    This paper focuses on numerical and practical aspects associated with a parallel implementation of a two-layer zonal wall model for large-eddy simulation (LES) of compressible wall-bounded turbulent flows on unstructured meshes. A zonal wall model based on the solution of unsteady three-dimensional Reynolds-averaged Navier-Stokes (RANS) equations on a separate near-wall grid is implemented in an unstructured, cell-centered finite-volume LES solver. The main challenge in its implementation is to couple two parallel, unstructured flow solvers for efficient boundary data communication and simultaneous time integrations. A coupling strategy with good load balancing and low processors underutilization is identified. Face mapping and interpolation procedures at the coupling interface are explained in detail. The method of manufactured solution is used for verifying the correct implementation of solver coupling, and parallel performance of the combined wall-modeled LES (WMLES) solver is investigated. The method has successfully been applied to several attached and separated flows, including a transitional flow over a flat plate and a separated flow over an airfoil at an angle of attack.

  7. SKIRT: Hybrid parallelization of radiative transfer simulations

    NASA Astrophysics Data System (ADS)

    Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.

    2017-07-01

    We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.

  8. Discovery of Antibiotics-derived Polymers for Gene Delivery using Combinatorial Synthesis and Cheminformatics Modeling

    PubMed Central

    Potta, Thrimoorthy; Zhen, Zhuo; Grandhi, Taraka Sai Pavan; Christensen, Matthew D.; Ramos, James; Breneman, Curt M.; Rege, Kaushal

    2014-01-01

    We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure-Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The QSAR model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based QSAR models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology. PMID:24331709

  9. Development of an Algorithm for Automatic Analysis of the Impedance Spectrum Based on a Measurement Model

    NASA Astrophysics Data System (ADS)

    Kobayashi, Kiyoshi; Suzuki, Tohru S.

    2018-03-01

    A new algorithm for the automatic estimation of an equivalent circuit and the subsequent parameter optimization is developed by combining the data-mining concept and complex least-squares method. In this algorithm, the program generates an initial equivalent-circuit model based on the sampling data and then attempts to optimize the parameters. The basic hypothesis is that the measured impedance spectrum can be reproduced by the sum of the partial-impedance spectra presented by the resistor, inductor, resistor connected in parallel to a capacitor, and resistor connected in parallel to an inductor. The adequacy of the model is determined by using a simple artificial-intelligence function, which is applied to the output function of the Levenberg-Marquardt module. From the iteration of model modifications, the program finds an adequate equivalent-circuit model without any user input to the equivalent-circuit model.

  10. Finite Element Analysis and Biomechanical Comparison of Short Posterior Spinal Instrumentation with Divergent Bridge Construct versus Parallel Tension Band Construct for Thoracolumbar Spine Fractures

    PubMed Central

    Ouellet, Jean A.; Richards, Corey; Sardar, Zeeshan M.; Giannitsios, Demetri; Noiseux, Nicholas; Strydom, Willem S.; Reindl, Rudy; Jarzem, Peter; Arlet, Vincent; Steffen, Thomas

    2013-01-01

    The ideal treatment for unstable thoracolumbar fractures remains controversial with posterior reduction and stabilization, anterior reduction and stabilization, combined posterior and anterior reduction and stabilization, and even nonoperative management advocated. Short segment posterior osteosynthesis of these fractures has less comorbidities compared with the other operative approaches but settles into kyphosis over time. Biomechanical comparison of the divergent bridge construct versus the parallel tension band construct was performed for anteriorly destabilized T11–L1 spine segments using three different models: (1) finite element analysis (FEA), (2) a synthetic model, and (3) a human cadaveric model. Outcomes measured were construct stiffness and ultimate failure load. Our objective was to determine if the divergent pedicle screw bridge construct would provide more resistance to kyphotic deforming forces. All three modalities showed greater stiffness with the divergent bridge construct. The FEA calculated a stiffness of 21.6 N/m for the tension band construct versus 34.1 N/m for the divergent bridge construct. The synthetic model resulted in a mean stiffness of 17.3 N/m for parallel tension band versus 20.6 N/m for the divergent bridge (p = 0.03), whereas the cadaveric model had an average stiffness of 15.2 N/m in the parallel tension band compared with 18.4 N/m for the divergent bridge (p = 0.02). Ultimate failure load with the cadaveric model was found to be 622 N for the divergent bridge construct versus 419 N (p = 0.15) for the parallel tension band construct. This study confirms our clinical experience that the short posterior divergent bridge construct provides greater stiffness for the management of unstable thoracolumbar fractures. PMID:24436856

  11. Longitudinal train dynamics: an overview

    NASA Astrophysics Data System (ADS)

    Wu, Qing; Spiryagin, Maksym; Cole, Colin

    2016-12-01

    This paper discusses the evolution of longitudinal train dynamics (LTD) simulations, which covers numerical solvers, vehicle connection systems, air brake systems, wagon dumper systems and locomotives, resistance forces and gravitational components, vehicle in-train instabilities, and computing schemes. A number of potential research topics are suggested, such as modelling of friction, polymer, and transition characteristics for vehicle connection simulations, studies of wagon dumping operations, proper modelling of vehicle in-train instabilities, and computing schemes for LTD simulations. Evidence shows that LTD simulations have evolved with computing capabilities. Currently, advanced component models that directly describe the working principles of the operation of air brake systems, vehicle connection systems, and traction systems are available. Parallel computing is a good solution to combine and simulate all these advanced models. Parallel computing can also be used to conduct three-dimensional long train dynamics simulations.

  12. Evaluation of stability of osteosynthesis with K-wires on an artificial model of tibial malleolus fracture.

    PubMed

    Bumči, Igor; Vlahović, Tomislav; Jurić, Filip; Žganjer, Mirko; Miličić, Gordana; Wolf, Hinko; Antabak, Anko

    2015-11-01

    Paediatric ankle fractures comprise approximately 4% of all paediatric fractures and 30% of all epiphyseal fractures. Integrity of the ankle "mortise", which consists of tibial and fibular malleoli, is significant for stability and function of the ankle joint. Tibial malleolar fractures are classified as SH III or SH IV intra-articular fractures and, in cases where the fragments are displaced, anatomic reposition and fixation is mandatory. Type SH III-IV fractures of the tibial malleolus are usually treated with open reduction and fixation with cannulated screws that are parallel to the physis. Two K-wires are used for temporary stabilisation of fragments during reduction. A third "guide wire" for the screw is then placed parallel with the physis. Considering the rules of mechanics, it is assumed that the two temporary pins with the additional third pin placed parallel to the physis create a strong triangle and thus provide strong fracture fixation. To prove this hypothesis, an experiment was conducted on the artificial models of the lower end of the tibia from the company "Sawbones". Each model had been sawn in a way that imitates the fracture of medial malleoli and then reattached with 1.8mm pins in various combinations. Prepared models were then tested for tensile and pressure forces. The least stable model was that in which the fractured pieces were attached with only two parallel pins. The most stable model comprised three pins, where two crossed pins were inserted in the opposite compact bone and the third pin was inserted through the epiphysis parallel with and below the growth plate. A potential method of choice for fixation of tibial malleolar fractures comprises three K-wires, where two crossed pins are placed in the opposite compact bone and one is parallel with the growth plate. The benefits associated with this method include shorter operating times and avoidance of a second operation for screw removal. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

    PubMed

    Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu

    2012-12-01

    Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.

  14. Impact of Media Richness and Flow on E-Learning Technology Acceptance

    ERIC Educational Resources Information Center

    Liu, Su-Houn; Liao, Hsiu-Li; Pratt, Jean A.

    2009-01-01

    Advances in e-learning technologies parallels a general increase in sophistication by computer users. The use of just one theory or model, such as the technology acceptance model, is no longer sufficient to study the intended use of e-learning systems. Rather, a combination of theories must be integrated in order to fully capture the complexity of…

  15. Parallel Electrochemical Treatment System and Application for Identifying Acid-Stable Oxygen Evolution Electrocatalysts

    DOE PAGES

    Jones, Ryan J. R.; Shinde, Aniketa; Guevarra, Dan; ...

    2015-01-05

    There are many energy technologies require electrochemical stability or preactivation of functional materials. Due to the long experiment duration required for either electrochemical preactivation or evaluation of operational stability, parallel screening is required to enable high throughput experimentation. We found that imposing operational electrochemical conditions to a library of materials in parallel creates several opportunities for experimental artifacts. We discuss the electrochemical engineering principles and operational parameters that mitigate artifacts int he parallel electrochemical treatment system. We also demonstrate the effects of resistive losses within the planar working electrode through a combination of finite element modeling and illustrative experiments. Operationmore » of the parallel-plate, membrane-separated electrochemical treatment system is demonstrated by exposing a composition library of mixed metal oxides to oxygen evolution conditions in 1M sulfuric acid for 2h. This application is particularly important because the electrolysis and photoelectrolysis of water are promising future energy technologies inhibited by the lack of highly active, acid-stable catalysts containing only earth abundant elements.« less

  16. Crustal origin of trench-parallel shear-wave fast polarizations in the Central Andes

    NASA Astrophysics Data System (ADS)

    Wölbern, I.; Löbl, U.; Rümpker, G.

    2014-04-01

    In this study, SKS and local S phases are analyzed to investigate variations of shear-wave splitting parameters along two dense seismic profiles across the central Andean Altiplano and Puna plateaus. In contrast to previous observations, the vast majority of the measurements reveal fast polarizations sub-parallel to the subduction direction of the Nazca plate with delay times between 0.3 and 1.2 s. Local phases show larger variations of fast polarizations and exhibit delay times ranging between 0.1 and 1.1 s. Two 70 km and 100 km wide sections along the Altiplano profile exhibit larger delay times and are characterized by fast polarizations oriented sub-parallel to major fault zones. Based on finite-difference wavefield calculations for anisotropic subduction zone models we demonstrate that the observations are best explained by fossil slab anisotropy with fast symmetry axes oriented sub-parallel to the slab movement in combination with a significant component of crustal anisotropy of nearly trench-parallel fast-axis orientation. From the modeling we exclude a sub-lithospheric origin of the observed strong anomalies due to the short-scale variations of the fast polarizations. Instead, our results indicate that anisotropy in the Central Andes generally reflects the direction of plate motion while the observed trench-parallel fast polarizations likely originate in the continental crust above the subducting slab.

  17. Automating FEA programming

    NASA Technical Reports Server (NTRS)

    Sharma, Naveen

    1992-01-01

    In this paper we briefly describe a combined symbolic and numeric approach for solving mathematical models on parallel computers. An experimental software system, PIER, is being developed in Common Lisp to synthesize computationally intensive and domain formulation dependent phases of finite element analysis (FEA) solution methods. Quantities for domain formulation like shape functions, element stiffness matrices, etc., are automatically derived using symbolic mathematical computations. The problem specific information and derived formulae are then used to generate (parallel) numerical code for FEA solution steps. A constructive approach to specify a numerical program design is taken. The code generator compiles application oriented input specifications into (parallel) FORTRAN77 routines with the help of built-in knowledge of the particular problem, numerical solution methods and the target computer.

  18. Combinatorial structures to modeling simple games and applications

    NASA Astrophysics Data System (ADS)

    Molinero, Xavier

    2017-09-01

    We connect three different topics: combinatorial structures, game theory and chemistry. In particular, we establish the bases to represent some simple games, defined as influence games, and molecules, defined from atoms, by using combinatorial structures. First, we characterize simple games as influence games using influence graphs. It let us to modeling simple games as combinatorial structures (from the viewpoint of structures or graphs). Second, we formally define molecules as combinations of atoms. It let us to modeling molecules as combinatorial structures (from the viewpoint of combinations). It is open to generate such combinatorial structures using some specific techniques as genetic algorithms, (meta-)heuristics algorithms and parallel programming, among others.

  19. Massively parallel multicanonical simulations

    NASA Astrophysics Data System (ADS)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  20. Algebraic multigrid preconditioning within parallel finite-element solvers for 3-D electromagnetic modelling problems in geophysics

    NASA Astrophysics Data System (ADS)

    Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María

    2014-06-01

    We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the parallel context.

  1. A new spherical model for computing the radiation field available for photolysis and heating at twilight

    NASA Technical Reports Server (NTRS)

    Dahlback, Arne; Stamnes, Knut

    1991-01-01

    Accurate computation of atmospheric photodissociation and heating rates is needed in photochemical models. These quantities are proportional to the mean intensity of the solar radiation penetrating to various levels in the atmosphere. For large solar zenith angles a solution of the radiative transfer equation valid for a spherical atmosphere is required in order to obtain accurate values of the mean intensity. Such a solution based on a perturbation technique combined with the discrete ordinate method is presented. Mean intensity calculations are carried out for various solar zenith angles. These results are compared with calculations from a plane parallel radiative transfer model in order to assess the importance of using correct geometry around sunrise and sunset. This comparison shows, in agreement with previous investigations, that for solar zenith angles less than 90 deg adequate solutions are obtained for plane parallel geometry as long as spherical geometry is used to compute the direct beam attenuation; but for solar zenith angles greater than 90 deg this pseudospherical plane parallel approximation overstimates the mean intensity.

  2. Comparison of two tension-band fixation materials and techniques in transverse patella fractures: a biomechanical study.

    PubMed

    Rabalais, R David; Burger, Evalina; Lu, Yun; Mansour, Alfred; Baratta, Richard V

    2008-02-01

    This study compared the biomechanical properties of 2 tension-band techniques with stainless steel wire and ultra high molecular weight polyethylene (UHMWPE) cable in a patella fracture model. Transverse patella fractures were simulated in 8 cadaver knees and fixated with figure-of-8 and parallel wire configurations in combination with Kirschner wires. Identical configurations were tested with UHMWPE cable. Specimens were mounted to a testing apparatus and the quadriceps was used to extend the knees from 90 degrees to 0 degrees; 4 knees were tested under monotonic loading, and 4 knees were tested under cyclic loading. Under monotonic loading, average fracture gap was 0.50 and 0.57 mm for steel wire and UHMWPE cable, respectively, in the figure-of-8 construct compared with 0.16 and 0.04 mm, respectively, in the parallel wire construct. Under cyclic loading, average fracture gap was 1.45 and 1.66 mm for steel wire and UHMWPE cable, respectively, in the figure-of-8 construct compared with 0.45 and 0.60 mm, respectively, in the parallel wire construct. A statistically significant effect of technique was found, with the parallel wire construct performing better than the figure-of-8 construct in both loading models. There was no effect of material or interaction. In this biomechanical model, parallel wires performed better than the figure-of-8 configuration in both loading regimens, and UHMWPE cable performed similarly to 18-gauge steel wire.

  3. Type synthesis for 4-DOF parallel press mechanism using GF set theory

    NASA Astrophysics Data System (ADS)

    He, Jun; Gao, Feng; Meng, Xiangdun; Guo, Weizhong

    2015-07-01

    Parallel mechanisms is used in the large capacity servo press to avoid the over-constraint of the traditional redundant actuation. Currently, the researches mainly focus on the performance analysis for some specific parallel press mechanisms. However, the type synthesis and evaluation of parallel press mechanisms is seldom studied, especially for the four degrees of freedom(DOF) press mechanisms. The type synthesis of 4-DOF parallel press mechanisms is carried out based on the generalized function(GF) set theory. Five design criteria of 4-DOF parallel press mechanisms are firstly proposed. The general procedure of type synthesis of parallel press mechanisms is obtained, which includes number synthesis, symmetrical synthesis of constraint GF sets, decomposition of motion GF sets and design of limbs. Nine combinations of constraint GF sets of 4-DOF parallel press mechanisms, ten combinations of GF sets of active limbs, and eleven combinations of GF sets of passive limbs are synthesized. Thirty-eight kinds of press mechanisms are presented and then different structures of kinematic limbs are designed. Finally, the geometrical constraint complexity( GCC), kinematic pair complexity( KPC), and type complexity( TC) are proposed to evaluate the press types and the optimal press type is achieved. The general methodologies of type synthesis and evaluation for parallel press mechanism are suggested.

  4. Development of Tokamak Transport Solvers for Stiff Confinement Systems

    NASA Astrophysics Data System (ADS)

    St. John, H. E.; Lao, L. L.; Murakami, M.; Park, J. M.

    2006-10-01

    Leading transport models such as GLF23 [1] and MM95 [2] describe turbulent plasma energy, momentum and particle flows. In order to accommodate existing transport codes and associated solution methods effective diffusivities have to be derived from these turbulent flow models. This can cause significant problems in predicting unique solutions. We have developed a parallel transport code solver, GCNMP, that can accommodate both flow based and diffusivity based confinement models by solving the discretized nonlinear equations using modern Newton, trust region, steepest descent and homotopy methods. We present our latest development efforts, including multiple dynamic grids, application of two-level parallel schemes, and operator splitting techniques that allow us to combine flow based and diffusivity based models in tokamk simulations. 6pt [1] R.E. Waltz, et al., Phys. Plasmas 4, 7 (1997). [2] G. Bateman, et al., Phys. Plasmas 5, 1793 (1998).

  5. Injector Design Tool Improvements: User's manual for FDNS V.4.5

    NASA Technical Reports Server (NTRS)

    Chen, Yen-Sen; Shang, Huan-Min; Wei, Hong; Liu, Jiwen

    1998-01-01

    The major emphasis of the current effort is in the development and validation of an efficient parallel machine computational model, based on the FDNS code, to analyze the fluid dynamics of a wide variety of liquid jet configurations for general liquid rocket engine injection system applications. This model includes physical models for droplet atomization, breakup/coalescence, evaporation, turbulence mixing and gas-phase combustion. Benchmark validation cases for liquid rocket engine chamber combustion conditions will be performed for model validation purpose. Test cases may include shear coaxial, swirl coaxial and impinging injection systems with combinations LOXIH2 or LOXISP-1 propellant injector elements used in rocket engine designs. As a final goal of this project, a well tested parallel CFD performance methodology together with a user's operation description in a final technical report will be reported at the end of the proposed research effort.

  6. High Performance Parallel Processing (HPPP) Finite Element Simulation of Fluid Structure Interactions Final Report CRADA No. TC-0824-94-A

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Couch, R.; Ziegler, D. P.

    This project was a muki-partner CRADA. This was a partnership between Alcoa and LLNL. AIcoa developed a system of numerical simulation modules that provided accurate and efficient threedimensional modeling of combined fluid dynamics and structural response.

  7. Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

    PubMed

    Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

    2012-01-01

    We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.

  8. Conceptual design of a hybrid parallel mechanism for mask exchanging of TMT

    NASA Astrophysics Data System (ADS)

    Wang, Jianping; Zhou, Hongfei; Li, Kexuan; Zhou, Zengxiang; Zhai, Chao

    2015-10-01

    Mask exchange system is an important part of the Multi-Object Broadband Imaging Echellette (MOBIE) on the Thirty Meter Telescope (TMT). To solve the problem of stiffness changing with the gravity vector of the mask exchange system in the MOBIE, the hybrid parallel mechanism design method was introduced into the whole research. By using the characteristics of high stiffness and precision of parallel structure, combined with large moving range of serial structure, a conceptual design of a hybrid parallel mask exchange system based on 3-RPS parallel mechanism was presented. According to the position requirements of the MOBIE, the SolidWorks structure model of the hybrid parallel mask exchange robot was established and the appropriate installation position without interfering with the related components and light path in the MOBIE of TMT was analyzed. Simulation results in SolidWorks suggested that 3-RPS parallel platform had good stiffness property in different gravity vector directions. Furthermore, through the research of the mechanism theory, the inverse kinematics solution of the 3-RPS parallel platform was calculated and the mathematical relationship between the attitude angle of moving platform and the angle of ball-hinges on the moving platform was established, in order to analyze the attitude adjustment ability of the hybrid parallel mask exchange robot. The proposed conceptual design has some guiding significance for the design of mask exchange system of the MOBIE on TMT.

  9. The impact of communications on the self-regulation of health beliefs, decisions, and behavior.

    PubMed

    Leventhal, H; Safer, M A; Panagis, D M

    1983-01-01

    The models used in the study of communication and health behavior have changed from those describing how to impose health actions on relatively passive respondents to models describing how respondents regulate their own health practices. We have traced the change from the fear-drive model, which described how fear induced change, to the parallel response model, which described how subjects processed information and generated coping responses to solve the problem posed by both the objective health threat and by their subjective fear. The data supporting this change showed that increasing fear led to more favorable attitudes but that fear alone was insufficient to create action: Specific action instructions had to be added to both high and low fear and both combinations produced the same level of health action. Neither the data nor the parallel model specified what subjects learned about the threat that made exposure to a high or low fear message necessary for behavior change. The parallel response model has been elaborated into a more complete systems model and new studies show how health threats are represented. They have found attributes such as IDENTITY (label and symptoms), CAUSES, TIME LINES or duration, and CONSEQUENCES, that set goals and criteria to generate and evaluate problem solving (coping) behavior. Suggestions are made for applying this more complete model to public health practice.

  10. Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

    PubMed

    Bellucci, Michael A; Coker, David F

    2011-07-28

    We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics

  11. The island dynamics model on parallel quadtree grids

    NASA Astrophysics Data System (ADS)

    Mistani, Pouria; Guittet, Arthur; Bochkov, Daniil; Schneider, Joshua; Margetis, Dionisios; Ratsch, Christian; Gibou, Frederic

    2018-05-01

    We introduce an approach for simulating epitaxial growth by use of an island dynamics model on a forest of quadtree grids, and in a parallel environment. To this end, we use a parallel framework introduced in the context of the level-set method. This framework utilizes: discretizations that achieve a second-order accurate level-set method on non-graded adaptive Cartesian grids for solving the associated free boundary value problem for surface diffusion; and an established library for the partitioning of the grid. We consider the cases with: irreversible aggregation, which amounts to applying Dirichlet boundary conditions at the island boundary; and an asymmetric (Ehrlich-Schwoebel) energy barrier for attachment/detachment of atoms at the island boundary, which entails the use of a Robin boundary condition. We provide the scaling analyses performed on the Stampede supercomputer and numerical examples that illustrate the capability of our methodology to efficiently simulate different aspects of epitaxial growth. The combination of adaptivity and parallelism in our approach enables simulations that are several orders of magnitude faster than those reported in the recent literature and, thus, provides a viable framework for the systematic study of mound formation on crystal surfaces.

  12. Parallel demodulation system and signal-processing method for extrinsic Fabry-Perot interferometer and fiber Bragg grating sensors.

    PubMed

    Jiang, Junfeng; Liu, Tiegen; Zhang, Yimo; Liu, Lina; Zha, Ying; Zhang, Fan; Wang, Yunxin; Long, Pin

    2005-03-15

    A parallel demodulation system for extrinsic Fabry-Perot interferometer (EFPI) and fiber Bragg grating (FBG) sensors is presented that is based on a Michelson interferometer and combines the methods of low-coherence interference and Fourier transform spectrum. Signals from EFPI and FBG sensors are obtained simultaneously by scanning one arm of a Michelson interferometer, and an algorithm model is established to process the signals and retrieve both the wavelength of the FBG and the cavity length of the EFPI at the same time, which are then used to determine the strain and temperature.

  13. Simulation of Hypervelocity Impact on Aluminum-Nextel-Kevlar Orbital Debris Shields

    NASA Technical Reports Server (NTRS)

    Fahrenthold, Eric P.

    2000-01-01

    An improved hybrid particle-finite element method has been developed for hypervelocity impact simulation. The method combines the general contact-impact capabilities of particle codes with the true Lagrangian kinematics of large strain finite element formulations. Unlike some alternative schemes which couple Lagrangian finite element models with smooth particle hydrodynamics, the present formulation makes no use of slidelines or penalty forces. The method has been implemented in a parallel, three dimensional computer code. Simulations of three dimensional orbital debris impact problems using this parallel hybrid particle-finite element code, show good agreement with experiment and good speedup in parallel computation. The simulations included single and multi-plate shields as well as aluminum and composite shielding materials. at an impact velocity of eleven kilometers per second.

  14. 2D-RBUC for efficient parallel compression of residuals

    NASA Astrophysics Data System (ADS)

    Đurđević, Đorđe M.; Tartalja, Igor I.

    2018-02-01

    In this paper, we present a method for lossless compression of residuals with an efficient SIMD parallel decompression. The residuals originate from lossy or near lossless compression of height fields, which are commonly used to represent models of terrains. The algorithm is founded on the existing RBUC method for compression of non-uniform data sources. We have adapted the method to capture 2D spatial locality of height fields, and developed the data decompression algorithm for modern GPU architectures already present even in home computers. In combination with the point-level SIMD-parallel lossless/lossy high field compression method HFPaC, characterized by fast progressive decompression and seamlessly reconstructed surface, the newly proposed method trades off small efficiency degradation for a non negligible compression ratio (measured up to 91%) benefit.

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Turner, A.; Davis, A.; University of Wisconsin-Madison, Madison, WI 53706

    CCFE perform Monte-Carlo transport simulations on large and complex tokamak models such as ITER. Such simulations are challenging since streaming and deep penetration effects are equally important. In order to make such simulations tractable, both variance reduction (VR) techniques and parallel computing are used. It has been found that the application of VR techniques in such models significantly reduces the efficiency of parallel computation due to 'long histories'. VR in MCNP can be accomplished using energy-dependent weight windows. The weight window represents an 'average behaviour' of particles, and large deviations in the arriving weight of a particle give rise tomore » extreme amounts of splitting being performed and a long history. When running on parallel clusters, a long history can have a detrimental effect on the parallel efficiency - if one process is computing the long history, the other CPUs complete their batch of histories and wait idle. Furthermore some long histories have been found to be effectively intractable. To combat this effect, CCFE has developed an adaptation of MCNP which dynamically adjusts the WW where a large weight deviation is encountered. The method effectively 'de-optimises' the WW, reducing the VR performance but this is offset by a significant increase in parallel efficiency. Testing with a simple geometry has shown the method does not bias the result. This 'long history method' has enabled CCFE to significantly improve the performance of MCNP calculations for ITER on parallel clusters, and will be beneficial for any geometry combining streaming and deep penetration effects. (authors)« less

  16. An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.

    2009-08-01

    Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisionsmore » are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.« less

  17. Modeling complex tone perception: grouping harmonics with combination-sensitive neurons.

    PubMed

    Medvedev, Andrei V; Chiao, Faye; Kanwal, Jagmeet S

    2002-06-01

    Perception of complex communication sounds is a major function of the auditory system. To create a coherent precept of these sounds the auditory system may instantaneously group or bind multiple harmonics within complex sounds. This perception strategy simplifies further processing of complex sounds and facilitates their meaningful integration with other sensory inputs. Based on experimental data and a realistic model, we propose that associative learning of combinations of harmonic frequencies and nonlinear facilitation of responses to those combinations, also referred to as "combination-sensitivity," are important for spectral grouping. For our model, we simulated combination sensitivity using Hebbian and associative types of synaptic plasticity in auditory neurons. We also provided a parallel tonotopic input that converges and diverges within the network. Neurons in higher-order layers of the network exhibited an emergent property of multifrequency tuning that is consistent with experimental findings. Furthermore, this network had the capacity to "recognize" the pitch or fundamental frequency of a harmonic tone complex even when the fundamental frequency itself was missing.

  18. Mapping behavioral landscapes for animal movement: a finite mixture modeling approach

    USGS Publications Warehouse

    Tracey, Jeff A.; Zhu, Jun; Boydston, Erin E.; Lyren, Lisa M.; Fisher, Robert N.; Crooks, Kevin R.

    2013-01-01

    Because of its role in many ecological processes, movement of animals in response to landscape features is an important subject in ecology and conservation biology. In this paper, we develop models of animal movement in relation to objects or fields in a landscape. We take a finite mixture modeling approach in which the component densities are conceptually related to different choices for movement in response to a landscape feature, and the mixing proportions are related to the probability of selecting each response as a function of one or more covariates. We combine particle swarm optimization and an Expectation-Maximization (EM) algorithm to obtain maximum likelihood estimates of the model parameters. We use this approach to analyze data for movement of three bobcats in relation to urban areas in southern California, USA. A behavioral interpretation of the models revealed similarities and differences in bobcat movement response to urbanization. All three bobcats avoided urbanization by moving either parallel to urban boundaries or toward less urban areas as the proportion of urban land cover in the surrounding area increased. However, one bobcat, a male with a dispersal-like large-scale movement pattern, avoided urbanization at lower densities and responded strictly by moving parallel to the urban edge. The other two bobcats, which were both residents and occupied similar geographic areas, avoided urban areas using a combination of movements parallel to the urban edge and movement toward areas of less urbanization. However, the resident female appeared to exhibit greater repulsion at lower levels of urbanization than the resident male, consistent with empirical observations of bobcats in southern California. Using the parameterized finite mixture models, we mapped behavioral states to geographic space, creating a representation of a behavioral landscape. This approach can provide guidance for conservation planning based on analysis of animal movement data using statistical models, thereby linking connectivity evaluations to empirical data.

  19. Finite Element Modeling of Coupled Flexible Multibody Dynamics and Liquid Sloshing

    DTIC Science & Technology

    2006-09-01

    tanks is presented. The semi-discrete combined solid and fluid equations of motions are integrated using a time- accurate parallel explicit solver...Incompressible fluid flow in a moving/deforming container including accurate modeling of the free-surface, turbulence, and viscous effects ...paper, a single computational code which uses a time- accurate explicit solution procedure is used to solve both the solid and fluid equations of

  20. Compartment models of the diffusion MR signal in brain white matter: a taxonomy and comparison.

    PubMed

    Panagiotaki, Eleftheria; Schneider, Torben; Siow, Bernard; Hall, Matt G; Lythgoe, Mark F; Alexander, Daniel C

    2012-02-01

    This paper aims to identify the minimum requirements for an accurate model of the diffusion MR signal in white matter of the brain. We construct a taxonomy of multi-compartment models of white matter from combinations of simple models for the intra- and the extra-axonal spaces. We devise a new diffusion MRI protocol that provides measurements with a wide range of imaging parameters for diffusion sensitization both parallel and perpendicular to white matter fibres. We use the protocol to acquire data from two fixed rat brains, which allows us to fit, study and compare the different models. The study examines a total of 47 analytic models, including several well-used models from the literature, which we place within the taxonomy. The results show that models that incorporate intra-axonal restriction, such as ball and stick or CHARMED, generally explain the data better than those that do not, such as the DT or the biexponential models. However, three-compartment models which account for restriction parallel to the axons and incorporate pore size explain the measurements most accurately. The best fit comes from combining a full diffusion tensor (DT) model of the extra-axonal space with a cylindrical intra-axonal component of single radius and a third spherical compartment of non-zero radius. We also measure the stability of the non-zero radius intra-axonal models and find that single radius intra-axonal models are more stable than gamma distributed radii models with similar fitting performance. Copyright © 2011 Elsevier Inc. All rights reserved.

  1. Series and parallel arc-fault circuit interrupter tests.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnson, Jay Dean; Fresquez, Armando J.; Gudgel, Bob

    2013-07-01

    While the 2011 National Electrical Codeª (NEC) only requires series arc-fault protection, some arc-fault circuit interrupter (AFCI) manufacturers are designing products to detect and mitigate both series and parallel arc-faults. Sandia National Laboratories (SNL) has extensively investigated the electrical differences of series and parallel arc-faults and has offered possible classification and mitigation solutions. As part of this effort, Sandia National Laboratories has collaborated with MidNite Solar to create and test a 24-string combiner box with an AFCI which detects, differentiates, and de-energizes series and parallel arc-faults. In the case of the MidNite AFCI prototype, series arc-faults are mitigated by openingmore » the PV strings, whereas parallel arc-faults are mitigated by shorting the array. A range of different experimental series and parallel arc-fault tests with the MidNite combiner box were performed at the Distributed Energy Technologies Laboratory (DETL) at SNL in Albuquerque, NM. In all the tests, the prototype de-energized the arc-faults in the time period required by the arc-fault circuit interrupt testing standard, UL 1699B. The experimental tests confirm series and parallel arc-faults can be successfully mitigated with a combiner box-integrated solution.« less

  2. Development and parallelization of a direct numerical simulation to study the formation and transport of nanoparticle clusters in a viscous fluid

    NASA Astrophysics Data System (ADS)

    Sloan, Gregory James

    The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.

  3. Modeling and Validation of Power-split and P2 Parallel Hybrid Electric Vehicles SAE 2013-01-1470)

    EPA Science Inventory

    The Advanced Light-Duty Powertrain and Hybrid Analysis tool was created by EPA to evaluate the Greenhouse Gas (GHG) emissions of Light-Duty (LD) vehicles. It is a physics-based, forward-looking, full vehicle computer simulator capable of analyzing various vehicle types combined ...

  4. School Principals' Opinions on In-Class Inspections

    ERIC Educational Resources Information Center

    Kayikci, Kemal; Sahin, Ahmet; Canturk, Gokhan

    2016-01-01

    The aim of this research is to determine school principals' opinions on the in-class inspections carried out by inspectors of the Ministry of National Education of Turkey (MEB). The study was modeled as a convergent parallel design, one of the mixed methods which combined qualitative and quantitative methods. For data collection, the researchers…

  5. LTE modeling of inhomogeneous chromospheric structure using high-resolution limb observations

    NASA Technical Reports Server (NTRS)

    Lindsey, C.

    1987-01-01

    The paper discusses considerations relevant to LTE modeling of rough atmospheres. Particular attention is given to the application of recent high-resolution observations of the solar limb in the far-infrared and radio continuum to the modeling of chromospheric spicules. It is explained how the continuum limb observations can be combined with morphological knowledge of spicule structure to model the physical conditions in chromospheric spicules. This discussion forms the basis for a chromospheric model presented in a parallel publication based on observations ranging from 100 microns to 2.6 mm.

  6. Nerve Fiber Activation During Peripheral Nerve Field Stimulation: Importance of Electrode Orientation and Estimation of Area of Paresthesia.

    PubMed

    Frahm, Ken Steffen; Hennings, Kristian; Vera-Portocarrero, Louis; Wacnik, Paul W; Mørch, Carsten Dahl

    2016-04-01

    Low back pain is one of the indications for using peripheral nerve field stimulation (PNFS). However, the effect of PNFS varies between patients; several stimulation parameters have not been investigated in depth, such as orientation of the nerve fiber in relation to the electrode. While placing the electrode parallel to the nerve fiber may give lower activation thresholds, anodal blocking may occur when the propagating action potential passes an anode. A finite element model was used to simulate the extracellular potential during PNFS. This was combined with an active cable model of Aβ and Aδ nerve fibers. It was investigated how the angle between the nerve fiber and electrode affected the nerve activation and whether anodal blocking could occur. Finally, the area of paresthesia was estimated and compared with any concomitant Aδ fiber activation. The lowest threshold was found when nerve and electrode were in parallel, and that anodal blocking did not appear to occur during PNFS. The activation of Aβ fibers was within therapeutic range (<10V) of PNFS; however, within this range, Aδ fiber activation also may occur. The combined area of activated Aβ fibers (paresthesia) was at least two times larger than Aδ fibers for similar stimulation intensities. No evidence of anodal blocking was observed in this PNFS model. The thresholds were lowest when the nerves and electrodes were parallel; thus, it may be relevant to investigate the overall position of the target nerve fibers prior to electrode placement. © 2015 International Neuromodulation Society.

  7. On the Takayanagi principle for the shape memory effect and thermomechanical behaviors in polymers with multi-phases

    NASA Astrophysics Data System (ADS)

    Lu, Haibao; Yu, Kai; Huang, Wei Min; Leng, Jinsong

    2016-12-01

    We present an explicit model to study the mechanics and physics of the shape memory effect (SME) in polymers based on the Takayanagi principle. The molecular structural characteristics and elastic behavior of shape memory polymers (SMPs) with multi-phases are investigated in terms of the thermomechanical properties of the individual components, of which the contributions are combined by using Takayanagi’s series-parallel model and parallel-series model, respectively. After that, Boltzmann superposition principle is employed to couple the multi-SME, elastic modulus parameter (E) and temperature parameter (T) in SMPs. Furthermore, the extended Takayanagi model is proposed to separate the plasticizing effect and physical swelling effect on the thermo-/chemo-responsive SME in polymers and then compared with the available experimental data reported in the literature. This study is expected to provide a powerful simulation tool for modeling and experimental substantiation of the mechanics and working mechanism of SME in polymers.

  8. A solution to neural field equations by a recurrent neural network method

    NASA Astrophysics Data System (ADS)

    Alharbi, Abir

    2012-09-01

    Neural field equations (NFE) are used to model the activity of neurons in the brain, it is introduced from a single neuron 'integrate-and-fire model' starting point. The neural continuum is spatially discretized for numerical studies, and the governing equations are modeled as a system of ordinary differential equations. In this article the recurrent neural network approach is used to solve this system of ODEs. This consists of a technique developed by combining the standard numerical method of finite-differences with the Hopfield neural network. The architecture of the net, energy function, updating equations, and algorithms are developed for the NFE model. A Hopfield Neural Network is then designed to minimize the energy function modeling the NFE. Results obtained from the Hopfield-finite-differences net show excellent performance in terms of accuracy and speed. The parallelism nature of the Hopfield approaches may make them easier to implement on fast parallel computers and give them the speed advantage over the traditional methods.

  9. Acoustic Parametric Array for Identifying Standoff Targets

    NASA Astrophysics Data System (ADS)

    Hinders, M. K.; Rudd, K. E.

    2010-02-01

    An integrated simulation method for investigating nonlinear sound beams and 3D acoustic scattering from any combination of complicated objects is presented. A standard finite-difference simulation method is used to model pulsed nonlinear sound propagation from a source to a scattering target via the KZK equation. Then, a parallel 3D acoustic simulation method based on the finite integration technique is used to model the acoustic wave interaction with the target. Any combination of objects and material layers can be placed into the 3D simulation space to study the resulting interaction. Several example simulations are presented to demonstrate the simulation method and 3D visualization techniques. The combined simulation method is validated by comparing experimental and simulation data and a demonstration of how this combined simulation method assisted in the development of a nonlinear acoustic concealed weapons detector is also presented.

  10. Constructing Neuronal Network Models in Massively Parallel Environments.

    PubMed

    Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus

    2017-01-01

    Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.

  11. Constructing Neuronal Network Models in Massively Parallel Environments

    PubMed Central

    Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus

    2017-01-01

    Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808

  12. Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hirata, So

    2003-11-20

    We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less

  13. Enhanced Sampling in the Well-Tempered Ensemble

    NASA Astrophysics Data System (ADS)

    Bonomi, M.; Parrinello, M.

    2010-05-01

    We introduce the well-tempered ensemble (WTE) which is the biased ensemble sampled by well-tempered metadynamics when the energy is used as collective variable. WTE can be designed so as to have approximately the same average energy as the canonical ensemble but much larger fluctuations. These two properties lead to an extremely fast exploration of phase space. An even greater efficiency is obtained when WTE is combined with parallel tempering. Unbiased Boltzmann averages are computed on the fly by a recently developed reweighting method [M. Bonomi , J. Comput. Chem. 30, 1615 (2009)JCCHDD0192-865110.1002/jcc.21305]. We apply WTE and its parallel tempering variant to the 2d Ising model and to a Gō model of HIV protease, demonstrating in these two representative cases that convergence is accelerated by orders of magnitude.

  14. Enhanced sampling in the well-tempered ensemble.

    PubMed

    Bonomi, M; Parrinello, M

    2010-05-14

    We introduce the well-tempered ensemble (WTE) which is the biased ensemble sampled by well-tempered metadynamics when the energy is used as collective variable. WTE can be designed so as to have approximately the same average energy as the canonical ensemble but much larger fluctuations. These two properties lead to an extremely fast exploration of phase space. An even greater efficiency is obtained when WTE is combined with parallel tempering. Unbiased Boltzmann averages are computed on the fly by a recently developed reweighting method [M. Bonomi, J. Comput. Chem. 30, 1615 (2009)]. We apply WTE and its parallel tempering variant to the 2d Ising model and to a Gō model of HIV protease, demonstrating in these two representative cases that convergence is accelerated by orders of magnitude.

  15. Applying the Extended Parallel Process Model to workplace safety messages.

    PubMed

    Basil, Michael; Basil, Debra; Deshpande, Sameer; Lavack, Anne M

    2013-01-01

    The extended parallel process model (EPPM) proposes fear appeals are most effective when they combine threat and efficacy. Three studies conducted in the workplace safety context examine the use of various EPPM factors and their effects, especially multiplicative effects. Study 1 was a content analysis examining the use of EPPM factors in actual workplace safety messages. Study 2 experimentally tested these messages with 212 construction trainees. Study 3 replicated this experiment with 1,802 men across four English-speaking countries-Australia, Canada, the United Kingdom, and the United States. The results of these three studies (1) demonstrate the inconsistent use of EPPM components in real-world work safety communications, (2) support the necessity of self-efficacy for the effective use of threat, (3) show a multiplicative effect where communication effectiveness is maximized when all model components are present (severity, susceptibility, and efficacy), and (4) validate these findings with gory appeals across four English-speaking countries.

  16. Many-to-one form-to-function mapping weakens parallel morphological evolution.

    PubMed

    Thompson, Cole J; Ahmed, Newaz I; Veen, Thor; Peichel, Catherine L; Hendry, Andrew P; Bolnick, Daniel I; Stuart, Yoel E

    2017-11-01

    Evolutionary ecologists aim to explain and predict evolutionary change under different selective regimes. Theory suggests that such evolutionary prediction should be more difficult for biomechanical systems in which different trait combinations generate the same functional output: "many-to-one mapping." Many-to-one mapping of phenotype to function enables multiple morphological solutions to meet the same adaptive challenges. Therefore, many-to-one mapping should undermine parallel morphological evolution, and hence evolutionary predictability, even when selection pressures are shared among populations. Studying 16 replicate pairs of lake- and stream-adapted threespine stickleback (Gasterosteus aculeatus), we quantified three parts of the teleost feeding apparatus and used biomechanical models to calculate their expected functional outputs. The three feeding structures differed in their form-to-function relationship from one-to-one (lower jaw lever ratio) to increasingly many-to-one (buccal suction index, opercular 4-bar linkage). We tested for (1) weaker linear correlations between phenotype and calculated function, and (2) less parallel evolution across lake-stream pairs, in the many-to-one systems relative to the one-to-one system. We confirm both predictions, thus supporting the theoretical expectation that increasing many-to-one mapping undermines parallel evolution. Therefore, sole consideration of morphological variation within and among populations might not serve as a proxy for functional variation when multiple adaptive trait combinations exist. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.

  17. ANNarchy: a code generation approach to neural simulations on parallel hardware

    PubMed Central

    Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  18. User's guide to the Reliability Estimation System Testbed (REST)

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Palumbo, Daniel L.; Rifkin, Adam

    1992-01-01

    The Reliability Estimation System Testbed is an X-window based reliability modeling tool that was created to explore the use of the Reliability Modeling Language (RML). RML was defined to support several reliability analysis techniques including modularization, graphical representation, Failure Mode Effects Simulation (FMES), and parallel processing. These techniques are most useful in modeling large systems. Using modularization, an analyst can create reliability models for individual system components. The modules can be tested separately and then combined to compute the total system reliability. Because a one-to-one relationship can be established between system components and the reliability modules, a graphical user interface may be used to describe the system model. RML was designed to permit message passing between modules. This feature enables reliability modeling based on a run time simulation of the system wide effects of a component's failure modes. The use of failure modes effects simulation enhances the analyst's ability to correctly express system behavior when using the modularization approach to reliability modeling. To alleviate the computation bottleneck often found in large reliability models, REST was designed to take advantage of parallel processing on hypercube processors.

  19. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  20. Adaptive multi-GPU Exchange Monte Carlo for the 3D Random Field Ising Model

    NASA Astrophysics Data System (ADS)

    Navarro, Cristóbal A.; Huang, Wei; Deng, Youjin

    2016-08-01

    This work presents an adaptive multi-GPU Exchange Monte Carlo approach for the simulation of the 3D Random Field Ising Model (RFIM). The design is based on a two-level parallelization. The first level, spin-level parallelism, maps the parallel computation as optimal 3D thread-blocks that simulate blocks of spins in shared memory with minimal halo surface, assuming a constant block volume. The second level, replica-level parallelism, uses multi-GPU computation to handle the simulation of an ensemble of replicas. CUDA's concurrent kernel execution feature is used in order to fill the occupancy of each GPU with many replicas, providing a performance boost that is more notorious at the smallest values of L. In addition to the two-level parallel design, the work proposes an adaptive multi-GPU approach that dynamically builds a proper temperature set free of exchange bottlenecks. The strategy is based on mid-point insertions at the temperature gaps where the exchange rate is most compromised. The extra work generated by the insertions is balanced across the GPUs independently of where the mid-point insertions were performed. Performance results show that spin-level performance is approximately two orders of magnitude faster than a single-core CPU version and one order of magnitude faster than a parallel multi-core CPU version running on 16-cores. Multi-GPU performance is highly convenient under a weak scaling setting, reaching up to 99 % efficiency as long as the number of GPUs and L increase together. The combination of the adaptive approach with the parallel multi-GPU design has extended our possibilities of simulation to sizes of L = 32 , 64 for a workstation with two GPUs. Sizes beyond L = 64 can eventually be studied using larger multi-GPU systems.

  1. Gas-liquid-liquid three-phase flow pattern and pressure drop in a microfluidic chip: similarities with gas-liquid/liquid-liquid flows.

    PubMed

    Yue, Jun; Rebrov, Evgeny V; Schouten, Jaap C

    2014-05-07

    We report a three-phase slug flow and a parallel-slug flow as two major flow patterns found under the nitrogen-decane-water flow through a glass microfluidic chip which features a long microchannel with a hydraulic diameter of 98 μm connected to a cross-flow mixer. The three-phase slug flow pattern is characterized by a flow of decane droplets containing single elongated nitrogen bubbles, which are separated by water slugs. This flow pattern was observed at a superficial velocity of decane (in the range of about 0.6 to 10 mm s(-1)) typically lower than that of water for a given superficial gas velocity in the range of 30 to 91 mm s(-1). The parallel-slug flow pattern is characterized by a continuous water flow in one part of the channel cross section and a parallel flow of decane with dispersed nitrogen bubbles in the adjacent part of the channel cross section, which was observed at a superficial velocity of decane (in the range of about 2.5 to 40 mm s(-1)) typically higher than that of water for each given superficial gas velocity. The three-phase slug flow can be seen as a superimposition of both decane-water and nitrogen-decane slug flows observed in the chip when the flow of the third phase (viz. nitrogen or water, respectively) was set at zero. The parallel-slug flow can be seen as a superimposition of the decane-water parallel flow and the nitrogen-decane slug flow observed in the chip under the corresponding two-phase flow conditions. In case of small capillary numbers (Ca ≪ 0.1) and Weber numbers (We ≪ 1), the developed two-phase pressure drop model under a slug flow has been extended to obtain a three-phase slug flow model in which the 'nitrogen-in-decane' droplet is assumed as a pseudo-homogeneous droplet with an effective viscosity. The parallel flow and slug flow pressure drop models have been combined to obtain a parallel-slug flow model. The obtained models describe the experimental pressure drop with standard deviations of 8% and 12% for the three-phase slug flow and parallel-slug flow, respectively. An example is given to illustrate the model uses in designing bifurcated microchannels that split the three-phase slug flow for high-throughput processing.

  2. A scheme for solving the plane-plane challenge in force measurements at the nanoscale.

    PubMed

    Siria, Alessandro; Huant, Serge; Auvert, Geoffroy; Comin, Fabio; Chevrier, Joel

    2010-05-19

    Non-contact interaction between two parallel flat surfaces is a central paradigm in sciences. This situation is the starting point for a wealth of different models: the capacitor description in electrostatics, hydrodynamic flow, thermal exchange, the Casimir force, direct contact study, third body confinement such as liquids or films of soft condensed matter. The control of parallelism is so demanding that no versatile single force machine in this geometry has been proposed so far. Using a combination of nanopositioning based on inertial motors, of microcrystal shaping with a focused-ion beam (FIB) and of accurate in situ and real-time control of surface parallelism with X-ray diffraction, we propose here a "gedanken" surface-force machine that should enable one to measure interactions between movable surfaces separated by gaps in the micrometer and nanometer ranges.

  3. Combining points and lines in rectifying satellite images

    NASA Astrophysics Data System (ADS)

    Elaksher, Ahmed F.

    2017-09-01

    The quick advance in remote sensing technologies established the potential to gather accurate and reliable information about the Earth surface using high resolution satellite images. Remote sensing satellite images of less than one-meter pixel size are currently used in large-scale mapping. Rigorous photogrammetric equations are usually used to describe the relationship between the image coordinates and ground coordinates. These equations require the knowledge of the exterior and interior orientation parameters of the image that might not be available. On the other hand, the parallel projection transformation could be used to represent the mathematical relationship between the image-space and objectspace coordinate systems and provides the required accuracy for large-scale mapping using fewer ground control features. This article investigates the differences between point-based and line-based parallel projection transformation models in rectifying satellite images with different resolutions. The point-based parallel projection transformation model and its extended form are presented and the corresponding line-based forms are developed. Results showed that the RMS computed using the point- or line-based transformation models are equivalent and satisfy the requirement for large-scale mapping. The differences between the transformation parameters computed using the point- and line-based transformation models are insignificant. The results showed high correlation between the differences in the ground elevation and the RMS.

  4. A Linguistic Model in Component Oriented Programming

    NASA Astrophysics Data System (ADS)

    Crăciunean, Daniel Cristian; Crăciunean, Vasile

    2016-12-01

    It is a fact that the component-oriented programming, well organized, can bring a large increase in efficiency in the development of large software systems. This paper proposes a model for building software systems by assembling components that can operate independently of each other. The model is based on a computing environment that runs parallel and distributed applications. This paper introduces concepts as: abstract aggregation scheme and aggregation application. Basically, an aggregation application is an application that is obtained by combining corresponding components. In our model an aggregation application is a word in a language.

  5. Serial multiplier arrays for parallel computation

    NASA Technical Reports Server (NTRS)

    Winters, Kel

    1990-01-01

    Arrays of systolic serial-parallel multiplier elements are proposed as an alternative to conventional SIMD mesh serial adder arrays for applications that are multiplication intensive and require few stored operands. The design and operation of a number of multiplier and array configurations featuring locality of connection, modularity, and regularity of structure are discussed. A design methodology combining top-down and bottom-up techniques is described to facilitate development of custom high-performance CMOS multiplier element arrays as well as rapid synthesis of simulation models and semicustom prototype CMOS components. Finally, a differential version of NORA dynamic circuits requiring a single-phase uncomplemented clock signal introduced for this application.

  6. A framework for grand scale parallelization of the combined finite discrete element method in 2d

    NASA Astrophysics Data System (ADS)

    Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.

    2014-09-01

    Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.

  7. NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

    NASA Technical Reports Server (NTRS)

    Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.

  8. Simulating Effects of High Angle of Attack on Turbofan Engine Performance

    NASA Technical Reports Server (NTRS)

    Liu, Yuan; Claus, Russell W.; Litt, Jonathan S.; Guo, Ten-Huei

    2013-01-01

    A method of investigating the effects of high angle of attack (AOA) flight on turbofan engine performance is presented. The methodology involves combining a suite of diverse simulation tools. Three-dimensional, steady-state computational fluid dynamics (CFD) software is used to model the change in performance of a commercial aircraft-type inlet and fan geometry due to various levels of AOA. Parallel compressor theory is then applied to assimilate the CFD data with a zero-dimensional, nonlinear, dynamic turbofan engine model. The combined model shows that high AOA operation degrades fan performance and, thus, negatively impacts compressor stability margins and engine thrust. In addition, the engine response to high AOA conditions is shown to be highly dependent upon the type of control system employed.

  9. Multitasking TORT under UNICOS: Parallel performance models and measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barnett, A.; Azmy, Y.Y.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  10. Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Azmy, Y.Y.; Barnett, D.A.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  11. Scalable and balanced dynamic hybrid data assimilation

    NASA Astrophysics Data System (ADS)

    Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa

    2017-04-01

    Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them implemented as parallel model runs themselves. The only bottleneck in the process is the gathering and scattering of initial and final model state snapshots before and after the parallel runs which requires a very efficient and low-latency communication network. However, the volume of data communicated is small and the intervening minimization steps are only 3D-Var, which means their computational load is negligible compared with the fully parallel model runs. We present example results of scalable VEnKF with the 4D lake and shallow sea model COHERENS, assimilating simultaneously continuous in situ measurements in a single point and infrequent satellite images that cover a whole lake, with the fully scalable VEnKF.

  12. Impedance matching for repetitive high voltage all-solid-state Marx generator and excimer DBD UV sources

    NASA Astrophysics Data System (ADS)

    Wang, Yonggang; Tong, Liqing; Liu, Kefu

    2017-06-01

    The purpose of impedance matching for a Marx generator and DBD lamp is to limit the output current of the Marx generator, provide a large discharge current at ignition, and obtain fast voltage rising/falling edges and large overshoot. In this paper, different impedance matching circuits (series inductor, parallel capacitor, and series inductor combined with parallel capacitor) are analyzed. It demonstrates that a series inductor could limit the Marx current. However, the discharge current is also limited. A parallel capacitor could provide a large discharge current, but the Marx current is also enlarged. A series inductor combined with a parallel capacitor takes full advantage of the inductor and capacitor, and avoids their shortcomings. Therefore, it is a good solution. Experimental results match the theoretical analysis well and show that both the series inductor and parallel capacitor improve the performance of the system. However, the series inductor combined with the parallel capacitor has the best performance. Compared with driving the DBD lamp with a Marx generator directly, an increase of 97.3% in radiant power and an increase of 59.3% in system efficiency are achieved using this matching circuit.

  13. Parallel updating and weighting of multiple spatial maps for visual stability during whole body motion

    PubMed Central

    Medendorp, W. P.

    2015-01-01

    It is known that the brain uses multiple reference frames to code spatial information, including eye-centered and body-centered frames. When we move our body in space, these internal representations are no longer in register with external space, unless they are actively updated. Whether the brain updates multiple spatial representations in parallel, or whether it restricts its updating mechanisms to a single reference frame from which other representations are constructed, remains an open question. We developed an optimal integration model to simulate the updating of visual space across body motion in multiple or single reference frames. To test this model, we designed an experiment in which participants had to remember the location of a briefly presented target while being translated sideways. The behavioral responses were in agreement with a model that uses a combination of eye- and body-centered representations, weighted according to the reliability in which the target location is stored and updated in each reference frame. Our findings suggest that the brain simultaneously updates multiple spatial representations across body motion. Because both representations are kept in sync, they can be optimally combined to provide a more precise estimate of visual locations in space than based on single-frame updating mechanisms. PMID:26490289

  14. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  15. Evidence of seasonal variation in longitudinal growth of height in a sample of boys from Stuttgart Carlsschule, 1771-1793, using combined principal component analysis and maximum likelihood principle.

    PubMed

    Lehmann, A; Scheffler, Ch; Hermanussen, M

    2010-02-01

    Recent progress in modelling individual growth has been achieved by combining the principal component analysis and the maximum likelihood principle. This combination models growth even in incomplete sets of data and in data obtained at irregular intervals. We re-analysed late 18th century longitudinal growth of German boys from the boarding school Carlsschule in Stuttgart. The boys, aged 6-23 years, were measured at irregular 3-12 monthly intervals during the period 1771-1793. At the age of 18 years, mean height was 1652 mm, but height variation was large. The shortest boy reached 1474 mm, the tallest 1826 mm. Measured height closely paralleled modelled height, with mean difference of 4 mm, SD 7 mm. Seasonal height variation was found. Low growth rates occurred in spring and high growth rates in summer and autumn. The present study demonstrates that combining the principal component analysis and the maximum likelihood principle enables growth modelling in historic height data also. Copyright (c) 2009 Elsevier GmbH. All rights reserved.

  16. Spatial proximity effects on the excitation of sheath RF voltages by evanescent slow waves in the ion cyclotron range of frequencies

    NASA Astrophysics Data System (ADS)

    Colas, Laurent; Lu, Ling-Feng; Křivská, Alena; Jacquot, Jonathan; Hillairet, Julien; Helou, Walid; Goniche, Marc; Heuraux, Stéphane; Faudot, Eric

    2017-02-01

    We investigate theoretically how sheath radio-frequency (RF) oscillations relate to the spatial structure of the near RF parallel electric field E ∥ emitted by ion cyclotron (IC) wave launchers. We use a simple model of slow wave (SW) evanescence coupled with direct current (DC) plasma biasing via sheath boundary conditions in a 3D parallelepiped filled with homogeneous cold magnetized plasma. Within a ‘wide-sheath’ asymptotic regime, valid for large-amplitude near RF fields, the RF part of this simple RF  +  DC model becomes linear: the sheath oscillating voltage V RF at open field line boundaries can be re-expressed as a linear combination of individual contributions by every emitting point in the input field map. SW evanescence makes individual contributions all the larger as the wave emission point is located closer to the sheath walls. The decay of |V RF| with the emission point/sheath poloidal distance involves the transverse SW evanescence length and the radial protrusion depth of lateral boundaries. The decay of |V RF| with the emitter/sheath parallel distance is quantified as a function of the parallel SW evanescence length and the parallel connection length of open magnetic field lines. For realistic geometries and target SOL plasmas, poloidal decay occurs over a few centimeters. Typical parallel decay lengths for |V RF| are found to be smaller than IC antenna parallel extension. Oscillating sheath voltages at IC antenna side limiters are therefore mainly sensitive to E ∥ emission by active or passive conducting elements near these limiters, as suggested by recent experimental observations. Parallel proximity effects could also explain why sheath oscillations persist with antisymmetric strap toroidal phasing, despite the parallel antisymmetry of the radiated field map. They could finally justify current attempts at reducing the RF fields induced near antenna boxes to attenuate sheath oscillations in their vicinity.

  17. Event-based hydrological modeling for detecting dominant hydrological process and suitable model strategy for semi-arid catchments

    NASA Astrophysics Data System (ADS)

    Huang, Pengnian; Li, Zhijia; Chen, Ji; Li, Qiaoling; Yao, Cheng

    2016-11-01

    To simulate the hydrological processes in semi-arid areas properly is still challenging. This study assesses the impact of different modeling strategies on simulating flood processes in semi-arid catchments. Four classic hydrological models, TOPMODEL, XINANJIANG (XAJ), SAC-SMA and TANK, were selected and applied to three semi-arid catchments in North China. Based on analysis and comparison of the simulation results of these classic models, four new flexible models were constructed and used to further investigate the suitability of various modeling strategies for semi-arid environments. Numerical experiments were also designed to examine the performances of the models. The results show that in semi-arid catchments a suitable model needs to include at least one nonlinear component to simulate the main process of surface runoff generation. If there are more than two nonlinear components in the hydrological model, they should be arranged in parallel, rather than in series. In addition, the results show that the parallel nonlinear components should be combined by multiplication rather than addition. Moreover, this study reveals that the key hydrological process over semi-arid catchments is the infiltration excess surface runoff, a non-linear component.

  18. TOMO3D: 3-D joint refraction and reflection traveltime tomography parallel code for active-source seismic data—synthetic test

    NASA Astrophysics Data System (ADS)

    Meléndez, A.; Korenaga, J.; Sallarès, V.; Miniussi, A.; Ranero, C. R.

    2015-10-01

    We present a new 3-D traveltime tomography code (TOMO3D) for the modelling of active-source seismic data that uses the arrival times of both refracted and reflected seismic phases to derive the velocity distribution and the geometry of reflecting boundaries in the subsurface. This code is based on its popular 2-D version TOMO2D from which it inherited the methods to solve the forward and inverse problems. The traveltime calculations are done using a hybrid ray-tracing technique combining the graph and bending methods. The LSQR algorithm is used to perform the iterative regularized inversion to improve the initial velocity and depth models. In order to cope with an increased computational demand due to the incorporation of the third dimension, the forward problem solver, which takes most of the run time (˜90 per cent in the test presented here), has been parallelized with a combination of multi-processing and message passing interface standards. This parallelization distributes the ray-tracing and traveltime calculations among available computational resources. The code's performance is illustrated with a realistic synthetic example, including a checkerboard anomaly and two reflectors, which simulates the geometry of a subduction zone. The code is designed to invert for a single reflector at a time. A data-driven layer-stripping strategy is proposed for cases involving multiple reflectors, and it is tested for the successive inversion of the two reflectors. Layers are bound by consecutive reflectors, and an initial velocity model for each inversion step incorporates the results from previous steps. This strategy poses simpler inversion problems at each step, allowing the recovery of strong velocity discontinuities that would otherwise be smoothened.

  19. Implementation of parallel transmit beamforming using orthogonal frequency division multiplexing--achievable resolution and interbeam interference.

    PubMed

    Demi, Libertario; Viti, Jacopo; Kusters, Lieneke; Guidi, Francesco; Tortoli, Piero; Mischi, Massimo

    2013-11-01

    The speed of sound in the human body limits the achievable data acquisition rate of pulsed ultrasound scanners. To overcome this limitation, parallel beamforming techniques are used in ultrasound 2-D and 3-D imaging systems. Different parallel beamforming approaches have been proposed. They may be grouped into two major categories: parallel beamforming in reception and parallel beamforming in transmission. The first category is not optimal for harmonic imaging; the second category may be more easily applied to harmonic imaging. However, inter-beam interference represents an issue. To overcome these shortcomings and exploit the benefit of combining harmonic imaging and high data acquisition rate, a new approach has been recently presented which relies on orthogonal frequency division multiplexing (OFDM) to perform parallel beamforming in transmission. In this paper, parallel transmit beamforming using OFDM is implemented for the first time on an ultrasound scanner. An advanced open platform for ultrasound research is used to investigate the axial resolution and interbeam interference achievable with parallel transmit beamforming using OFDM. Both fundamental and second-harmonic imaging modalities have been considered. Results show that, for fundamental imaging, axial resolution in the order of 2 mm can be achieved in combination with interbeam interference in the order of -30 dB. For second-harmonic imaging, axial resolution in the order of 1 mm can be achieved in combination with interbeam interference in the order of -35 dB.

  20. Numerical study of the stress-strain state of reinforced plate on an elastic foundation by the Bubnov-Galerkin method

    NASA Astrophysics Data System (ADS)

    Beskopylny, Alexey; Kadomtseva, Elena; Strelnikov, Grigory

    2017-10-01

    The stress-strain state of a rectangular slab resting on an elastic foundation is considered. The slab material is isotropic. The slab has stiffening ribs that directed parallel to both sides of the plate. Solving equations are obtained for determining the deflection for various mechanical and geometric characteristics of the stiffening ribs which are parallel to different sides of the plate, having different rigidity for bending and torsion. The calculation scheme assumes an orthotropic slab having different cylindrical stiffness in two mutually perpendicular directions parallel to the reinforcing ribs. An elastic foundation is adopted by Winkler model. To determine the deflection the Bubnov-Galerkin method is used. The deflection is taken in the form of an expansion in a series with unknown coefficients by special polynomials, which are a combination of Legendre polynomials.

  1. Function-based payment model for inpatient medical rehabilitation: an evaluation.

    PubMed

    Sutton, J P; DeJong, G; Wilkerson, D

    1996-07-01

    To describe the components of a function-based prospective payment model for inpatient medical rehabilitation that parallels diagnosis-related groups (DRGs), to evaluate this model in relation to stakeholder objectives, and to detail the components of a quality of care incentive program that, when combined with this payment model, creates an incentive for provides to maximize functional outcomes. This article describes a conceptual model, involving no data collection or data synthesis. The basic payment model described parallels DRGs. Information on the potential impact of this model on medical rehabilitation is gleaned from the literature evaluating the impact of DRGs. The conceptual model described is evaluated against the results of a Delphi Survey of rehabilitation providers, consumers, policymakers, and researchers previously conducted by members of the research team. The major shortcoming of a function-based prospective payment model for inpatient medical rehabilitation is that it contains no inherent incentive to maximize functional outcomes. Linkage of reimbursement to outcomes, however, by withholding a fixed proportion of the standard FRG payment amount, placing that amount in a "quality of care" pool, and distributing that pool annually among providers whose predesignated, facility-level, case-mix-adjusted outcomes are attained, may be one strategy for maximizing outcome goals.

  2. A cost-benefit model comparing the California Milk Cell Test and Milk Electrical Resistance Test.

    PubMed

    Petzer, Inge-Marie; Karzis, Joanne; Meyer, Isabel A; van der Schans, Theodorus J

    2013-04-24

    The indirect effects of mastitis treatment are often overlooked in cost-benefit analyses, but it may be beneficial for the dairy industry to consider them. The cost of mastitis treatment may increase when the duration of intra-mammary infections are prolonged due to misdiagnosis of host-adapted mastitis. Laboratory diagnosis of mastitis can be costly and time consuming, therefore cow-side tests such as the California Milk Cell Test (CMCT) and Milk Electrical Resistance (MER) need to be utilised to their full potential. The aim of this study was to determine the relative benefit of using these two tests separately and in parallel. This was done using a partial-budget analysis and a cost-benefit model to estimate the benefits and costs of each respective test and the parallel combination thereof. Quarter milk samples (n= 1860) were taken from eight different dairy herds in South Africa. Milk samples were evaluated by means of the CMCT, hand-held MER meter and cyto-microbiological laboratory analysis. After determining the most appropriate cut-off points for the two cow-side tests, the sensitivity and specificity of the CMCT (Se= 1.00, Sp= 0.66), MER (Se= 0.92, Sp= 0.62) and the tests done in parallel (Se= 1.00, Sp= 0.87) were calculated. The input data that were used for partial-budget analysis and in the cost-benefit model were based on South African figures at the time of the study, and on literature. The total estimated financial benefit of correct diagnosis of host-adapted mastitis per cow for the CMCT, MER and the tests done in parallel was R898.73, R518.70 and R1064.67 respectively. This involved taking the expected benefit of a correct test result per cow, the expected cost of an error per cow and the cost of the test into account. The CMCT was shown to be 11%more beneficial than the MER test, whilst using the tests in parallel was shown to be the most beneficial method for evaluating the mastitis-control programme. Therefore, it is recommended that the combined tests should be used strategically in practice to monitor udder health and promote a pro-active udder health approach when dealing with host-adapted pathogens.

  3. Design of k-Space Channel Combination Kernels and Integration with Parallel Imaging

    PubMed Central

    Beatty, Philip J.; Chang, Shaorong; Holmes, James H.; Wang, Kang; Brau, Anja C. S.; Reeder, Scott B.; Brittain, Jean H.

    2014-01-01

    Purpose In this work, a new method is described for producing local k-space channel combination kernels using a small amount of low-resolution multichannel calibration data. Additionally, this work describes how these channel combination kernels can be combined with local k-space unaliasing kernels produced by the calibration phase of parallel imaging methods such as GRAPPA, PARS and ARC. Methods Experiments were conducted to evaluate both the image quality and computational efficiency of the proposed method compared to a channel-by-channel parallel imaging approach with image-space sum-of-squares channel combination. Results Results indicate comparable image quality overall, with some very minor differences seen in reduced field-of-view imaging. It was demonstrated that this method enables a speed up in computation time on the order of 3–16X for 32-channel data sets. Conclusion The proposed method enables high quality channel combination to occur earlier in the reconstruction pipeline, reducing computational and memory requirements for image reconstruction. PMID:23943602

  4. Interactive Visualization of DGA Data Based on Multiple Views

    NASA Astrophysics Data System (ADS)

    Geng, Yujie; Lin, Ying; Ma, Yan; Guo, Zhihong; Gu, Chao; Wang, Mingtao

    2017-01-01

    The commission and operation of dissolved gas analysis (DGA) online monitoring makes up for the weakness of traditional DGA method. However, volume and high-dimensional DGA data brings a huge challenge for monitoring and analysis. In this paper, we present a novel interactive visualization model of DGA data based on multiple views. This model imitates multi-angle analysis by combining parallel coordinates, scatter plot matrix and data table. By offering brush, collaborative filter and focus + context technology, this model provides a convenient and flexible interactive way to analyze and understand the DGA data.

  5. A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

    PubMed

    Liang, Faming; Kim, Jinsu; Song, Qifan

    2016-01-01

    Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.

  6. A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data

    PubMed Central

    Kim, Jinsu; Song, Qifan

    2016-01-01

    Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469

  7. Adolescents' Use of Sexually Explicit Internet Material and Their Sexual Attitudes and Behavior: Parallel Development and Directional Effects

    ERIC Educational Resources Information Center

    Doornwaard, Suzan M.; Bickham, David S.; Rich, Michael; ter Bogt, Tom F. M.; van den Eijnden, Regina J. J. M.

    2015-01-01

    Although research has repeatedly demonstrated that adolescents' use of sexually explicit Internet material (SEIM) is related to their endorsement of permissive sexual attitudes and their experience with sexual behavior, it is not clear how linkages between these constructs unfold over time. This study combined 2 types of longitudinal modeling,…

  8. Parallel Energy Transport in Detached DIII-D Divertor Plasmas

    NASA Astrophysics Data System (ADS)

    Leonard, A. W.; Lore, J. D.; Canik, J. M.; McLean, A. G.; Makowski, M. A.

    2017-10-01

    A comparison of experiment and modeling of detached divertor plasmas is examined in the context of parallel energy transport. Experimental estimates of power carried by electron thermal conduction versus plasma convection are experimentally inferred from power balance measurements of radiated power and target plate heat flux combined with Thomson scattering measurements of the Te profile along the divertor leg. Experimental profiles of Te exhibit relatively low gradients with Te < 15 eV from the X-point to the target implying transport dominated by convection. In contrast, fluid modeling with SOLPS produces sharp Te gradients for Te > 3 eV, characteristic of transport dominated by electron conduction through the bulk of the divertor. This discrepancy with experimental transport dominated by convection and modeling by conduction has significant implications for the radiative capacity of divertor plasmas and may explain at least part of the difficulty for fluid modeling to obtain the experimentally observed radiative losses. Comparisons are also made for helium plasmas where the match between experiment and modeling is much better. Work supported by the US DOE under DE-FC02-04ER54698.

  9. Multiscale Modeling of Structurally-Graded Materials Using Discrete Dislocation Plasticity Models and Continuum Crystal Plasticity Models

    NASA Technical Reports Server (NTRS)

    Saether, Erik; Hochhalter, Jacob D.; Glaessgen, Edward H.

    2012-01-01

    A multiscale modeling methodology that combines the predictive capability of discrete dislocation plasticity and the computational efficiency of continuum crystal plasticity is developed. Single crystal configurations of different grain sizes modeled with periodic boundary conditions are analyzed using discrete dislocation plasticity (DD) to obtain grain size-dependent stress-strain predictions. These relationships are mapped into crystal plasticity parameters to develop a multiscale DD/CP model for continuum level simulations. A polycrystal model of a structurally-graded microstructure is developed, analyzed and used as a benchmark for comparison between the multiscale DD/CP model and the DD predictions. The multiscale DD/CP model follows the DD predictions closely up to an initial peak stress and then follows a strain hardening path that is parallel but somewhat offset from the DD predictions. The difference is believed to be from a combination of the strain rate in the DD simulation and the inability of the DD/CP model to represent non-monotonic material response.

  10. Electromagnetic Design of a Magnetically-Coupled Spatial Power Combiner

    NASA Technical Reports Server (NTRS)

    Bulcha, B.; Cataldo, G.; Stevenson, T. R.; U-Yen, K.; Moseley, S. H.; Wollack, E. J.

    2017-01-01

    The design of a two-dimensional beam-combining network employing a parallel-plate superconducting waveguide with a mono-crystalline silicon dielectric is presented. This novel beam-combining network structure employs an array of magnetically coupled antenna elements to achieve high coupling efficiency and full sampling of the intensity distribution while avoiding diffractive losses in the multi-mode region defined by the parallel-plate waveguide. These attributes enable the structures use in realizing compact far-infrared spectrometers for astrophysical and instrumentation applications. When configured with a suitable corporate-feed power-combiner, this fully sampled array can be used to realize a low-sidelobe apodized response without incurring a reduction in coupling efficiency. To control undesired reflections over a wide range of angles in the finite-sized parallel-plate waveguide region, a wideband meta-material electromagnetic absorber structure is implemented. This adiabatic structure absorbs greater than 99 of the power over the 1.7:1 operational band at angles ranging from normal (0 degree) to near parallel (180 degree) incidence. Design, simulations, and application of the device will be presented.

  11. Model-based tomographic reconstruction

    DOEpatents

    Chambers, David H; Lehman, Sean K; Goodman, Dennis M

    2012-06-26

    A model-based approach to estimating wall positions for a building is developed and tested using simulated data. It borrows two techniques from geophysical inversion problems, layer stripping and stacking, and combines them with a model-based estimation algorithm that minimizes the mean-square error between the predicted signal and the data. The technique is designed to process multiple looks from an ultra wideband radar array. The processed signal is time-gated and each section processed to detect the presence of a wall and estimate its position, thickness, and material parameters. The floor plan of a building is determined by moving the array around the outside of the building. In this paper we describe how the stacking and layer stripping algorithms are combined and show the results from a simple numerical example of three parallel walls.

  12. Exploring Machine Learning Techniques For Dynamic Modeling on Future Exascale Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Song, Shuaiwen; Tallent, Nathan R.; Vishnu, Abhinav

    2013-09-23

    Future exascale systems must be optimized for both power and performance at scale in order to achieve DOE’s goal of a sustained petaflop within 20 Megawatts by 2022 [1]. Massive parallelism of the future systems combined with complex memory hierarchies will form a barrier to efficient application and architecture design. These challenges are exacerbated with emerging complex architectures such as GPGPUs and Intel Xeon Phi as parallelism increases orders of magnitude and system power consumption can easily triple or quadruple. Therefore, we need techniques that can reduce the search space for optimization, isolate power-performance bottlenecks, identify root causes for software/hardwaremore » inefficiency, and effectively direct runtime scheduling.« less

  13. GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sengupta, Dipanjan; Song, Shuaiwen; Agarwal, Kapil

    2015-11-15

    Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the host andmore » device.« less

  14. Parallel Semi-Implicit Spectral Element Atmospheric Model

    NASA Astrophysics Data System (ADS)

    Fournier, A.; Thomas, S.; Loft, R.

    2001-05-01

    The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.

  15. Multi-GPU parallel algorithm design and analysis for improved inversion of probability tomography with gravity gradiometry data

    NASA Astrophysics Data System (ADS)

    Hou, Zhenlong; Huang, Danian

    2017-09-01

    In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.

  16. Mentat: An object-oriented macro data flow system

    NASA Technical Reports Server (NTRS)

    Grimshaw, Andrew S.; Liu, Jane W. S.

    1988-01-01

    Mentat, an object-oriented macro data flow system designed to facilitate parallelism in distributed systems, is presented. The macro data flow model is a model of computation similar to the data flow model with two principal differences: the computational complexity of the actors is much greater than in traditional data flow systems, and there are persistent actors that maintain state information between executions. Mentat is a system that combines the object-oriented programming paradigm and the macro data flow model of computation. Mentat programs use a dynamic structure called a future list to represent the future of computations.

  17. A symmetrical subtraction combined with interpolated values for eliminating scattering from fluorescence EEM data

    NASA Astrophysics Data System (ADS)

    Xu, Jing; Liu, Xiaofei; Wang, Yutian

    2016-08-01

    Parallel factor analysis is a widely used method to extract qualitative and quantitative information of the analyte of interest from fluorescence emission-excitation matrix containing unknown components. Big amplitude of scattering will influence the results of parallel factor analysis. Many methods of eliminating scattering have been proposed. Each of these methods has its advantages and disadvantages. The combination of symmetrical subtraction and interpolated values has been discussed. The combination refers to both the combination of results and the combination of methods. Nine methods were used for comparison. The results show the combination of results can make a better concentration prediction for all the components.

  18. An evidential reasoning extension to quantitative model-based failure diagnosis

    NASA Technical Reports Server (NTRS)

    Gertler, Janos J.; Anderson, Kenneth C.

    1992-01-01

    The detection and diagnosis of failures in physical systems characterized by continuous-time operation are studied. A quantitative diagnostic methodology has been developed that utilizes the mathematical model of the physical system. On the basis of the latter, diagnostic models are derived each of which comprises a set of orthogonal parity equations. To improve the robustness of the algorithm, several models may be used in parallel, providing potentially incomplete and/or conflicting inferences. Dempster's rule of combination is used to integrate evidence from the different models. The basic probability measures are assigned utilizing quantitative information extracted from the mathematical model and from online computation performed therewith.

  19. The Sensitivity of WRF Daily Summertime Simulations over West Africa to Alternative Parameterizations. Part 1: African Wave Circulation

    NASA Technical Reports Server (NTRS)

    Noble, Erik; Druyan, Leonard M.; Fulakeza, Matthew

    2014-01-01

    The performance of the NCAR Weather Research and Forecasting Model (WRF) as a West African regional-atmospheric model is evaluated. The study tests the sensitivity of WRF-simulated vorticity maxima associated with African easterly waves to 64 combinations of alternative parameterizations in a series of simulations in September. In all, 104 simulations of 12-day duration during 11 consecutive years are examined. The 64 combinations combine WRF parameterizations of cumulus convection, radiation transfer, surface hydrology, and PBL physics. Simulated daily and mean circulation results are validated against NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA) and NCEP/Department of Energy Global Reanalysis 2. Precipitation is considered in a second part of this two-part paper. A wide range of 700-hPa vorticity validation scores demonstrates the influence of alternative parameterizations. The best WRF performers achieve correlations against reanalysis of 0.40-0.60 and realistic amplitudes of spatiotemporal variability for the 2006 focus year while a parallel-benchmark simulation by the NASA Regional Model-3 (RM3) achieves higher correlations, but less realistic spatiotemporal variability. The largest favorable impact on WRF-vorticity validation is achieved by selecting the Grell-Devenyi cumulus convection scheme, resulting in higher correlations against reanalysis than simulations using the Kain-Fritch convection. Other parameterizations have less-obvious impact, although WRF configurations incorporating one surface model and PBL scheme consistently performed poorly. A comparison of reanalysis circulation against two NASA radiosonde stations confirms that both reanalyses represent observations well enough to validate the WRF results. Validation statistics for optimized WRF configurations simulating the parallel period during 10 additional years are less favorable than for 2006.

  20. Unrewarded Object Combinations in Captive Parrots

    PubMed Central

    Auersperg, Alice Marie Isabel; Oswald, Natalie; Domanegg, Markus; Gajdon, Gyula Koppany; Bugnyar, Thomas

    2015-01-01

    In primates, complex object combinations during play are often regarded as precursors of functional behavior. Here we investigate combinatory behaviors during unrewarded object manipulation in seven parrot species, including kea, African grey parrots and Goffin cockatoos, three species previously used as model species for technical problem solving. We further examine a habitually tool using species, the black palm cockatoo. Moreover, we incorporate three neotropical species, the yellow- and the black-billed Amazon and the burrowing parakeet. Paralleling previous studies on primates and corvids, free object-object combinations and complex object-substrate combinations such as inserting objects into tubes/holes or stacking rings onto poles prevailed in the species previously linked to advanced physical cognition and tool use. In addition, free object-object combinations were intrinsically structured in Goffin cockatoos and in kea. PMID:25984564

  1. Numerical simulation of h-adaptive immersed boundary method for freely falling disks

    NASA Astrophysics Data System (ADS)

    Zhang, Pan; Xia, Zhenhua; Cai, Qingdong

    2018-05-01

    In this work, a freely falling disk with aspect ratio 1/10 is directly simulated by using an adaptive numerical model implemented on a parallel computation framework JASMIN. The adaptive numerical model is a combination of the h-adaptive mesh refinement technique and the implicit immersed boundary method (IBM). Our numerical results agree well with the experimental results in all of the six degrees of freedom of the disk. Furthermore, very similar vortex structures observed in the experiment were also obtained.

  2. Parallel language constructs for tensor product computations on loosely coupled architectures

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1989-01-01

    A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.

  3. Development of a Navier-Stokes algorithm for parallel-processing supercomputers. Ph.D. Thesis - Colorado State Univ., Dec. 1988

    NASA Technical Reports Server (NTRS)

    Swisshelm, Julie M.

    1989-01-01

    An explicit flow solver, applicable to the hierarchy of model equations ranging from Euler to full Navier-Stokes, is combined with several techniques designed to reduce computational expense. The computational domain consists of local grid refinements embedded in a global coarse mesh, where the locations of these refinements are defined by the physics of the flow. Flow characteristics are also used to determine which set of model equations is appropriate for solution in each region, thereby reducing not only the number of grid points at which the solution must be obtained, but also the computational effort required to get that solution. Acceleration to steady-state is achieved by applying multigrid on each of the subgrids, regardless of the particular model equations being solved. Since each of these components is explicit, advantage can readily be taken of the vector- and parallel-processing capabilities of machines such as the Cray X-MP and Cray-2.

  4. Advanced techniques in reliability model representation and solution

    NASA Technical Reports Server (NTRS)

    Palumbo, Daniel L.; Nicol, David M.

    1992-01-01

    The current tendency of flight control system designs is towards increased integration of applications and increased distribution of computational elements. The reliability analysis of such systems is difficult because subsystem interactions are increasingly interdependent. Researchers at NASA Langley Research Center have been working for several years to extend the capability of Markov modeling techniques to address these problems. This effort has been focused in the areas of increased model abstraction and increased computational capability. The reliability model generator (RMG) is a software tool that uses as input a graphical object-oriented block diagram of the system. RMG uses a failure-effects algorithm to produce the reliability model from the graphical description. The ASSURE software tool is a parallel processing program that uses the semi-Markov unreliability range evaluator (SURE) solution technique and the abstract semi-Markov specification interface to the SURE tool (ASSIST) modeling language. A failure modes-effects simulation is used by ASSURE. These tools were used to analyze a significant portion of a complex flight control system. The successful combination of the power of graphical representation, automated model generation, and parallel computation leads to the conclusion that distributed fault-tolerant system architectures can now be analyzed.

  5. Parallel design patterns for a low-power, software-defined compressed video encoder

    NASA Astrophysics Data System (ADS)

    Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar

    2011-06-01

    Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.

  6. Parallelisation study of a three-dimensional environmental flow model

    NASA Astrophysics Data System (ADS)

    O'Donncha, Fearghal; Ragnoli, Emanuele; Suits, Frank

    2014-03-01

    There are many simulation codes in the geosciences that are serial and cannot take advantage of the parallel computational resources commonly available today. One model important for our work in coastal ocean current modelling is EFDC, a Fortran 77 code configured for optimal deployment on vector computers. In order to take advantage of our cache-based, blade computing system we restructured EFDC from serial to parallel, thereby allowing us to run existing models more quickly, and to simulate larger and more detailed models that were previously impractical. Since the source code for EFDC is extensive and involves detailed computation, it is important to do such a port in a manner that limits changes to the files, while achieving the desired speedup. We describe a parallelisation strategy involving surgical changes to the source files to minimise error-prone alteration of the underlying computations, while allowing load-balanced domain decomposition for efficient execution on a commodity cluster. The use of conjugate gradient posed particular challenges due to implicit non-local communication posing a hindrance to standard domain partitioning schemes; a number of techniques are discussed to address this in a feasible, computationally efficient manner. The parallel implementation demonstrates good scalability in combination with a novel domain partitioning scheme that specifically handles mixed water/land regions commonly found in coastal simulations. The approach presented here represents a practical methodology to rejuvenate legacy code on a commodity blade cluster with reasonable effort; our solution has direct application to other similar codes in the geosciences.

  7. Comparison of intervention effects in split-mouth and parallel-arm randomized controlled trials: a meta-epidemiological study

    PubMed Central

    2014-01-01

    Background Split-mouth randomized controlled trials (RCTs) are popular in oral health research. Meta-analyses frequently include trials of both split-mouth and parallel-arm designs to derive combined intervention effects. However, carry-over effects may induce bias in split- mouth RCTs. We aimed to assess whether intervention effect estimates differ between split- mouth and parallel-arm RCTs investigating the same questions. Methods We performed a meta-epidemiological study. We systematically reviewed meta- analyses including both split-mouth and parallel-arm RCTs with binary or continuous outcomes published up to February 2013. Two independent authors selected studies and extracted data. We used a two-step approach to quantify the differences between split-mouth and parallel-arm RCTs: for each meta-analysis. First, we derived ratios of odds ratios (ROR) for dichotomous data and differences in standardized mean differences (∆SMD) for continuous data; second, we pooled RORs or ∆SMDs across meta-analyses by random-effects meta-analysis models. Results We selected 18 systematic reviews, for 15 meta-analyses with binary outcomes (28 split-mouth and 28 parallel-arm RCTs) and 19 meta-analyses with continuous outcomes (28 split-mouth and 28 parallel-arm RCTs). Effect estimates did not differ between split-mouth and parallel-arm RCTs (mean ROR, 0.96, 95% confidence interval 0.52–1.80; mean ∆SMD, 0.08, -0.14–0.30). Conclusions Our study did not provide sufficient evidence for a difference in intervention effect estimates derived from split-mouth and parallel-arm RCTs. Authors should consider including split-mouth RCTs in their meta-analyses with suitable and appropriate analysis. PMID:24886043

  8. Using the Statecharts paradigm for simulation of patient flow in surgical care.

    PubMed

    Sobolev, Boris; Harel, David; Vasilakis, Christos; Levy, Adrian

    2008-03-01

    Computer simulation of patient flow has been used extensively to assess the impacts of changes in the management of surgical care. However, little research is available on the utility of existing modeling techniques. The purpose of this paper is to examine the capacity of Statecharts, a system of graphical specification, for constructing a discrete-event simulation model of the perioperative process. The Statecharts specification paradigm was originally developed for representing reactive systems by extending the formalism of finite-state machines through notions of hierarchy, parallelism, and event broadcasting. Hierarchy permits subordination between states so that one state may contain other states. Parallelism permits more than one state to be active at any given time. Broadcasting of events allows one state to detect changes in another state. In the context of the peri-operative process, hierarchy provides the means to describe steps within activities and to cluster related activities, parallelism provides the means to specify concurrent activities, and event broadcasting provides the means to trigger a series of actions in one activity according to transitions that occur in another activity. Combined with hierarchy and parallelism, event broadcasting offers a convenient way to describe the interaction of concurrent activities. We applied the Statecharts formalism to describe the progress of individual patients through surgical care as a series of asynchronous updates in patient records generated in reaction to events produced by parallel finite-state machines representing concurrent clinical and managerial activities. We conclude that Statecharts capture successfully the behavioral aspects of surgical care delivery by specifying permissible chronology of events, conditions, and actions.

  9. Concentric Parallel Combining Balun for Millimeter-Wave Power Amplifier in Low-Power CMOS with High-Power Density

    NASA Astrophysics Data System (ADS)

    Han, Jiang-An; Kong, Zhi-Hui; Ma, Kaixue; Yeo, Kiat Seng; Lim, Wei Meng

    2016-11-01

    This paper presents a novel balun for a millimeter-wave power amplifier (PA) design to achieve high-power density in a 65-nm low-power (LP) CMOS process. By using a concentric winding technique, the proposed parallel combining balun with compact size accomplishes power combining and unbalance-balance conversion concurrently. For calculating its power combination efficiency in the condition of various amplitude and phase wave components, a method basing on S-parameters is derived. Based on the proposed parallel combining balun, a fabricated 60-GHz industrial, scientific, and medical (ISM) band PA with single-ended I/O achieves an 18.9-dB gain and an 8.8-dBm output power at 1-dB compression and 14.3-dBm saturated output power ( P sat) at 62 GHz. This PA occupying only a 0.10-mm2 core area has demonstrated a high-power density of 269.15 mW/mm2 in 65 nm LP CMOS.

  10. Genetic Modifiers of Ovarian Cancer

    DTIC Science & Technology

    2014-08-01

    samples from many countries. To account for population stratification, the genotyping data in combination with HapMap data (CEU, Yoruban, Han Chinese...Cambridge, we evaluated associations with both breast and ovarian cancer using a retrospective likelihood model. This accounts for the age extremes of...carriers we used a competing risk analysis that accounted for the effects on breast and ovarian cancer in parallel. In this competing risk analysis

  11. The Diffusion Region in Collisionless Magnetic Reconnection

    NASA Technical Reports Server (NTRS)

    Hesse, Michael; Neukirch, Thomas; Schindler, Karl; Kuznetsova, Masha; Zenitani, Seiji

    2011-01-01

    A review of present understanding of the dissipation region in magnetic reconnection is presented. The review focuses on results of the thermal inertia-based dissipation mechanism but alternative mechanisms are mentioned as well. For the former process, a combination of analytical theory and numerical modeling is presented. Furthermore, a new relation between the electric field expressions for anti-parallel and guide field reconnection is developed.

  12. High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

    ERIC Educational Resources Information Center

    von Davier, Matthias

    2016-01-01

    This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…

  13. Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Saini, Subhash

    1997-01-01

    Compilers supporting High Performance Form (HPF) features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR), Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI) combinations will be compared, based on latest NAS Parallel Benchmark results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition, we would also present NPB, (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu CAPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, and SGI Origin2000. We would also present sustained performance per dollar for Class B LU, SP and BT benchmarks.

  14. Moose: An Open-Source Framework to Enable Rapid Development of Collaborative, Multi-Scale, Multi-Physics Simulation Tools

    NASA Astrophysics Data System (ADS)

    Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.

    2014-12-01

    The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.

  15. A symmetrical subtraction combined with interpolated values for eliminating scattering from fluorescence EEM data.

    PubMed

    Xu, Jing; Liu, Xiaofei; Wang, Yutian

    2016-08-05

    Parallel factor analysis is a widely used method to extract qualitative and quantitative information of the analyte of interest from fluorescence emission-excitation matrix containing unknown components. Big amplitude of scattering will influence the results of parallel factor analysis. Many methods of eliminating scattering have been proposed. Each of these methods has its advantages and disadvantages. The combination of symmetrical subtraction and interpolated values has been discussed. The combination refers to both the combination of results and the combination of methods. Nine methods were used for comparison. The results show the combination of results can make a better concentration prediction for all the components. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Scalable geocomputation: evolving an environmental model building platform from single-core to supercomputers

    NASA Astrophysics Data System (ADS)

    Schmitz, Oliver; de Jong, Kor; Karssenberg, Derek

    2017-04-01

    There is an increasing demand to run environmental models on a big scale: simulations over large areas at high resolution. The heterogeneity of available computing hardware such as multi-core CPUs, GPUs or supercomputer potentially provides significant computing power to fulfil this demand. However, this requires detailed knowledge of the underlying hardware, parallel algorithm design and the implementation thereof in an efficient system programming language. Domain scientists such as hydrologists or ecologists often lack this specific software engineering knowledge, their emphasis is (and should be) on exploratory building and analysis of simulation models. As a result, models constructed by domain specialists mostly do not take full advantage of the available hardware. A promising solution is to separate the model building activity from software engineering by offering domain specialists a model building framework with pre-programmed building blocks that they combine to construct a model. The model building framework, consequently, needs to have built-in capabilities to make full usage of the available hardware. Developing such a framework providing understandable code for domain scientists and being runtime efficient at the same time poses several challenges on developers of such a framework. For example, optimisations can be performed on individual operations or the whole model, or tasks need to be generated for a well-balanced execution without explicitly knowing the complexity of the domain problem provided by the modeller. Ideally, a modelling framework supports the optimal use of available hardware whichsoever combination of model building blocks scientists use. We demonstrate our ongoing work on developing parallel algorithms for spatio-temporal modelling and demonstrate 1) PCRaster, an environmental software framework (http://www.pcraster.eu) providing spatio-temporal model building blocks and 2) parallelisation of about 50 of these building blocks using the new Fern library (https://github.com/geoneric/fern/), an independent generic raster processing library. Fern is a highly generic software library and its algorithms can be configured according to the configuration of a modelling framework. With manageable programming effort (e.g. matching data types between programming and domain language) we created a binding between Fern and PCRaster. The resulting PCRaster Python multicore module can be used to execute existing PCRaster models without having to make any changes to the model code. We show initial results on synthetic and geoscientific models indicating significant runtime improvements provided by parallel local and focal operations. We further outline challenges in improving remaining algorithms such as flow operations over digital elevation maps and further potential improvements like enhancing disk I/O.

  17. The whole mesh deformation model: a fast image segmentation method suitable for effective parallelization

    NASA Astrophysics Data System (ADS)

    Lenkiewicz, Przemyslaw; Pereira, Manuela; Freire, Mário M.; Fernandes, José

    2013-12-01

    In this article, we propose a novel image segmentation method called the whole mesh deformation (WMD) model, which aims at addressing the problems of modern medical imaging. Such problems have raised from the combination of several factors: (1) significant growth of medical image volumes sizes due to increasing capabilities of medical acquisition devices; (2) the will to increase the complexity of image processing algorithms in order to explore new functionality; (3) change in processor development and turn towards multi processing units instead of growing bus speeds and the number of operations per second of a single processing unit. Our solution is based on the concept of deformable models and is characterized by a very effective and precise segmentation capability. The proposed WMD model uses a volumetric mesh instead of a contour or a surface to represent the segmented shapes of interest, which allows exploiting more information in the image and obtaining results in shorter times, independently of image contents. The model also offers a good ability for topology changes and allows effective parallelization of workflow, which makes it a very good choice for large datasets. We present a precise model description, followed by experiments on artificial images and real medical data.

  18. Mesh quality oriented 3D geometric vascular modeling based on parallel transport frame.

    PubMed

    Guo, Jixiang; Li, Shun; Chui, Yim Pan; Qin, Jing; Heng, Pheng Ann

    2013-08-01

    While a number of methods have been proposed to reconstruct geometrically and topologically accurate 3D vascular models from medical images, little attention has been paid to constantly maintain high mesh quality of these models during the reconstruction procedure, which is essential for many subsequent applications such as simulation-based surgical training and planning. We propose a set of methods to bridge this gap based on parallel transport frame. An improved bifurcation modeling method and two novel trifurcation modeling methods are developed based on 3D Bézier curve segments in order to ensure the continuous surface transition at furcations. In addition, a frame blending scheme is implemented to solve the twisting problem caused by frame mismatch of two successive furcations. A curvature based adaptive sampling scheme combined with a mesh quality guided frame tilting algorithm is developed to construct an evenly distributed, non-concave and self-intersection free surface mesh for vessels with distinct radius and high curvature. Extensive experiments demonstrate that our methodology can generate vascular models with better mesh quality than previous methods in terms of surface mesh quality criteria. Copyright © 2013 Elsevier Ltd. All rights reserved.

  19. Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation

    NASA Technical Reports Server (NTRS)

    Kouatchou, Jules; Halem, Milton (Technical Monitor)

    2000-01-01

    We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.

  20. An analytical study of hybrid ejector/internal combustion engine-driven heat pumps

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murphy, R.W.

    1988-01-01

    Because ejectors can combine high reliability with low maintenance cost in a package requiring little capital investment, they may provide attractive heat pumping capability in situations where the importance of their inefficiencies is minimized. One such concept, a hybrid system in which an ejector driven by engine reject heat is used to increase the performance of an internal combustion engine-driven heat pump, was analyzed by modifying an existing ejector heat pump model and combining it with generic compressor and internal combustion engine models. Under the model assumptions for nominal cooling mode conditions, the results showed that hybrid systems could providemore » substantial performance augmentation/emdash/up to 17/percent/ increase in system coefficient of performance for a parallel arrangement of an enhanced ejector with the engine-driven compressor. 4 refs., 4 figs., 4 tabs.« less

  1. Parallel robot for micro assembly with integrated innovative optical 3D-sensor

    NASA Astrophysics Data System (ADS)

    Hesselbach, Juergen; Ispas, Diana; Pokar, Gero; Soetebier, Sven; Tutsch, Rainer

    2002-10-01

    Recent advances in the fields of MEMS and MOEMS often require precise assembly of very small parts with an accuracy of a few microns. In order to meet this demand, a new approach using a robot based on parallel mechanisms in combination with a novel 3D-vision system has been chosen. The planar parallel robot structure with 2 DOF provides a high resolution in the XY-plane. It carries two additional serial axes for linear and rotational movement in/about z direction. In order to achieve high precision as well as good dynamic capabilities, the drive concept for the parallel (main) axes incorporates air bearings in combination with a linear electric servo motors. High accuracy position feedback is provided by optical encoders with a resolution of 0.1 μm. To allow for visualization and visual control of assembly processes, a camera module fits into the hollow tool head. It consists of a miniature CCD camera and a light source. In addition a modular gripper support is integrated into the tool head. To increase the accuracy a control loop based on an optoelectronic sensor will be implemented. As a result of an in-depth analysis of different approaches a photogrammetric system using one single camera and special beam-splitting optics was chosen. A pattern of elliptical marks is applied to the surfaces of workpiece and gripper. Using a model-based recognition algorithm the image processing software identifies the gripper and the workpiece and determines their relative position. A deviation vector is calculated and fed into the robot control to guide the gripper.

  2. Error Modeling and Experimental Study of a Flexible Joint 6-UPUR Parallel Six-Axis Force Sensor.

    PubMed

    Zhao, Yanzhi; Cao, Yachao; Zhang, Caifeng; Zhang, Dan; Zhang, Jie

    2017-09-29

    By combining a parallel mechanism with integrated flexible joints, a large measurement range and high accuracy sensor is realized. However, the main errors of the sensor involve not only assembly errors, but also deformation errors of its flexible leg. Based on a flexible joint 6-UPUR (a kind of mechanism configuration where U-universal joint, P-prismatic joint, R-revolute joint) parallel six-axis force sensor developed during the prephase, assembly and deformation error modeling and analysis of the resulting sensors with a large measurement range and high accuracy are made in this paper. First, an assembly error model is established based on the imaginary kinematic joint method and the Denavit-Hartenberg (D-H) method. Next, a stiffness model is built to solve the stiffness matrix. The deformation error model of the sensor is obtained. Then, the first order kinematic influence coefficient matrix when the synthetic error is taken into account is solved. Finally, measurement and calibration experiments of the sensor composed of the hardware and software system are performed. Forced deformation of the force-measuring platform is detected by using laser interferometry and analyzed to verify the correctness of the synthetic error model. In addition, the first order kinematic influence coefficient matrix in actual circumstances is calculated. By comparing the condition numbers and square norms of the coefficient matrices, the conclusion is drawn theoretically that it is very important to take into account the synthetic error for design stage of the sensor and helpful to improve performance of the sensor in order to meet needs of actual working environments.

  3. Error Modeling and Experimental Study of a Flexible Joint 6-UPUR Parallel Six-Axis Force Sensor

    PubMed Central

    Zhao, Yanzhi; Cao, Yachao; Zhang, Caifeng; Zhang, Dan; Zhang, Jie

    2017-01-01

    By combining a parallel mechanism with integrated flexible joints, a large measurement range and high accuracy sensor is realized. However, the main errors of the sensor involve not only assembly errors, but also deformation errors of its flexible leg. Based on a flexible joint 6-UPUR (a kind of mechanism configuration where U-universal joint, P-prismatic joint, R-revolute joint) parallel six-axis force sensor developed during the prephase, assembly and deformation error modeling and analysis of the resulting sensors with a large measurement range and high accuracy are made in this paper. First, an assembly error model is established based on the imaginary kinematic joint method and the Denavit-Hartenberg (D-H) method. Next, a stiffness model is built to solve the stiffness matrix. The deformation error model of the sensor is obtained. Then, the first order kinematic influence coefficient matrix when the synthetic error is taken into account is solved. Finally, measurement and calibration experiments of the sensor composed of the hardware and software system are performed. Forced deformation of the force-measuring platform is detected by using laser interferometry and analyzed to verify the correctness of the synthetic error model. In addition, the first order kinematic influence coefficient matrix in actual circumstances is calculated. By comparing the condition numbers and square norms of the coefficient matrices, the conclusion is drawn theoretically that it is very important to take into account the synthetic error for design stage of the sensor and helpful to improve performance of the sensor in order to meet needs of actual working environments. PMID:28961209

  4. "Feeling" Series and Parallel Resistances.

    ERIC Educational Resources Information Center

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  5. GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sengupta, Dipanjan; Agarwal, Kapil; Song, Shuaiwen

    2015-09-30

    Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the hostmore » and the device.« less

  6. Scalable multi-objective control for large scale water resources systems under uncertainty

    NASA Astrophysics Data System (ADS)

    Giuliani, Matteo; Quinn, Julianne; Herman, Jonathan; Castelletti, Andrea; Reed, Patrick

    2016-04-01

    The use of mathematical models to support the optimal management of environmental systems is rapidly expanding over the last years due to advances in scientific knowledge of the natural processes, efficiency of the optimization techniques, and availability of computational resources. However, undergoing changes in climate and society introduce additional challenges for controlling these systems, ultimately motivating the emergence of complex models to explore key causal relationships and dependencies on uncontrolled sources of variability. In this work, we contribute a novel implementation of the evolutionary multi-objective direct policy search (EMODPS) method for controlling environmental systems under uncertainty. The proposed approach combines direct policy search (DPS) with hierarchical parallelization of multi-objective evolutionary algorithms (MOEAs) and offers a threefold advantage: the DPS simulation-based optimization can be combined with any simulation model and does not add any constraint on modeled information, allowing the use of exogenous information in conditioning the decisions. Moreover, the combination of DPS and MOEAs prompts the generation or Pareto approximate set of solutions for up to 10 objectives, thus overcoming the decision biases produced by cognitive myopia, where narrow or restrictive definitions of optimality strongly limit the discovery of decision relevant alternatives. Finally, the use of large-scale MOEAs parallelization improves the ability of the designed solutions in handling the uncertainty due to severe natural variability. The proposed approach is demonstrated on a challenging water resources management problem represented by the optimal control of a network of four multipurpose water reservoirs in the Red River basin (Vietnam). As part of the medium-long term energy and food security national strategy, four large reservoirs have been constructed on the Red River tributaries, which are mainly operated for hydropower production, flood control, and water supply. Numerical results under historical as well as synthetically generated hydrologic conditions show that our approach is able to discover key system tradeoffs in the operations of the system. The ability of the algorithm to find near-optimal solutions increases with the number of islands in the adopted hierarchical parallelization scheme. In addition, although significant performance degradation is observed when the solutions designed over history are re-evaluated over synthetically generated inflows, we successfully reduced these vulnerabilities by identifying alternative solutions that are more robust to hydrologic uncertainties, while also addressing the tradeoffs across the Red River multi-sector services.

  7. An Interactive Multi-Model for Consensus on Climate Change

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kocarev, Ljupco

    This project purports to develop a new scheme for forming consensus among alternative climate models, that give widely divergent projections as to the details of climate change, that is more intelligent than simply averaging the model outputs, or averaging with ex post facto weighting factors. The method under development effectively allows models to assimilate data from one another in run time with weights that are chosen in an adaptive training phase using 20th century data, so that the models synchronize with one another as well as with reality. An alternate approach that is being explored in parallel is the automatedmore » combination of equations from different models in an expert-system-like framework.« less

  8. Parallel replica dynamics method for bistable stochastic reaction networks: Simulation and sensitivity analysis

    NASA Astrophysics Data System (ADS)

    Wang, Ting; Plecháč, Petr

    2017-12-01

    Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.

  9. Scaling for the SOL/separatrix χ ⊥ following from the heuristic drift model for the power scrape-off layer width

    NASA Astrophysics Data System (ADS)

    Huber, A.; Chankin, A. V.

    2017-06-01

    A simple two-point representation of the tokamak scrape-off layer (SOL) in the conduction limited regime, based on the parallel and perpendicular energy balance equations in combination with the heat flux width predicted by a heuristic drift-based model, was used to derive a scaling for the cross-field thermal diffusivity {χ }\\perp . For fixed plasma shape and neglecting weak power dependence indexes 1/8, the scaling {χ }\\perp \\propto {P}{{S}{{O}}{{L}}}/(n{B}θ {R}2) is derived.

  10. A parallel optimization method for product configuration and supplier selection based on interval

    NASA Astrophysics Data System (ADS)

    Zheng, Jian; Zhang, Meng; Li, Guoxi

    2017-06-01

    In the process of design and manufacturing, product configuration is an important way of product development, and supplier selection is an essential component of supply chain management. To reduce the risk of procurement and maximize the profits of enterprises, this study proposes to combine the product configuration and supplier selection, and express the multiple uncertainties as interval numbers. An integrated optimization model of interval product configuration and supplier selection was established, and NSGA-II was put forward to locate the Pareto-optimal solutions to the interval multiobjective optimization model.

  11. Hierarchically Structured Non-Intrusive Sign Language Recognition. Chapter 2

    NASA Technical Reports Server (NTRS)

    Zieren, Jorg; Zieren, Jorg; Kraiss, Karl-Friedrich

    2007-01-01

    This work presents a hierarchically structured approach at the nonintrusive recognition of sign language from a monocular frontal view. Robustness is achieved through sophisticated localization and tracking methods, including a combined EM/CAMSHIFT overlap resolution procedure and the parallel pursuit of multiple hypotheses about hands position and movement. This allows handling of ambiguities and automatically corrects tracking errors. A biomechanical skeleton model and dynamic motion prediction using Kalman filters represents high level knowledge. Classification is performed by Hidden Markov Models. 152 signs from German sign language were recognized with an accuracy of 97.6%.

  12. Kalman filter techniques for accelerated Cartesian dynamic cardiac imaging.

    PubMed

    Feng, Xue; Salerno, Michael; Kramer, Christopher M; Meyer, Craig H

    2013-05-01

    In dynamic MRI, spatial and temporal parallel imaging can be exploited to reduce scan time. Real-time reconstruction enables immediate visualization during the scan. Commonly used view-sharing techniques suffer from limited temporal resolution, and many of the more advanced reconstruction methods are either retrospective, time-consuming, or both. A Kalman filter model capable of real-time reconstruction can be used to increase the spatial and temporal resolution in dynamic MRI reconstruction. The original study describing the use of the Kalman filter in dynamic MRI was limited to non-Cartesian trajectories because of a limitation intrinsic to the dynamic model used in that study. Here the limitation is overcome, and the model is applied to the more commonly used Cartesian trajectory with fast reconstruction. Furthermore, a combination of the Kalman filter model with Cartesian parallel imaging is presented to further increase the spatial and temporal resolution and signal-to-noise ratio. Simulations and experiments were conducted to demonstrate that the Kalman filter model can increase the temporal resolution of the image series compared with view-sharing techniques and decrease the spatial aliasing compared with TGRAPPA. The method requires relatively little computation, and thus is suitable for real-time reconstruction. Copyright © 2012 Wiley Periodicals, Inc.

  13. Kalman Filter Techniques for Accelerated Cartesian Dynamic Cardiac Imaging

    PubMed Central

    Feng, Xue; Salerno, Michael; Kramer, Christopher M.; Meyer, Craig H.

    2012-01-01

    In dynamic MRI, spatial and temporal parallel imaging can be exploited to reduce scan time. Real-time reconstruction enables immediate visualization during the scan. Commonly used view-sharing techniques suffer from limited temporal resolution, and many of the more advanced reconstruction methods are either retrospective, time-consuming, or both. A Kalman filter model capable of real-time reconstruction can be used to increase the spatial and temporal resolution in dynamic MRI reconstruction. The original study describing the use of the Kalman filter in dynamic MRI was limited to non-Cartesian trajectories, because of a limitation intrinsic to the dynamic model used in that study. Here the limitation is overcome and the model is applied to the more commonly used Cartesian trajectory with fast reconstruction. Furthermore, a combination of the Kalman filter model with Cartesian parallel imaging is presented to further increase the spatial and temporal resolution and SNR. Simulations and experiments were conducted to demonstrate that the Kalman filter model can increase the temporal resolution of the image series compared with view sharing techniques and decrease the spatial aliasing compared with TGRAPPA. The method requires relatively little computation, and thus is suitable for real-time reconstruction. PMID:22926804

  14. ACCELERATING MR PARAMETER MAPPING USING SPARSITY-PROMOTING REGULARIZATION IN PARAMETRIC DIMENSION

    PubMed Central

    Velikina, Julia V.; Alexander, Andrew L.; Samsonov, Alexey

    2013-01-01

    MR parameter mapping requires sampling along additional (parametric) dimension, which often limits its clinical appeal due to a several-fold increase in scan times compared to conventional anatomic imaging. Data undersampling combined with parallel imaging is an attractive way to reduce scan time in such applications. However, inherent SNR penalties of parallel MRI due to noise amplification often limit its utility even at moderate acceleration factors, requiring regularization by prior knowledge. In this work, we propose a novel regularization strategy, which utilizes smoothness of signal evolution in the parametric dimension within compressed sensing framework (p-CS) to provide accurate and precise estimation of parametric maps from undersampled data. The performance of the method was demonstrated with variable flip angle T1 mapping and compared favorably to two representative reconstruction approaches, image space-based total variation regularization and an analytical model-based reconstruction. The proposed p-CS regularization was found to provide efficient suppression of noise amplification and preservation of parameter mapping accuracy without explicit utilization of analytical signal models. The developed method may facilitate acceleration of quantitative MRI techniques that are not suitable to model-based reconstruction because of complex signal models or when signal deviations from the expected analytical model exist. PMID:23213053

  15. Analysis of a multi-machine database on divertor heat fluxesa)

    NASA Astrophysics Data System (ADS)

    Makowski, M. A.; Elder, D.; Gray, T. K.; LaBombard, B.; Lasnier, C. J.; Leonard, A. W.; Maingi, R.; Osborne, T. H.; Stangeby, P. C.; Terry, J. L.; Watkins, J.

    2012-05-01

    A coordinated effort to measure divertor heat flux characteristics in fully attached, similarly shaped H-mode plasmas on C-Mod, DIII-D, and NSTX was carried out in 2010 in order to construct a predictive scaling relation applicable to next step devices including ITER, FNSF, and DEMO. Few published scaling laws are available and those that have been published were obtained under widely varying conditions and divertor geometries, leading to conflicting predictions for this critically important quantity. This study was designed to overcome these deficiencies. Analysis of the combined data set reveals that the primary dependence of the parallel heat flux width is robustly inverse with Ip, which all three tokamaks independently demonstrate. An improved Thomson scattering system on DIII-D has yielded very accurate scrape off layer (SOL) profile measurements from which tests of parallel transport models have been made. It is found that a flux-limited model agrees best with the data at all collisionalities, while a Spitzer resistivity model agrees at higher collisionality where it is more valid. The SOL profile measurements and divertor heat flux scaling are consistent with a heuristic drift based model as well as a critical gradient model.

  16. Models@Home: distributed computing in bioinformatics using a screensaver based approach.

    PubMed

    Krieger, Elmar; Vriend, Gert

    2002-02-01

    Due to the steadily growing computational demands in bioinformatics and related scientific disciplines, one is forced to make optimal use of the available resources. A straightforward solution is to build a network of idle computers and let each of them work on a small piece of a scientific challenge, as done by Seti@Home (http://setiathome.berkeley.edu), the world's largest distributed computing project. We developed a generally applicable distributed computing solution that uses a screensaver system similar to Seti@Home. The software exploits the coarse-grained nature of typical bioinformatics projects. Three major considerations for the design were: (1) often, many different programs are needed, while the time is lacking to parallelize them. Models@Home can run any program in parallel without modifications to the source code; (2) in contrast to the Seti project, bioinformatics applications are normally more sensitive to lost jobs. Models@Home therefore includes stringent control over job scheduling; (3) to allow use in heterogeneous environments, Linux and Windows based workstations can be combined with dedicated PCs to build a homogeneous cluster. We present three practical applications of Models@Home, running the modeling programs WHAT IF and YASARA on 30 PCs: force field parameterization, molecular dynamics docking, and database maintenance.

  17. Impact of equalizing currents on losses and torque ripples in electrical machines with fractional slot concentrated windings

    NASA Astrophysics Data System (ADS)

    Toporkov, D. M.; Vialcev, G. B.

    2017-10-01

    The implementation of parallel branches is a commonly used manufacturing method of the realizing of fractional slot concentrated windings in electrical machines. If the rotor eccentricity is enabled in a machine with parallel branches, the equalizing currents can arise. The simulation approach of the equalizing currents in parallel branches of an electrical machine winding based on magnetic field calculation by using Finite Elements Method is discussed in the paper. The high accuracy of the model is provided by the dynamic improvement of the inductances in the differential equation system describing a machine. The pre-computed table flux linkage functions are used for that. The functions are the dependences of the flux linkage of parallel branches on the branches currents and rotor position angle. The functions permit to calculate self-inductances and mutual inductances by partial derivative. The calculated results obtained for the electric machine specimen are presented. The results received show that the adverse combination of design solutions and the rotor eccentricity leads to a high value of the equalizing currents and windings heating. Additional torque ripples also arise. The additional ripples harmonic content is not similar to the cogging torque or ripples caused by the rotor eccentricity.

  18. An Evaluation of Different Statistical Targets for Assembling Parallel Forms in Item Response Theory

    PubMed Central

    Ali, Usama S.; van Rijn, Peter W.

    2015-01-01

    Assembly of parallel forms is an important step in the test development process. Therefore, choosing a suitable theoretical framework to generate well-defined test specifications is critical. The performance of different statistical targets of test specifications using the test characteristic curve (TCC) and the test information function (TIF) was investigated. Test length, the number of test forms, and content specifications are considered as well. The TCC target results in forms that are parallel in difficulty, but not necessarily in terms of precision. Vice versa, test forms created using a TIF target are parallel in terms of precision, but not necessarily in terms of difficulty. As sometimes the focus is either on TIF or TCC, differences in either difficulty or precision can arise. Differences in difficulty can be mitigated by equating, but differences in precision cannot. In a series of simulations using a real item bank, the two-parameter logistic model, and mixed integer linear programming for automated test assembly, these differences were found to be quite substantial. When both TIF and TCC are combined into one target with manipulation to relative importance, these differences can be made to disappear.

  19. Development of a 3D parallel mechanism robot arm with three vertical-axial pneumatic actuators combined with a stereo vision system.

    PubMed

    Chiang, Mao-Hsiung; Lin, Hao-Ting

    2011-01-01

    This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot's end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H(∞) tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to verify the feasibility of the proposed parallel mechanism robot driven by three vertical pneumatic servo actuators, a full-scale test rig of the proposed parallel mechanism pneumatic robot is set up. Thus, simulations and experiments for different complex 3D motion profiles of the robot end-effector can be successfully achieved. The desired, the actual and the calculated 3D position of the end-effector can be compared in the complex 3D motion control.

  20. Development of a 3D Parallel Mechanism Robot Arm with Three Vertical-Axial Pneumatic Actuators Combined with a Stereo Vision System

    PubMed Central

    Chiang, Mao-Hsiung; Lin, Hao-Ting

    2011-01-01

    This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot’s end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H∞ tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to verify the feasibility of the proposed parallel mechanism robot driven by three vertical pneumatic servo actuators, a full-scale test rig of the proposed parallel mechanism pneumatic robot is set up. Thus, simulations and experiments for different complex 3D motion profiles of the robot end-effector can be successfully achieved. The desired, the actual and the calculated 3D position of the end-effector can be compared in the complex 3D motion control. PMID:22247676

  1. Parallel particle filters for online identification of mechanistic mathematical models of physiology from monitoring data: performance and real-time scalability in simulation scenarios.

    PubMed

    Zenker, Sven

    2010-08-01

    Combining mechanistic mathematical models of physiology with quantitative observations using probabilistic inference may offer advantages over established approaches to computerized decision support in acute care medicine. Particle filters (PF) can perform such inference successively as data becomes available. The potential of PF for real-time state estimation (SE) for a model of cardiovascular physiology is explored using parallel computers and the ability to achieve joint state and parameter estimation (JSPE) given minimal prior knowledge tested. A parallelized sequential importance sampling/resampling algorithm was implemented and its scalability for the pure SE problem for a non-linear five-dimensional ODE model of the cardiovascular system evaluated on a Cray XT3 using up to 1,024 cores. JSPE was implemented using a state augmentation approach with artificial stochastic evolution of the parameters. Its performance when simultaneously estimating the 5 states and 18 unknown parameters when given observations only of arterial pressure, central venous pressure, heart rate, and, optionally, cardiac output, was evaluated in a simulated bleeding/resuscitation scenario. SE was successful and scaled up to 1,024 cores with appropriate algorithm parametrization, with real-time equivalent performance for up to 10 million particles. JSPE in the described underdetermined scenario achieved excellent reproduction of observables and qualitative tracking of enddiastolic ventricular volumes and sympathetic nervous activity. However, only a subset of the posterior distributions of parameters concentrated around the true values for parts of the estimated trajectories. Parallelized PF's performance makes their application to complex mathematical models of physiology for the purpose of clinical data interpretation, prediction, and therapy optimization appear promising. JSPE in the described extremely underdetermined scenario nevertheless extracted information of potential clinical relevance from the data in this simulation setting. However, fully satisfactory resolution of this problem when minimal prior knowledge about parameter values is available will require further methodological improvements, which are discussed.

  2. A Joint Method of Envelope Inversion Combined with Hybrid-domain Full Waveform Inversion

    NASA Astrophysics Data System (ADS)

    CUI, C.; Hou, W.

    2017-12-01

    Full waveform inversion (FWI) aims to construct high-precision subsurface models by fully using the information in seismic records, including amplitude, travel time, phase and so on. However, high non-linearity and the absence of low frequency information in seismic data lead to the well-known cycle skipping problem and make inversion easily fall into local minima. In addition, those 3D inversion methods that are based on acoustic approximation ignore the elastic effects in real seismic field, and make inversion harder. As a result, the accuracy of final inversion results highly relies on the quality of initial model. In order to improve stability and quality of inversion results, multi-scale inversion that reconstructs subsurface model from low to high frequency are applied. But, the absence of very low frequencies (< 3Hz) in field data is still bottleneck in the FWI. By extracting ultra low-frequency data from field data, envelope inversion is able to recover low wavenumber model with a demodulation operator (envelope operator), though the low frequency data does not really exist in field data. To improve the efficiency and viability of the inversion, in this study, we proposed a joint method of envelope inversion combined with hybrid-domain FWI. First, we developed 3D elastic envelope inversion, and the misfit function and the corresponding gradient operator were derived. Then we performed hybrid-domain FWI with envelope inversion result as initial model which provides low wavenumber component of model. Here, forward modeling is implemented in the time domain and inversion in the frequency domain. To accelerate the inversion, we adopt CPU/GPU heterogeneous computing techniques. There were two levels of parallelism. In the first level, the inversion tasks are decomposed and assigned to each computation node by shot number. In the second level, GPU multithreaded programming is used for the computation tasks in each node, including forward modeling, envelope extraction, DFT (discrete Fourier transform) calculation and gradients calculation. Numerical tests demonstrated that the combined envelope inversion + hybrid-domain FWI could obtain much faithful and accurate result than conventional hybrid-domain FWI. The CPU/GPU heterogeneous parallel computation could improve the performance speed.

  3. GPU-based parallel algorithm for blind image restoration using midfrequency-based methods

    NASA Astrophysics Data System (ADS)

    Xie, Lang; Luo, Yi-han; Bao, Qi-liang

    2013-08-01

    GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.

  4. Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald.

    PubMed

    Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

    2014-02-28

    In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.

  5. Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald

    PubMed Central

    Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

    2015-01-01

    In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230

  6. Determination of polycyclic aromatic hydrocarbons by four-way parallel factor analysis in presence of humic acid.

    PubMed

    Yang, Ruifang; Zhao, Nanjing; Xiao, Xue; Yu, Shaohui; Liu, Jianguo; Liu, Wenqing

    2016-01-05

    There is not effective method to solve the quenching effect of quencher in fluorescence spectra measurement and recognition of polycyclic aromatic hydrocarbons in aquatic environment. In this work, a four-way dataset combined with four-way parallel factor analysis is used to identify and quantify polycyclic aromatic hydrocarbons in the presence of humic acid, a fluorescent quencher and an ubiquitous substance in aquatic system, through modeling the quenching effect of humic acid by decomposing the four-way dataset into four loading matrices corresponding to relative concentration, excitation spectra, emission spectra and fluorescence quantum yield, respectively. It is found that Phenanthrene, pyrene, anthracene and fluorene can be recognized simultaneously with the similarities all above 0.980 between resolved spectra and reference spectra. Moreover, the concentrations of them ranging from 0 to 8μgL(-1) in the test samples prepared with river water could also be predicted successfully with recovery rate of each polycyclic aromatic hydrocarbon between 100% and 120%, which were higher than those of three-way PARAFAC. These results demonstrate that the combination of four-way dataset with four-way parallel factor analysis could be a promising method to recognize the fluorescence spectra of polycyclic aromatic hydrocarbons in the presence of fluorescent quencher from both qualitative and quantitative perspective. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Parallel CARLOS-3D code development

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Putnam, J.M.; Kotulski, J.D.

    1996-02-01

    CARLOS-3D is a three-dimensional scattering code which was developed under the sponsorship of the Electromagnetic Code Consortium, and is currently used by over 80 aerospace companies and government agencies. The code has been extensively validated and runs on both serial workstations and parallel super computers such as the Intel Paragon. CARLOS-3D is a three-dimensional surface integral equation scattering code based on a Galerkin method of moments formulation employing Rao- Wilton-Glisson roof-top basis for triangular faceted surfaces. Fully arbitrary 3D geometries composed of multiple conducting and homogeneous bulk dielectric materials can be modeled. This presentation describes some of the extensions tomore » the CARLOS-3D code, and how the operator structure of the code facilitated these improvements. Body of revolution (BOR) and two-dimensional geometries were incorporated by simply including new input routines, and the appropriate Galerkin matrix operator routines. Some additional modifications were required in the combined field integral equation matrix generation routine due to the symmetric nature of the BOR and 2D operators. Quadrilateral patched surfaces with linear roof-top basis functions were also implemented in the same manner. Quadrilateral facets and triangular facets can be used in combination to more efficiently model geometries with both large smooth surfaces and surfaces with fine detail such as gaps and cracks. Since the parallel implementation in CARLOS-3D is at high level, these changes were independent of the computer platform being used. This approach minimizes code maintenance, while providing capabilities with little additional effort. Results are presented showing the performance and accuracy of the code for some large scattering problems. Comparisons between triangular faceted and quadrilateral faceted geometry representations will be shown for some complex scatterers.« less

  8. Flow distribution in parallel microfluidic networks and its effect on concentration gradient

    PubMed Central

    Guermonprez, Cyprien; Michelin, Sébastien; Baroud, Charles N.

    2015-01-01

    The architecture of microfluidic networks can significantly impact the flow distribution within its different branches and thereby influence tracer transport within the network. In this paper, we study the flow rate distribution within a network of parallel microfluidic channels with a single input and single output, using a combination of theoretical modeling and microfluidic experiments. Within the ladder network, the flow rate distribution follows a U-shaped profile, with the highest flow rate occurring in the initial and final branches. The contrast with the central branches is controlled by a single dimensionless parameter, namely, the ratio of hydrodynamic resistance between the distribution channel and the side branches. This contrast in flow rates decreases when the resistance of the side branches increases relative to the resistance of the distribution channel. When the inlet flow is composed of two parallel streams, one of which transporting a diffusing species, a concentration variation is produced within the side branches of the network. The shape of this concentration gradient is fully determined by two dimensionless parameters: the ratio of resistances, which determines the flow rate distribution, and the Péclet number, which characterizes the relative speed of diffusion and advection. Depending on the values of these two control parameters, different distribution profiles can be obtained ranging from a flat profile to a step distribution of solute, with well-distributed gradients between these two limits. Our experimental results are in agreement with our numerical model predictions, based on a simplified 2D advection-diffusion problem. Finally, two possible applications of this work are presented: the first one combines the present design with self-digitization principle to encapsulate the controlled concentration in nanoliter chambers, while the second one extends the present design to create a continuous concentration gradient within an open flow chamber. PMID:26487905

  9. 2D Seismic Imaging of Elastic Parameters by Frequency Domain Full Waveform Inversion

    NASA Astrophysics Data System (ADS)

    Brossier, R.; Virieux, J.; Operto, S.

    2008-12-01

    Thanks to recent advances in parallel computing, full waveform inversion is today a tractable seismic imaging method to reconstruct physical parameters of the earth interior at different scales ranging from the near- surface to the deep crust. We present a massively parallel 2D frequency-domain full-waveform algorithm for imaging visco-elastic media from multi-component seismic data. The forward problem (i.e. the resolution of the frequency-domain 2D PSV elastodynamics equations) is based on low-order Discontinuous Galerkin (DG) method (P0 and/or P1 interpolations). Thanks to triangular unstructured meshes, the DG method allows accurate modeling of both body waves and surface waves in case of complex topography for a discretization of 10 to 15 cells per shear wavelength. The frequency-domain DG system is solved efficiently for multiple sources with the parallel direct solver MUMPS. The local inversion procedure (i.e. minimization of residuals between observed and computed data) is based on the adjoint-state method which allows to efficiently compute the gradient of the objective function. Applying the inversion hierarchically from the low frequencies to the higher ones defines a multiresolution imaging strategy which helps convergence towards the global minimum. In place of expensive Newton algorithm, the combined use of the diagonal terms of the approximate Hessian matrix and optimization algorithms based on quasi-Newton methods (Conjugate Gradient, LBFGS, ...) allows to improve the convergence of the iterative inversion. The distribution of forward problem solutions over processors driven by a mesh partitioning performed by METIS allows to apply most of the inversion in parallel. We shall present the main features of the parallel modeling/inversion algorithm, assess its scalability and illustrate its performances with realistic synthetic case studies.

  10. Hybrid simulations of radial transport driven by the Rayleigh-Taylor instability

    NASA Astrophysics Data System (ADS)

    Delamere, P. A.; Stauffer, B. H.; Ma, X.

    2017-12-01

    Plasma transport in the rapidly rotating giant magnetospheres is thought to involve a centrifugally-driven flux tube interchange instability, similar to the Rayleigh-Taylor (RT) instability. In three dimensions, the convective flow patterns associated with the RT instability can produce strong guide field reconnection, allowing plasma mass to move radially outward while conserving magnetic flux (Ma et al., 2016). We present a set of hybrid (kinetic ion / fluid electron) plasma simulations of the RT instability using high plasma beta conditions appropriate for Jupiter's inner and middle magnetosphere. A density gradient, combined with a centrifugal force, provide appropriate RT onset conditions. Pressure balance is achieved by initializing two ion populations: one with fixed temperature, but varying density, and the other with fixed density, but a temperature gradient that offsets the density gradient from the first population and the centrifugal force (effective gravity). We first analyze two-dimensional results for the plane perpendicular to the magnetic field by comparing growth rates as a function of wave vector following Huba et al. (1998). Prescribed perpendicular wave modes are seeded with an initial velocity perturbation. We then extend the model to three dimensions, introducing a stabilizing parallel wave vector. Boundary conditions in the parallel direction prohibit motion of the magnetic field line footprints to model the eigenmodes of the magnetodisc's resonant cavity. We again compare growth rates based on perpendicular wave number, but also on the parallel extent of the resonant cavity, which fixes the size of the largest parallel wavelength. Finally, we search for evidence of strong guide field magnetic reconnection within the domain by identifying areas with large parallel electric fields or changes in magnetic field topology.

  11. Accurate modeling of switched reluctance machine based on hybrid trained WNN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Song, Shoujun, E-mail: sunnyway@nwpu.edu.cn; Ge, Lefei; Ma, Shaojie

    2014-04-15

    According to the strong nonlinear electromagnetic characteristics of switched reluctance machine (SRM), a novel accurate modeling method is proposed based on hybrid trained wavelet neural network (WNN) which combines improved genetic algorithm (GA) with gradient descent (GD) method to train the network. In the novel method, WNN is trained by GD method based on the initial weights obtained per improved GA optimization, and the global parallel searching capability of stochastic algorithm and local convergence speed of deterministic algorithm are combined to enhance the training accuracy, stability and speed. Based on the measured electromagnetic characteristics of a 3-phase 12/8-pole SRM, themore » nonlinear simulation model is built by hybrid trained WNN in Matlab. The phase current and mechanical characteristics from simulation under different working conditions meet well with those from experiments, which indicates the accuracy of the model for dynamic and static performance evaluation of SRM and verifies the effectiveness of the proposed modeling method.« less

  12. Manual control of yaw motion with combined visual and vestibular cues

    NASA Technical Reports Server (NTRS)

    Zacharias, G. L.; Young, L. R.

    1977-01-01

    Measurements are made of manual control performance in the closed-loop task of nulling perceived self-rotation velocity about an earth-vertical axis. Self-velocity estimation was modelled as a function of the simultaneous presentation of vestibular and peripheral visual field motion cues. Based on measured low-frequency operator behavior in three visual field environments, a parallel channel linear model is proposed which has separate visual and vestibular pathways summing in a complementary manner. A correction to the frequency responses is provided by a separate measurement of manual control performance in an analogous visual pursuit nulling task. The resulting dual-input describing function for motion perception dependence on combined cue presentation supports the complementary model, in which vestibular cues dominate sensation at frequencies above 0.05 Hz. The describing function model is extended by the proposal of a non-linear cue conflict model, in which cue weighting depends on the level of agreement between visual and vestibular cues.

  13. 3-D modeling of ductile tearing using finite elements: Computational aspects and techniques

    NASA Astrophysics Data System (ADS)

    Gullerud, Arne Stewart

    This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.

  14. Toxicodynamic analysis of the combined cholinesterase inhibition by paraoxon and methamidophos in human whole blood

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bosgra, Sieto; Eijkeren, Jan C.H. van; Schans, Marcel J. van der

    2009-04-01

    Theoretical work has shown that the isobole method is not generally valid as a method for testing the absence or presence of interaction (in the biochemical sense) between chemicals. The present study illustrates how interaction can be tested by fitting a toxicodynamic model to the results of a mixture experiment. The inhibition of cholinesterases (ChE) in human whole blood by various dose combinations of paraoxon and methamidophos was measured in vitro. A toxicodynamic model describing the processes related to both OPs in inhibiting AChE activity was developed, and fit to the observed activities. This model, not containing any interaction betweenmore » the two OPs, described the results from the mixture experiment well, and it was concluded that the OPs did not interact in the whole blood samples. While this approach of toxicodynamic modeling is the most appropriate method for predicting combined effects, it is not rapidly applicable. Therefore, we illustrate how toxicodynamic modeling can be used to explore under which conditions dose addition would give an acceptable approximation of the combined effects from various chemicals. In the specific case of paraoxon and methamidophos in whole blood samples, it was found that dose addition gave a reasonably accurate prediction of the combined effects, despite considerable difference in some of their rate constants, and mildly non-parallel dose-response curves. Other possibilities of validating dose-addition using toxicodynamic modeling are briefly discussed.« less

  15. Model and particle-in-cell simulation of ion energy distribution in collisionless sheath

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, Zhuwen, E-mail: zzwwdxy@gznc.edu.cn; Key Laboratory of Photoelectron Materials Design and Simulation in Guizhou Province, Guiyang 550018; Scientific Research Innovation Team in Plasma and Functional Thin Film Materials in Guizhou Province, Guiyang 550018

    2015-06-15

    In this paper, we propose a self-consistent theoretical model, which is described by the ion energy distributions (IEDs) in collisionless sheaths, and the analytical results for different combined dc/radio frequency (rf) capacitive coupled plasma discharge cases, including sheath voltage errors analysis, are compared with the results of numerical simulations using a one-dimensional plane-parallel particle-in-cell (PIC) simulation. The IEDs in collisionless sheaths are performed on combination of dc/rf voltage sources electrodes discharge using argon as the process gas. The incident ions on the grounded electrode are separated, according to their different radio frequencies, and dc voltages on a separated electrode, themore » IEDs, and widths of energy in sheath and the plasma sheath thickness are discussed. The IEDs, the IED widths, and sheath voltages by the theoretical model are investigated and show good agreement with PIC simulations.« less

  16. Effect of Topology Structure on the Output Performance of an Automobile Exhaust Thermoelectric Generator

    NASA Astrophysics Data System (ADS)

    Fang, W.; Quan, S. H.; Xie, C. J.; Ran, B.; Li, X. L.; Wang, L.; Jiao, Y. T.; Xu, T. W.

    2017-05-01

    The majority of the thermal energy released in an automotive internal combustion cycle is exhausted as waste heat through the tail pipe. This paper describes an automobile exhaust thermoelectric generator (AETEG), designed to recycle automobile waste heat. A model of the output characteristics of each thermoelectric device was established by testing their open circuit voltage and internal resistance, and combining the output characteristics. To better describe the relationship, the physical model was transformed into a topological model. The connection matrix was used to describe the relationship between any two thermoelectric devices in the topological structure. Different topological structures produced different power outputs; their output power was maximised by using an iterative algorithm to optimize the series-parallel electrical topology structure. The experimental results have shown that the output power of the optimal topology structure increases by 18.18% and 29.35% versus that of a pure in-series or parallel topology, respectively, and by 10.08% versus a manually defined structure (based on user experience). The thermoelectric conversion device increased energy efficiency by 40% when compared with a traditional car.

  17. Unravelling how βCaMKII controls the direction of plasticity at parallel fibre-Purkinje cell synapses

    NASA Astrophysics Data System (ADS)

    Pinto, Thiago M.; Schilstra, Maria J.; Steuber, Volker; Roque, Antonio C.

    2015-12-01

    Long-term plasticity at parallel fibre (PF)-Purkinje cell (PC) synapses is thought to mediate cerebellar motor learning. It is known that calcium-calmodulin dependent protein kinase II (CaMKII) is essential for plasticity in the cerebellum. Recently, Van Woerden et al. demonstrated that the β isoform of CaMKII regulates the bidirectional inversion of PF-PC plasticity. Because the cellular events that underlie these experimental findings are still poorly understood, our work aims at unravelling how β CaMKII controls the direction of plasticity at PF-PC synapses. We developed a bidirectional plasticity model that replicates the experimental observations by Van Woerden et al. Simulation results obtained from this model indicate the mechanisms that underlie the bidirectional inversion of cerebellar plasticity. As suggested by Van Woerden et al., the filamentous actin binding enables β CaMKII to regulate the bidirectional plasticity at PF-PC synapses. Our model suggests that the reversal of long-term plasticity in PCs is based on a combination of mechanisms that occur at different calcium concentrations.

  18. Accelerating population balance-Monte Carlo simulation for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU parallel computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Zuwei; Zhao, Haibo, E-mail: klinsmannzhb@163.com; Zheng, Chuguang

    2015-01-15

    This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule providesmore » a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated in a physically realistic Brownian coagulation case. The computational accuracy is validated with benchmark solution of discrete-sectional method. The simulation results show that the comprehensive approach can attain very favorable improvement in cost without sacrificing computational accuracy.« less

  19. Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

    NASA Technical Reports Server (NTRS)

    Abdi, Frank

    1996-01-01

    A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.

  20. Novel Prostate Cancer Pathway Modeling using Boolean Implication

    DTIC Science & Technology

    2012-09-01

    cause of cancer deaths in men. Diagnosis and pathogenesis of this disease is poorly understood. Prostate specific antigen (PSA) test is still the... specific database, I combined 14 different datasets (global prostate cancer database, total n=891) that are in Affymetrix U133A (n=456), U133A 2.0...immunohistochemistry and flow cytometry are limited by the availability of antigen - specific monoclonal antibodies and by the small number of parallel

  1. QCM-D on mica for parallel QCM-D-AFM studies.

    PubMed

    Richter, Ralf P; Brisson, Alain

    2004-05-25

    Quartz crystal microbalance with dissipation monitoring (QCM-D) has developed into a recognized method to study adsorption processes in liquid, such as the formation of supported lipid bilayers and protein adsorption. However, the large intrinsic roughness of currently used gold-coated or silica-coated QCM-D sensors limits parallel structural characterization by atomic force microscopy (AFM). We present a method for coating QCM-D sensors with thin mica sheets operating in liquid with high stability and sensitivity. We define criteria to objectively assess the reliability of the QCM-D measurements and demonstrate that the mica-coated sensors can be used to follow the formation of supported lipid membranes and subsequent protein adsorption. This method allows combining QCM-D and AFM investigations on identical supports, providing detailed physicochemical and structural characterization of model membranes.

  2. Reconfigurable Model Execution in the OpenMDAO Framework

    NASA Technical Reports Server (NTRS)

    Hwang, John T.

    2017-01-01

    NASA's OpenMDAO framework facilitates constructing complex models and computing their derivatives for multidisciplinary design optimization. Decomposing a model into components that follow a prescribed interface enables OpenMDAO to assemble multidisciplinary derivatives from the component derivatives using what amounts to the adjoint method, direct method, chain rule, global sensitivity equations, or any combination thereof, using the MAUD architecture. OpenMDAO also handles the distribution of processors among the disciplines by hierarchically grouping the components, and it automates the data transfer between components that are on different processors. These features have made OpenMDAO useful for applications in aircraft design, satellite design, wind turbine design, and aircraft engine design, among others. This paper presents new algorithms for OpenMDAO that enable reconfigurable model execution. This concept refers to dynamically changing, during execution, one or more of: the variable sizes, solution algorithm, parallel load balancing, or set of variables-i.e., adding and removing components, perhaps to switch to a higher-fidelity sub-model. Any component can reconfigure at any point, even when running in parallel with other components, and the reconfiguration algorithm presented here performs the synchronized updates to all other components that are affected. A reconfigurable software framework for multidisciplinary design optimization enables new adaptive solvers, adaptive parallelization, and new applications such as gradient-based optimization with overset flow solvers and adaptive mesh refinement. Benchmarking results demonstrate the time savings for reconfiguration compared to setting up the model again from scratch, which can be significant in large-scale problems. Additionally, the new reconfigurability feature is applied to a mission profile optimization problem for commercial aircraft where both the parametrization of the mission profile and the time discretization are adaptively refined, resulting in computational savings of roughly 10% and the elimination of oscillations in the optimized altitude profile.

  3. An executable specification for the message processor in a simple combining network

    NASA Technical Reports Server (NTRS)

    Middleton, David

    1995-01-01

    While the primary function of the network in a parallel computer is to communicate data between processors, it is often useful if the network can also perform rudimentary calculations. That is, some simple processing ability in the network itself, particularly for performing parallel prefix computations, can reduce both the volume of data being communicated and the computational load on the processors proper. Unfortunately, typical implementations of such networks require a large fraction of the hardware budget, and so combining networks are viewed as being impractical. The FFP Machine has such a combining network, and various characteristics of the machine allow a good deal of simplification in the network design. Despite being simple in construction however, the network relies on many subtle details to work correctly. This paper describes an executable model of the network which will serve several purposes. It provides a complete and detailed description of the network which can substantiate its ability to support necessary functions. It provides an environment in which algorithms to be run on the network can be designed and debugged more easily than they would on physical hardware. Finally, it provides the foundation for exploring the design of the message receiving facility which connects the network to the individual processors.

  4. Modelling audiovisual integration of affect from videos and music.

    PubMed

    Gao, Chuanji; Wedell, Douglas H; Kim, Jongwan; Weber, Christine E; Shinkareva, Svetlana V

    2018-05-01

    Two experiments examined how affective values from visual and auditory modalities are integrated. Experiment 1 paired music and videos drawn from three levels of valence while holding arousal constant. Experiment 2 included a parallel combination of three levels of arousal while holding valence constant. In each experiment, participants rated their affective states after unimodal and multimodal presentations. Experiment 1 revealed a congruency effect in which stimulus combinations of the same extreme valence resulted in more extreme state ratings than component stimuli presented in isolation. An interaction between music and video valence reflected the greater influence of negative affect. Video valence was found to have a significantly greater effect on combined ratings than music valence. The pattern of data was explained by a five parameter differential weight averaging model that attributed greater weight to the visual modality and increased weight with decreasing values of valence. Experiment 2 revealed a congruency effect only for high arousal combinations and no interaction effects. This pattern was explained by a three parameter constant weight averaging model with greater weight for the auditory modality and a very low arousal value for the initial state. These results demonstrate key differences in audiovisual integration between valence and arousal.

  5. Synchronized motion control and precision positioning compensation of a 3-DOFs macro-micro parallel manipulator fully actuated by piezoelectric actuators

    NASA Astrophysics Data System (ADS)

    Zhang, Quan; Li, Chaodong; Zhang, Jiantao; Zhang, Xu

    2017-11-01

    The macro-micro combined approach, as an effective way to realize trans-scale nano-precision positioning with multi-dimensions and high velocity, plays a significant role in integrated circuit manufacturing field. A 3-degree-of-freedoms (3-DOFs) macro-micro manipulator is designed and analyzed to compromise the conflictions among the large stroke, high precision and multi-DOFs. The macro manipulator is a 3-Prismatic-Revolute-Revolute (3-PRR) structure parallel manipulator which is driven by three linear ultrasonic motors. The dynamic model and the cross-coupling error based synchronized motion controller of the 3-PRR parallel manipulator are theoretical analyzed and experimental tested. To further improve the positioning accuracy, a 3-DOFs monolithic compliant manipulator actuated by three piezoelectric stack actuators is designed. Then a multilayer BP neural network based inverse kinematic model identifier is developed to perform the positioning control. Finally, by forming the macro-micro structure, the dual stage manipulator successfully achieved the positioning task from the point (2 mm, 2 mm, 0 rad) back to the original point (0 mm, 0 mm, 0 rad) with the translation errors in X and Y directions less than ±50 nm and the rotation error around Z axis less than ±1 μrad, respectively.

  6. PCSIM: A Parallel Simulation Environment for Neural Circuits Fully Integrated with Python

    PubMed Central

    Pecevski, Dejan; Natschläger, Thomas; Schuch, Klaus

    2008-01-01

    The Parallel Circuit SIMulator (PCSIM) is a software package for simulation of neural circuits. It is primarily designed for distributed simulation of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit simulator with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural simulations. PMID:19543450

  7. Nice Guys Finish Fast and Bad Guys Finish Last: Facilitatory vs. Inhibitory Interaction in Parallel Systems

    PubMed Central

    Eidels, Ami; Houpt, Joseph W.; Altieri, Nicholas; Pei, Lei; Townsend, James T.

    2011-01-01

    Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system. PMID:21516183

  8. Nice Guys Finish Fast and Bad Guys Finish Last: Facilitatory vs. Inhibitory Interaction in Parallel Systems.

    PubMed

    Eidels, Ami; Houpt, Joseph W; Altieri, Nicholas; Pei, Lei; Townsend, James T

    2011-04-01

    Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system.

  9. Parallel and pipeline computation of fast unitary transforms

    NASA Technical Reports Server (NTRS)

    Fino, B. J.; Algazi, V. R.

    1975-01-01

    The letter discusses the parallel and pipeline organization of fast-unitary-transform algorithms such as the fast Fourier transform, and points out the efficiency of a combined parallel-pipeline processor of a transform such as the Haar transform, in which (2 to the n-th power) -1 hardware 'butterflies' generate a transform of order 2 to the n-th power every computation cycle.

  10. Electrophysiological Modeling of Cardiac Ventricular Function: From Cell to Organ

    PubMed Central

    Winslow, R. L.; Scollan, D. F.; Holmes, A.; Yung, C. K.; Zhang, J.; Jafri, M. S.

    2005-01-01

    Three topics of importance to modeling the integrative function of the heart are reviewed. The first is modeling of the ventricular myocyte. Emphasis is placed on excitation-contraction coupling and intracellular Ca2+ handling, and the interpretation of experimental data regarding interval-force relationships. Second, data on use of diffusion tensor magnetic resonance (DTMR) imaging for measuring the anatomical structure of the cardiac ventricles are presented. A method for the semi-automated reconstruction of the ventricles using a combination of gradient recalled acquisition in the steady state (GRASS) and DTMR images is described. Third, we describe how these anatomically and biophysically based models of the cardiac ventricles can be implemented on parallel computers. PMID:11701509

  11. Removal of antibiotics in a parallel-plate thin-film-photocatalytic reactor: Process modeling and evolution of transformation by-products and toxicity.

    PubMed

    Özkal, Can Burak; Frontistis, Zacharias; Antonopoulou, Maria; Konstantinou, Ioannis; Mantzavinos, Dionissios; Meriç, Süreyya

    2017-10-01

    Photocatalytic degradation of sulfamethoxazole (SMX) antibiotic has been studied under recycling batch and homogeneous flow conditions in a thin-film coated immobilized system namely parallel-plate (PPL) reactor. Experimentally designed, statistically evaluated with a factorial design (FD) approach with intent to provide a mathematical model takes into account the parameters influencing process performance. Initial antibiotic concentration, UV energy level, irradiated surface area, water matrix (ultrapure and secondary treated wastewater) and time, were defined as model parameters. A full of 2 5 experimental design was consisted of 32 random experiments. PPL reactor test experiments were carried out in order to set boundary levels for hydraulic, volumetric and defined defined process parameters. TTIP based thin-film with polyethylene glycol+TiO 2 additives were fabricated according to pre-described methodology. Antibiotic degradation was monitored by High Performance Liquid Chromatography analysis while the degradation products were specified by LC-TOF-MS analysis. Acute toxicity of untreated and treated SMX solutions was tested by standard Daphnia magna method. Based on the obtained mathematical model, the response of the immobilized PC system is described with a polynomial equation. The statistically significant positive effects are initial SMX concentration, process time and the combined effect of both, while combined effect of water matrix and irradiated surface area displays an adverse effect on the rate of antibiotic degradation by photocatalytic oxidation. Process efficiency and the validity of the acquired mathematical model was also verified for levofloxacin and cefaclor antibiotics. Immobilized PC degradation in PPL reactor configuration was found capable of providing reduced effluent toxicity by simultaneous degradation of SMX parent compound and TBPs. Copyright © 2017. Published by Elsevier B.V.

  12. Graphics Processing Unit–Enhanced Genetic Algorithms for Solving the Temporal Dynamics of Gene Regulatory Networks

    PubMed Central

    García-Calvo, Raúl; Guisado, JL; Diaz-del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco

    2018-01-01

    Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes—master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)—is carried out for this problem. Several procedures that optimize the use of the GPU’s resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs). PMID:29662297

  13. Graphics Processing Unit-Enhanced Genetic Algorithms for Solving the Temporal Dynamics of Gene Regulatory Networks.

    PubMed

    García-Calvo, Raúl; Guisado, J L; Diaz-Del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco

    2018-01-01

    Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes-master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)-is carried out for this problem. Several procedures that optimize the use of the GPU's resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs).

  14. A Laboratory Exercise in Physics: Determining Single Capacitances and Series and Parallel Combinations of Capacitance.

    ERIC Educational Resources Information Center

    Schlenker, Richard M.

    This document presents a series of physics experiments which allow students to determine the value of unknown electrical capacitors. The exercises include both parallel and series connected capacitors. (SL)

  15. Smart Optical Material Characterization System and Method

    NASA Technical Reports Server (NTRS)

    Choi, Sang Hyouk (Inventor); Park, Yeonjoon (Inventor)

    2015-01-01

    Disclosed is a system and method for characterizing optical materials, using steps and equipment for generating a coherent laser light, filtering the light to remove high order spatial components, collecting the filtered light and forming a parallel light beam, splitting the parallel beam into a first direction and a second direction wherein the parallel beam travelling in the second direction travels toward the material sample so that the parallel beam passes through the sample, applying various physical quantities to the sample, reflecting the beam travelling in the first direction to produce a first reflected beam, reflecting the beam that passes through the sample to produce a second reflected beam that travels back through the sample, combining the second reflected beam after it travels back though the sample with the first reflected beam, sensing the light beam produced by combining the first and second reflected beams, and processing the sensed beam to determine sample characteristics and properties.

  16. Symplectic molecular dynamics simulations on specially designed parallel computers.

    PubMed

    Borstnik, Urban; Janezic, Dusanka

    2005-01-01

    We have developed a computer program for molecular dynamics (MD) simulation that implements the Split Integration Symplectic Method (SISM) and is designed to run on specialized parallel computers. The MD integration is performed by the SISM, which analytically treats high-frequency vibrational motion and thus enables the use of longer simulation time steps. The low-frequency motion is treated numerically on specially designed parallel computers, which decreases the computational time of each simulation time step. The combination of these approaches means that less time is required and fewer steps are needed and so enables fast MD simulations. We study the computational performance of MD simulation of molecular systems on specialized computers and provide a comparison to standard personal computers. The combination of the SISM with two specialized parallel computers is an effective way to increase the speed of MD simulations up to 16-fold over a single PC processor.

  17. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  18. Exo-reversible staging of coolers in series and in parallel

    NASA Astrophysics Data System (ADS)

    Maytal, Ben-Zion

    2017-10-01

    Serial and parallel staging of exo-reversible coolers are formulated, analyzed and compared. The parallel staging includes an extensive parameter which is the proportion of combined stages. This extensive free parameter affects the intensive factors of specific power and figure of merit. Serial staging reduces the 1st Law efficiency and parallel staging improves the 2nd Law efficiency. Comparison of a parallel with a serial staging under common cooling capacity and cooling range, shows that it is always possible to find a parallel arrangement of lower specific power and more compact. Some results are demonstrated on staging of Joule-Thomson cryocoolers (below and above the Joule-Thomson inversion temperature).

  19. Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

    NASA Astrophysics Data System (ADS)

    Yi, Hongsuk

    2014-03-01

    Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.

  20. A high-performance spatial database based approach for pathology imaging algorithm evaluation

    PubMed Central

    Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.

    2013-01-01

    Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905

  1. A novel cost-effective parallel narrowband ANC system with local secondary-path estimation

    NASA Astrophysics Data System (ADS)

    Delegà, Riccardo; Bernasconi, Giancarlo; Piroddi, Luigi

    2017-08-01

    Many noise reduction applications are targeted at multi-tonal disturbances. Active noise control (ANC) solutions for such problems are generally based on the combination of multiple adaptive notch filters. Both the performance and the computational cost are negatively affected by an increase in the number of controlled frequencies. In this work we study a different modeling approach for the secondary path, based on the estimation of various small local models in adjacent frequency subbands, that greatly reduces the impact of reference-filtering operations in the ANC algorithm. Furthermore, in combination with a frequency-specific step size tuning method it provides a balanced attenuation performance over the whole controlled frequency range (and particularly in the high end of the range). Finally, the use of small local models is greatly beneficial for the reactivity of the online secondary path modeling algorithm when the characteristics of the acoustic channels are time-varying. Several simulations are provided to illustrate the positive features of the proposed method compared to other well-known techniques.

  2. Unified tensor model for space-frequency spreading-multiplexing (SFSM) MIMO communication systems

    NASA Astrophysics Data System (ADS)

    de Almeida, André LF; Favier, Gérard

    2013-12-01

    This paper presents a unified tensor model for space-frequency spreading-multiplexing (SFSM) multiple-input multiple-output (MIMO) wireless communication systems that combine space- and frequency-domain spreadings, followed by a space-frequency multiplexing. Spreading across space (transmit antennas) and frequency (subcarriers) adds resilience against deep channel fades and provides space and frequency diversities, while orthogonal space-frequency multiplexing enables multi-stream transmission. We adopt a tensor-based formulation for the proposed SFSM MIMO system that incorporates space, frequency, time, and code dimensions by means of the parallel factor model. The developed SFSM tensor model unifies the tensorial formulation of some existing multiple-access/multicarrier MIMO signaling schemes as special cases, while revealing interesting tradeoffs due to combined space, frequency, and time diversities which are of practical relevance for joint symbol-channel-code estimation. The performance of the proposed SFSM MIMO system using either a zero forcing receiver or a semi-blind tensor-based receiver is illustrated by means of computer simulation results under realistic channel and system parameters.

  3. Verification and Planning Based on Coinductive Logic Programming

    NASA Technical Reports Server (NTRS)

    Bansal, Ajay; Min, Richard; Simon, Luke; Mallya, Ajay; Gupta, Gopal

    2008-01-01

    Coinduction is a powerful technique for reasoning about unfounded sets, unbounded structures, infinite automata, and interactive computations [6]. Where induction corresponds to least fixed point's semantics, coinduction corresponds to greatest fixed point semantics. Recently coinduction has been incorporated into logic programming and an elegant operational semantics developed for it [11, 12]. This operational semantics is the greatest fix point counterpart of SLD resolution (SLD resolution imparts operational semantics to least fix point based computations) and is termed co- SLD resolution. In co-SLD resolution, a predicate goal p( t) succeeds if it unifies with one of its ancestor calls. In addition, rational infinite terms are allowed as arguments of predicates. Infinite terms are represented as solutions to unification equations and the occurs check is omitted during the unification process. Coinductive Logic Programming (Co-LP) and Co-SLD resolution can be used to elegantly perform model checking and planning. A combined SLD and Co-SLD resolution based LP system forms the common basis for planning, scheduling, verification, model checking, and constraint solving [9, 4]. This is achieved by amalgamating SLD resolution, co-SLD resolution, and constraint logic programming [13] in a single logic programming system. Given that parallelism in logic programs can be implicitly exploited [8], complex, compute-intensive applications (planning, scheduling, model checking, etc.) can be executed in parallel on multi-core machines. Parallel execution can result in speed-ups as well as in larger instances of the problems being solved. In the remainder we elaborate on (i) how planning can be elegantly and efficiently performed under real-time constraints, (ii) how real-time systems can be elegantly and efficiently model- checked, as well as (iii) how hybrid systems can be verified in a combined system with both co-SLD and SLD resolution. Implementations of co-SLD resolution as well as preliminary implementations of the planning and verification applications have been developed [4]. Co-LP and Model Checking: The vast majority of properties that are to be verified can be classified into safety properties and liveness properties. It is well known within model checking that safety properties can be verified by reachability analysis, i.e, if a counter-example to the property exists, it can be finitely determined by enumerating all the reachable states of the Kripke structure.

  4. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    PubMed

    Nadkarni, P M; Miller, P L

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.

  5. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Kwan-Liu

    Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations willmore » prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale computer. This project concludes with a functional prototype containing pervasively parallel algorithms that perform demonstratively well on many-core processors. These algorithms are fundamental for performing data analysis and visualization at extreme scale.« less

  6. Anatomically constrained neural network models for the categorization of facial expression

    NASA Astrophysics Data System (ADS)

    McMenamin, Brenton W.; Assadi, Amir H.

    2004-12-01

    The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.

  7. Anatomically constrained neural network models for the categorization of facial expression

    NASA Astrophysics Data System (ADS)

    McMenamin, Brenton W.; Assadi, Amir H.

    2005-01-01

    The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.

  8. Advanced computer techniques for inverse modeling of electric current in cardiac tissue

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hutchinson, S.A.; Romero, L.A.; Diegert, C.F.

    1996-08-01

    For many years, ECG`s and vector cardiograms have been the tools of choice for non-invasive diagnosis of cardiac conduction problems, such as found in reentrant tachycardia or Wolff-Parkinson-White (WPW) syndrome. Through skillful analysis of these skin-surface measurements of cardiac generated electric currents, a physician can deduce the general location of heart conduction irregularities. Using a combination of high-fidelity geometry modeling, advanced mathematical algorithms and massively parallel computing, Sandia`s approach would provide much more accurate information and thus allow the physician to pinpoint the source of an arrhythmia or abnormal conduction pathway.

  9. A note on parallel and pipeline computation of fast unitary transforms

    NASA Technical Reports Server (NTRS)

    Fino, B. J.; Algazi, V. R.

    1974-01-01

    The parallel and pipeline organization of fast unitary transform algorithms such as the Fast Fourier Transform are discussed. The efficiency is pointed out of a combined parallel-pipeline processor of a transform such as the Haar transform in which 2 to the n minus 1 power hardware butterflies generate a transform of order 2 to the n power every computation cycle.

  10. Robot acting on moving bodies (RAMBO): Preliminary results

    NASA Technical Reports Server (NTRS)

    Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madju; Harwood, David

    1989-01-01

    A robot system called RAMBO is being developed. It is equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a moving object. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations nearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enchancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows the use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using parametric cubic splines between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.

  11. Semantic Processing Persists despite Anomalous Syntactic Category: ERP Evidence from Chinese Passive Sentences.

    PubMed

    Yang, Yang; Wu, Fuyun; Zhou, Xiaolin

    2015-01-01

    The syntax-first model and the parallel/interactive models make different predictions regarding whether syntactic category processing has a temporal and functional primacy over semantic processing. To further resolve this issue, an event-related potential experiment was conducted on 24 Chinese speakers reading Chinese passive sentences with the passive marker BEI (NP1 + BEI + NP2 + Verb). This construction was selected because it is the most-commonly used Chinese passive and very much resembles German passives, upon which the syntax-first hypothesis was primarily based. We manipulated semantic consistency (consistent vs. inconsistent) and syntactic category (noun vs. verb) of the critical verb, yielding four conditions: CORRECT (correct sentences), SEMANTIC (semantic anomaly), SYNTACTIC (syntactic category anomaly), and COMBINED (combined anomalies). Results showed both N400 and P600 effects for sentences with semantic anomaly, with syntactic category anomaly, or with combined anomalies. Converging with recent findings of Chinese ERP studies on various constructions, our study provides further evidence that syntactic category processing does not precede semantic processing in reading Chinese.

  12. Serial vs. parallel models of attention in visual search: accounting for benchmark RT-distributions.

    PubMed

    Moran, Rani; Zehetleitner, Michael; Liesefeld, Heinrich René; Müller, Hermann J; Usher, Marius

    2016-10-01

    Visual search is central to the investigation of selective visual attention. Classical theories propose that items are identified by serially deploying focal attention to their locations. While this accounts for set-size effects over a continuum of task difficulties, it has been suggested that parallel models can account for such effects equally well. We compared the serial Competitive Guided Search model with a parallel model in their ability to account for RT distributions and error rates from a large visual search data-set featuring three classical search tasks: 1) a spatial configuration search (2 vs. 5); 2) a feature-conjunction search; and 3) a unique feature search (Wolfe, Palmer & Horowitz Vision Research, 50(14), 1304-1311, 2010). In the parallel model, each item is represented by a diffusion to two boundaries (target-present/absent); the search corresponds to a parallel race between these diffusors. The parallel model was highly flexible in that it allowed both for a parametric range of capacity-limitation and for set-size adjustments of identification boundaries. Furthermore, a quit unit allowed for a continuum of search-quitting policies when the target is not found, with "single-item inspection" and exhaustive searches comprising its extremes. The serial model was found to be superior to the parallel model, even before penalizing the parallel model for its increased complexity. We discuss the implications of the results and the need for future studies to resolve the debate.

  13. CVXPY: A Python-Embedded Modeling Language for Convex Optimization.

    PubMed

    Diamond, Steven; Boyd, Stephen

    2016-04-01

    CVXPY is a domain-specific language for convex optimization embedded in Python. It allows the user to express convex optimization problems in a natural syntax that follows the math, rather than in the restrictive standard form required by solvers. CVXPY makes it easy to combine convex optimization with high-level features of Python such as parallelism and object-oriented design. CVXPY is available at http://www.cvxpy.org/ under the GPL license, along with documentation and examples.

  14. Special Sessions: International Conference on Rehabilitation Engineering (2nd, Ottawa, Canada, June 17-22, 1984). Combined with RESNA 7th Annual Conference = Seances speciales: conference internationale sur la technologie de reeducation fonctionnelle (2nd, Ottawa, Canada, Juin 17-22, 1984). Tenue parallelement a la RESNA 7e conference annuelle.

    ERIC Educational Resources Information Center

    Rehabilitation Engineering Society of North America, Washington, DC.

    These proceedings of the conference's Special Sessions contain 85 papers organized into the following sections: "Implant Materials and Devices,""Communication Aids,""Neural Prosthetics for the Disabled,""Current Concepts in Spinal Cord Rehabilitation,""New Models in…

  15. Evolutionary developmental pathology and anthropology: A new field linking development, comparative anatomy, human evolution, morphological variations and defects, and medicine.

    PubMed

    Diogo, Rui; Smith, Christopher M; Ziermann, Janine M

    2015-11-01

    We introduce a new subfield of the recently created field of Evolutionary-Developmental-Anthropology (Evo-Devo-Anth): Evolutionary-Developmental-Pathology-and-Anthropology (Evo-Devo-P'Anth). This subfield combines experimental and developmental studies of nonhuman model organisms, biological anthropology, chordate comparative anatomy and evolution, and the study of normal and pathological human development. Instead of focusing on other organisms to try to better understand human development, evolution, anatomy, and pathology, it places humans as the central case study, i.e., as truly model organism themselves. We summarize the results of our recent Evo-Devo-P'Anth studies and discuss long-standing questions in each of the broader biological fields combined in this subfield, paying special attention to the links between: (1) Human anomalies and variations, nonpentadactyly, homeotic transformations, and "nearest neighbor" vs. "find and seek" muscle-skeleton associations in limb+facial muscles vs. other head muscles; (2) Developmental constraints, the notion of "phylotypic stage," internalism vs. externalism, and the "logic of monsters" vs. "lack of homeostasis" views about human birth defects; (3) Human evolution, reversions, atavisms, paedomorphosis, and peromorphosis; (4) Scala naturae, Haeckelian recapitulation, von Baer's laws, and parallelism between phylogeny and development, here formally defined as "Phylo-Devo parallelism"; and (5) Patau, Edwards, and Down syndrome (trisomies 13, 18, 21), atavisms, apoptosis, heart malformations, and medical implications. © 2015 Wiley Periodicals, Inc.

  16. Research approaches to mass casualty incidents response: development from routine perspectives to complexity science.

    PubMed

    Shen, Weifeng; Jiang, Libing; Zhang, Mao; Ma, Yuefeng; Jiang, Guanyu; He, Xiaojun

    2014-01-01

    To review the research methods of mass casualty incident (MCI) systematically and introduce the concept and characteristics of complexity science and artificial system, computational experiments and parallel execution (ACP) method. We searched PubMed, Web of Knowledge, China Wanfang and China Biology Medicine (CBM) databases for relevant studies. Searches were performed without year or language restrictions and used the combinations of the following key words: "mass casualty incident", "MCI", "research method", "complexity science", "ACP", "approach", "science", "model", "system" and "response". Articles were searched using the above keywords and only those involving the research methods of mass casualty incident (MCI) were enrolled. Research methods of MCI have increased markedly over the past few decades. For now, dominating research methods of MCI are theory-based approach, empirical approach, evidence-based science, mathematical modeling and computer simulation, simulation experiment, experimental methods, scenario approach and complexity science. This article provides an overview of the development of research methodology for MCI. The progresses of routine research approaches and complexity science are briefly presented in this paper. Furthermore, the authors conclude that the reductionism underlying the exact science is not suitable for MCI complex systems. And the only feasible alternative is complexity science. Finally, this summary is followed by a review that ACP method combining artificial systems, computational experiments and parallel execution provides a new idea to address researches for complex MCI.

  17. Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

    USDA-ARS?s Scientific Manuscript database

    This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...

  18. Highly Efficient Parallel Multigrid Solver For Large-Scale Simulation of Grain Growth Using the Structural Phase Field Crystal Model

    NASA Astrophysics Data System (ADS)

    Guan, Zhen; Pekurovsky, Dmitry; Luce, Jason; Thornton, Katsuyo; Lowengrub, John

    The structural phase field crystal (XPFC) model can be used to model grain growth in polycrystalline materials at diffusive time-scales while maintaining atomic scale resolution. However, the governing equation of the XPFC model is an integral-partial-differential-equation (IPDE), which poses challenges in implementation onto high performance computing (HPC) platforms. In collaboration with the XSEDE Extended Collaborative Support Service, we developed a distributed memory HPC solver for the XPFC model, which combines parallel multigrid and P3DFFT. The performance benchmarking on the Stampede supercomputer indicates near linear strong and weak scaling for both multigrid and transfer time between multigrid and FFT modules up to 1024 cores. Scalability of the FFT module begins to decline at 128 cores, but it is sufficient for the type of problem we will be examining. We have demonstrated simulations using 1024 cores, and we expect to achieve 4096 cores and beyond. Ongoing work involves optimization of MPI/OpenMP-based codes for the Intel KNL Many-Core Architecture. This optimizes the code for coming pre-exascale systems, in particular many-core systems such as Stampede 2.0 and Cori 2 at NERSC, without sacrificing efficiency on other general HPC systems.

  19. Examining Parallelism of Sets of Psychometric Measures Using Latent Variable Modeling

    ERIC Educational Resources Information Center

    Raykov, Tenko; Patelis, Thanos; Marcoulides, George A.

    2011-01-01

    A latent variable modeling approach that can be used to examine whether several psychometric tests are parallel is discussed. The method consists of sequentially testing the properties of parallel measures via a corresponding relaxation of parameter constraints in a saturated model or an appropriately constructed latent variable model. The…

  20. Evidence of Multiple Reconnection Lines at the Magnetopause from Cusp Observations

    NASA Technical Reports Server (NTRS)

    Trattner, K. J.; Petrinec, S. M.; Fuselier, S. A.; Omidi, N.; Sibeck, David Gary

    2012-01-01

    Recent global hybrid simulations investigated the formation of flux transfer events (FTEs) and their convection and interaction with the cusp. Based on these simulations, we have analyzed several Polar cusp crossings in the Northern Hemisphere to search for the signature of such FTEs in the energy distribution of downward precipitating ions: precipitating ion beams at different energies parallel to the ambient magnetic field and overlapping in time. Overlapping ion distributions in the cusp are usually attributed to a combination of variable ion acceleration during the magnetopause crossing together with the time-of-flight effect from the entry point to the observing satellite. Most "step up" ion cusp structures (steps in the ion energy dispersions) only overlap for the populations with large pitch angles and not for the parallel streaming populations. Such cusp structures are the signatures predicted by the pulsed reconnection model, where the reconnection rate at the magnetopause decreased to zero, physically separating convecting flux tubes and their parallel streaming ions. However, several Polar cusp events discussed in this study also show an energy overlap for parallel-streaming precipitating ions. This condition might be caused by reopening an already reconnected field line, forming a magnetic island (flux rope) at the magnetopause similar to that reported in global MHD and Hybrid simulations

  1. Additive Antinociceptive Effects of a Combination of Vitamin C and Vitamin E after Peripheral Nerve Injury

    PubMed Central

    Lu, Ruirui; Kallenborn-Gerhardt, Wiebke; Geisslinger, Gerd; Schmidtko, Achim

    2011-01-01

    Accumulating evidence indicates that increased generation of reactive oxygen species (ROS) contributes to the development of exaggerated pain hypersensitivity during persistent pain. In the present study, we investigated the antinociceptive efficacy of the antioxidants vitamin C and vitamin E in mouse models of inflammatory and neuropathic pain. We show that systemic administration of a combination of vitamins C and E inhibited the early behavioral responses to formalin injection and the neuropathic pain behavior after peripheral nerve injury, but not the inflammatory pain behavior induced by Complete Freund's Adjuvant. In contrast, vitamin C or vitamin E given alone failed to affect the nociceptive behavior in all tested models. The attenuated neuropathic pain behavior induced by the vitamin C and E combination was paralleled by a reduced p38 phosphorylation in the spinal cord and in dorsal root ganglia, and was also observed after intrathecal injection of the vitamins. Moreover, the vitamin C and E combination ameliorated the allodynia induced by an intrathecally delivered ROS donor. Our results suggest that administration of vitamins C and E in combination may exert synergistic antinociceptive effects, and further indicate that ROS essentially contribute to nociceptive processing in special pain states. PMID:22195029

  2. Utility of Emulation and Simulation Computer Modeling of Space Station Environmental Control and Life Support Systems

    NASA Technical Reports Server (NTRS)

    Yanosy, James L.

    1988-01-01

    Over the years, computer modeling has been used extensively in many disciplines to solve engineering problems. A set of computer program tools is proposed to assist the engineer in the various phases of the Space Station program from technology selection through flight operations. The development and application of emulation and simulation transient performance modeling tools for life support systems are examined. The results of the development and the demonstration of the utility of three computer models are presented. The first model is a detailed computer model (emulation) of a solid amine water desorbed (SAWD) CO2 removal subsystem combined with much less detailed models (simulations) of a cabin, crew, and heat exchangers. This model was used in parallel with the hardware design and test of this CO2 removal subsystem. The second model is a simulation of an air revitalization system combined with a wastewater processing system to demonstrate the capabilities to study subsystem integration. The third model is that of a Space Station total air revitalization system. The station configuration consists of a habitat module, a lab module, two crews, and four connecting nodes.

  3. dfnWorks: A discrete fracture network framework for modeling subsurface flow and transport

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hyman, Jeffrey D.; Karra, Satish; Makedonska, Nataliia

    DFNWORKS is a parallelized computational suite to generate three-dimensional discrete fracture networks (DFN) and simulate flow and transport. Developed at Los Alamos National Laboratory over the past five years, it has been used to study flow and transport in fractured media at scales ranging from millimeters to kilometers. The networks are created and meshed using DFNGEN, which combines FRAM (the feature rejection algorithm for meshing) methodology to stochastically generate three-dimensional DFNs with the LaGriT meshing toolbox to create a high-quality computational mesh representation. The representation produces a conforming Delaunay triangulation suitable for high performance computing finite volume solvers in anmore » intrinsically parallel fashion. Flow through the network is simulated in dfnFlow, which utilizes the massively parallel subsurface flow and reactive transport finite volume code PFLOTRAN. A Lagrangian approach to simulating transport through the DFN is adopted within DFNTRANS to determine pathlines and solute transport through the DFN. Example applications of this suite in the areas of nuclear waste repository science, hydraulic fracturing and CO 2 sequestration are also included.« less

  4. dfnWorks: A discrete fracture network framework for modeling subsurface flow and transport

    DOE PAGES

    Hyman, Jeffrey D.; Karra, Satish; Makedonska, Nataliia; ...

    2015-11-01

    DFNWORKS is a parallelized computational suite to generate three-dimensional discrete fracture networks (DFN) and simulate flow and transport. Developed at Los Alamos National Laboratory over the past five years, it has been used to study flow and transport in fractured media at scales ranging from millimeters to kilometers. The networks are created and meshed using DFNGEN, which combines FRAM (the feature rejection algorithm for meshing) methodology to stochastically generate three-dimensional DFNs with the LaGriT meshing toolbox to create a high-quality computational mesh representation. The representation produces a conforming Delaunay triangulation suitable for high performance computing finite volume solvers in anmore » intrinsically parallel fashion. Flow through the network is simulated in dfnFlow, which utilizes the massively parallel subsurface flow and reactive transport finite volume code PFLOTRAN. A Lagrangian approach to simulating transport through the DFN is adopted within DFNTRANS to determine pathlines and solute transport through the DFN. Example applications of this suite in the areas of nuclear waste repository science, hydraulic fracturing and CO 2 sequestration are also included.« less

  5. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    PubMed Central

    Nadkarni, P. M.; Miller, P. L.

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chrisochoides, N.; Sukup, F.

    In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less

  7. Modelling parallel programs and multiprocessor architectures with AXE

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.

    1991-01-01

    AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.

  8. Programs as Polypeptides.

    PubMed

    Williams, Lance R

    2016-01-01

    Object-oriented combinator chemistry (OOCC) is an artificial chemistry with composition devices borrowed from object-oriented and functional programming languages. Actors in OOCC are embedded in space and subject to diffusion; since they are neither created nor destroyed, their mass is conserved. Actors use programs constructed from combinators to asynchronously update their own states and the states of other actors in their neighborhoods. The fact that programs and combinators are themselves reified as actors makes it possible to build programs that build programs from combinators of a few primitive types using asynchronous spatial processes that resemble chemistry as much as computation. To demonstrate this, OOCC is used to define a parallel, asynchronous, spatially distributed self-replicating system modeled in part on the living cell. Since interactions among its parts result in the construction of more of these same parts, the system is strongly constructive. The system's high normalized complexity is contrasted with that of a simple composome.

  9. Backtracking and Re-execution in the Automatic Debugging of Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Matthews, Gregory; Hood, Robert; Johnson, Stephen; Leggett, Peter; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of th it program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.

  10. Transmissive Nanohole Arrays for Massively-Parallel Optical Biosensing

    PubMed Central

    2015-01-01

    A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL. PMID:25530982

  11. Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Widlund, Olof B.

    2015-06-09

    The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less

  12. Exploring the protein folding free energy landscape: coupling replica exchange method with P3ME/RESPA algorithm.

    PubMed

    Zhou, Ruhong

    2004-05-01

    A highly parallel replica exchange method (REM) that couples with a newly developed molecular dynamics algorithm particle-particle particle-mesh Ewald (P3ME)/RESPA has been proposed for efficient sampling of protein folding free energy landscape. The algorithm is then applied to two separate protein systems, beta-hairpin and a designed protein Trp-cage. The all-atom OPLSAA force field with an explicit solvent model is used for both protein folding simulations. Up to 64 replicas of solvated protein systems are simulated in parallel over a wide range of temperatures. The combined trajectories in temperature and configurational space allow a replica to overcome free energy barriers present at low temperatures. These large scale simulations reveal detailed results on folding mechanisms, intermediate state structures, thermodynamic properties and the temperature dependences for both protein systems.

  13. Acoustic simulation in architecture with parallel algorithm

    NASA Astrophysics Data System (ADS)

    Li, Xiaohong; Zhang, Xinrong; Li, Dan

    2004-03-01

    In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.

  14. Neural model for processing the influence of visual orientation on visually perceived eye level (VPEL).

    PubMed

    Matin, L; Li, W

    2001-10-01

    An individual line or a combination of lines viewed in darkness has a large influence on the elevation to which an observer sets a target so that it is perceived to lie at eye level (VPEL). These influences are systematically related to the orientation of pitched-from-vertical lines on pitched plane(s) and to the lengths of the lines, as well as to the orientations of lines of 'equivalent pitch' that lie on frontoparallel planes. A three-stage model processes the visual influence: The first stage parallel processes the orientations of the lines utilizing 2 classes of orientation-sensitive neural units in each hemisphere, with the two classes sensitive to opposing ranges of orientations; the signal delivered by each class is of opposite sign in the two hemispheres. The second stage generates the total visual influence from the parallel combination of inputs delivered by the 4 groups of the first stage, and a third stage combines the total visual influence from the second stage with signals from the body-referenced mechanism that contains information about the position and orientation of the eyes, head, and body. The circuit equation describing the combined influence of n separate inputs from stage 1 on the output of the stage 2 integrating neuron is derived for n stimulus lines which possess any combination of orientations and lengths; Each of the n lines is assumed to stimulate one of the groups of orientation-sensitive units in visual cortex (stage 1) whose signals converge on to a dendrite of the integrating neuron (stage 2), and to produce changes in postsynaptic membrane conductance (g(i)) and potential (V(i)) there. The net current from the n dendrites results in a voltage change (V(A)) at the initial segment of the axon of the integrating neuron. Nerve impulse frequency proportional to this voltage change signals the total visual influence on perceived elevation of the visual field. The circuit equation corresponding to the total visual influence for n equal length inducing lines is V(A)= sum V(i)/[n+(g(A)/g(S))], where the potential change due to line i, V(i), is proportional to line orientation, g(A) is the conductance at the axon's summing point, and g(S)=g(i) for each i for the equal length case; the net conductance change due to a line is proportional to the line's length. The circuit equation is interpreted as a basis for quantitative predictions from the model that can be compared to psychophysical measurements of the elevation of VPEL. The interpretation provides the predicted relation for the visual influence on VPEL, V, by n inducing lines each with length l: thus, V=a+[k(i) sum theta(i)/n+(k(2)/l)], where theta(i) is the orientation of line i, a is the effect of the body-referenced mechanism, and k(1) and k(2) are constants. The model's output is fitted to the results of five sets of experiments in which the elevation of VPEL measured with a small target in the median plane is systematically influenced by distantly located 1-line or 2-line inducing stimuli varying in orientation and length and viewed in otherwise total darkness with gaze restricted to the median plane; each line is located at either 25 degrees eccentricity to the left or right of the median plane. The model predicts the negatively accelerated growth of VPEL with line length for each orientation and the change of slope constant of the linear combination rule among lines from 1.00 (linear summation; short lines) to 0.61 (near-averaging; long lines). Fits to the data are obtained over a range of orientations from -30 degrees to +30 degrees of pitch for 1-line visual fields from lengths of 3 degrees to 64 degrees, for parallel 2-line visual fields over the same range of lengths and orientations, for short and long 2-line combinations in which each of the two members may have any orientation (parallel or nonparallel pairs), and for the well-illuminated and fully structured pitchroom. In addition, similar experiments with 2-line stimuli of equivalent pitch in the frontoparallel plane were also fitted to the model. The model accounts for more than 98% of the variance of the results in each case.

  15. Integration of Modelling and Graphics to Create an Infrared Signal Processing Test Bed

    NASA Astrophysics Data System (ADS)

    Sethi, H. R.; Ralph, John E.

    1989-03-01

    The work reported in this paper was carried out as part of a contract with MoD (PE) UK. It considers the problems associated with realistic modelling of a passive infrared system in an operational environment. Ideally all aspects of the system and environment should be integrated into a complete end-to-end simulation but in the past limited computing power has prevented this. Recent developments in workstation technology and the increasing availability of parallel processing techniques makes the end-to-end simulation possible. However the complexity and speed of such simulations means difficulties for the operator in controlling the software and understanding the results. These difficulties can be greatly reduced by providing an extremely user friendly interface and a very flexible, high power, high resolution colour graphics capability. Most system modelling is based on separate software simulation of the individual components of the system itself and its environment. These component models may have their own characteristic inbuilt assumptions and approximations, may be written in the language favoured by the originator and may have a wide variety of input and output conventions and requirements. The models and their limitations need to be matched to the range of conditions appropriate to the operational scenerio. A comprehensive set of data bases needs to be generated by the component models and these data bases must be made readily available to the investigator. Performance measures need to be defined and displayed in some convenient graphics form. Some options are presented for combining available hardware and software to create an environment within which the models can be integrated, and which provide the required man-machine interface, graphics and computing power. The impact of massively parallel processing and artificial intelligence will be discussed. Parallel processing will make real time end-to-end simulation possible and will greatly improve the graphical visualisation of the model output data. Artificial intelligence should help to enhance the man-machine interface.

  16. Superconducting FCL using a combined inducted magnetic field trigger and shunt coil

    DOEpatents

    Tekletsadik, Kasegn D.

    2007-10-16

    A single trigger/shunt coil is utilized for combined induced magnetic field triggering and shunt impedance. The single coil connected in parallel with the high temperature superconducting element, is designed to generate a circulating current in the parallel circuit during normal operation to aid triggering the high temperature superconducting element to quench in the event of a fault. The circulating current is generated by an induced voltage in the coil, when the system current flows through the high temperature superconducting element.

  17. Parallel computing of a climate model on the dawn 1000 by domain decomposition method

    NASA Astrophysics Data System (ADS)

    Bi, Xunqiang

    1997-12-01

    In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.

  18. Multi-echo acquisition

    PubMed Central

    Posse, Stefan

    2011-01-01

    The rapid development of fMRI was paralleled early on by the adaptation of MR spectroscopic imaging (MRSI) methods to quantify water relaxation changes during brain activation. This review describes the evolution of multi-echo acquisition from high-speed MRSI to multi-echo EPI and beyond. It highlights milestones in the development of multi-echo acquisition methods, such as the discovery of considerable gains in fMRI sensitivity when combining echo images, advances in quantification of the BOLD effect using analytical biophysical modeling and interleaved multi-region shimming. The review conveys the insight gained from combining fMRI and MRSI methods and concludes with recent trends in ultra-fast fMRI, which will significantly increase temporal resolution of multi-echo acquisition. PMID:22056458

  19. Developing Subdomain Allocation Algorithms Based on Spatial and Communicational Constraints to Accelerate Dust Storm Simulation

    PubMed Central

    Gui, Zhipeng; Yu, Manzhu; Yang, Chaowei; Jiang, Yunfeng; Chen, Songqing; Xia, Jizhe; Huang, Qunying; Liu, Kai; Li, Zhenlong; Hassan, Mohammed Anowarul; Jin, Baoxuan

    2016-01-01

    Dust storm has serious disastrous impacts on environment, human health, and assets. The developments and applications of dust storm models have contributed significantly to better understand and predict the distribution, intensity and structure of dust storms. However, dust storm simulation is a data and computing intensive process. To improve the computing performance, high performance computing has been widely adopted by dividing the entire study area into multiple subdomains and allocating each subdomain on different computing nodes in a parallel fashion. Inappropriate allocation may introduce imbalanced task loads and unnecessary communications among computing nodes. Therefore, allocation is a key factor that may impact the efficiency of parallel process. An allocation algorithm is expected to consider the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire simulation. This research introduces three algorithms to optimize the allocation by considering the spatial and communicational constraints: 1) an Integer Linear Programming (ILP) based algorithm from combinational optimization perspective; 2) a K-Means and Kernighan-Lin combined heuristic algorithm (K&K) integrating geometric and coordinate-free methods by merging local and global partitioning; 3) an automatic seeded region growing based geometric and local partitioning algorithm (ASRG). The performance and effectiveness of the three algorithms are compared based on different factors. Further, we adopt the K&K algorithm as the demonstrated algorithm for the experiment of dust model simulation with the non-hydrostatic mesoscale model (NMM-dust) and compared the performance with the MPI default sequential allocation. The results demonstrate that K&K method significantly improves the simulation performance with better subdomain allocation. This method can also be adopted for other relevant atmospheric and numerical modeling. PMID:27044039

  20. Simulation of Semi-Solid Material Mechanical Behavior Using a Combined Discrete/Finite Element Method

    NASA Astrophysics Data System (ADS)

    Sistaninia, M.; Phillion, A. B.; Drezet, J.-M.; Rappaz, M.

    2011-01-01

    As a necessary step toward the quantitative prediction of hot tearing defects, a three-dimensional stress-strain simulation based on a combined finite element (FE)/discrete element method (DEM) has been developed that is capable of predicting the mechanical behavior of semisolid metallic alloys during solidification. The solidification model used for generating the initial solid-liquid structure is based on a Voronoi tessellation of randomly distributed nucleation centers and a solute diffusion model for each element of this tessellation. At a given fraction of solid, the deformation is then simulated with the solid grains being modeled using an elastoviscoplastic constitutive law, whereas the remaining liquid layers at grain boundaries are approximated by flexible connectors, each consisting of a spring element and a damper element acting in parallel. The model predictions have been validated against Al-Cu alloy experimental data from the literature. The results show that a combined FE/DEM approach is able to express the overall mechanical behavior of semisolid alloys at the macroscale based on the morphology of the grain structure. For the first time, the localization of strain in the intergranular regions is taken into account. Thus, this approach constitutes an indispensible step towards the development of a comprehensive model of hot tearing.

  1. Climate Ocean Modeling on a Beowulf Class System

    NASA Technical Reports Server (NTRS)

    Cheng, B. N.; Chao, Y.; Wang, P.; Bondarenko, M.

    2000-01-01

    With the growing power and shrinking cost of personal computers. the availability of fast ethernet interconnections, and public domain software packages, it is now possible to combine them to build desktop parallel computers (named Beowulf or PC clusters) at a fraction of what it would cost to buy systems of comparable power front supercomputer companies. This led as to build and assemble our own sys tem. specifically for climate ocean modeling. In this article, we present our experience with such a system, discuss its network performance, and provide some performance comparison data with both HP SPP2000 and Cray T3E for an ocean Model used in present-day oceanographic research.

  2. Development of an Underactuated 2-DOF Wrist Joint using McKibben PAMs

    NASA Astrophysics Data System (ADS)

    Rajagopal, S. P.; Jain, S.; Ramasubramanian, S. N.; Johnson, B. V.; Dwivedy, S. K.

    2014-10-01

    In this work, model of an underactuated 2-DOF wrist joint with pneumatically actuated muscles is proposed. For the joint, McKibben-type artificial muscles are used in parallel configuration for the actuation. For each Degree of Freedom (DOF) one agonist-antagonist pair arrangement is usually used with a pulley mechanism. A mathematical model of wrist joint is derived using conventional forward kinematic analysis. The static model relating pressure in the muscle with the orientation of the wrist joint is obtained by combining the experimental data and mathematical model. Regulation of pressure can be achieved by pulse width modulation control of on/off solenoid valves. A set of free vibration experiments are done for the dynamic identification of the muscle characteristics.

  3. Transport in the plateau regime in a tokamak pedestal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seol, J.; Shaing, K. C.

    In a tokamak H-mode, a strong E Multiplication-Sign B flow shear is generated during the L-H transition. Turbulence in a pedestal is suppressed significantly by this E Multiplication-Sign B flow shear. In this case, neoclassical transport may become important. The neoclassical fluxes are calculated in the plateau regime with the parallel plasma flow using their kinetic definitions. In an axisymmetric tokamak, the neoclassical particles fluxes can be decomposed into the banana-plateau flux and the Pfirsch-Schlueter flux. The banana-plateau particle flux is driven by the parallel viscous force and the Pfirsch-Schlueter flux by the poloidal variation of the friction force. Themore » combined quantity of the radial electric field and the parallel flow is determined by the flux surface averaged parallel momentum balance equation rather than requiring the ambipolarity of the total particle fluxes. In this process, the Pfirsch-Schlueter flux does not appear in the flux surface averaged parallel momentum equation. Only the banana-plateau flux is used to determine the parallel flow in the form of the flux surface averaged parallel viscosity. The heat flux, obtained using the solution of the parallel momentum balance equation, decreases exponentially in the presence of sonic M{sub p} without any enhancement over that in the standard neoclassical theory. Here, M{sub p} is a combination of the poloidal E Multiplication-Sign B flow and the parallel mass flow. The neoclassical bootstrap current in the plateau regime is presented. It indicates that the neoclassical bootstrap current also is related only to the banana-plateau fluxes. Finally, transport fluxes are calculated when M{sub p} is large enough to make the parallel electron viscosity comparable with the parallel ion viscosity. It is found that the bootstrap current has a finite value regardless of the magnitude of M{sub p}.« less

  4. A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics

    NASA Technical Reports Server (NTRS)

    Kurdila, Andrew J.

    1990-01-01

    While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.

  5. Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Biros, George

    Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less

  6. Simulated Wake Characteristics Data for Closely Spaced Parallel Runway Operations Analysis

    NASA Technical Reports Server (NTRS)

    Guerreiro, Nelson M.; Neitzke, Kurt W.

    2012-01-01

    A simulation experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced parallel runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were simulated to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.

  7. MIP models and hybrid algorithms for simultaneous job splitting and scheduling on unrelated parallel machines.

    PubMed

    Eroglu, Duygu Yilmaz; Ozmutlu, H Cenk

    2014-01-01

    We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms.

  8. CVXPY: A Python-Embedded Modeling Language for Convex Optimization

    PubMed Central

    Diamond, Steven; Boyd, Stephen

    2016-01-01

    CVXPY is a domain-specific language for convex optimization embedded in Python. It allows the user to express convex optimization problems in a natural syntax that follows the math, rather than in the restrictive standard form required by solvers. CVXPY makes it easy to combine convex optimization with high-level features of Python such as parallelism and object-oriented design. CVXPY is available at http://www.cvxpy.org/ under the GPL license, along with documentation and examples. PMID:27375369

  9. Optimal estimates of free energies from multistate nonequilibrium work data.

    PubMed

    Maragakis, Paul; Spichty, Martin; Karplus, Martin

    2006-03-17

    We derive the optimal estimates of the free energies of an arbitrary number of thermodynamic states from nonequilibrium work measurements; the work data are collected from forward and reverse switching processes and obey a fluctuation theorem. The maximum likelihood formulation properly reweights all pathways contributing to a free energy difference and is directly applicable to simulations and experiments. We demonstrate dramatic gains in efficiency by combining the analysis with parallel tempering simulations for alchemical mutations of model amino acids.

  10. RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

    PubMed Central

    Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan

    2017-01-01

    This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325

  11. RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization.

    PubMed

    Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan

    2017-08-04

    This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.

  12. Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models.

    PubMed

    Ferrucci, Filomena; Salza, Pasquale; Sarro, Federica

    2017-06-29

    The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs' computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.

  13. Monte-Carlo methods make Dempster-Shafer formalism feasible

    NASA Technical Reports Server (NTRS)

    Kreinovich, Vladik YA.; Bernat, Andrew; Borrett, Walter; Mariscal, Yvonne; Villa, Elsa

    1991-01-01

    One of the main obstacles to the applications of Dempster-Shafer formalism is its computational complexity. If we combine m different pieces of knowledge, then in general case we have to perform up to 2(sup m) computational steps, which for large m is infeasible. For several important cases algorithms with smaller running time were proposed. We prove, however, that if we want to compute the belief bel(Q) in any given query Q, then exponential time is inevitable. It is still inevitable, if we want to compute bel(Q) with given precision epsilon. This restriction corresponds to the natural idea that since initial masses are known only approximately, there is no sense in trying to compute bel(Q) precisely. A further idea is that there is always some doubt in the whole knowledge, so there is always a probability p(sub o) that the expert's knowledge is wrong. In view of that it is sufficient to have an algorithm that gives a correct answer a probability greater than 1-p(sub o). If we use the original Dempster's combination rule, this possibility diminishes the running time, but still leaves the problem infeasible in the general case. We show that for the alternative combination rules proposed by Smets and Yager feasible methods exist. We also show how these methods can be parallelized, and what parallelization model fits this problem best.

  14. A one-pot parallel reductive amination of aldehydes with heteroaromatic amines.

    PubMed

    Bogolubsky, Andrey V; Moroz, Yurii S; Mykhailiuk, Pavel K; Panov, Dmitriy M; Pipko, Sergey E; Konovets, Anzhelika I; Tolmachev, Andrey

    2014-08-11

    A parallel reductive amination of heteroaromatic amines has been performed using a combination of ZnCl2-TMSOAc (activating agents) and NaBH(OAc)3 (reducing agent). A library of diverse secondary amines was easily prepared on a 50-300 mg scale.

  15. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, Michael S.; Strip, David R.

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  16. Fast hydrological model calibration based on the heterogeneous parallel computing accelerated shuffled complex evolution method

    NASA Astrophysics Data System (ADS)

    Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke

    2018-01-01

    Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.

  17. Iterative algorithms for large sparse linear systems on parallel computers

    NASA Technical Reports Server (NTRS)

    Adams, L. M.

    1982-01-01

    Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.

  18. Virtual earthquake engineering laboratory with physics-based degrading materials on parallel computers

    NASA Astrophysics Data System (ADS)

    Cho, In Ho

    For the last few decades, we have obtained tremendous insight into underlying microscopic mechanisms of degrading quasi-brittle materials from persistent and near-saintly efforts in laboratories, and at the same time we have seen unprecedented evolution in computational technology such as massively parallel computers. Thus, time is ripe to embark on a novel approach to settle unanswered questions, especially for the earthquake engineering community, by harmoniously combining the microphysics mechanisms with advanced parallel computing technology. To begin with, it should be stressed that we placed a great deal of emphasis on preserving clear meaning and physical counterparts of all the microscopic material models proposed herein, since it is directly tied to the belief that by doing so, the more physical mechanisms we incorporate, the better prediction we can obtain. We departed from reviewing representative microscopic analysis methodologies, selecting out "fixed-type" multidirectional smeared crack model as the base framework for nonlinear quasi-brittle materials, since it is widely believed to best retain the physical nature of actual cracks. Microscopic stress functions are proposed by integrating well-received existing models to update normal stresses on the crack surfaces (three orthogonal surfaces are allowed to initiate herein) under cyclic loading. Unlike the normal stress update, special attention had to be paid to the shear stress update on the crack surfaces, due primarily to the well-known pathological nature of the fixed-type smeared crack model---spurious large stress transfer over the open crack under nonproportional loading. In hopes of exploiting physical mechanism to resolve this deleterious nature of the fixed crack model, a tribology-inspired three-dimensional (3d) interlocking mechanism has been proposed. Following the main trend of tribology (i.e., the science and engineering of interacting surfaces), we introduced the base fabric of solid particle-soft matrix to explain realistic interlocking over rough crack surfaces, and the adopted Gaussian distribution feeds random particle sizes to the entire domain. Validation against a well-documented rough crack experiment reveals promising accuracy of the proposed 3d interlocking model. A consumed energy-based damage model has been proposed for the weak correlation between the normal and shear stresses on the crack surfaces, and also for describing the nature of irrecoverable damage. Since the evaluation of the consumed energy is directly linked to the microscopic deformation, which can be efficiently tracked on the crack surfaces, the proposed damage model is believed to provide a more physical interpretation than existing damage mechanics, which fundamentally stem from mathematical derivation with few physical counterparts. Another novel point of the present work lies in the topological transition-based "smart" steel bar model, notably with evolving compressive buckling length. We presented a systematic framework of information flow between the key ingredients of composite materials (i.e., steel bar and its surrounding concrete elements). The smart steel model suggested can incorporate smooth transition during reversal loading, tensile rupture, early buckling after reversal from excessive tensile loading, and even compressive buckling. Especially, the buckling length is made to evolve according to the damage states of the surrounding elements of each bar, while all other dominant models leave the length unchanged. What lies behind all the aforementioned novel attempts is, of course, the problem-optimized parallel platform. In fact, the parallel computing in our field has been restricted to monotonic shock or blast loading with explicit algorithm which is characteristically feasible to be parallelized. In the present study, efficient parallelization strategies for the highly demanding implicit nonlinear finite element analysis (FEA) program for real-scale reinforced concrete (RC) structures under cyclic loading are proposed. Quantitative comparison of state-of-the-art parallel strategies, in terms of factorization, had been carried out, leading to the problem-optimized solver, which is successfully embracing the penalty method and banded nature. Particularly, the penalty method employed imparts considerable smoothness to the global response, which yields a practical superiority of the parallel triangular system solver over other advanced solvers such as parallel preconditioned conjugate gradient method. Other salient issues on parallelization are also addressed. The parallel platform established offers unprecedented access to simulations of real-scale structures, giving new understanding about the physics-based mechanisms adopted and probabilistic randomness at the entire system level. Particularly, the platform enables bold simulations of real-scale RC structures exposed to cyclic loading---H-shaped wall system and 4-story T-shaped wall system. The simulations show the desired capability of accurate prediction of global force-displacement responses, postpeak softening behavior, and compressive buckling of longitudinal steel bars. It is fascinating to see that intrinsic randomness of the 3d interlocking model appears to cause "localized" damage of the real-scale structures, which is consistent with reported observations in different fields such as granular media. Equipped with accuracy, stability and scalability as demonstrated so far, the parallel platform is believed to serve as a fertile ground for the introducing of further physical mechanisms into various research fields as well as the earthquake engineering community. In the near future, it can be further expanded to run in concert with reliable FEA programs such as FRAME3d or OPENSEES. Following the central notion of "multiscale" analysis technique, actual infrastructures exposed to extreme natural hazard can be successfully tackled by this next generation analysis tool---the harmonious union of the parallel platform and a general FEA program. At the same time, any type of experiments can be easily conducted by this "virtual laboratory."

  19. A Comparative Study of the Application of Fluorescence Excitation-Emission Matrices Combined with Parallel Factor Analysis and Nonnegative Matrix Factorization in the Analysis of Zn Complexation by Humic Acids

    PubMed Central

    Boguta, Patrycja; Pieczywek, Piotr M.; Sokołowska, Zofia

    2016-01-01

    The main aim of this study was the application of excitation-emission fluorescence matrices (EEMs) combined with two decomposition methods: parallel factor analysis (PARAFAC) and nonnegative matrix factorization (NMF) to study the interaction mechanisms between humic acids (HAs) and Zn(II) over a wide concentration range (0–50 mg·dm−3). The influence of HA properties on Zn(II) complexation was also investigated. Stability constants, quenching degree and complexation capacity were estimated for binding sites found in raw EEM, EEM-PARAFAC and EEM-NMF data using mathematical models. A combination of EEM fluorescence analysis with one of the proposed decomposition methods enabled separation of overlapping binding sites and yielded more accurate calculations of the binding parameters. PARAFAC and NMF processing allowed finding binding sites invisible in a few raw EEM datasets as well as finding totally new maxima attributed to structures of the lowest humification. Decomposed data showed an increase in Zn complexation with an increase in humification, aromaticity and molecular weight of HAs. EEM-PARAFAC analysis also revealed that the most stable compounds were formed by structures containing the highest amounts of nitrogen. The content of oxygen-functional groups did not influence the binding parameters, mainly due to fact of higher competition of metal cation with protons. EEM spectra coupled with NMF and especially PARAFAC processing gave more adequate assessments of interactions as compared to raw EEM data and should be especially recommended for modeling of complexation processes where the fluorescence intensities (FI) changes are weak or where the processes are interfered with by the presence of other fluorophores. PMID:27782078

  20. A Parallel Approach To Optimum Actuator Selection With a Genetic Algorithm

    NASA Technical Reports Server (NTRS)

    Rogers, James L.

    2000-01-01

    Recent discoveries in smart technologies have created a variety of aerodynamic actuators which have great potential to enable entirely new approaches to aerospace vehicle flight control. For a revolutionary concept such as a seamless aircraft with no moving control surfaces, there is a large set of candidate locations for placing actuators, resulting in a substantially larger number of combinations to examine in order to find an optimum placement satisfying the mission requirements. The placement of actuators on a wing determines the control effectiveness of the airplane. One approach to placement Maximizes the moments about the pitch, roll, and yaw axes, while minimizing the coupling. Genetic algorithms have been instrumental in achieving good solutions to discrete optimization problems, such as the actuator placement problem. As a proof of concept, a genetic has been developed to find the minimum number of actuators required to provide uncoupled pitch, roll, and yaw control for a simplified, untapered, unswept wing model. To find the optimum placement by searching all possible combinations would require 1,100 hours. Formulating the problem and as a multi-objective problem and modifying it to take advantage of the parallel processing capabilities of a multi-processor computer, reduces the optimization time to 22 hours.

  1. Influence of combined visual and vestibular cues on human perception and control of horizontal rotation

    NASA Technical Reports Server (NTRS)

    Zacharias, G. L.; Young, L. R.

    1981-01-01

    Measurements are made of manual control performance in the closed-loop task of nulling perceived self-rotation velocity about an earth-vertical axis. Self-velocity estimation is modeled as a function of the simultaneous presentation of vestibular and peripheral visual field motion cues. Based on measured low-frequency operator behavior in three visual field environments, a parallel channel linear model is proposed which has separate visual and vestibular pathways summing in a complementary manner. A dual-input describing function analysis supports the complementary model; vestibular cues dominate sensation at higher frequencies. The describing function model is extended by the proposal of a nonlinear cue conflict model, in which cue weighting depends on the level of agreement between visual and vestibular cues.

  2. Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations

    NASA Astrophysics Data System (ADS)

    Detrixhe, Miles; Gibou, Frédéric

    2016-10-01

    The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.

  3. Parallel language constructs for tensor product computations on loosely coupled architectures

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Vanrosendale, John

    1989-01-01

    Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.

  4. In vitro protease cleavage and computer simulations reveal the HIV-1 capsid maturation pathway

    NASA Astrophysics Data System (ADS)

    Ning, Jiying; Erdemci-Tandogan, Gonca; Yufenyuy, Ernest L.; Wagner, Jef; Himes, Benjamin A.; Zhao, Gongpu; Aiken, Christopher; Zandi, Roya; Zhang, Peijun

    2016-12-01

    HIV-1 virions assemble as immature particles containing Gag polyproteins that are processed by the viral protease into individual components, resulting in the formation of mature infectious particles. There are two competing models for the process of forming the mature HIV-1 core: the disassembly and de novo reassembly model and the non-diffusional displacive model. To study the maturation pathway, we simulate HIV-1 maturation in vitro by digesting immature particles and assembled virus-like particles with recombinant HIV-1 protease and monitor the process with biochemical assays and cryoEM structural analysis in parallel. Processing of Gag in vitro is accurate and efficient and results in both soluble capsid protein and conical or tubular capsid assemblies, seemingly converted from immature Gag particles. Computer simulations further reveal probable assembly pathways of HIV-1 capsid formation. Combining the experimental data and computer simulations, our results suggest a sequential combination of both displacive and disassembly/reassembly processes for HIV-1 maturation.

  5. Business model for sensor-based fall recognition systems.

    PubMed

    Fachinger, Uwe; Schöpke, Birte

    2014-01-01

    AAL systems require, in addition to sophisticated and reliable technology, adequate business models for their launch and sustainable establishment. This paper presents the basic features of alternative business models for a sensor-based fall recognition system which was developed within the context of the "Lower Saxony Research Network Design of Environments for Ageing" (GAL). The models were developed parallel to the R&D process with successive adaptation and concretization. An overview of the basic features (i.e. nine partial models) of the business model is given and the mutual exclusive alternatives for each partial model are presented. The partial models are interconnected and the combinations of compatible alternatives lead to consistent alternative business models. However, in the current state, only initial concepts of alternative business models can be deduced. The next step will be to gather additional information to work out more detailed models.

  6. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics.

    PubMed

    Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel

    2012-09-25

    Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.

  7. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics

    PubMed Central

    2012-01-01

    Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363

  8. Measurement of the curvature of a surface using parallel light beams

    DOEpatents

    Chason, Eric H.; Floro, Jerrold A.; Seager, Carleton H.; Sinclair, Michael B.

    1999-01-01

    Apparatus for measuring curvature of a surface wherein a beam of collimated light is passed through means for producing a plurality of parallel light beams each separated by a common distance which then reflect off the surface to fall upon a detector that measures the separation of the reflected beams of light. This means can be an etalon and the combination of a diffractive element and a converging lens. The curvature of the surface along the line onto which the multiple beams fall can be calculated from this information. A two-dimensional map of the curvature can be obtained by adding a second etalon (or a second combination of a diffractive element and a converging lens) which is rotated 90.degree. about the optical axis relative to the first etalon and inclined at the same angle. The second etalon creates an individual set of parallel light beams from each of the individual beams created by the first etalon with the sets of parallel light beams from the second etalon rotated 90.degree. relative to the line onto which the single set of parallel beams from the first etalon would have fallen.

  9. Measurement of the curvature of a surface using parallel light beams

    DOEpatents

    Chason, E.H.; Floro, J.A.; Seager, C.H.; Sinclair, M.B.

    1999-06-15

    Apparatus is disclosed for measuring curvature of a surface wherein a beam of collimated light is passed through a means for producing a plurality of parallel light beams each separated by a common distance which then reflect off the surface to fall upon a detector that measures the separation of the reflected beams of light. This means can be an etalon and the combination of a diffractive element and a converging lens. The curvature of the surface along the line onto which the multiple beams fall can be calculated from this information. A two-dimensional map of the curvature can be obtained by adding a second etalon (or a second combination of a diffractive element and a converging lens) which is rotated 90[degree] about the optical axis relative to the first etalon and inclined at the same angle. The second etalon creates an individual set of parallel light beams from each of the individual beams created by the first etalon with the sets of parallel light beams from the second etalon rotated 90[degree] relative to the line onto which the single set of parallel beams from the first etalon would have fallen. 5 figs.

  10. Optimisation of a parallel ocean general circulation model

    NASA Astrophysics Data System (ADS)

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  11. Power-balancing instantaneous optimization energy management for a novel series-parallel hybrid electric bus

    NASA Astrophysics Data System (ADS)

    Sun, Dongye; Lin, Xinyou; Qin, Datong; Deng, Tao

    2012-11-01

    Energy management(EM) is a core technique of hybrid electric bus(HEB) in order to advance fuel economy performance optimization and is unique for the corresponding configuration. There are existing algorithms of control strategy seldom take battery power management into account with international combustion engine power management. In this paper, a type of power-balancing instantaneous optimization(PBIO) energy management control strategy is proposed for a novel series-parallel hybrid electric bus. According to the characteristic of the novel series-parallel architecture, the switching boundary condition between series and parallel mode as well as the control rules of the power-balancing strategy are developed. The equivalent fuel model of battery is implemented and combined with the fuel of engine to constitute the objective function which is to minimize the fuel consumption at each sampled time and to coordinate the power distribution in real-time between the engine and battery. To validate the proposed strategy effective and reasonable, a forward model is built based on Matlab/Simulink for the simulation and the dSPACE autobox is applied to act as a controller for hardware in-the-loop integrated with bench test. Both the results of simulation and hardware-in-the-loop demonstrate that the proposed strategy not only enable to sustain the battery SOC within its operational range and keep the engine operation point locating the peak efficiency region, but also the fuel economy of series-parallel hybrid electric bus(SPHEB) dramatically advanced up to 30.73% via comparing with the prototype bus and a similar improvement for PBIO strategy relative to rule-based strategy, the reduction of fuel consumption is up to 12.38%. The proposed research ensures the algorithm of PBIO is real-time applicability, improves the efficiency of SPHEB system, as well as suite to complicated configuration perfectly.

  12. Fusion of local and global detection systems to detect tuberculosis in chest radiographs.

    PubMed

    Hogeweg, Laurens; Mol, Christian; de Jong, Pim A; Dawson, Rodney; Ayles, Helen; van Ginneken, Bramin

    2010-01-01

    Automatic detection of tuberculosis (TB) on chest radiographs is a difficult problem because of the diverse presentation of the disease. A combination of detection systems for abnormalities and normal anatomy is used to improve detection performance. A textural abnormality detection system operating at the pixel level is combined with a clavicle detection system to suppress false positive responses. The output of a shape abnormality detection system operating at the image level is combined in a next step to further improve performance by reducing false negatives. Strategies for combining systems based on serial and parallel configurations were evaluated using the minimum, maximum, product, and mean probability combination rules. The performance of TB detection increased, as measured using the area under the ROC curve, from 0.67 for the textural abnormality detection system alone to 0.86 when the three systems were combined. The best result was achieved using the sum and product rule in a parallel combination of outputs.

  13. Bi-directional series-parallel elastic actuator and overlap of the actuation layers.

    PubMed

    Furnémont, Raphaël; Mathijssen, Glenn; Verstraten, Tom; Lefeber, Dirk; Vanderborght, Bram

    2016-01-27

    Several robotics applications require high torque-to-weight ratio and energy efficient actuators. Progress in that direction was made by introducing compliant elements into the actuation. A large variety of actuators were developed such as series elastic actuators (SEAs), variable stiffness actuators and parallel elastic actuators (PEAs). SEAs can reduce the peak power while PEAs can reduce the torque requirement on the motor. Nonetheless, these actuators still cannot meet performances close to humans. To combine both advantages, the series parallel elastic actuator (SPEA) was developed. The principle is inspired from biological muscles. Muscles are composed of motor units, placed in parallel, which are variably recruited as the required effort increases. This biological principle is exploited in the SPEA, where springs (layers), placed in parallel, can be recruited one by one. This recruitment is performed by an intermittent mechanism. This paper presents the development of a SPEA using the MACCEPA principle with a self-closing mechanism. This actuator can deliver a bi-directional output torque, variable stiffness and reduced friction. The load on the motor can also be reduced, leading to a lower power consumption. The variable recruitment of the parallel springs can also be tuned in order to further decrease the consumption of the actuator for a given task. First, an explanation of the concept and a brief description of the prior work done will be given. Next, the design and the model of one of the layers will be presented. The working principle of the full actuator will then be given. At the end of this paper, experiments showing the electric consumption of the actuator will display the advantage of the SPEA over an equivalent stiff actuator.

  14. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

    NASA Astrophysics Data System (ADS)

    Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

    2017-08-01

    The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.

  15. National Centers for Environmental Prediction

    Science.gov Websites

    Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model PARALLEL/EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS

  16. Performance Analysis and Optimization on the UCLA Parallel Atmospheric General Circulation Model Code

    NASA Technical Reports Server (NTRS)

    Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos

    1996-01-01

    An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.

  17. High-resolution brain SPECT imaging by combination of parallel and tilted detector heads.

    PubMed

    Suzuki, Atsuro; Takeuchi, Wataru; Ishitsu, Takafumi; Morimoto, Yuichi; Kobashi, Keiji; Ueno, Yuichiro

    2015-10-01

    To improve the spatial resolution of brain single-photon emission computed tomography (SPECT), we propose a new brain SPECT system in which the detector heads are tilted towards the rotation axis so that they are closer to the brain. In addition, parallel detector heads are used to obtain the complete projection data set. We evaluated this parallel and tilted detector head system (PT-SPECT) in simulations. In the simulation study, the tilt angle of the detector heads relative to the axis was 45°. The distance from the collimator surface of the parallel detector heads to the axis was 130 mm. The distance from the collimator surface of the tilted detector heads to the origin on the axis was 110 mm. A CdTe semiconductor panel with a 1.4 mm detector pitch and a parallel-hole collimator were employed in both types of detector head. A line source phantom, cold-rod brain-shaped phantom, and cerebral blood flow phantom were evaluated. The projection data were generated by forward-projection of the phantom images using physics models, and Poisson noise at clinical levels was applied to the projection data. The ordered-subsets expectation maximization algorithm with physics models was used. We also evaluated conventional SPECT using four parallel detector heads for the sake of comparison. The evaluation of the line source phantom showed that the transaxial FWHM in the central slice for conventional SPECT ranged from 6.1 to 8.5 mm, while that for PT-SPECT ranged from 5.3 to 6.9 mm. The cold-rod brain-shaped phantom image showed that conventional SPECT could visualize up to 8-mm-diameter rods. By contrast, PT-SPECT could visualize up to 6-mm-diameter rods in upper slices of a cerebrum. The cerebral blood flow phantom image showed that the PT-SPECT system provided higher resolution at the thalamus and caudate nucleus as well as at the longitudinal fissure of the cerebrum compared with conventional SPECT. PT-SPECT provides improved image resolution at not only upper but also at central slices of the cerebrum.

  18. Electronics Book II.

    ERIC Educational Resources Information Center

    Johnson, Dennis; And Others

    This manual, the second of three curriculum guides for an electronics course, is intended for use in a program combining vocational English as a second language (VESL) with bilingual vocational education. Ten units cover the electrical team, Ohm's law, Watt's law, series resistive circuits, parallel resistive circuits, series parallel circuits,…

  19. Fast in-memory elastic full-waveform inversion using consumer-grade GPUs

    NASA Astrophysics Data System (ADS)

    Sivertsen Bergslid, Tore; Birger Raknes, Espen; Arntsen, Børge

    2017-04-01

    Full-waveform inversion (FWI) is a technique to estimate subsurface properties by using the recorded waveform produced by a seismic source and applying inverse theory. This is done through an iterative optimization procedure, where each iteration requires solving the wave equation many times, then trying to minimize the difference between the modeled and the measured seismic data. Having to model many of these seismic sources per iteration means that this is a highly computationally demanding procedure, which usually involves writing a lot of data to disk. We have written code that does forward modeling and inversion entirely in memory. A typical HPC cluster has many more CPUs than GPUs. Since FWI involves modeling many seismic sources per iteration, the obvious approach is to parallelize the code on a source-by-source basis, where each core of the CPU performs one modeling, and do all modelings simultaneously. With this approach, the GPU is already at a major disadvantage in pure numbers. Fortunately, GPUs can more than make up for this hardware disadvantage by performing each modeling much faster than a CPU. Another benefit of parallelizing each individual modeling is that it lets each modeling use a lot more RAM. If one node has 128 GB of RAM and 20 CPU cores, each modeling can use only 6.4 GB RAM if one is running the node at full capacity with source-by-source parallelization on the CPU. A parallelized per-source code using GPUs can use 64 GB RAM per modeling. Whenever a modeling uses more RAM than is available and has to start using regular disk space the runtime increases dramatically, due to slow file I/O. The extremely high computational speed of the GPUs combined with the large amount of RAM available for each modeling lets us do high frequency FWI for fairly large models very quickly. For a single modeling, our GPU code outperforms the single-threaded CPU-code by a factor of about 75. Successful inversions have been run on data with frequencies up to 40 Hz for a model of 2001 by 600 grid points with 5 m grid spacing and 5000 time steps, in less than 2.5 minutes per source. In practice, using 15 nodes (30 GPUs) to model 101 sources, each iteration took approximately 9 minutes. For reference, the same inversion run with our CPU code uses two hours per iteration. This was done using only a very simple wavefield interpolation technique, saving every second timestep. Using a more sophisticated checkpointing or wavefield reconstruction method would allow us to increase this model size significantly. Our results show that ordinary gaming GPUs are a viable alternative to the expensive professional GPUs often used today, when performing large scale modeling and inversion in geophysics.

  20. Parallel closure theory for toroidally confined plasmas

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.

    2017-10-01

    We solve a system of general moment equations to obtain parallel closures for electrons and ions in an axisymmetric toroidal magnetic field. Magnetic field gradient terms are kept and treated using the Fourier series method. Assuming lowest order density (pressure) and temperature to be flux labels, the parallel heat flow, friction, and viscosity are expressed in terms of radial gradients of the lowest-order temperature and pressure, parallel gradients of temperature and parallel flow, and the relative electron-ion parallel flow velocity. Convergence of closure quantities is demonstrated as the number of moments and Fourier modes are increased. Properties of the moment equations in the collisionless limit are also discussed. Combining closures with fluid equations parallel mass flow and electric current are also obtained. Work in collaboration with the PSI Center and supported by the U.S. DOE under Grant Nos. DE-SC0014033, DE-SC0016256, and DE-FG02-04ER54746.

  1. Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Hui; Shi, Yanjun

    A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate amore » pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.« less

  2. Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

    NASA Astrophysics Data System (ADS)

    Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; Stuehn, Torsten

    2017-11-01

    Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach, the theoretical modeling and scaling laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.

  3. Entropy generation in a parallel-plate active magnetic regenerator with insulator layers

    NASA Astrophysics Data System (ADS)

    Mugica Guerrero, Ibai; Poncet, Sébastien; Bouchard, Jonathan

    2017-02-01

    This paper proposes a feasible solution to diminish conduction losses in active magnetic regenerators. Higher performances of these machines are linked to a lower thermal conductivity of the Magneto-Caloric Material (MCM) in the streamwise direction. The concept presented here involves the insertion of insulator layers along the length of a parallel-plate magnetic regenerator in order to reduce the heat conduction within the MCM. This idea is investigated by means of a 1D numerical model. This model solves not only the energy equations for the fluid and solid domains but also the magnetic circuit that conforms the experimental setup of reference. In conclusion, the addition of insulator layers within the MCM increases the temperature span, cooling load, and coefficient of performance by a combination of lower heat conduction losses and an increment of the global Magneto-Caloric Effect. The generated entropy by solid conduction, fluid convection, and conduction and viscous losses are calculated to help understand the implications of introducing insulator layers in magnetic regenerators. Finally, the optimal number of insulator layers is studied.

  4. High-resolution multi-code implementation of unsteady Navier-Stokes flow solver based on paralleled overset adaptive mesh refinement and high-order low-dissipation hybrid schemes

    NASA Astrophysics Data System (ADS)

    Li, Gaohua; Fu, Xiang; Wang, Fuxin

    2017-10-01

    The low-dissipation high-order accurate hybrid up-winding/central scheme based on fifth-order weighted essentially non-oscillatory (WENO) and sixth-order central schemes, along with the Spalart-Allmaras (SA)-based delayed detached eddy simulation (DDES) turbulence model, and the flow feature-based adaptive mesh refinement (AMR), are implemented into a dual-mesh overset grid infrastructure with parallel computing capabilities, for the purpose of simulating vortex-dominated unsteady detached wake flows with high spatial resolutions. The overset grid assembly (OGA) process based on collection detection theory and implicit hole-cutting algorithm achieves an automatic coupling for the near-body and off-body solvers, and the error-and-try method is used for obtaining a globally balanced load distribution among the composed multiple codes. The results of flows over high Reynolds cylinder and two-bladed helicopter rotor show that the combination of high-order hybrid scheme, advanced turbulence model, and overset adaptive mesh refinement can effectively enhance the spatial resolution for the simulation of turbulent wake eddies.

  5. Trace element evaluation of a suite of rocks from Reunion Island, Indian Ocean

    USGS Publications Warehouse

    Zielinski, R.A.

    1975-01-01

    Reunion Island consists of an olivine-basalt shield capped by a series of flows and intrusives ranging from hawaiite through trachyte. Eleven rocks representing the total compositional sequence have been analyzed for U, Th and REE. Eight of the rocks (group 1) have positive-slope, parallel, chondrite-normalized REE fractionation patterns. Using a computer model, the major element compositions of group 1 whole rocks and observed phenocrysts were used to predict the crystallization histories of increasingly residual liquids, and allowed semi-quantitative verification of origin by fractional crystallization of the olivine-basalt parent magma. Results were combined with mineral-liquid distribution coefficient data to predict trace element abundances, and existing data on Cr, Ni, Sr and Ba were also successfully incorporated in the model. The remaining three rocks (group 2) have nonuniform positive-slope REE fractionation patterns not parallel to group 1 patterns. Rare earth fractionation in a syenite is explained by partial melting of a source rich in clinopyroxene and/or hornblende. The other two rocks of group 2 are explained as hybrids resulting from mixing of syenite and magmas of group 1. ?? 1975.

  6. BAYESIAN ANALYSIS TO EVALUATE TESTS FOR THE DETECTION OF MYCOBACTERIUM BOVIS INFECTION IN FREE-RANGING WILD BISON (BISON BISON ATHABASCAE) IN THE ABSENCE OF A GOLD STANDARD.

    PubMed

    Chapinal, Núria; Schumaker, Brant A; Joly, Damien O; Elkin, Brett T; Stephen, Craig

    2015-07-01

    We estimated the sensitivity and specificity of the caudal-fold skin test (CFT), the fluorescent polarization assay (FPA), and the rapid lateral-flow test (RT) for the detection of Mycobacterium bovis in free-ranging wild wood bison (Bison bison athabascae), in the absence of a gold standard, by using Bayesian analysis, and then used those estimates to forecast the performance of a pairwise combination of tests in parallel. In 1998-99, 212 wood bison from Wood Buffalo National Park (Canada) were tested for M. bovis infection using CFT and two serologic tests (FPA and RT). The sensitivity and specificity of each test were estimated using a three-test, one-population, Bayesian model allowing for conditional dependence between FPA and RT. The sensitivity and specificity of the combination of CFT and each serologic test in parallel were calculated assuming conditional independence. The test performance estimates were influenced by the prior values chosen. However, the rank of tests and combinations of tests based on those estimates remained constant. The CFT was the most sensitive test and the FPA was the least sensitive, whereas RT was the most specific test and CFT was the least specific. In conclusion, given the fact that gold standards for the detection of M. bovis are imperfect and difficult to obtain in the field, Bayesian analysis holds promise as a tool to rank tests and combinations of tests based on their performance. Combining a skin test with an animal-side serologic test, such as RT, increases sensitivity in the detection of M. bovis and is a good approach to enhance disease eradication or control in wild bison.

  7. Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

    NASA Astrophysics Data System (ADS)

    Georgiev, K.; Zlatev, Z.

    2010-11-01

    The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.

  8. Pressure-constrained, reduced-DOF, interconnected parallel manipulators with applications to space suit design

    NASA Astrophysics Data System (ADS)

    Jacobs, Shane Earl

    This dissertation presents the concept of a Morphing Upper Torso, an innovative pressure suit design that incorporates robotic elements to enable a resizable, highly mobile and easy to don/doff spacesuit. The torso is modeled as a system of interconnected, pressure-constrained, reduced-DOF, wire-actuated parallel manipulators, that enable the dimensions of the suit to be reconfigured to match the wearer. The kinematics, dynamics and control of wire-actuated manipulators are derived and simulated, along with the Jacobian transforms, which relate the total twist vector of the system to the vector of actuator velocities. Tools are developed that allow calculation of the workspace for both single and interconnected reduced-DOF robots of this type, using knowledge of the link lengths. The forward kinematics and statics equations are combined and solved to produce the pose of the platforms along with the link tensions. These tools allow analysis of the full Morphing Upper Torso design, in which the back hatch of a rear-entry torso is interconnected with the waist ring, helmet ring and two scye bearings. Half-scale and full-scale experimental models are used along with analytical models to examine the feasibility of this novel space suit concept. The analytical and experimental results demonstrate that the torso could be expanded to facilitate donning and doffng, and then contracted to match different wearer's body dimensions. Using the system of interconnected parallel manipulators, suit components can be accurately repositioned to different desired configurations. The demonstrated feasibility of the Morphing Upper Torso concept makes it an exciting candidate for inclusion in a future planetary suit architecture.

  9. An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

    2015-01-01

    This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.

  10. Boundedness and exponential convergence in a chemotaxis model for tumor invasion

    NASA Astrophysics Data System (ADS)

    Jin, Hai-Yang; Xiang, Tian

    2016-12-01

    We revisit the following chemotaxis system modeling tumor invasion {ut=Δu-∇ṡ(u∇v),x∈Ω,t>0,vt=Δv+wz,x∈Ω,t>0,wt=-wz,x∈Ω,t>0,zt=Δz-z+u,x∈Ω,t>0, in a smooth bounded domain Ω \\subset {{{R}}n}(n≥slant 1) with homogeneous Neumann boundary and initial conditions. This model was recently proposed by Fujie et al (2014 Adv. Math. Sci. Appl. 24 67-84) as a model for tumor invasion with the role of extracellular matrix incorporated, and was analyzed later by Fujie et al (2016 Discrete Contin. Dyn. Syst. 36 151-69), showing the uniform boundedness and convergence for n≤slant 3 . In this work, we first show that the {{L}∞} -boundedness of the system can be reduced to the boundedness of \\parallel u(\\centerdot,t){{\\parallel}{{L\\frac{n{4}+ɛ}}(Ω )}} for some ɛ >0 alone, and then, for n≥slant 4 , if the initial data \\parallel {{u}0}{{\\parallel}{{L\\frac{n{4}}}}} , \\parallel {{z}0}{{\\parallel}{{L\\frac{n{2}}}}} and \\parallel \

  11. Trinary signed-digit arithmetic using an efficient encoding scheme

    NASA Astrophysics Data System (ADS)

    Salim, W. Y.; Alam, M. S.; Fyath, R. S.; Ali, S. A.

    2000-09-01

    The trinary signed-digit (TSD) number system is of interest for ultrafast optoelectronic computing systems since it permits parallel carry-free addition and borrow-free subtraction of two arbitrary length numbers in constant time. In this paper, a simple coding scheme is proposed to encode the decimal number directly into the TSD form. The coding scheme enables one to perform parallel one-step TSD arithmetic operation. The proposed coding scheme uses only a 5-combination coding table instead of the 625-combination table reported recently for recoded TSD arithmetic technique.

  12. One-step trinary signed-digit arithmetic using an efficient encoding scheme

    NASA Astrophysics Data System (ADS)

    Salim, W. Y.; Fyath, R. S.; Ali, S. A.; Alam, Mohammad S.

    2000-11-01

    The trinary signed-digit (TSD) number system is of interest for ultra fast optoelectronic computing systems since it permits parallel carry-free addition and borrow-free subtraction of two arbitrary length numbers in constant time. In this paper, a simple coding scheme is proposed to encode the decimal number directly into the TSD form. The coding scheme enables one to perform parallel one-step TSD arithmetic operation. The proposed coding scheme uses only a 5-combination coding table instead of the 625-combination table reported recently for recoded TSD arithmetic technique.

  13. Relativistic effects in the energy loss of a fast charged particle moving parallel to a two-dimensional electron gas

    NASA Astrophysics Data System (ADS)

    Mišković, Zoran L.; Akbari, Kamran; Segui, Silvina; Gervasoni, Juana L.; Arista, Néstor R.

    2018-05-01

    We present a fully relativistic formulation for the energy loss rate of a charged particle moving parallel to a sheet containing two-dimensional electron gas, allowing that its in-plane polarization may be described by different longitudinal and transverse conductivities. We apply our formulation to the case of a doped graphene layer in the terahertz range of frequencies, where excitation of the Dirac plasmon polariton (DPP) in graphene plays a major role. By using the Drude model with zero damping we evaluate the energy loss rate due to excitation of the DPP, and show that the retardation effects are important when the incident particle speed and its distance from graphene both increase. Interestingly, the retarded energy loss rate obtained in this manner may be both larger and smaller than its non-retarded counterpart for different combinations of the particle speed and distance.

  14. Chromatographic background drift correction coupled with parallel factor analysis to resolve coelution problems in three-dimensional chromatographic data: quantification of eleven antibiotics in tap water samples by high-performance liquid chromatography coupled with a diode array detector.

    PubMed

    Yu, Yong-Jie; Wu, Hai-Long; Fu, Hai-Yan; Zhao, Juan; Li, Yuan-Na; Li, Shu-Fang; Kang, Chao; Yu, Ru-Qin

    2013-08-09

    Chromatographic background drift correction has been an important field of research in chromatographic analysis. In the present work, orthogonal spectral space projection for background drift correction of three-dimensional chromatographic data was described in detail and combined with parallel factor analysis (PARAFAC) to resolve overlapped chromatographic peaks and obtain the second-order advantage. This strategy was verified by simulated chromatographic data and afforded significant improvement in quantitative results. Finally, this strategy was successfully utilized to quantify eleven antibiotics in tap water samples. Compared with the traditional methodology of introducing excessive factors for the PARAFAC model to eliminate the effect of background drift, clear improvement in the quantitative performance of PARAFAC was observed after background drift correction by orthogonal spectral space projection. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. Exsolution of Ca-clinopyroxene from orthopyroxene aided by deformation

    USGS Publications Warehouse

    Kirby, S.H.; Etheridge, M.A.

    1981-01-01

    Monoclinic calcium-poor shear-transformation lamellae and calcium-rich exsolution lamellae occur parallel to (100) in orthopyroxene. The formation of both structures from an orthopyroxene host involves a shear on (100) parallel to [001], with additional cation exchange in the exsolution case. The shear transformation involves a macroscopic simple shear angle of 13.3?? (shear strain of 0.236) and produces a specific a-axis orientation with respect to the sense of shear; we have found that this orientation dominates in exsolution lamellae in kinked orthopyroxene, where the sense of shear is known. In undeformed orthopyroxene, there is generally no preferred sense of orientation of the monoclinic a axes. We advance a specific model for exsolution involving nucleation and growth by shear transformation combined with cation exchange, thus circumventing the classical nucleation barrier and permitting exsolution at lower solute supersaturations. ?? 1981 Springer-Verlag.

  16. Continuum Reconfigurable Parallel Robots for Surgery: Shape Sensing and State Estimation with Uncertainty.

    PubMed

    Anderson, Patrick L; Mahoney, Arthur W; Webster, Robert J

    2017-07-01

    This paper examines shape sensing for a new class of surgical robot that consists of parallel flexible structures that can be reconfigured inside the human body. Known as CRISP robots, these devices provide access to the human body through needle-sized entry points, yet can be configured into truss-like structures capable of dexterous movement and large force application. They can also be reconfigured as needed during a surgical procedure. Since CRISP robots are elastic, they will deform when subjected to external forces or other perturbations. In this paper, we explore how to combine sensor information with mechanics-based models for CRISP robots to estimate their shapes under applied loads. The end result is a shape sensing framework for CRISP robots that will enable future research on control under applied loads, autonomous motion, force sensing, and other robot behaviors.

  17. Parallel Treatments Design: A Nested Single Subject Design for Comparing Instructional Procedures.

    ERIC Educational Resources Information Center

    Gast, David L.; Wolery, Mark

    1988-01-01

    This paper describes the parallel treatments design, a nested single subject experimental design that combines two concurrently implemented multiple probe designs, allows control for effects of extraneous variables through counterbalancing, and replicates its effects across behaviors. Procedural guidelines for the design's use and issues related…

  18. An embedded multi-core parallel model for real-time stereo imaging

    NASA Astrophysics Data System (ADS)

    He, Wenjing; Hu, Jian; Niu, Jingyu; Li, Chuanrong; Liu, Guangyu

    2018-04-01

    The real-time processing based on embedded system will enhance the application capability of stereo imaging for LiDAR and hyperspectral sensor. The task partitioning and scheduling strategies for embedded multiprocessor system starts relatively late, compared with that for PC computer. In this paper, aimed at embedded multi-core processing platform, a parallel model for stereo imaging is studied and verified. After analyzing the computing amount, throughout capacity and buffering requirements, a two-stage pipeline parallel model based on message transmission is established. This model can be applied to fast stereo imaging for airborne sensors with various characteristics. To demonstrate the feasibility and effectiveness of the parallel model, a parallel software was designed using test flight data, based on the 8-core DSP processor TMS320C6678. The results indicate that the design performed well in workload distribution and had a speed-up ratio up to 6.4.

  19. Adaption of a parallel-path poly(tetrafluoroethylene) nebulizer to an evaporative light scattering detector: Optimization and application to studies of poly(dimethylsiloxane) oligomers as a model polymer.

    PubMed

    Durner, Bernhard; Ehmann, Thomas; Matysik, Frank-Michael

    2018-06-05

    The adaption of an parallel-path poly(tetrafluoroethylene)(PTFE) ICP-nebulizer to an evaporative light scattering detector (ELSD) was realized. This was done by substituting the originally installed concentric glass nebulizer of the ELSD. The performance of both nebulizers was compared regarding nebulizer temperature, evaporator temperature, flow rate of nebulizing gas and flow rate of mobile phase of different solvents using caffeine and poly(dimethylsiloxane) (PDMS) as analytes. Both nebulizers showed similar performances but for the parallel-path PTFE nebulizer the performance was considerably better at low LC flow rates and the nebulizer lifetime was substantially increased. In general, for both nebulizers the highest sensitivity was obtained by applying the lowest possible evaporator temperature in combination with the highest possible nebulizer temperature at preferably low gas flow rates. Besides the optimization of detector parameters, response factors for various PDMS oligomers were determined and the dependency of the detector signal on molar mass of the analytes was studied. The significant improvement regarding long-term stability made the modified ELSD much more robust and saved time and money by reducing the maintenance efforts. Thus, especially in polymer HPLC, associated with a complex matrix situation, the PTFE-based parallel-path nebulizer exhibits attractive characteristics for analytical studies of polymers. Copyright © 2018. Published by Elsevier B.V.

  20. Techniques for Single System Integration of Elastic Simulation Features

    NASA Astrophysics Data System (ADS)

    Mitchell, Nathan M.

    Techniques for simulating the behavior of elastic objects have matured considerably over the last several decades, tackling diverse problems from non-linear models for incompressibility to accurate self-collisions. Alongside these contributions, advances in parallel hardware design and algorithms have made simulation more efficient and affordable than ever before. However, prior research often has had to commit to design choices that compromise certain simulation features to better optimize others, resulting in a fragmented landscape of solutions. For complex, real-world tasks, such as virtual surgery, a holistic approach is desirable, where complex behavior, performance, and ease of modeling are supported equally. This dissertation caters to this goal in the form of several interconnected threads of investigation, each of which contributes a piece of an unified solution. First, it will be demonstrated how various non-linear materials can be combined with lattice deformers to yield simulations with behavioral richness and a high potential for parallelism. This potential will be exploited to show how a hybrid solver approach based on large macroblocks can accelerate the convergence of these deformers. Further extensions of the lattice concept with non-manifold topology will allow for efficient processing of self-collisions and topology change. Finally, these concepts will be explored in the context of a case study on virtual plastic surgery, demonstrating a real-world problem space where these ideas can be combined to build an expressive authoring tool, allowing surgeons to record procedures digitally for future reference or education.

  1. Weak extremely-low-frequency magnetic field-induced regeneration anomalies in the planarian, Dugesia tigrina

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jenrow, K.A.; Smith, C.H.; Liboff, A.R.

    1996-12-31

    The authors recently reported that cephalic regeneration in the planarian Dugesia tigrina was significantly delayed in populations exposed continuously to combined parallel DC and AC magnetic fields. This effect was consistent with hypotheses suggesting an underlying resonance phenomenon. The authors report here, in a parallel series of investigations on the same model system, that the incidence of regeneration anomalies presenting as tumor-like protuberances also increases significantly (P < .001) in association with exposure to weak 60 Hz magnetic fields, with peak intensities ranging between 1.0 and 80.0 {micro}T. These anomalies often culminate in the complete disaggregation of the organism. Similarmore » to regeneration rate effects, the incidence of regeneration anomalies is specifically dependent upon the planaria possessing a fixed orientation with respect to the applied magnetic field vectors. However, unlike the regeneration rate effects, the AC magnetic field alone, in the absence of any measurable DC field, is capable of producing these anomalies. Moreover, the incidence of regeneration anomalies follows a clear dose-response relationship as a function of AC magnetic field intensity, with the threshold for induced electric field intensity estimated at 5 {micro} V/m. The addition of either 51.1 or 78.4 {micro}T DC magnetic fields, applied in parallel combination with the AC field, enhances the appearance of anomalies relative to the 60 Hz AC field alone, but only at certain AC field intensities. Thus, whereas the previous study of regeneration rate effects appeared to involve exclusively resonance interactions, the regeneration anomalies reported here appear to result primarily from Faraday induction coupling.« less

  2. Mathematical model of a parallel plate ammonia electrolyzer for combined wastewater remediation and hydrogen production.

    PubMed

    Estejab, Ali; Daramola, Damilola A; Botte, Gerardine G

    2015-06-15

    A mathematical model was developed for the simulation of a parallel plate ammonia electrolyzer to convert ammonia in wastewater to nitrogen and hydrogen under basic conditions. The model consists of fundamental transport equations, the ammonia oxidation kinetics at the anode, and the hydrogen evolution kinetics at the cathode of the electrochemical reactor. The model shows both qualitative and quantitative agreement with experimental measurements at ammonia concentrations found within wastewater (200-1200 mg L(-1)). The optimum electrolyzer performance is dependent on both the applied voltage and the inlet concentrations. Maximum conversion of ammonia to nitrogen at the rates of 0.569 and 0.766 mg L(-1) min(-1) are achieved at low (0.01 M NH4Cl and 0.1 M KOH) and high (0.07 M NH4Cl and 0.15 M KOH) inlet concentrations, respectively. At high and low concentrations, an initial increase in the cell voltage will cause an increase in the system response - current density generated and ammonia converted. These system responses will approach a peak value before they start to decrease due to surface blockage and/or depletion of solvated species at the electrode surface. Furthermore, the model predicts that by increasing the reactant and electrolyte concentrations at a certain voltage, the peak current density will plateau, showing an asymptotic response. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Mathematical models in simulation process in rehabilitation of persons with disabilities

    NASA Astrophysics Data System (ADS)

    Gorie, Nina; Dolga, Valer; Mondoc, Alina

    2012-11-01

    The problems of people with disability are varied. A disability may be physical, cognitive, mental, sensory, emotional, developmental or some combination of these. The major disabilities which can appear in people's lives are: the blindness, the deafness, the limb-girdle muscular dystrophy, the orthopedic impairment, the visual impairment. A disability is an umbrella term, covering impairments, activity limitations and participation restrictions. A disability may occur during a person's lifetime or may be present from birth. The authors conclude that some of these disabilities like physical, cognitive, mental, sensory, emotional, developmental can be rehabilitated. Starting from this state of affairs the authors present briefly the possibility of using certain mechatronic systems for rehabilitation of persons with different disabilities. The authors focus their presentation on alternative calling the Stewart platform in order to achieve the proposed goal. The authors present a mathematical model of systems theory approach under the parallel system and described its contents can. The authors analyze in a meaningful mathematical model describing the procedure of rehabilitation process. From the affected function biomechanics and taking into account medical recommendations the authors illustrate the mathematical models of rehabilitation work. The authors assemble a whole mathematical model of parallel structure and the rehabilitation process and making simulation and highlighting the results estimated. The authors present in the end work the results envisaged in the end analysis work, conclusions and steps for future work program..

  4. MUTILS - a set of efficient modeling tools for multi-core CPUs implemented in MEX

    NASA Astrophysics Data System (ADS)

    Krotkiewski, Marcin; Dabrowski, Marcin

    2013-04-01

    The need for computational performance is common in scientific applications, and in particular in numerical simulations, where high resolution models require efficient processing of large amounts of data. Especially in the context of geological problems the need to increase the model resolution to resolve physical and geometrical complexities seems to have no limits. Alas, the performance of new generations of CPUs does not improve any longer by simply increasing clock speeds. Current industrial trends are to increase the number of computational cores. As a result, parallel implementations are required in order to fully utilize the potential of new processors, and to study more complex models. We target simulations on small to medium scale shared memory computers: laptops and desktop PCs with ~8 CPU cores and up to tens of GB of memory to high-end servers with ~50 CPU cores and hundereds of GB of memory. In this setting MATLAB is often the environment of choice for scientists that want to implement their own models with little effort. It is a useful general purpose mathematical software package, but due to its versatility some of its functionality is not as efficient as it could be. In particular, the challanges of modern multi-core architectures are not fully addressed. We have developed MILAMIN 2 - an efficient FEM modeling environment written in native MATLAB. Amongst others, MILAMIN provides functions to define model geometry, generate and convert structured and unstructured meshes (also through interfaces to external mesh generators), compute element and system matrices, apply boundary conditions, solve the system of linear equations, address non-linear and transient problems, and perform post-processing. MILAMIN strives to combine the ease of code development and the computational efficiency. Where possible, the code is optimized and/or parallelized within the MATLAB framework. Native MATLAB is augmented with the MUTILS library - a set of MEX functions that implement the computationally intensive, performance critical parts of the code, which we have identified to be bottlenecks. Here, we discuss the functionality and performance of the MUTILS library. Currently, it includes: 1. time and memory efficient assembly of sparse matrices for FEM simulations 2. parallel sparse matrix - vector product with optimizations speficic to symmetric matrices and multiple degrees of freedom per node 3. parallel point in triangle location and point in tetrahedron location for unstructured, adaptive 2D and 3D meshes (useful for 'marker in cell' type of methods) 4. parallel FEM interpolation for 2D and 3D meshes of elements of different types and orders, and for different number of degrees of freedom per node 5. a stand-alone, MEX implementation of the Conjugate Gradients iterative solver 6. interface to METIS graph partitioning and a fast implementation of RCM reordering

  5. Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu

    The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less

  6. A parallelized three-dimensional cellular automaton model for grain growth during additive manufacturing

    NASA Astrophysics Data System (ADS)

    Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.

    2018-05-01

    In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.

  7. A parallelized three-dimensional cellular automaton model for grain growth during additive manufacturing

    NASA Astrophysics Data System (ADS)

    Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.

    2018-01-01

    In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.

  8. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  9. Parallel Monte Carlo transport modeling in the context of a time-dependent, three-dimensional multi-physics code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Procassini, R.J.

    1997-12-31

    The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution ofmore » particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.« less

  10. Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

    NASA Astrophysics Data System (ADS)

    Zerr, Robert Joseph

    2011-12-01

    The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)

  11. Monte Carlo Transport for Electron Thermal Transport

    NASA Astrophysics Data System (ADS)

    Chenhall, Jeffrey; Cao, Duc; Moses, Gregory

    2015-11-01

    The iSNB (implicit Schurtz Nicolai Busquet multigroup electron thermal transport method of Cao et al. is adapted into a Monte Carlo transport method in order to better model the effects of non-local behavior. The end goal is a hybrid transport-diffusion method that combines Monte Carlo Transport with a discrete diffusion Monte Carlo (DDMC). The hybrid method will combine the efficiency of a diffusion method in short mean free path regions with the accuracy of a transport method in long mean free path regions. The Monte Carlo nature of the approach allows the algorithm to be massively parallelized. Work to date on the method will be presented. This work was supported by Sandia National Laboratory - Albuquerque and the University of Rochester Laboratory for Laser Energetics.

  12. Optical/MRI Multimodality Molecular Imaging

    NASA Astrophysics Data System (ADS)

    Ma, Lixin; Smith, Charles; Yu, Ping

    2007-03-01

    Multimodality molecular imaging that combines anatomical and functional information has shown promise in development of tumor-targeted pharmaceuticals for cancer detection or therapy. We present a new multimodality imaging technique that combines fluorescence molecular tomography (FMT) and magnetic resonance imaging (MRI) for in vivo molecular imaging of preclinical tumor models. Unlike other optical/MRI systems, the new molecular imaging system uses parallel phase acquisition based on heterodyne principle. The system has a higher accuracy of phase measurements, reduced noise bandwidth, and an efficient modulation of the fluorescence diffuse density waves. Fluorescent Bombesin probes were developed for targeting breast cancer cells and prostate cancer cells. Tissue phantom and small animal experiments were performed for calibration of the imaging system and validation of the targeting probes.

  13. Rarefied gas flow simulations using high-order gas-kinetic unified algorithms for Boltzmann model equations

    NASA Astrophysics Data System (ADS)

    Li, Zhi-Hui; Peng, Ao-Ping; Zhang, Han-Xin; Yang, Jaw-Yen

    2015-04-01

    This article reviews rarefied gas flow computations based on nonlinear model Boltzmann equations using deterministic high-order gas-kinetic unified algorithms (GKUA) in phase space. The nonlinear Boltzmann model equations considered include the BGK model, the Shakhov model, the Ellipsoidal Statistical model and the Morse model. Several high-order gas-kinetic unified algorithms, which combine the discrete velocity ordinate method in velocity space and the compact high-order finite-difference schemes in physical space, are developed. The parallel strategies implemented with the accompanying algorithms are of equal importance. Accurate computations of rarefied gas flow problems using various kinetic models over wide ranges of Mach numbers 1.2-20 and Knudsen numbers 0.0001-5 are reported. The effects of different high resolution schemes on the flow resolution under the same discrete velocity ordinate method are studied. A conservative discrete velocity ordinate method to ensure the kinetic compatibility condition is also implemented. The present algorithms are tested for the one-dimensional unsteady shock-tube problems with various Knudsen numbers, the steady normal shock wave structures for different Mach numbers, the two-dimensional flows past a circular cylinder and a NACA 0012 airfoil to verify the present methodology and to simulate gas transport phenomena covering various flow regimes. Illustrations of large scale parallel computations of three-dimensional hypersonic rarefied flows over the reusable sphere-cone satellite and the re-entry spacecraft using almost the largest computer systems available in China are also reported. The present computed results are compared with the theoretical prediction from gas dynamics, related DSMC results, slip N-S solutions and experimental data, and good agreement can be found. The numerical experience indicates that although the direct model Boltzmann equation solver in phase space can be computationally expensive, nevertheless, the present GKUAs for kinetic model Boltzmann equations in conjunction with current available high-performance parallel computer power can provide a vital engineering tool for analyzing rarefied gas flows covering the whole range of flow regimes in aerospace engineering applications.

  14. Room-Temperature Charpy Impact Property of 3D-Printed 15-5 Stainless Steel

    NASA Astrophysics Data System (ADS)

    Sagar, Sugrim; Zhang, Yi; Wu, Linmin; Park, Hye-Young; Lee, Je-Hyun; Jung, Yeon-Gil; Zhang, Jing

    2018-01-01

    In this study, the room-temperature Charpy impact property of 3D-printed 15-5 stainless steel was investigated by a combined experimental and finite element modeling approach. The experimentally measured impact energy is 10.85 ± 1.20 J/cm2, which is comparable to the conventionally wrought and non-heat treated 15-5 stainless steel. In parallel to the impact test experiment, a finite element model using the Johnson-Cook material model with damage parameters was developed to simulate the impact test. The simulated impact energy is 10.46 J/cm2, which is in good agreement with the experimental data. The fracture surface from the experimentally tested specimen suggests that the 3D-printed specimens undergo predominately brittle fracture.

  15. Parallel interactive retrieval of item and associative information from event memory.

    PubMed

    Cox, Gregory E; Criss, Amy H

    2017-09-01

    Memory contains information about individual events (items) and combinations of events (associations). Despite the fundamental importance of this distinction, it remains unclear exactly how these two kinds of information are stored and whether different processes are used to retrieve them. We use both model-independent qualitative properties of response dynamics and quantitative modeling of individuals to address these issues. Item and associative information are not independent and they are retrieved concurrently via interacting processes. During retrieval, matching item and associative information mutually facilitate one another to yield an amplified holistic signal. Modeling of individuals suggests that this kind of facilitation between item and associative retrieval is a ubiquitous feature of human memory. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Using Qualitative Hazard Analysis to Guide Quantitative Safety Analysis

    NASA Technical Reports Server (NTRS)

    Shortle, J. F.; Allocco, M.

    2005-01-01

    Quantitative methods can be beneficial in many types of safety investigations. However, there are many difficulties in using quantitative m ethods. Far example, there may be little relevant data available. This paper proposes a framework for using quantitative hazard analysis to prioritize hazard scenarios most suitable for quantitative mziysis. The framework first categorizes hazard scenarios by severity and likelihood. We then propose another metric "modeling difficulty" that desc ribes the complexity in modeling a given hazard scenario quantitatively. The combined metrics of severity, likelihood, and modeling difficu lty help to prioritize hazard scenarios for which quantitative analys is should be applied. We have applied this methodology to proposed concepts of operations for reduced wake separation for airplane operatio ns at closely spaced parallel runways.

  17. An analysis of a nonlinear instability in the implementation of a VTOL control system

    NASA Technical Reports Server (NTRS)

    Weber, J. M.

    1982-01-01

    The contributions to nonlinear behavior and unstable response of the model following yaw control system of a VTOL aircraft during hover were determined. The system was designed as a state rate feedback implicit model follower that provided yaw rate command/heading hold capability and used combined full authority parallel and limited authority series servo actuators to generate an input to the yaw reaction control system of the aircraft. Both linear and nonlinear system models, as well as describing function linearization techniques were used to determine the influence on the control system instability of input magnitude and bandwidth, series servo authority, and system bandwidth. Results of the analysis describe stability boundaries as a function of these system design characteristics.

  18. Hierarchical Parallelism in Finite Difference Analysis of Heat Conduction

    NASA Technical Reports Server (NTRS)

    Padovan, Joseph; Krishna, Lala; Gute, Douglas

    1997-01-01

    Based on the concept of hierarchical parallelism, this research effort resulted in highly efficient parallel solution strategies for very large scale heat conduction problems. Overall, the method of hierarchical parallelism involves the partitioning of thermal models into several substructured levels wherein an optimal balance into various associated bandwidths is achieved. The details are described in this report. Overall, the report is organized into two parts. Part 1 describes the parallel modelling methodology and associated multilevel direct, iterative and mixed solution schemes. Part 2 establishes both the formal and computational properties of the scheme.

  19. The revised solar array synthesis computer program

    NASA Technical Reports Server (NTRS)

    1970-01-01

    The Revised Solar Array Synthesis Computer Program is described. It is a general-purpose program which computes solar array output characteristics while accounting for the effects of temperature, incidence angle, charged-particle irradiation, and other degradation effects on various solar array configurations in either circular or elliptical orbits. Array configurations may consist of up to 75 solar cell panels arranged in any series-parallel combination not exceeding three series-connected panels in a parallel string and no more than 25 parallel strings in an array. Up to 100 separate solar array current-voltage characteristics, corresponding to 100 equal-time increments during the sunlight illuminated portion of an orbit or any 100 user-specified combinations of incidence angle and temperature, can be computed and printed out during one complete computer execution. Individual panel incidence angles may be computed and printed out at the user's option.

  20. An Overview of Kinematic and Calibration Models Using Internal/External Sensors or Constraints to Improve the Behavior of Spatial Parallel Mechanisms

    PubMed Central

    Majarena, Ana C.; Santolaria, Jorge; Samper, David; Aguilar, Juan J.

    2010-01-01

    This paper presents an overview of the literature on kinematic and calibration models of parallel mechanisms, the influence of sensors in the mechanism accuracy and parallel mechanisms used as sensors. The most relevant classifications to obtain and solve kinematic models and to identify geometric and non-geometric parameters in the calibration of parallel robots are discussed, examining the advantages and disadvantages of each method, presenting new trends and identifying unsolved problems. This overview tries to answer and show the solutions developed by the most up-to-date research to some of the most frequent questions that appear in the modelling of a parallel mechanism, such as how to measure, the number of sensors and necessary configurations, the type and influence of errors or the number of necessary parameters. PMID:22163469

  1. Retargeting of existing FORTRAN program and development of parallel compilers

    NASA Technical Reports Server (NTRS)

    Agrawal, Dharma P.

    1988-01-01

    The software models used in implementing the parallelizing compiler for the B-HIVE multiprocessor system are described. The various models and strategies used in the compiler development are: flexible granularity model, which allows a compromise between two extreme granularity models; communication model, which is capable of precisely describing the interprocessor communication timings and patterns; loop type detection strategy, which identifies different types of loops; critical path with coloring scheme, which is a versatile scheduling strategy for any multicomputer with some associated communication costs; and loop allocation strategy, which realizes optimum overlapped operations between computation and communication of the system. Using these models, several sample routines of the AIR3D package are examined and tested. It may be noted that automatically generated codes are highly parallelized to provide the maximized degree of parallelism, obtaining the speedup up to a 28 to 32-processor system. A comparison of parallel codes for both the existing and proposed communication model, is performed and the corresponding expected speedup factors are obtained. The experimentation shows that the B-HIVE compiler produces more efficient codes than existing techniques. Work is progressing well in completing the final phase of the compiler. Numerous enhancements are needed to improve the capabilities of the parallelizing compiler.

  2. Multidisciplinary Design Optimization (MDO) Methods: Their Synergy with Computer Technology in Design Process

    NASA Technical Reports Server (NTRS)

    Sobieszczanski-Sobieski, Jaroslaw

    1998-01-01

    The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate a radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimization (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behavior by interaction of a large number of very simple models may be an inspiration for the above algorithms, the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should be now, even though the widespread availability of massively parallel processing is still a few years away.

  3. Hyper-Parallel Tempering Monte Carlo Method and It's Applications

    NASA Astrophysics Data System (ADS)

    Yan, Qiliang; de Pablo, Juan

    2000-03-01

    A new generalized hyper-parallel tempering Monte Carlo molecular simulation method is presented for study of complex fluids. The method is particularly useful for simulation of many-molecule complex systems, where rough energy landscapes and inherently long characteristic relaxation times can pose formidable obstacles to effective sampling of relevant regions of configuration space. The method combines several key elements from expanded ensemble formalisms, parallel-tempering, open ensemble simulations, configurational bias techniques, and histogram reweighting analysis of results. It is found to accelerate significantly the diffusion of a complex system through phase-space. In this presentation, we demonstrate the effectiveness of the new method by implementing it in grand canonical ensembles for a Lennard-Jones fluid, for the restricted primitive model of electrolyte solutions (RPM), and for polymer solutions and blends. Our results indicate that the new algorithm is capable of overcoming the large free energy barriers associated with phase transitions, thereby greatly facilitating the simulation of coexistence properties. It is also shown that the method can be orders of magnitude more efficient than previously available techniques. More importantly, the method is relatively simple and can be incorporated into existing simulation codes with minor efforts.

  4. Parallelization of GeoClaw code for modeling geophysical flows with adaptive mesh refinement on many-core systems

    USGS Publications Warehouse

    Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.

    2011-01-01

    We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.

  5. Overview and extensions of a system for routing directed graphs on SIMD architectures

    NASA Technical Reports Server (NTRS)

    Tomboulian, Sherryl

    1988-01-01

    Many problems can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from adjacent vertices. A method is given for parallelizing such problems on an SIMD machine model that uses only nearest neighbor connections for communication, and has no facility for local indirect addressing. Each vertex of the graph will be assigned to a processor in the machine. Rules for a labeling are introduced that support the use of a simple algorithm for movement of data along the edges of the graph. Additional algorithms are defined for addition and deletion of edges. Modifying or adding a new edge takes the same time as parallel traversal. This combination of architecture and algorithms defines a system that is relatively simple to build and can do fast graph processing. All edges can be traversed in parallel in time O(T), where T is empirically proportional to the average path length in the embedding times the average degree of the graph. Additionally, researchers present an extension to the above method which allows for enhanced performance by allowing some broadcasting capabilities.

  6. Multidisciplinary Design Optimisation (MDO) Methods: Their Synergy with Computer Technology in the Design Process

    NASA Technical Reports Server (NTRS)

    Sobieszczanski-Sobieski, Jaroslaw

    1999-01-01

    The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.

  7. The electrical MHD and Hall current impact on micropolar nanofluid flow between rotating parallel plates

    NASA Astrophysics Data System (ADS)

    Shah, Zahir; Islam, Saeed; Gul, Taza; Bonyah, Ebenezer; Altaf Khan, Muhammad

    2018-06-01

    The current research aims to examine the combined effect of magnetic and electric field on micropolar nanofluid between two parallel plates in a rotating system. The nanofluid flow between two parallel plates is taken under the influence of Hall current. The flow of micropolar nanofluid has been assumed in steady state. The rudimentary governing equations have been changed to a set of differential nonlinear and coupled equations using suitable similarity variables. An optimal approach has been used to acquire the solution of the modelled problems. The convergence of the method has been shown numerically. The impact of the Skin friction on velocity profile, Nusslet number on temperature profile and Sherwood number on concentration profile have been studied. The influences of the Hall currents, rotation, Brownian motion and thermophoresis analysis of micropolar nanofluid have been mainly focused in this work. Moreover, for comprehension the physical presentation of the embedded parameters that is, coupling parameter N1 , viscosity parameter Re , spin gradient viscosity parameter N2 , rotating parameter Kr , Micropolar fluid constant N3 , magnetic parameter M , Prandtl number Pr , Thermophoretic parameter Nt , Brownian motion parameter Nb , and Schmidt number Sc have been plotted and deliberated graphically.

  8. Low-Speed Investigation of Upper-Surface Leading-Edge Blowing on a High-Speed Civil Transport Configuration

    NASA Technical Reports Server (NTRS)

    Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.

    1999-01-01

    The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.

  9. Hybrid Forecasting of Daily River Discharges Considering Autoregressive Heteroscedasticity

    NASA Astrophysics Data System (ADS)

    Szolgayová, Elena Peksová; Danačová, Michaela; Komorniková, Magda; Szolgay, Ján

    2017-06-01

    It is widely acknowledged that in the hydrological and meteorological communities, there is a continuing need to improve the quality of quantitative rainfall and river flow forecasts. A hybrid (combined deterministic-stochastic) modelling approach is proposed here that combines the advantages offered by modelling the system dynamics with a deterministic model and a deterministic forecasting error series with a data-driven model in parallel. Since the processes to be modelled are generally nonlinear and the model error series may exhibit nonstationarity and heteroscedasticity, GARCH-type nonlinear time series models are considered here. The fitting, forecasting and simulation performance of such models have to be explored on a case-by-case basis. The goal of this paper is to test and develop an appropriate methodology for model fitting and forecasting applicable for daily river discharge forecast error data from the GARCH family of time series models. We concentrated on verifying whether the use of a GARCH-type model is suitable for modelling and forecasting a hydrological model error time series on the Hron and Morava Rivers in Slovakia. For this purpose we verified the presence of heteroscedasticity in the simulation error series of the KLN multilinear flow routing model; then we fitted the GARCH-type models to the data and compared their fit with that of an ARMA - type model. We produced one-stepahead forecasts from the fitted models and again provided comparisons of the model's performance.

  10. Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

    NASA Technical Reports Server (NTRS)

    Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)

    2002-01-01

    A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.

  11. Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

    NASA Technical Reports Server (NTRS)

    Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)

    2001-01-01

    A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.

  12. Process optimization using combinatorial design principles: parallel synthesis and design of experiment methods.

    PubMed

    Gooding, Owen W

    2004-06-01

    The use of parallel synthesis techniques with statistical design of experiment (DoE) methods is a powerful combination for the optimization of chemical processes. Advances in parallel synthesis equipment and easy to use software for statistical DoE have fueled a growing acceptance of these techniques in the pharmaceutical industry. As drug candidate structures become more complex at the same time that development timelines are compressed, these enabling technologies promise to become more important in the future.

  13. Hypercluster Parallel Processor

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

    1992-01-01

    Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.

  14. Research on Multi - Person Parallel Modeling Method Based on Integrated Model Persistent Storage

    NASA Astrophysics Data System (ADS)

    Qu, MingCheng; Wu, XiangHu; Tao, YongChao; Liu, Ying

    2018-03-01

    This paper mainly studies the multi-person parallel modeling method based on the integrated model persistence storage. The integrated model refers to a set of MDDT modeling graphics system, which can carry out multi-angle, multi-level and multi-stage description of aerospace general embedded software. Persistent storage refers to converting the data model in memory into a storage model and converting the storage model into a data model in memory, where the data model refers to the object model and the storage model is a binary stream. And multi-person parallel modeling refers to the need for multi-person collaboration, the role of separation, and even real-time remote synchronization modeling.

  15. Multichannel quench-flow microreactor chip for parallel reaction monitoring.

    PubMed

    Bula, Wojciech P; Verboom, Willem; Reinhoudt, David N; Gardeniers, Han J G E

    2007-12-01

    This paper describes a multichannel silicon-glass microreactor which has been utilized to investigate the kinetics of a Knoevenagel condensation reaction under different reaction conditions. The reaction is performed on the chip in four parallel channels under identical conditions but with different residence times. A special topology of the reaction coils overcomes the common problem arising from the difference in pressure drop of parallel channels having different length. The parallelization of reaction coils combined with chemical quenching at specific locations results in a considerable reduction in experimental effort and cost. The system was tested and showed good reproducibility in flow properties and reaction kinetic data generation.

  16. An L1-norm phase constraint for half-Fourier compressed sensing in 3D MR imaging.

    PubMed

    Li, Guobin; Hennig, Jürgen; Raithel, Esther; Büchert, Martin; Paul, Dominik; Korvink, Jan G; Zaitsev, Maxim

    2015-10-01

    In most half-Fourier imaging methods, explicit phase replacement is used. In combination with parallel imaging, or compressed sensing, half-Fourier reconstruction is usually performed in a separate step. The purpose of this paper is to report that integration of half-Fourier reconstruction into iterative reconstruction minimizes reconstruction errors. The L1-norm phase constraint for half-Fourier imaging proposed in this work is compared with the L2-norm variant of the same algorithm, with several typical half-Fourier reconstruction methods. Half-Fourier imaging with the proposed phase constraint can be seamlessly combined with parallel imaging and compressed sensing to achieve high acceleration factors. In simulations and in in-vivo experiments half-Fourier imaging with the proposed L1-norm phase constraint enables superior performance both reconstruction of image details and with regard to robustness against phase estimation errors. The performance and feasibility of half-Fourier imaging with the proposed L1-norm phase constraint is reported. Its seamless combination with parallel imaging and compressed sensing enables use of greater acceleration in 3D MR imaging.

  17. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2014-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  18. 5 mW parallel-connected resonant-tunnelling diode oscillator

    NASA Technical Reports Server (NTRS)

    Stephan, K. D.; Wong, S.-C.; Brown, E. R.; Molvar, K. M.; Calawa, A. R.; Manfra, M. J.

    1992-01-01

    A new type of resonant-tunneling diode (RTD) oscillator that generates 5 mW at 1.18 GHz is reported. This result was obtained by connecting in parallel 25 individual diodes designed for such a connection. This experiment demonstrates that RTDs can successfully be used in a chip-level power-combining circuit.

  19. Study of imbalanced internal resistance on drop voltage of LiFePO4 battery system connected in parallel

    NASA Astrophysics Data System (ADS)

    Adie Perdana, Fengky; Supriyanto, Agus; Purwanto, Agus; Jamaluddin, Anif

    2017-01-01

    The purpose of this research focuses on the effect of imbalanced internal resistance for the drop voltage of LiFePO4 18650 battery system connected in parallel. The battery pack has been assembled consist of two cell battery LiFePO4 18650 that has difference combination of internal resistance. Battery pack was tested with 1/C constant current charging, 3,65V per group sel, 3,65V constant voltage charging, 5 minutes of rest time between charge and discharge process, 1/2C Constant current discharge until 2,2V, 26 cycle of measurement test, and 4320 minutes rest time after the last charge cycle. We can conclude that the difference combination of internal resistance on the battery pack seriously influence the drop voltage of a battery. Theoretical and experimental result show that the imbalance of internal resistance during cycling are mainly responsible for the drop voltage of LiFePO4 parallel batteries. It is thus a good way to avoid drop voltage fade of parallel battery system by suppressing variations of internal resistance.

  20. OpenACC directive-based GPU acceleration of an implicit reconstructed discontinuous Galerkin method for compressible flows on 3D unstructured grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lou, Jialin; Xia, Yidong; Luo, Lixiang

    2016-09-01

    In this study, we use a combination of modeling techniques to describe the relationship between fracture radius that might be accomplished in a hypothetical enhanced geothermal system (EGS) and drilling distance required to create and access those fractures. We use a combination of commonly applied analytical solutions for heat transport in parallel fractures and 3D finite-element method models of more realistic heat extraction geometries. For a conceptual model involving multiple parallel fractures developed perpendicular to an inclined or horizontal borehole, calculations demonstrate that EGS will likely require very large fractures, of greater than 300 m radius, to keep interfracture drillingmore » distances to ~10 km or less. As drilling distances are generally inversely proportional to the square of fracture radius, drilling costs quickly escalate as the fracture radius decreases. It is important to know, however, whether fracture spacing will be dictated by thermal or mechanical considerations, as the relationship between drilling distance and number of fractures is quite different in each case. Information about the likelihood of hydraulically creating very large fractures comes primarily from petroleum recovery industry data describing hydraulic fractures in shale. Those data suggest that fractures with radii on the order of several hundred meters may, indeed, be possible. The results of this study demonstrate that relatively simple calculations can be used to estimate primary design constraints on a system, particularly regarding the relationship between generated fracture radius and the total length of drilling needed in the fracture creation zone. Comparison of the numerical simulations of more realistic geometries than addressed in the analytical solutions suggest that simple proportionalities can readily be derived to relate a particular flow field.« less

  1. Simultaneous learning and filtering without delusions: a Bayes-optimal combination of Predictive Inference and Adaptive Filtering.

    PubMed

    Kneissler, Jan; Drugowitsch, Jan; Friston, Karl; Butz, Martin V

    2015-01-01

    Predictive coding appears to be one of the fundamental working principles of brain processing. Amongst other aspects, brains often predict the sensory consequences of their own actions. Predictive coding resembles Kalman filtering, where incoming sensory information is filtered to produce prediction errors for subsequent adaptation and learning. However, to generate prediction errors given motor commands, a suitable temporal forward model is required to generate predictions. While in engineering applications, it is usually assumed that this forward model is known, the brain has to learn it. When filtering sensory input and learning from the residual signal in parallel, a fundamental problem arises: the system can enter a delusional loop when filtering the sensory information using an overly trusted forward model. In this case, learning stalls before accurate convergence because uncertainty about the forward model is not properly accommodated. We present a Bayes-optimal solution to this generic and pernicious problem for the case of linear forward models, which we call Predictive Inference and Adaptive Filtering (PIAF). PIAF filters incoming sensory information and learns the forward model simultaneously. We show that PIAF is formally related to Kalman filtering and to the Recursive Least Squares linear approximation method, but combines these procedures in a Bayes optimal fashion. Numerical evaluations confirm that the delusional loop is precluded and that the learning of the forward model is more than 10-times faster when compared to a naive combination of Kalman filtering and Recursive Least Squares.

  2. Multi-Modulator for Bandwidth-Efficient Communication

    NASA Technical Reports Server (NTRS)

    Gray, Andrew; Lee, Dennis; Lay, Norman; Cheetham, Craig; Fong, Wai; Yeh, Pen-Shu; King, Robin; Ghuman, Parminder; Hoy, Scott; Fisher, Dave

    2009-01-01

    A modulator circuit board has recently been developed to be used in conjunction with a vector modulator to generate any of a large number of modulations for bandwidth-efficient radio transmission of digital data signals at rates than can exceed 100 Mb/s. The modulations include quadrature phaseshift keying (QPSK), offset quadrature phase-shift keying (OQPSK), Gaussian minimum-shift keying (GMSK), and octonary phase-shift keying (8PSK) with square-root raised-cosine pulse shaping. The figure is a greatly simplified block diagram showing the relationship between the modulator board and the rest of the transmitter. The role of the modulator board is to encode the incoming data stream and to shape the resulting pulses, which are fed as inputs to the vector modulator. The combination of encoding and pulse shaping in a given application is chosen to maximize the bandwidth efficiency. The modulator board includes gallium arsenide serial-to-parallel converters at its input end. A complementary metal oxide/semiconductor (CMOS) field-programmable gate array (FPGA) performs the coding and modulation computations and utilizes parallel processing in doing so. The results of the parallel computation are combined and converted to pulse waveforms by use of gallium arsenide parallel-to-serial converters integrated with digital-to-analog converters. Without changing the hardware, one can configure the modulator to produce any of the designed combinations of coding and modulation by loading the appropriate bit configuration file into the FPGA.

  3. Partitioning and packing mathematical simulation models for calculation on parallel computers

    NASA Technical Reports Server (NTRS)

    Arpasi, D. J.; Milner, E. J.

    1986-01-01

    The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.

  4. Incorporating Parallel Computing into the Goddard Earth Observing System Data Assimilation System (GEOS DAS)

    NASA Technical Reports Server (NTRS)

    Larson, Jay W.

    1998-01-01

    Atmospheric data assimilation is a method of combining actual observations with model forecasts to produce a more accurate description of the earth system than the observations or forecast alone can provide. The output of data assimilation, sometimes called the analysis, are regular, gridded datasets of observed and unobserved variables. Analysis plays a key role in numerical weather prediction and is becoming increasingly important for climate research. These applications, and the need for timely validation of scientific enhancements to the data assimilation system pose computational demands that are best met by distributed parallel software. The mission of the NASA Data Assimilation Office (DAO) is to provide datasets for climate research and to support NASA satellite and aircraft missions. The system used to create these datasets is the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The core components of the the GEOS DAS are: the GEOS General Circulation Model (GCM), the Physical-space Statistical Analysis System (PSAS), the Observer, the on-line Quality Control (QC) system, the Coupler (which feeds analysis increments back to the GCM), and an I/O package for processing the large amounts of data the system produces (which will be described in another presentation in this session). The discussion will center on the following issues: the computational complexity for the whole GEOS DAS, assessment of the performance of the individual elements of GEOS DAS, and parallelization strategy for some of the components of the system.

  5. Validation of Shear Wave Elastography in Skeletal Muscle

    PubMed Central

    Eby, Sarah F.; Song, Pengfei; Chen, Shigao; Chen, Qingshan; Greenleaf, James F.; An, Kai-Nan

    2013-01-01

    Skeletal muscle is a very dynamic tissue, thus accurate quantification of skeletal muscle stiffness throughout its functional range is crucial to improve the physical functioning and independence following pathology. Shear wave elastography (SWE) is an ultrasound-based technique that characterizes tissue mechanical properties based on the propagation of remotely induced shear waves. The objective of this study is to validate SWE throughout the functional range of motion of skeletal muscle for three ultrasound transducer orientations. We hypothesized that combining traditional materials testing (MTS) techniques with SWE measurements will show increased stiffness measures with increasing tensile load, and will correlate well with each other for trials in which the transducer is parallel to underlying muscle fibers. To evaluate this hypothesis, we monitored the deformation throughout tensile loading of four porcine brachialis whole-muscle tissue specimens, while simultaneously making SWE measurements of the same specimen. We used regression to examine the correlation between Young's modulus from MTS and shear modulus from SWE for each of the transducer orientations. We applied a generalized linear model to account for repeated testing. Model parameters were estimated via generalized estimating equations. The regression coefficient was 0.1944, with a 95% confidence interval of (0.1463 – 0.2425) for parallel transducer trials. Shear waves did not propagate well for both the 45° and perpendicular transducer orientations. Both parallel SWE and MTS showed increased stiffness with increasing tensile load. This study provides the necessary first step for additional studies that can evaluate the distribution of stiffness throughout muscle. PMID:23953670

  6. PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.

    PubMed

    Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar

    2014-01-01

    Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.

  7. OpenMP parallelization of a gridded SWAT (SWATG)

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

    2017-12-01

    Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.

  8. Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

    NASA Astrophysics Data System (ADS)

    Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

    1997-12-01

    Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.

  9. Calculation of Vertical and Horizontal Mobilities in InAs/GaSb Superlattices (Postprint)

    DTIC Science & Technology

    2011-10-13

    width 2a and GaSb having width 2b, with the period = 2a + 2b. For energies near the band gap edges, the carrier wave function can be approximated by a...online) Electron energy bands along the growth direction for three combinations of InAs/ GaSb layer widths. For typical carrier densities, at low...Fermi energies , parallel masses, and band gaps from the 8×8 EFA model. Sheet carrier Calculated Measured Calculated InAs GaSb concentration per period

  10. The formation of quasi-parallel shocks. [in space, solar and astrophysical plasmas

    NASA Technical Reports Server (NTRS)

    Cargill, Peter J.

    1991-01-01

    In a collisionless plasma, the coupling between a piston and the plasma must take place through either laminar or turbulent electromagnetic fields. Of the three types of coupling (laminar, Larmor and turbulent), shock formation in the parallel regime is dominated by the latter and in the quasi-parallel regime by a combination of all three, depending on the piston. In the quasi-perpendicular regime, there is usually a good separation between piston and shock. This is not true in the quasi-parallel and parallel regime. Hybrid numerical simulations for hot plasma pistons indicate that when the electrons are hot, a shock forms, but does not cleanly decouple from the piston. For hot ion pistons, no shock forms in the parallel limit: in the quasi-parallel case, a shock forms, but there is severe contamination from hot piston ions. These results suggest that the properties of solar and astrophysical shocks, such as particle acceleration, cannot be readily separated from their driving mechanism.

  11. Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

    NASA Astrophysics Data System (ADS)

    Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

    2012-12-01

    The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.

  12. apGA: An adaptive parallel genetic algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liepins, G.E.; Baluja, S.

    1991-01-01

    We develop apGA, a parallel variant of the standard generational GA, that combines aggressive search with perpetual novelty, yet is able to preserve enough genetic structure to optimally solve variably scaled, non-uniform block deceptive and hierarchical deceptive problems. apGA combines elitism, adaptive mutation, adaptive exponential scaling, and temporal memory. We present empirical results for six classes of problems, including the DeJong test suite. Although we have not investigated hybrids, we note that apGA could be incorporated into other recent GA variants such as GENITOR, CHC, and the recombination stage of mGA. 12 refs., 2 figs., 2 tabs.

  13. The Research of the Parallel Computing Development from the Angle of Cloud Computing

    NASA Astrophysics Data System (ADS)

    Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

    2017-10-01

    Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.

  14. Combining biomedical preventions for HIV: Vaccines with pre-exposure prophylaxis, microbicides or other HIV preventions

    PubMed Central

    McNicholl, Janet M.

    2016-01-01

    ABSTRACT Biomedical preventions for HIV, such as vaccines, microbicides or pre-exposure prophylaxis (PrEP) with antiretroviral drugs, can each only partially prevent HIV-1 infection in most human trials. Oral PrEP is now FDA approved for HIV-prevention in high risk groups, but partial adherence reduces efficacy. If combined as biomedical preventions (CBP) an HIV vaccine could provide protection when PrEP adherence is low and PrEP could prevent vaccine breakthroughs. Other types of PrEP or microbicides may also be partially protective. When licensed, first generation HIV vaccines are likely to be partially effective. Individuals at risk for HIV may receive an HIV vaccine combined with other biomedical preventions, in series or in parallel, in clinical trials or as part of standard of care, with the goal of maximally increasing HIV prevention. In human studies, it is challenging to determine which preventions are best combined, how they interact and how effective they are. Animal models can determine CBP efficacy, whether additive or synergistic, the efficacy of different products and combinations, dose, timing and mechanisms. CBP studies in macaques have shown that partially or minimally effective candidate HIV vaccines combined with partially effective oral PrEP, vaginal PrEP or microbicide generally provided greater protection than either prevention alone against SIV or SHIV challenges. Since human CBP trials will be complex, animal models can guide their design, sample size, endpoints, correlates and surrogates of protection. This review focuses on animal studies and human models of CBP and discusses implications for HIV prevention. PMID:27679928

  15. Combining biomedical preventions for HIV: Vaccines with pre-exposure prophylaxis, microbicides or other HIV preventions.

    PubMed

    McNicholl, Janet M

    2016-12-01

    Biomedical preventions for HIV, such as vaccines, microbicides or pre-exposure prophylaxis (PrEP) with antiretroviral drugs, can each only partially prevent HIV-1 infection in most human trials. Oral PrEP is now FDA approved for HIV-prevention in high risk groups, but partial adherence reduces efficacy. If combined as biomedical preventions (CBP) an HIV vaccine could provide protection when PrEP adherence is low and PrEP could prevent vaccine breakthroughs. Other types of PrEP or microbicides may also be partially protective. When licensed, first generation HIV vaccines are likely to be partially effective. Individuals at risk for HIV may receive an HIV vaccine combined with other biomedical preventions, in series or in parallel, in clinical trials or as part of standard of care, with the goal of maximally increasing HIV prevention. In human studies, it is challenging to determine which preventions are best combined, how they interact and how effective they are. Animal models can determine CBP efficacy, whether additive or synergistic, the efficacy of different products and combinations, dose, timing and mechanisms. CBP studies in macaques have shown that partially or minimally effective candidate HIV vaccines combined with partially effective oral PrEP, vaginal PrEP or microbicide generally provided greater protection than either prevention alone against SIV or SHIV challenges. Since human CBP trials will be complex, animal models can guide their design, sample size, endpoints, correlates and surrogates of protection. This review focuses on animal studies and human models of CBP and discusses implications for HIV prevention.

  16. Describing, using 'recognition cones'. [parallel-series model with English-like computer program

    NASA Technical Reports Server (NTRS)

    Uhr, L.

    1973-01-01

    A parallel-serial 'recognition cone' model is examined, taking into account the model's ability to describe scenes of objects. An actual program is presented in an English-like language. The concept of a 'description' is discussed together with possible types of descriptive information. Questions regarding the level and the variety of detail are considered along with approaches for improving the serial representations of parallel systems.

  17. Parallel distributed, reciprocal Monte Carlo radiation in coupled, large eddy combustion simulations

    NASA Astrophysics Data System (ADS)

    Hunsaker, Isaac L.

    Radiation is the dominant mode of heat transfer in high temperature combustion environments. Radiative heat transfer affects the gas and particle phases, including all the associated combustion chemistry. The radiative properties are in turn affected by the turbulent flow field. This bi-directional coupling of radiation turbulence interactions poses a major challenge in creating parallel-capable, high-fidelity combustion simulations. In this work, a new model was developed in which reciprocal monte carlo radiation was coupled with a turbulent, large-eddy simulation combustion model. A technique wherein domain patches are stitched together was implemented to allow for scalable parallelism. The combustion model runs in parallel on a decomposed domain. The radiation model runs in parallel on a recomposed domain. The recomposed domain is stored on each processor after information sharing of the decomposed domain is handled via the message passing interface. Verification and validation testing of the new radiation model were favorable. Strong scaling analyses were performed on the Ember cluster and the Titan cluster for the CPU-radiation model and GPU-radiation model, respectively. The model demonstrated strong scaling to over 1,700 and 16,000 processing cores on Ember and Titan, respectively.

  18. Numerical modelling of the formation of fibrous bedding-parallel veins

    NASA Astrophysics Data System (ADS)

    Torremans, Koen; Muchez, Philippe; Sintubin, Manuel

    2014-05-01

    Bedding-parallel veins with a fibrous infill oriented orthogonal to the vein wall, are often observed in fine-grained metasedimentary sequences. Several mechanisms have been proposed for their formation, mostly with respect to effects of fluid overpressures and anisotropy of the host-rock fabric in order to explain the inferred extensional failure with sub-vertical opening. Abundant pre-folding, bedding-parallel fibrous dolomite veins are found associated with the Nkana-Mindola stratiform Cu-Co deposit in Zambia. The goal of this study is to better understand the formation mechanisms of these veins and to explain their particular spatial and thickness distribution, with respect to failure of transversely isotropic rocks. The spatial distribution and thickness variation of these veins was quantified during a field campaign in thirteen line transects perpendicular to undeformed veins in underground crosscuts. The fibrous dolomite veins studied are not related to lithological contrasts, but to a strong bedding-parallel shaly fabric, typical for the black shale facies of the Copperbelt Orebody Member. The host rock can hence be considered as transversely isotropic. Growth morphologies vary from antitaxial with a pronounced median surface to asymmetric syntaxial, always with small but quantifiable growth competition. A microstructural fabric study reveals that the undeformed dolomite veins show low-tortuosity vein walls and quantifiable growth competition. Here, we use a Discrete Element Method numerical modelling approach with ESyS-Particle (http://launchpad.net/esys-particle) to simulate the observed properties of the veins. Calibrated numerical specimens with a transversely isotropic matrix are repeatedly brought to failure under constant strain rates by changing the effective strain rates at model boundaries. After each fracture event, fractures in the numerical model are filled with cohesive vein material and the experiment is repeated. By systematically varying stress states, fluid pressures and mechanical properties of materials (host rock, vein infill and interface), we attempt to reproduce the characteristics of spatial distribution and thickness variation of the veins. Four parameter sets of mechanical micro-properties are defined in the models, essentially yielding (1) a competent and (2) incompetent matrix, (3) a vein material and (4) a vein-matrix interface. Each combination of parameters and particle packings is calibrated to fit a predetermined Mohr-Coulomb type failure envelope, via an automated calibration procedure. Preliminary tests already show that by varying these parameters, we are able to simulate realistically distributed cracking through crack-seal processes. Different types of veins and vein generations can be modelled, ranging from single veins, over crack-seal veins to anastomosing veins, by varying the mechanical strength of competent and incompetent matrix, vein and interface material. Further results of this approach will be presented. We will discuss our results with respect to mechanisms proposed in the literature for bedding-parallel, fibrous veins in metasedimentary rock sequences.

  19. Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fathany, Maulana Yusuf, E-mail: myfathany@gmail.com; Fuada, Syifaul, E-mail: fsyifaul@gmail.com; Lawu, Braham Lawas, E-mail: bram-labs@rocketmail.com

    2016-04-19

    This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.

  20. MIP Models and Hybrid Algorithms for Simultaneous Job Splitting and Scheduling on Unrelated Parallel Machines

    PubMed Central

    Ozmutlu, H. Cenk

    2014-01-01

    We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms. PMID:24977204

  1. Modelling machine ensembles with discrete event dynamical system theory

    NASA Technical Reports Server (NTRS)

    Hunter, Dan

    1990-01-01

    Discrete Event Dynamical System (DEDS) theory can be utilized as a control strategy for future complex machine ensembles that will be required for in-space construction. The control strategy involves orchestrating a set of interactive submachines to perform a set of tasks for a given set of constraints such as minimum time, minimum energy, or maximum machine utilization. Machine ensembles can be hierarchically modeled as a global model that combines the operations of the individual submachines. These submachines are represented in the global model as local models. Local models, from the perspective of DEDS theory , are described by the following: a set of system and transition states, an event alphabet that portrays actions that takes a submachine from one state to another, an initial system state, a partial function that maps the current state and event alphabet to the next state, and the time required for the event to occur. Each submachine in the machine ensemble is presented by a unique local model. The global model combines the local models such that the local models can operate in parallel under the additional logistic and physical constraints due to submachine interactions. The global model is constructed from the states, events, event functions, and timing requirements of the local models. Supervisory control can be implemented in the global model by various methods such as task scheduling (open-loop control) or implementing a feedback DEDS controller (closed-loop control).

  2. Development of a Distributed Parallel Computing Framework to Facilitate Regional/Global Gridded Crop Modeling with Various Scenarios

    NASA Astrophysics Data System (ADS)

    Jang, W.; Engda, T. A.; Neff, J. C.; Herrick, J.

    2017-12-01

    Many crop models are increasingly used to evaluate crop yields at regional and global scales. However, implementation of these models across large areas using fine-scale grids is limited by computational time requirements. In order to facilitate global gridded crop modeling with various scenarios (i.e., different crop, management schedule, fertilizer, and irrigation) using the Environmental Policy Integrated Climate (EPIC) model, we developed a distributed parallel computing framework in Python. Our local desktop with 14 cores (28 threads) was used to test the distributed parallel computing framework in Iringa, Tanzania which has 406,839 grid cells. High-resolution soil data, SoilGrids (250 x 250 m), and climate data, AgMERRA (0.25 x 0.25 deg) were also used as input data for the gridded EPIC model. The framework includes a master file for parallel computing, input database, input data formatters, EPIC model execution, and output analyzers. Through the master file for parallel computing, the user-defined number of threads of CPU divides the EPIC simulation into jobs. Then, Using EPIC input data formatters, the raw database is formatted for EPIC input data and the formatted data moves into EPIC simulation jobs. Then, 28 EPIC jobs run simultaneously and only interesting results files are parsed and moved into output analyzers. We applied various scenarios with seven different slopes and twenty-four fertilizer ranges. Parallelized input generators create different scenarios as a list for distributed parallel computing. After all simulations are completed, parallelized output analyzers are used to analyze all outputs according to the different scenarios. This saves significant computing time and resources, making it possible to conduct gridded modeling at regional to global scales with high-resolution data. For example, serial processing for the Iringa test case would require 113 hours, while using the framework developed in this study requires only approximately 6 hours, a nearly 95% reduction in computing time.

  3. Modeling and Visualizing Flow of Chemical Agents Across Complex Terrain

    NASA Technical Reports Server (NTRS)

    Kao, David; Kramer, Marc; Chaderjian, Neal

    2005-01-01

    Release of chemical agents across complex terrain presents a real threat to homeland security. Modeling and visualization tools are being developed that capture flow fluid terrain interaction as well as point dispersal downstream flow paths. These analytic tools when coupled with UAV atmospheric observations provide predictive capabilities to allow for rapid emergency response as well as developing a comprehensive preemptive counter-threat evacuation plan. The visualization tools involve high-end computing and massive parallel processing combined with texture mapping. We demonstrate our approach across a mountainous portion of North California under two contrasting meteorological conditions. Animations depicting flow over this geographical location provide immediate assistance in decision support and crisis management.

  4. The Role of Entrepreneurial Activities in Academic Pharmaceutical Science Research

    PubMed Central

    Stinchcomb, Audra L.

    2010-01-01

    Academic pharmaceutical science research is expanding further and further from the University setting to encompass the for-profit private company setting. This parallels the National Institutes of Health momentum to include multiple funding opportunities for University and private company collaboration. It has been recognized that the non-profit and for-profit combination research model can accelerate the commercialization of pharmaceutical products, and therefore more efficiently improve human health. Entrepreneurial activities require unique considerations in the University environment, but can be modeled after the commercialization expansion of the academic healthcare enterprise. Challenges and barriers exist to starting a company as an entrepreneurial faculty member, but the rewards to one's personal and professional lives are incomparable. PMID:20017206

  5. Parallelization of elliptic solver for solving 1D Boussinesq model

    NASA Astrophysics Data System (ADS)

    Tarwidi, D.; Adytia, D.

    2018-03-01

    In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.

  6. A hybrid parallel framework for the cellular Potts model simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Yi; He, Kejing; Dong, Shoubin

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less

  7. Comparing Families of Dynamic Causal Models

    PubMed Central

    Penny, Will D.; Stephan, Klaas E.; Daunizeau, Jean; Rosa, Maria J.; Friston, Karl J.; Schofield, Thomas M.; Leff, Alex P.

    2010-01-01

    Mathematical models of scientific data can be formally compared using Bayesian model evidence. Previous applications in the biological sciences have mainly focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. This “best model” approach is very useful but can become brittle if there are a large number of models to compare, and if different subjects use different models. To overcome this shortcoming we propose the combination of two further approaches: (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic of interest. For example: What are the inputs to the system? Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by a single, crucial connection? We apply Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure. We illustrate the methods using Dynamic Causal Models of brain imaging data. PMID:20300649

  8. A Parallel Saturation Algorithm on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Ezekiel, Jonathan; Siminiceanu

    2007-01-01

    Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.

  9. Electromagnetic Design of a Magnetically Coupled Spatial Power Combiner

    NASA Astrophysics Data System (ADS)

    Bulcha, B. T.; Cataldo, G.; Stevenson, T. R.; U-Yen, K.; Moseley, S. H.; Wollack, E. J.

    2018-04-01

    The design of a two-dimensional spatial beam-combining network employing a parallel-plate superconducting waveguide filled with a monocrystalline silicon dielectric substrate is presented. This component uses arrays of magnetically coupled antenna elements to achieve high coupling efficiency and full sampling of the intensity distribution while avoiding diffractive losses in the multimode waveguide region. These attributes enable the structure's use in realizing compact far-infrared spectrometers for astrophysical and instrumentation applications. If unterminated, reflections within a finite-sized spatial beam combiner can potentially lead to spurious couplings between elements. A planar meta-material electromagnetic absorber is implemented to control this response within the device. This broadband termination absorbs greater than 0.99 of the power over the 1.7:1 operational band at angles ranging from normal to near-parallel incidence. The design approach, simulations and applications of the spatial power combiner and meta-material termination structure are presented.

  10. Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

    DOE PAGES

    Grayver, Alexander V.; Kolev, Tzanio V.

    2015-11-01

    Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less

  11. Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grayver, Alexander V.; Kolev, Tzanio V.

    Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less

  12. Extricating Manual and Non-Manual Features for Subunit Level Medical Sign Modelling in Automatic Sign Language Classification and Recognition.

    PubMed

    R, Elakkiya; K, Selvamani

    2017-09-22

    Subunit segmenting and modelling in medical sign language is one of the important studies in linguistic-oriented and vision-based Sign Language Recognition (SLR). Many efforts were made in the precedent to focus the functional subunits from the view of linguistic syllables but the problem is implementing such subunit extraction using syllables is not feasible in real-world computer vision techniques. And also, the present recognition systems are designed in such a way that it can detect the signer dependent actions under restricted and laboratory conditions. This research paper aims at solving these two important issues (1) Subunit extraction and (2) Signer independent action on visual sign language recognition. Subunit extraction involved in the sequential and parallel breakdown of sign gestures without any prior knowledge on syllables and number of subunits. A novel Bayesian Parallel Hidden Markov Model (BPaHMM) is introduced for subunit extraction to combine the features of manual and non-manual parameters to yield better results in classification and recognition of signs. Signer independent action aims in using a single web camera for different signer behaviour patterns and for cross-signer validation. Experimental results have proved that the proposed signer independent subunit level modelling for sign language classification and recognition has shown improvement and variations when compared with other existing works.

  13. Impact of current speed on mass flux to a model flexible seagrass blade

    NASA Astrophysics Data System (ADS)

    Lei, Jiarui; Nepf, Heidi

    2016-07-01

    Seagrass and other freshwater macrophytes can acquire nutrients from surrounding water through their blades. This flux may depend on the current speed (U), which can influence both the posture of flexible blades (reconfiguration) and the thickness of the flux-limiting diffusive layer. The impact of current speed (U) on mass flux to flexible blades of model seagrass was studied through a combination of laboratory flume experiments, numerical modeling and theory. Model seagrass blades were constructed from low-density polyethylene (LDPE), and 1, 2-dichlorobenzene was used as a tracer chemical. The tracer mass accumulation in the blades was measured at different unidirectional current speeds. A numerical model was used to estimate the transfer velocity (K) by fitting the measured mass uptake to a one-dimensional diffusion model. The measured transfer velocity was compared to predictions based on laminar and turbulent boundary layers developing over a flat plate parallel to flow, for which K∝U0.5 and ∝U, respectively. The degree of blade reconfiguration depended on the dimensionless Cauchy number, Ca, which is a function of both the blade stiffness and flow velocity. For large Ca, the majority of the blade was parallel to the flow, and the measured transfer velocity agreed with laminar boundary layer theory, K∝U0.5. For small Ca, the model blades remained upright, and the flux to the blade was diminished relative to the flat-plate model. A meadow-scale analysis suggests that the mass exchange at the blade scale may control the uptake at the meadow scale.

  14. Robot Acting on Moving Bodies (RAMBO): Interaction with tumbling objects

    NASA Technical Reports Server (NTRS)

    Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madhu; Harwood, David

    1989-01-01

    Interaction with tumbling objects will become more common as human activities in space expand. Attempting to interact with a large complex object translating and rotating in space, a human operator using only his visual and mental capacities may not be able to estimate the object motion, plan actions or control those actions. A robot system (RAMBO) equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a tumbling object, is being developed. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations rearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enhancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using dynamic interpolations between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.

  15. Revised Extended Grid Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martz, Roger L.

    The Revised Eolus Grid Library (REGL) is a mesh-tracking library that was developed for use with the MCNP6TM computer code so that (radiation) particles can track on an unstructured mesh. The unstructured mesh is a finite element representation of any geometric solid model created with a state-of-the-art CAE/CAD tool. The mesh-tracking library is written using modern Fortran and programming standards; the library is Fortran 2003 compliant. The library was created with a defined application programmer interface (API) so that it could easily integrate with other particle tracking/transport codes. The library does not handle parallel processing via the message passing interfacemore » (mpi), but has been used successfully where the host code handles the mpi calls. The library is thread-safe and supports the OpenMP paradigm. As a library, all features are available through the API and overall a tight coupling between it and the host code is required. Features of the library are summarized with the following list: Can accommodate first and second order 4, 5, and 6-sided polyhedra; any combination of element types may appear in a single geometry model; parts may not contain tetrahedra mixed with other element types; pentahedra and hexahedra can be together in the same part; robust handling of overlaps and gaps; tracks element-to-element to produce path length results at the element level; finds element numbers for a given mesh location; finds intersection points on element faces for the particle tracks; produce a data file for post processing results analysis; reads Abaqus .inp input (ASCII) files to obtain information for the global mesh-model; supports parallel input processing via mpi; and support parallel particle transport by both mpi and OpenMP.« less

  16. Enabling Efficient Climate Science Workflows in High Performance Computing Environments

    NASA Astrophysics Data System (ADS)

    Krishnan, H.; Byna, S.; Wehner, M. F.; Gu, J.; O'Brien, T. A.; Loring, B.; Stone, D. A.; Collins, W.; Prabhat, M.; Liu, Y.; Johnson, J. N.; Paciorek, C. J.

    2015-12-01

    A typical climate science workflow often involves a combination of acquisition of data, modeling, simulation, analysis, visualization, publishing, and storage of results. Each of these tasks provide a myriad of challenges when running on a high performance computing environment such as Hopper or Edison at NERSC. Hurdles such as data transfer and management, job scheduling, parallel analysis routines, and publication require a lot of forethought and planning to ensure that proper quality control mechanisms are in place. These steps require effectively utilizing a combination of well tested and newly developed functionality to move data, perform analysis, apply statistical routines, and finally, serve results and tools to the greater scientific community. As part of the CAlibrated and Systematic Characterization, Attribution and Detection of Extremes (CASCADE) project we highlight a stack of tools our team utilizes and has developed to ensure that large scale simulation and analysis work are commonplace and provide operations that assist in everything from generation/procurement of data (HTAR/Globus) to automating publication of results to portals like the Earth Systems Grid Federation (ESGF), all while executing everything in between in a scalable environment in a task parallel way (MPI). We highlight the use and benefit of these tools by showing several climate science analysis use cases they have been applied to.

  17. Efficacy and safety of pioglitazone added to alogliptin in Japanese patients with type 2 diabetes mellitus: a multicentre, randomized, double-blind, parallel-group, comparative study.

    PubMed

    Kaku, K; Katou, M; Igeta, M; Ohira, T; Sano, H

    2015-12-01

    A phase IV, multicentre, randomized, double-blind, parallel-group, comparative study was conducted in Japanese subjects with type 2 diabetes mellitus (T2DM) who had inadequate glycaemic control, despite treatment with alogliptin in addition to diet and/or exercise therapy. Subjects with glycated haemoglobin (HbA1c) concentrations of 6.9-10.5% were randomized to receive 16 weeks' double-blind treatment with pioglitazone 15 mg, 30 mg once daily or placebo added to alogliptin 25 mg once daily. The primary endpoint was the change in HbA1c from baseline at the end of treatment period (week 16). Both pioglitazone 15 and 30 mg combination therapy resulted in a significantly greater reduction in HbA1c than alogliptin monotherapy [-0.80 and -0.90% vs 0.00% (the least squares mean using analysis of covariance model); p < 0.0001, respectively]. The overall incidence rates of treatment-emergent adverse events were similar among the treatment groups. Pioglitazone/alogliptin combination therapy was effective and generally well tolerated in Japanese subjects with T2DM and is considered to be useful in clinical settings. © 2015 John Wiley & Sons Ltd.

  18. A novel approach combining self-organizing map and parallel factor analysis for monitoring water quality of watersheds under non-point source pollution

    PubMed Central

    Zhang, Yixiang; Liang, Xinqiang; Wang, Zhibo; Xu, Lixian

    2015-01-01

    High content of organic matter in the downstream of watersheds underscored the severity of non-point source (NPS) pollution. The major objectives of this study were to characterize and quantify dissolved organic matter (DOM) in watersheds affected by NPS pollution, and to apply self-organizing map (SOM) and parallel factor analysis (PARAFAC) to assess fluorescence properties as proxy indicators for NPS pollution and labor-intensive routine water quality indicators. Water from upstreams and downstreams was sampled to measure dissolved organic carbon (DOC) concentrations and excitation-emission matrix (EEM). Five fluorescence components were modeled with PARAFAC. The regression analysis between PARAFAC intensities (Fmax) and raw EEM measurements indicated that several raw fluorescence measurements at target excitation-emission wavelength region could provide similar DOM information to massive EEM measurements combined with PARAFAC. Regression analysis between DOC concentration and raw EEM measurements suggested that some regions in raw EEM could be used as surrogates for labor-intensive routine indicators. SOM can be used to visualize the occurrence of pollution. Relationship between DOC concentration and PARAFAC components analyzed with SOM suggested that PARAFAC component 2 might be the major part of bulk DOC and could be recognized as a proxy indicator to predict the DOC concentration. PMID:26526140

  19. Providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

    DOEpatents

    Archer, Charles J.; Faraj, Ahmad A.; Inglett, Todd A.; Ratterman, Joseph D.

    2012-10-23

    Methods, apparatus, and products are disclosed for providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: identifying each link in the global combining network for each compute node of the operational group; designating one of a plurality of point-to-point class routing identifiers for each link such that no compute node in the operational group is connected to two adjacent compute nodes in the operational group with links designated for the same class routing identifiers; and configuring each compute node of the operational group for point-to-point communications with each adjacent compute node in the global combining network through the link between that compute node and that adjacent compute node using that link's designated class routing identifier.

  20. Innovative Language-Based & Object-Oriented Structured AMR Using Fortran 90 and OpenMP

    NASA Technical Reports Server (NTRS)

    Norton, C.; Balsara, D.

    1999-01-01

    Parallel adaptive mesh refinement (AMR) is an important numerical technique that leads to the efficient solution of many physical and engineering problems. In this paper, we describe how AMR programing can be performed in an object-oreinted way using the modern aspects of Fortran 90 combined with the parallelization features of OpenMP.

  1. Research on Parallel Three Phase PWM Converters base on RTDS

    NASA Astrophysics Data System (ADS)

    Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun

    2018-01-01

    Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.

  2. On the costs of parallel processing in dual-task performance: The case of lexical processing in word production.

    PubMed

    Paucke, Madlen; Oppermann, Frank; Koch, Iring; Jescheniak, Jörg D

    2015-12-01

    Previous dual-task picture-naming studies suggest that lexical processes require capacity-limited processes and prevent other tasks to be carried out in parallel. However, studies involving the processing of multiple pictures suggest that parallel lexical processing is possible. The present study investigated the specific costs that may arise when such parallel processing occurs. We used a novel dual-task paradigm by presenting 2 visual objects associated with different tasks and manipulating between-task similarity. With high similarity, a picture-naming task (T1) was combined with a phoneme-decision task (T2), so that lexical processes were shared across tasks. With low similarity, picture-naming was combined with a size-decision T2 (nonshared lexical processes). In Experiment 1, we found that a manipulation of lexical processes (lexical frequency of T1 object name) showed an additive propagation with low between-task similarity and an overadditive propagation with high between-task similarity. Experiment 2 replicated this differential forward propagation of the lexical effect and showed that it disappeared with longer stimulus onset asynchronies. Moreover, both experiments showed backward crosstalk, indexed as worse T1 performance with high between-task similarity compared with low similarity. Together, these findings suggest that conditions of high between-task similarity can lead to parallel lexical processing in both tasks, which, however, does not result in benefits but rather in extra performance costs. These costs can be attributed to crosstalk based on the dual-task binding problem arising from parallel processing. Hence, the present study reveals that capacity-limited lexical processing can run in parallel across dual tasks but only at the expense of extraordinary high costs. (c) 2015 APA, all rights reserved).

  3. Capabilities of Fully Parallelized MHD Stability Code MARS

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2016-10-01

    Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.

  4. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2015-11-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.

  5. Flood predictions using the parallel version of distributed numerical physical rainfall-runoff model TOPKAPI

    NASA Astrophysics Data System (ADS)

    Boyko, Oleksiy; Zheleznyak, Mark

    2015-04-01

    The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.

  6. Dynamic modeling of parallel robots for computed-torque control implementation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Codourey, A.

    1998-12-01

    In recent years, increased interest in parallel robots has been observed. Their control with modern theory, such as the computed-torque method, has, however, been restrained, essentially due to the difficulty in establishing a simple dynamic model that can be calculated in real time. In this paper, a simple method based on the virtual work principle is proposed for modeling parallel robots. The mass matrix of the robot, needed for decoupling control strategies, does not explicitly appear in the formulation; however, it can be computed separately, based on kinetic energy considerations. The method is applied to the DELTA parallel robot, leadingmore » to a very efficient model that has been implemented in a real-time computed-torque control algorithm.« less

  7. A hybrid algorithm for parallel molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Mangiardi, Chris M.; Meyer, R.

    2017-10-01

    This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.

  8. Optimal Design of Passive Power Filters Based on Pseudo-parallel Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Li, Pei; Li, Hongbo; Gao, Nannan; Niu, Lin; Guo, Liangfeng; Pei, Ying; Zhang, Yanyan; Xu, Minmin; Chen, Kerui

    2017-05-01

    The economic costs together with filter efficiency are taken as targets to optimize the parameter of passive filter. Furthermore, the method of combining pseudo-parallel genetic algorithm with adaptive genetic algorithm is adopted in this paper. In the early stages pseudo-parallel genetic algorithm is introduced to increase the population diversity, and adaptive genetic algorithm is used in the late stages to reduce the workload. At the same time, the migration rate of pseudo-parallel genetic algorithm is improved to change with population diversity adaptively. Simulation results show that the filter designed by the proposed method has better filtering effect with lower economic cost, and can be used in engineering.

  9. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.

  10. A communication library for the parallelization of air quality models on structured grids

    NASA Astrophysics Data System (ADS)

    Miehe, Philipp; Sandu, Adrian; Carmichael, Gregory R.; Tang, Youhua; Dăescu, Dacian

    PAQMSG is an MPI-based, Fortran 90 communication library for the parallelization of air quality models (AQMs) on structured grids. It consists of distribution, gathering and repartitioning routines for different domain decompositions implementing a master-worker strategy. The library is architecture and application independent and includes optimization strategies for different architectures. This paper presents the library from a user perspective. Results are shown from the parallelization of STEM-III on Beowulf clusters. The PAQMSG library is available on the web. The communication routines are easy to use, and should allow for an immediate parallelization of existing AQMs. PAQMSG can also be used for constructing new models.

  11. Decomposition of timed automata for solving scheduling problems

    NASA Astrophysics Data System (ADS)

    Nishi, Tatsushi; Wakatake, Masato

    2014-03-01

    A decomposition algorithm for scheduling problems based on timed automata (TA) model is proposed. The problem is represented as an optimal state transition problem for TA. The model comprises of the parallel composition of submodels such as jobs and resources. The procedure of the proposed methodology can be divided into two steps. The first step is to decompose the TA model into several submodels by using decomposable condition. The second step is to combine individual solution of subproblems for the decomposed submodels by the penalty function method. A feasible solution for the entire model is derived through the iterated computation of solving the subproblem for each submodel. The proposed methodology is applied to solve flowshop and jobshop scheduling problems. Computational experiments demonstrate the effectiveness of the proposed algorithm compared with a conventional TA scheduling algorithm without decomposition.

  12. NASA Space Radiation Risk Project: Overview and Recent Results

    NASA Technical Reports Server (NTRS)

    Blattnig, Steve R.; Chappell, Lori J.; George, Kerry A.; Hada, Megumi; Hu, Shaowen; Kidane, Yared H.; Kim, Myung-Hee Y.; Kovyrshina, Tatiana; Norman, Ryan B.; Nounu, Hatem N.; hide

    2015-01-01

    The NASA Space Radiation Risk project is responsible for integrating new experimental and computational results into models to predict risk of cancer and acute radiation syndrome (ARS) for use in mission planning and systems design, as well as current space operations. The project has several parallel efforts focused on proving NASA's radiation risk projection capability in both the near and long term. This presentation will give an overview, with select results from these efforts including the following topics: verification, validation, and streamlining the transition of models to use in decision making; relative biological effectiveness and dose rate effect estimation using a combination of stochastic track structure simulations, DNA damage model calculations and experimental data; ARS model improvements; pathway analysis from gene expression data sets; solar particle event probabilistic exposure calculation including correlated uncertainties for use in design optimization.

  13. Providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

    DOEpatents

    Archer, Charles J; Faraj, Ahmad A; Inglett, Todd A; Ratterman, Joseph D

    2013-04-16

    Methods, apparatus, and products are disclosed for providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: receiving a network packet in a compute node, the network packet specifying a destination compute node; selecting, in dependence upon the destination compute node, at least one of the links for the compute node along which to forward the network packet toward the destination compute node; and forwarding the network packet along the selected link to the adjacent compute node connected to the compute node through the selected link.

  14. Toward a Model Framework of Generalized Parallel Componential Processing of Multi-Symbol Numbers

    ERIC Educational Resources Information Center

    Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

    2015-01-01

    In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining…

  15. Combined collapse by bridging and self-adhesion in a prototypical polymer model inspired by the bacterial nucleoid

    NASA Astrophysics Data System (ADS)

    Scolari, Vittore F.; Cosentino Lagomarsino, Marco

    Recent experimental results suggest that the E. coli chromosome feels a self-attracting interaction of osmotic origin, and is condensed in foci by bridging interactions. Motivated by these findings, we explore a generic modeling framework combining solely these two ingredients, in order to characterize their joint effects. Specifically, we study a simple polymer physics computational model with weak ubiquitous short-ranged self attraction and stronger sparse bridging interactions. Combining theoretical arguments and simulations, we study the general phenomenology of polymer collapse induced by these dual contributions, in the case of regularly-spaced bridging. Our results distinguish a regime of classical Flory-like coil-globule collapse dictated by the interplay of excluded volume and attractive energy and a switch-like collapse where bridging interaction compete with entropy loss terms from the looped arms of a star-like rosette. Additionally, we show that bridging can induce stable compartmentalized domains. In these configurations, different "cores" of bridging proteins are kept separated by star-like polymer loops in an entropically favorable multi-domain configuration, with a mechanism that parallels micellar polysoaps. Such compartmentalized domains are stable, and do not need any intra-specific interactions driving their segregation. Domains can be stable also in presence of uniform attraction, as long as the uniform collapse is above its theta point.

  16. Vehicle track segmentation using higher order random fields

    DOE PAGES

    Quach, Tu -Thach

    2017-01-09

    Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less

  17. Vehicle track segmentation using higher order random fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quach, Tu -Thach

    Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less

  18. A concise evidence-based physical examination for diagnosis of acromioclavicular joint pathology: a systematic review.

    PubMed

    Krill, Michael K; Rosas, Samuel; Kwon, KiHyun; Dakkak, Andrew; Nwachukwu, Benedict U; McCormick, Frank

    2018-02-01

    The clinical examination of the shoulder joint is an undervalued diagnostic tool for evaluating acromioclavicular (AC) joint pathology. Applying evidence-based clinical tests enables providers to make an accurate diagnosis and minimize costly imaging procedures and potential delays in care. The purpose of this study was to create a decision tree analysis enabling simple and accurate diagnosis of AC joint pathology. A systematic review of the Medline, Ovid and Cochrane Review databases was performed to identify level one and two diagnostic studies evaluating clinical tests for AC joint pathology. Individual test characteristics were combined in series and in parallel to improve sensitivities and specificities. A secondary analysis utilized subjective pre-test probabilities to create a clinical decision tree algorithm with post-test probabilities. The optimal special test combination to screen and confirm AC joint pathology combined Paxinos sign and O'Brien's Test, with a specificity of 95.8% when performed in series; whereas, Paxinos sign and Hawkins-Kennedy Test demonstrated a sensitivity of 93.7% when performed in parallel. Paxinos sign and O'Brien's Test demonstrated the greatest positive likelihood ratio (2.71); whereas, Paxinos sign and Hawkins-Kennedy Test reported the lowest negative likelihood ratio (0.35). No combination of special tests performed in series or in parallel creates more than a small impact on post-test probabilities to screen or confirm AC joint pathology. Paxinos sign and O'Brien's Test is the only special test combination that has a small and sometimes important impact when used both in series and in parallel. Physical examination testing is not beneficial for diagnosis of AC joint pathology when pretest probability is unequivocal. In these instances, it is of benefit to proceed with procedural tests to evaluate AC joint pathology. Ultrasound-guided corticosteroid injections are diagnostic and therapeutic. An ultrasound-guided AC joint corticosteroid injection may be an appropriate new standard for treatment and surgical decision-making. II - Systematic Review.

  19. Are changes in objective working hour characteristics associated with changes in work-life conflict among hospital employees working shifts? A 7-year follow-up.

    PubMed

    Karhula, Kati; Koskinen, Aki; Ojajärvi, Anneli; Ropponen, Annina; Puttonen, Sampsa; Kivimäki, Mika; Härmä, Mikko

    2018-06-01

    To investigate whether changes in objective working hour characteristics are associated with parallel changes in work-life conflict (WLC) among hospital employees. Survey responses from three waves of the Finnish Public Sector study (2008, 2012 and 2015) were combined with payroll data from 91 days preceding the surveys (n=2 482, 93% women). Time-dependent fixed effects regression models adjusted for marital status, number of children and stressfulness of the life situation were used to investigate whether changes in working hour characteristics were associated with parallel change in WLC. The working hour characteristics were dichotomised with cut-points in less than or greater than 10% or less than or greater than25% occurrence) and WLC to frequent versus seldom/none. Change in proportion of evening and night shifts and weekend work was significantly associated with parallel change in WLC (adjusted OR 2.19, 95% CI 1.62 to 2.96; OR 1.71, 95% CI 1.21 to 2.44; OR 1.63, 95% CI 1.194 to 2.22, respectively). Similarly, increase or decrease in proportion of quick returns (adjusted OR 1.45, 95% CI 1.10 to 1.89) and long work weeks (adjusted OR 1.26, 95% CI 1.04 to 1.52) was associated with parallel increase or decrease in WLC. Single days off and very long work weeks showed no association with WLC. Changes in unsocial working hour characteristics, especially in connection with evening shifts, are consistently associated with parallel changes in WLC. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  20. Are changes in objective working hour characteristics associated with changes in work-life conflict among hospital employees working shifts? A 7-year follow-up

    PubMed Central

    Karhula, Kati; Koskinen, Aki; Ojajärvi, Anneli; Puttonen, Sampsa; Kivimäki, Mika; Härmä, Mikko

    2018-01-01

    Objective To investigate whether changes in objective working hour characteristics are associated with parallel changes in work-life conflict (WLC) among hospital employees. Methods Survey responses from three waves of the Finnish Public Sector study (2008, 2012 and 2015) were combined with payroll data from 91 days preceding the surveys (n=2 482, 93% women). Time-dependent fixed effects regression models adjusted for marital status, number of children and stressfulness of the life situation were used to investigate whether changes in working hour characteristics were associated with parallel change in WLC. The working hour characteristics were dichotomised with cut-points in less than or greater than 10% or less than or greater than25% occurrence) and WLC to frequent versus seldom/none. Results Change in proportion of evening and night shifts and weekend work was significantly associated with parallel change in WLC (adjusted OR 2.19, 95% CI 1.62 to 2.96; OR 1.71, 95% CI 1.21 to 2.44; OR 1.63, 95% CI 1.194 to 2.22, respectively). Similarly, increase or decrease in proportion of quick returns (adjusted OR 1.45, 95% CI 1.10 to 1.89) and long work weeks (adjusted OR 1.26, 95% CI 1.04 to 1.52) was associated with parallel increase or decrease in WLC. Single days off and very long work weeks showed no association with WLC. Conclusions Changes in unsocial working hour characteristics, especially in connection with evening shifts, are consistently associated with parallel changes in WLC. PMID:29367350

  1. Search asymmetries: parallel processing of uncertain sensory information.

    PubMed

    Vincent, Benjamin T

    2011-08-01

    What is the mechanism underlying search phenomena such as search asymmetry? Two-stage models such as Feature Integration Theory and Guided Search propose parallel pre-attentive processing followed by serial post-attentive processing. They claim search asymmetry effects are indicative of finding pairs of features, one processed in parallel, the other in serial. An alternative proposal is that a 1-stage parallel process is responsible, and search asymmetries occur when one stimulus has greater internal uncertainty associated with it than another. While the latter account is simpler, only a few studies have set out to empirically test its quantitative predictions, and many researchers still subscribe to the 2-stage account. This paper examines three separate parallel models (Bayesian optimal observer, max rule, and a heuristic decision rule). All three parallel models can account for search asymmetry effects and I conclude that either people can optimally utilise the uncertain sensory data available to them, or are able to select heuristic decision rules which approximate optimal performance. Copyright © 2011 Elsevier Ltd. All rights reserved.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xue, Yuxin; Suto, Yasushi; Taruya, Atsushi

    The angle between the stellar spin and the planetary orbit axes (the spin-orbit angle) is supposed to carry valuable information concerning the initial condition of planetary formation and subsequent migration history. Indeed, current observations of the Rossiter-McLaughlin effect have revealed a wide range of spin-orbit misalignments for transiting exoplanets. We examine in detail the tidal evolution of a simple system comprising a Sun-like star and a hot Jupiter adopting the equilibrium tide and the inertial wave dissipation effects simultaneously. We find that the combined tidal model works as a very efficient realignment mechanism; it predicts three distinct states of themore » spin-orbit angle (i.e., parallel, polar, and antiparallel orbits) for a while, but the latter two states eventually approach the parallel spin-orbit configuration. The intermediate spin-orbit angles as measured in recent observations are difficult to obtain. Therefore the current model cannot reproduce the observed broad distribution of the spin-orbit angles, at least in its simple form. This indicates that the observed diversity of the spin-orbit angles may emerge from more complicated interactions with outer planets and/or may be the consequence of the primordial misalignment between the protoplanetary disk and the stellar spin, which requires future detailed studies.« less

  3. Steering intermediate courses: desert ants combine information from various navigational routines.

    PubMed

    Wehner, Rüdiger; Hoinville, Thierry; Cruse, Holk; Cheng, Ken

    2016-07-01

    A number of systems of navigation have been studied in some detail in insects. These include path integration, a system that keeps track of the straight-line distance and direction travelled on the current trip, the use of panoramic landmarks and scenery for orientation, and systematic searching. A traditional view is that only one navigational system is in operation at any one time, with different systems running in sequence depending on the context and conditions. We review selected data suggesting that often, different navigational cues (e.g., compass cues) and different systems of navigation are in operation simultaneously in desert ant navigation. The evidence suggests that all systems operate in parallel forming a heterarchical network. External and internal conditions determine the weights to be accorded to each cue and system. We also show that a model of independent modules feeding into a central summating device, the Navinet model, can in principle account for such data. No central executive processor is necessary aside from a weighted summation of the different cues and systems. Such a heterarchy of parallel systems all in operation represents a new view of insect navigation that has already been expressed informally by some authors.

  4. Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt

    Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less

  5. Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

    DOE PAGES

    Guzman, Horacio V.; Junghans, Christoph; Kremer, Kurt; ...

    2017-11-27

    Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach, which is a combination of an heterogeneity-sensitive spatial domain decomposition with an a priori rearrangement of subdomain walls. Within this approach and paper, the theoretical modeling and scalingmore » laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load-balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. Finally, these two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase-separated binary Lennard-Jones fluid.« less

  6. Path lumping: An efficient algorithm to identify metastable path channels for conformational dynamics of multi-body systems

    NASA Astrophysics Data System (ADS)

    Meng, Luming; Sheong, Fu Kit; Zeng, Xiangze; Zhu, Lizhe; Huang, Xuhui

    2017-07-01

    Constructing Markov state models from large-scale molecular dynamics simulation trajectories is a promising approach to dissect the kinetic mechanisms of complex chemical and biological processes. Combined with transition path theory, Markov state models can be applied to identify all pathways connecting any conformational states of interest. However, the identified pathways can be too complex to comprehend, especially for multi-body processes where numerous parallel pathways with comparable flux probability often coexist. Here, we have developed a path lumping method to group these parallel pathways into metastable path channels for analysis. We define the similarity between two pathways as the intercrossing flux between them and then apply the spectral clustering algorithm to lump these pathways into groups. We demonstrate the power of our method by applying it to two systems: a 2D-potential consisting of four metastable energy channels and the hydrophobic collapse process of two hydrophobic molecules. In both cases, our algorithm successfully reveals the metastable path channels. We expect this path lumping algorithm to be a promising tool for revealing unprecedented insights into the kinetic mechanisms of complex multi-body processes.

  7. Parallelization of the Physical-Space Statistical Analysis System (PSAS)

    NASA Technical Reports Server (NTRS)

    Larson, J. W.; Guo, J.; Lyster, P. M.

    1999-01-01

    Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational reproducibility is well known in the parallel computing community. It is a requirement that the parallel code perform calculations in a fashion that will yield identical results on different configurations of processing elements on the same platform. In some cases this problem can be solved by sacrificing performance. Meeting this requirement and still achieving high performance is very difficult. Topics to be discussed include: current PSAS design and parallelization strategy; reproducibility issues; load balance vs. database memory demands, possible solutions to these problems.

  8. Magnetospheric Multiscale Observations of the Electron Diffusion Region of Large Guide Field Magnetic Reconnection

    NASA Technical Reports Server (NTRS)

    Eriksson, S.; Wilder, F. D.; Ergun, R. E.; Schwartz, S. J.; Cassak, P. A.; Burch, J. L.; Chen, Li-Jen; Torbert, R. B.; Phan, T. D.; Lavraud, B.; hide

    2016-01-01

    We report observations from the Magnetospheric Multiscale (MMS) satellites of a large guide field magnetic reconnection event. The observations suggest that two of the four MMS spacecraft sampled the electron diffusion region, whereas the other two spacecraft detected the exhaust jet from the event. The guide magnetic field amplitude is approximately 4 times that of the reconnecting field. The event is accompanied by a significant parallel electric field (E(sub parallel lines) that is larger than predicted by simulations. The high-speed (approximately 300 km/s) crossing of the electron diffusion region limited the data set to one complete electron distribution inside of the electron diffusion region, which shows significant parallel heating. The data suggest that E(sub parallel lines) is balanced by a combination of electron inertia and a parallel gradient of the gyrotropic electron pressure.

  9. Current Status on the use of Parallel Computing in Turbulent Reacting Flow Computations Involving Sprays, Monte Carlo PDF and Unstructured Grids. Chapter 4

    NASA Technical Reports Server (NTRS)

    Raju, M. S.

    1998-01-01

    The state of the art in multidimensional combustor modeling as evidenced by the level of sophistication employed in terms of modeling and numerical accuracy considerations, is also dictated by the available computer memory and turnaround times afforded by present-day computers. With the aim of advancing the current multi-dimensional computational tools used in the design of advanced technology combustors, a solution procedure is developed that combines the novelty of the coupled CFD/spray/scalar Monte Carlo PDF (Probability Density Function) computations on unstructured grids with the ability to run on parallel architectures. In this approach, the mean gas-phase velocity and turbulence fields are determined from a standard turbulence model, the joint composition of species and enthalpy from the solution of a modeled PDF transport equation, and a Lagrangian-based dilute spray model is used for the liquid-phase representation. The gas-turbine combustor flows are often characterized by a complex interaction between various physical processes associated with the interaction between the liquid and gas phases, droplet vaporization, turbulent mixing, heat release associated with chemical kinetics, radiative heat transfer associated with highly absorbing and radiating species, among others. The rate controlling processes often interact with each other at various disparate time 1 and length scales. In particular, turbulence plays an important role in determining the rates of mass and heat transfer, chemical reactions, and liquid phase evaporation in many practical combustion devices.

  10. Nonlinear viscoplasticity in ASPECT: benchmarking and applications to subduction

    NASA Astrophysics Data System (ADS)

    Glerum, Anne; Thieulot, Cedric; Fraters, Menno; Blom, Constantijn; Spakman, Wim

    2018-03-01

    ASPECT (Advanced Solver for Problems in Earth's ConvecTion) is a massively parallel finite element code originally designed for modeling thermal convection in the mantle with a Newtonian rheology. The code is characterized by modern numerical methods, high-performance parallelism and extensibility. This last characteristic is illustrated in this work: we have extended the use of ASPECT from global thermal convection modeling to upper-mantle-scale applications of subduction.

    Subduction modeling generally requires the tracking of multiple materials with different properties and with nonlinear viscous and viscoplastic rheologies. To this end, we implemented a frictional plasticity criterion that is combined with a viscous diffusion and dislocation creep rheology. Because ASPECT uses compositional fields to represent different materials, all material parameters are made dependent on a user-specified number of fields.

    The goal of this paper is primarily to describe and verify our implementations of complex, multi-material rheology by reproducing the results of four well-known two-dimensional benchmarks: the indentor benchmark, the brick experiment, the sandbox experiment and the slab detachment benchmark. Furthermore, we aim to provide hands-on examples for prospective users by demonstrating the use of multi-material viscoplasticity with three-dimensional, thermomechanical models of oceanic subduction, putting ASPECT on the map as a community code for high-resolution, nonlinear rheology subduction modeling.

  11. War and peace: morphemes and full forms in a noninteractive activation parallel dual-route model.

    PubMed

    Baayen, H; Schreuder, R

    This article introduces a computational tool for modeling the process of morphological segmentation in visual and auditory word recognition in the framework of a parallel dual-route model. Copyright 1999 Academic Press.

  12. Efficient Parallel Levenberg-Marquardt Model Fitting towards Real-Time Automated Parametric Imaging Microscopy

    PubMed Central

    Zhu, Xiang; Zhang, Dianwen

    2013-01-01

    We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785

  13. Distributed computing feasibility in a non-dedicated homogeneous distributed system

    NASA Technical Reports Server (NTRS)

    Leutenegger, Scott T.; Sun, Xian-He

    1993-01-01

    The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.

  14. Linearly exact parallel closures for slab geometry

    NASA Astrophysics Data System (ADS)

    Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

    2013-08-01

    Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).

  15. Nonlinear Dynamic Model-Based Multiobjective Sensor Network Design Algorithm for a Plant with an Estimator-Based Control System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paul, Prokash; Bhattacharyya, Debangsu; Turton, Richard

    Here, a novel sensor network design (SND) algorithm is developed for maximizing process efficiency while minimizing sensor network cost for a nonlinear dynamic process with an estimator-based control system. The multiobjective optimization problem is solved following a lexicographic approach where the process efficiency is maximized first followed by minimization of the sensor network cost. The partial net present value, which combines the capital cost due to the sensor network and the operating cost due to deviation from the optimal efficiency, is proposed as an alternative objective. The unscented Kalman filter is considered as the nonlinear estimator. The large-scale combinatorial optimizationmore » problem is solved using a genetic algorithm. The developed SND algorithm is applied to an acid gas removal (AGR) unit as part of an integrated gasification combined cycle (IGCC) power plant with CO 2 capture. Due to the computational expense, a reduced order nonlinear model of the AGR process is identified and parallel computation is performed during implementation.« less

  16. Nonlinear Dynamic Model-Based Multiobjective Sensor Network Design Algorithm for a Plant with an Estimator-Based Control System

    DOE PAGES

    Paul, Prokash; Bhattacharyya, Debangsu; Turton, Richard; ...

    2017-06-06

    Here, a novel sensor network design (SND) algorithm is developed for maximizing process efficiency while minimizing sensor network cost for a nonlinear dynamic process with an estimator-based control system. The multiobjective optimization problem is solved following a lexicographic approach where the process efficiency is maximized first followed by minimization of the sensor network cost. The partial net present value, which combines the capital cost due to the sensor network and the operating cost due to deviation from the optimal efficiency, is proposed as an alternative objective. The unscented Kalman filter is considered as the nonlinear estimator. The large-scale combinatorial optimizationmore » problem is solved using a genetic algorithm. The developed SND algorithm is applied to an acid gas removal (AGR) unit as part of an integrated gasification combined cycle (IGCC) power plant with CO 2 capture. Due to the computational expense, a reduced order nonlinear model of the AGR process is identified and parallel computation is performed during implementation.« less

  17. A 2D MTF approach to evaluate and guide dynamic imaging developments.

    PubMed

    Chao, Tzu-Cheng; Chung, Hsiao-Wen; Hoge, W Scott; Madore, Bruno

    2010-02-01

    As the number and complexity of partially sampled dynamic imaging methods continue to increase, reliable strategies to evaluate performance may prove most useful. In the present work, an analytical framework to evaluate given reconstruction methods is presented. A perturbation algorithm allows the proposed evaluation scheme to perform robustly without requiring knowledge about the inner workings of the method being evaluated. A main output of the evaluation process consists of a two-dimensional modulation transfer function, an easy-to-interpret visual rendering of a method's ability to capture all combinations of spatial and temporal frequencies. Approaches to evaluate noise properties and artifact content at all spatial and temporal frequencies are also proposed. One fully sampled phantom and three fully sampled cardiac cine datasets were subsampled (R = 4 and 8) and reconstructed with the different methods tested here. A hybrid method, which combines the main advantageous features observed in our assessments, was proposed and tested in a cardiac cine application, with acceleration factors of 3.5 and 6.3 (skip factors of 4 and 8, respectively). This approach combines features from methods such as k-t sensitivity encoding, unaliasing by Fourier encoding the overlaps in the temporal dimension-sensitivity encoding, generalized autocalibrating partially parallel acquisition, sensitivity profiles from an array of coils for encoding and reconstruction in parallel, self, hybrid referencing with unaliasing by Fourier encoding the overlaps in the temporal dimension and generalized autocalibrating partially parallel acquisition, and generalized autocalibrating partially parallel acquisition-enhanced sensitivity maps for sensitivity encoding reconstructions.

  18. Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amadio, G.; et al.

    An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less

  19. Xyce parallel electronic simulator users guide, version 6.1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  20. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  1. Xyce parallel electronic simulator users guide, version 6.0.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

  2. Multirate parallel distributed compensation of a cluster in wireless sensor and actor networks

    NASA Astrophysics Data System (ADS)

    Yang, Chun-xi; Huang, Ling-yun; Zhang, Hao; Hua, Wang

    2016-01-01

    The stabilisation problem for one of the clusters with bounded multiple random time delays and packet dropouts in wireless sensor and actor networks is investigated in this paper. A new multirate switching model is constructed to describe the feature of this single input multiple output linear system. According to the difficulty of controller design under multi-constraints in multirate switching model, this model can be converted to a Takagi-Sugeno fuzzy model. By designing a multirate parallel distributed compensation, a sufficient condition is established to ensure this closed-loop fuzzy control system to be globally exponentially stable. The solution of the multirate parallel distributed compensation gains can be obtained by solving an auxiliary convex optimisation problem. Finally, two numerical examples are given to show, compared with solving switching controller, multirate parallel distributed compensation can be obtained easily. Furthermore, it has stronger robust stability than arbitrary switching controller and single-rate parallel distributed compensation under the same conditions.

  3. Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian

    The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functionalmore » characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.« less

  4. Parallel imports and the pricing of pharmaceutical products: evidence from the European Union.

    PubMed

    Ganslandt, Mattias; Maskus, Keith E

    2004-09-01

    We consider policy issues regarding parallel imports (PIs) of brand-name pharmaceuticals in the European Union, where such trade is permitted. We develop a simple model in which an original manufacturer competes in its home market with PI firms. The model suggests that for small trade costs the original manufacturer will accommodate the import decisions of parallel traders and that the price in the home market falls as the volume of parallel imports rises. Using data from Sweden we find that the prices of drugs subject to competition from parallel imports fell relative to other drugs over the period 1994-1999. Econometric analysis finds that parallel imports significantly reduced manufacturing prices, by 12-19%. There is evidence that this effect increases with multiple PI entrants.

  5. High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Patlolla, Dilip R; Surendran Nair, Sujithkumar; Graves, Daniel A.

    For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, themore » estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less

  6. Large-scale trench-normal mantle flow beneath central South America

    NASA Astrophysics Data System (ADS)

    Reiss, M. C.; Rümpker, G.; Wölbern, I.

    2018-01-01

    We investigate the anisotropic properties of the fore-arc region of the central Andean margin between 17-25°S by analyzing shear-wave splitting from teleseismic and local earthquakes from the Nazca slab. With partly over ten years of recording time, the data set is uniquely suited to address the long-standing debate about the mantle flow field at the South American margin and in particular whether the flow field beneath the slab is parallel or perpendicular to the trench. Our measurements suggest two anisotropic layers located within the crust and mantle beneath the stations, respectively. The teleseismic measurements show a moderate change of fast polarizations from North to South along the trench ranging from parallel to subparallel to the absolute plate motion and, are oriented mostly perpendicular to the trench. Shear-wave splitting measurements from local earthquakes show fast polarizations roughly aligned trench-parallel but exhibit short-scale variations which are indicative of a relatively shallow origin. Comparisons between fast polarization directions from local earthquakes and the strike of the local fault systems yield a good agreement. To infer the parameters of the lower anisotropic layer we employ an inversion of the teleseismic waveforms based on two-layer models, where the anisotropy of the upper (crustal) layer is constrained by the results from the local splitting. The waveform inversion yields a mantle layer that is best characterized by a fast axis parallel to the absolute plate motion which is more-or-less perpendicular to the trench. This orientation is likely caused by a combination of the fossil crystallographic preferred orientation of olivine within the slab and entrained mantle flow beneath the slab. The anisotropy within the crust of the overriding continental plate is explained by the shape-preferred orientation of micro-cracks in relation to local fault zones which are oriented parallel to the overall strike of the Andean range. Our results do not provide any evidence for a significant contribution of trench-parallel mantle flow beneath the subducting slab.

  7. Parallel optical image addition and subtraction in a dynamic photorefractive memory by phase-code multiplexing

    NASA Astrophysics Data System (ADS)

    Denz, Cornelia; Dellwig, Thilo; Lembcke, Jan; Tschudi, Theo

    1996-02-01

    We propose and demonstrate experimentally a method for utilizing a dynamic phase-encoded photorefractive memory to realize parallel optical addition, subtraction, and inversion operations of stored images. The phase-encoded holographic memory is realized in photorefractive BaTiO3, storing eight images using WalshHadamard binary phase codes and an incremental recording procedure. By subsampling the set of reference beams during the recall operation, the selectivity of the phase address is decreased, allowing one to combine images in such a way that different linear combination of the images can be realized at the output of the memory.

  8. Parallel In Vivo DNA Assembly by Recombination: Experimental Demonstration and Theoretical Approaches

    PubMed Central

    Shi, Zhenyu; Wedd, Anthony G.; Gras, Sally L.

    2013-01-01

    The development of synthetic biology requires rapid batch construction of large gene networks from combinations of smaller units. Despite the availability of computational predictions for well-characterized enzymes, the optimization of most synthetic biology projects requires combinational constructions and tests. A new building-brick-style parallel DNA assembly framework for simple and flexible batch construction is presented here. It is based on robust recombination steps and allows a variety of DNA assembly techniques to be organized for complex constructions (with or without scars). The assembly of five DNA fragments into a host genome was performed as an experimental demonstration. PMID:23468883

  9. Efficient parallel implementation of active appearance model fitting algorithm on GPU.

    PubMed

    Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

    2014-01-01

    The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

  10. Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

    PubMed Central

    Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

    2014-01-01

    The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812

  11. Considering Horn's Parallel Analysis from a Random Matrix Theory Point of View.

    PubMed

    Saccenti, Edoardo; Timmerman, Marieke E

    2017-03-01

    Horn's parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy-Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy-Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy-Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy-Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.

  12. High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

    NASA Technical Reports Server (NTRS)

    Simon, Horst D.; Saini, Subhash; Grassi, Charles

    1994-01-01

    The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.

  13. Parallel Computing Using Web Servers and "Servlets".

    ERIC Educational Resources Information Center

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  14. Parallel Computation of the Regional Ocean Modeling System (ROMS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, P; Song, Y T; Chao, Y

    2005-04-05

    The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less

  15. Performance of GeantV EM Physics Models

    NASA Astrophysics Data System (ADS)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Cosmo, G.; Duhem, L.; Elvira, D.; Folger, G.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2017-10-01

    The recent progress in parallel hardware architectures with deeper vector pipelines or many-cores technologies brings opportunities for HEP experiments to take advantage of SIMD and SIMT computing models. Launched in 2013, the GeantV project studies performance gains in propagating multiple particles in parallel, improving instruction throughput and data locality in HEP event simulation on modern parallel hardware architecture. Due to the complexity of geometry description and physics algorithms of a typical HEP application, performance analysis is indispensable in identifying factors limiting parallel execution. In this report, we will present design considerations and preliminary computing performance of GeantV physics models on coprocessors (Intel Xeon Phi and NVidia GPUs) as well as on mainstream CPUs.

  16. A Bold Step Forward: Juxtaposition of the Constructivist and Freeschooling Learning Model

    ERIC Educational Resources Information Center

    Chiatul, Victoria Oliaku

    2015-01-01

    This article discusses the juxtaposition of learning within the parallel structure of the constructivist and freeschooling models of education. To begin, characteristics describing the constructivist-learning model are provided, followed by a summary of the major components of the freeschooling-learning model. Finally, the parallel structure…

  17. Validation of neoclassical bootstrap current models in the edge of an H-mode plasma.

    PubMed

    Wade, M R; Murakami, M; Politzer, P A

    2004-06-11

    Analysis of the parallel electric field E(parallel) evolution following an L-H transition in the DIII-D tokamak indicates the generation of a large negative pulse near the edge which propagates inward, indicative of the generation of a noninductive edge current. Modeling indicates that the observed E(parallel) evolution is consistent with a narrow current density peak generated in the plasma edge. Very good quantitative agreement is found between the measured E(parallel) evolution and that expected from neoclassical theory predictions of the bootstrap current.

  18. Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

    NASA Technical Reports Server (NTRS)

    O'Connell, Matthew; Karman, Steve L.

    2015-01-01

    Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.

  19. Improving operating room productivity via parallel anesthesia processing.

    PubMed

    Brown, Michael J; Subramanian, Arun; Curry, Timothy B; Kor, Daryl J; Moran, Steven L; Rohleder, Thomas R

    2014-01-01

    Parallel processing of regional anesthesia may improve operating room (OR) efficiency in patients undergoes upper extremity surgical procedures. The purpose of this paper is to evaluate whether performing regional anesthesia outside the OR in parallel increases total cases per day, improve efficiency and productivity. Data from all adult patients who underwent regional anesthesia as their primary anesthetic for upper extremity surgery over a one-year period were used to develop a simulation model. The model evaluated pure operating modes of regional anesthesia performed within and outside the OR in a parallel manner. The scenarios were used to evaluate how many surgeries could be completed in a standard work day (555 minutes) and assuming a standard three cases per day, what was the predicted end-of-day time overtime. Modeling results show that parallel processing of regional anesthesia increases the average cases per day for all surgeons included in the study. The average increase was 0.42 surgeries per day. Where it was assumed that three cases per day would be performed by all surgeons, the days going to overtime was reduced by 43 percent with parallel block. The overtime with parallel anesthesia was also projected to be 40 minutes less per day per surgeon. Key limitations include the assumption that all cases used regional anesthesia in the comparisons. Many days may have both regional and general anesthesia. Also, as a case study, single-center research may limit generalizability. Perioperative care providers should consider parallel administration of regional anesthesia where there is a desire to increase daily upper extremity surgical case capacity. Where there are sufficient resources to do parallel anesthesia processing, efficiency and productivity can be significantly improved. Simulation modeling can be an effective tool to show practice change effects at a system-wide level.

  20. Connectionism, parallel constraint satisfaction processes, and gestalt principles: (re) introducing cognitive dynamics to social psychology.

    PubMed

    Read, S J; Vanman, E J; Miller, L C

    1997-01-01

    We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.

Top