Sample records for leading parallel systems

  1. Evaluation of Job Queuing/Scheduling Software: Phase I Report

    NASA Technical Reports Server (NTRS)

    Jones, James Patton

    1996-01-01

    The recent proliferation of high performance work stations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, the national Aerodynamic Simulation (NAS) supercomputer facility compiled a requirements checklist for job queuing/scheduling software. Next, NAS began an evaluation of the leading job management system (JMS) software packages against the checklist. This report describes the three-phase evaluation process, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still insufficient, even in the leading JMS's. However, by ranking each JMS evaluated against the requirements, we provide data that will be useful to other sites in selecting a JMS.

  2. Attachment of lead wires to thin film thermocouples mounted on high temperature materials using the parallel gap welding process

    NASA Technical Reports Server (NTRS)

    Holanda, Raymond; Kim, Walter S.; Pencil, Eric; Groth, Mary; Danzey, Gerald A.

    1990-01-01

    Parallel gap resistance welding was used to attach lead wires to sputtered thin film sensors. Ranges of optimum welding parameters to produce an acceptable weld were determined. The thin film sensors were Pt13Rh/Pt thermocouples; they were mounted on substrates of MCrAlY-coated superalloys, aluminum oxide, silicon carbide and silicon nitride. The entire sensor system is designed to be used on aircraft engine parts. These sensor systems, including the thin-film-to-lead-wire connectors, were tested to 1000 C.

  3. Characterizing parallel file-access patterns on a large-scale multiprocessor

    NASA Technical Reports Server (NTRS)

    Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.

    1995-01-01

    High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.

  4. Proton core-beam system in the expanding solar wind: Hybrid simulations

    NASA Astrophysics Data System (ADS)

    Hellinger, Petr; Trávníček, Pavel M.

    2011-11-01

    Results of a two-dimensional hybrid expanding box simulation of a proton beam-core system in the solar wind are presented. The expansion with a strictly radial magnetic field leads to a decrease of the ratio between the proton perpendicular and parallel temperatures as well as to an increase of the ratio between the beam-core differential velocity and the local Alfvén velocity creating a free energy for many different instabilities. The system is indeed most of the time marginally stable with respect to the parallel magnetosonic, oblique Alfvén, proton cyclotron and parallel fire hose instabilities which determine the system evolution counteracting some effects of the expansion and interacting with each other. Nonlinear evolution of these instabilities leads to large modifications of the proton velocity distribution function. The beam and core protons are slowed with respect to each other and heated, and at later stages of the evolution the two populations are not clearly distinguishable. On the macroscopic level the instabilities cause large departures from the double adiabatic prediction leading to an efficient isotropization of effective proton temperatures in agreement with Helios observations.

  5. Some fast elliptic solvers on parallel architectures and their complexities

    NASA Technical Reports Server (NTRS)

    Gallopoulos, E.; Saad, Y.

    1989-01-01

    The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.

  6. Some fast elliptic solvers on parallel architectures and their complexities

    NASA Technical Reports Server (NTRS)

    Gallopoulos, E.; Saad, Youcef

    1989-01-01

    The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.

  7. Research on Parallel Three Phase PWM Converters base on RTDS

    NASA Astrophysics Data System (ADS)

    Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun

    2018-01-01

    Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.

  8. The effect of earthquake on architecture geometry with non-parallel system irregularity configuration

    NASA Astrophysics Data System (ADS)

    Teddy, Livian; Hardiman, Gagoek; Nuroji; Tudjono, Sri

    2017-12-01

    Indonesia is an area prone to earthquake that may cause casualties and damage to buildings. The fatalities or the injured are not largely caused by the earthquake, but by building collapse. The collapse of the building is resulted from the building behaviour against the earthquake, and it depends on many factors, such as architectural design, geometry configuration of structural elements in horizontal and vertical plans, earthquake zone, geographical location (distance to earthquake center), soil type, material quality, and construction quality. One of the geometry configurations that may lead to the collapse of the building is irregular configuration of non-parallel system. In accordance with FEMA-451B, irregular configuration in non-parallel system is defined to have existed if the vertical lateral force-retaining elements are neither parallel nor symmetric with main orthogonal axes of the earthquake-retaining axis system. Such configuration may lead to torque, diagonal translation and local damage to buildings. It does not mean that non-parallel irregular configuration should not be formed on architectural design; however the designer must know the consequence of earthquake behaviour against buildings with irregular configuration of non-parallel system. The present research has the objective to identify earthquake behaviour in architectural geometry with irregular configuration of non-parallel system. The present research was quantitative with simulation experimental method. It consisted of 5 models, where architectural data and model structure data were inputted and analyzed using the software SAP2000 in order to find out its performance, and ETAB2015 to determine the eccentricity occurred. The output of the software analysis was tabulated, graphed, compared and analyzed with relevant theories. For areas of strong earthquake zones, avoid designing buildings which wholly form irregular configuration of non-parallel system. If it is inevitable to design a building with building parts containing irregular configuration of non-parallel system, make it more rigid by forming a triangle module, and use the formula.A good collaboration is needed between architects and structural experts in creating earthquake architecture.

  9. Protons and alpha particles in the expanding solar wind: Hybrid simulations

    NASA Astrophysics Data System (ADS)

    Hellinger, Petr; Trávníček, Pavel M.

    2013-09-01

    We present results of a two‒dimensional hybrid expanding box simulation of a plasma system with three ion populations, beam and core protons, and alpha particles (and fluid electrons), drifting with respect to each other. The expansion with a strictly radial magnetic field leads to a decrease of the ion perpendicular to parallel temperature ratios as well as to an increase of the ratio between the ion relative velocities and the local Alfvén velocity creating a free energy for many different instabilities. The system is most of the time marginally stable with respect to kinetic instabilities mainly due to the ion relative velocities; these instabilities determine the system evolution counteracting some effects of the expansion. Nonlinear evolution of these instabilities leads to large modifications of the ion velocity distribution functions. The beam protons and alpha particles are decelerated with respect to the core protons and all the populations are cooled in the parallel direction and heated in the perpendicular one. On the macroscopic level, the kinetic instabilities cause large departures of the system evolution from the double adiabatic prediction and lead to perpendicular heating and parallel cooling rates which are comparable to the heating rates estimated from the Helios observations.

  10. Parallel O(log n) algorithms for open- and closed-chain rigid multibody systems based on a new mass matrix factorization technique

    NASA Technical Reports Server (NTRS)

    Fijany, Amir

    1993-01-01

    In this paper, parallel O(log n) algorithms for computation of rigid multibody dynamics are developed. These parallel algorithms are derived by parallelization of new O(n) algorithms for the problem. The underlying feature of these O(n) algorithms is a drastically different strategy for decomposition of interbody force which leads to a new factorization of the mass matrix (M). Specifically, it is shown that a factorization of the inverse of the mass matrix in the form of the Schur Complement is derived as M(exp -1) = C - B(exp *)A(exp -1)B, wherein matrices C, A, and B are block tridiagonal matrices. The new O(n) algorithm is then derived as a recursive implementation of this factorization of M(exp -1). For the closed-chain systems, similar factorizations and O(n) algorithms for computation of Operational Space Mass Matrix lambda and its inverse lambda(exp -1) are also derived. It is shown that these O(n) algorithms are strictly parallel, that is, they are less efficient than other algorithms for serial computation of the problem. But, to our knowledge, they are the only known algorithms that can be parallelized and that lead to both time- and processor-optimal parallel algorithms for the problem, i.e., parallel O(log n) algorithms with O(n) processors. The developed parallel algorithms, in addition to their theoretical significance, are also practical from an implementation point of view due to their simple architectural requirements.

  11. Distributed computing feasibility in a non-dedicated homogeneous distributed system

    NASA Technical Reports Server (NTRS)

    Leutenegger, Scott T.; Sun, Xian-He

    1993-01-01

    The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.

  12. Revisiting Parallel Cyclic Reduction and Parallel Prefix-Based Algorithms for Block Tridiagonal System of Equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul

    2013-01-01

    Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less

  13. Multiple asynchronous stimulus- and task-dependent hierarchies (STDH) within the visual brain's parallel processing systems.

    PubMed

    Zeki, Semir

    2016-10-01

    Results from a variety of sources, some many years old, lead ineluctably to a re-appraisal of the twin strategies of hierarchical and parallel processing used by the brain to construct an image of the visual world. Contrary to common supposition, there are at least three 'feed-forward' anatomical hierarchies that reach the primary visual cortex (V1) and the specialized visual areas outside it, in parallel. These anatomical hierarchies do not conform to the temporal order with which visual signals reach the specialized visual areas through V1. Furthermore, neither the anatomical hierarchies nor the temporal order of activation through V1 predict the perceptual hierarchies. The latter shows that we see (and become aware of) different visual attributes at different times, with colour leading form (orientation) and directional visual motion, even though signals from fast-moving, high-contrast stimuli are among the earliest to reach the visual cortex (of area V5). Parallel processing, on the other hand, is much more ubiquitous than commonly supposed but is subject to a barely noticed but fundamental aspect of brain operations, namely that different parallel systems operate asynchronously with respect to each other and reach perceptual endpoints at different times. This re-assessment leads to the conclusion that the visual brain is constituted of multiple, parallel and asynchronously operating task- and stimulus-dependent hierarchies (STDH); which of these parallel anatomical hierarchies have temporal and perceptual precedence at any given moment is stimulus and task related, and dependent on the visual brain's ability to undertake multiple operations asynchronously. © 2016 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.

  14. PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.

    PubMed

    Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar

    2014-01-01

    Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.

  15. Second Evaluation of Job Queuing/Scheduling Software. Phase 1

    NASA Technical Reports Server (NTRS)

    Jones, James Patton; Brickell, Cristy; Chancellor, Marisa (Technical Monitor)

    1997-01-01

    The recent proliferation of high performance workstations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, NAS compiled a requirements checklist for job queuing/scheduling software. Next, NAS evaluated the leading job management system (JMS) software packages against the checklist. A year has now elapsed since the first comparison was published, and NAS has repeated the evaluation. This report describes this second evaluation, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still lacking, however, definite progress has been made by the vendors to correct the deficiencies. This report is supplemented by a WWW interface to the data collected, to aid other sites in extracting the evaluation information on specific requirements of interest.

  16. Thread concept for automatic task parallelization in image analysis

    NASA Astrophysics Data System (ADS)

    Lueckenhaus, Maximilian; Eckstein, Wolfgang

    1998-09-01

    Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.

  17. Seismic analysis of parallel structures coupled by lead extrusion dampers

    NASA Astrophysics Data System (ADS)

    Patel, C. C.

    2017-06-01

    In this paper, the response behaviors of two parallel structures coupled by Lead Extrusion Dampers (LED) under various earthquake ground motion excitations are investigated. The equation of motion for the two parallel, multi-degree-of-freedom (MDOF) structures connected by LEDs is formulated. To explore the viability of LED to control the responses, namely displacement, acceleration and shear force of parallel coupled structures, the numerical study is done in two parts: (1) two parallel MDOF structures connected with LEDs having same damper damping in all the dampers and (2) two parallel MDOF structures connected with LEDs having different damper damping. A parametric study is conducted to investigate the optimum damping of the dampers. Moreover, to limit the cost of the dampers, the study is conducted with only 50% of total dampers at optimal locations, instead of placing the dampers at all the floor level. Results show that LEDs connecting the parallel structures of different fundamental frequencies, the earthquake-induced responses of either structure can be effectively reduced. Further, it is not necessary to connect the two structures at all floors; however, lesser damper at appropriate locations can significantly reduce the earthquake response of the coupled system, thus reducing the cost of the dampers significantly.

  18. Parallel/Vector Integration Methods for Dynamical Astronomy

    NASA Astrophysics Data System (ADS)

    Fukushima, T.

    Progress of parallel/vector computers has driven us to develop suitable numerical integrators utilizing their computational power to the full extent while being independent on the size of system to be integrated. Unfortunately, the parallel version of Runge-Kutta type integrators are known to be not so efficient. Recently we developed a parallel version of the extrapolation method (Ito and Fukushima 1997), which allows variable timesteps and still gives an acceleration factor of 3-4 for general problems. While the vector-mode usage of Picard-Chebyshev method (Fukushima 1997a, 1997b) will lead the acceleration factor of order of 1000 for smooth problems such as planetary/satellites orbit integration. The success of multiple-correction PECE mode of time-symmetric implicit Hermitian integrator (Kokubo 1998) seems to enlighten Milankar's so-called "pipelined predictor corrector method", which is expected to lead an acceleration factor of 3-4. We will review these directions and discuss future prospects.

  19. Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser-induced fluorescence detection.

    PubMed

    Nikcevic, Irena; Piruska, Aigars; Wehmeyer, Kenneth R; Seliskar, Carl J; Limbach, Patrick A; Heineman, William R

    2010-08-01

    Parallel separations using CE on a multilane microchip with multiplexed LIF detection is demonstrated. The detection system was developed to simultaneously record data on all channels using an expanded laser beam for excitation, a camera lens to capture emission, and a CCD camera for detection. The detection system enables monitoring of each channel continuously and distinguishing individual lanes without significant crosstalk between adjacent lanes. Multiple analytes can be determined in parallel lanes within a single microchip in a single run, leading to increased sample throughput. The pK(a) determination of small molecule analytes is demonstrated with the multilane microchip.

  20. Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser induced fluorescence detection

    PubMed Central

    Nikcevic, Irena; Piruska, Aigars; Wehmeyer, Kenneth R.; Seliskar, Carl J.; Limbach, Patrick A.; Heineman, William R.

    2010-01-01

    Parallel separations using capillary electrophoresis on a multilane microchip with multiplexed laser induced fluorescence detection is demonstrated. The detection system was developed to simultaneously record data on all channels using an expanded laser beam for excitation, a camera lens to capture emission, and a CCD camera for detection. The detection system enables monitoring of each channel continuously and distinguishing individual lanes without significant crosstalk between adjacent lanes. Multiple analytes can be analyzed on parallel lanes within a single microchip in a single run, leading to increased sample throughput. The pKa determination of small molecule analytes is demonstrated with the multilane microchip. PMID:20737446

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoon, Y; Park, M; Kim, H

    Purpose: This study aims to identify the feasibility of a novel cesium-iodine (CsI)-based flat-panel detector (FPD) for removing scatter radiation in diagnostic radiology. Methods: The indirect FPD comprises three layers: a substrate, scintillation, and thin-film-transistor (TFT) layer. The TFT layer has a matrix structure with pixels. There are ineffective dimensions on the TFT layer, such as the voltage and data lines; therefore, we devised a new FPD system having net-like lead in the substrate layer, matching the ineffective area, to block the scatter radiation so that only primary X-rays could reach the effective dimension.To evaluate the performance of this newmore » FPD system, we conducted a Monte Carlo simulation using MCNPX 2.6.0 software. Scatter fractions (SFs) were acquired using no grid, a parallel grid (8:1 grid ratio), and the new system, and the performances were compared.Two systems having different thicknesses of lead in the substrate layer—10 and 20μm—were simulated. Additionally, we examined the effects of different pixel sizes (153×153 and 163×163μm) on the image quality, while keeping the effective area of pixels constant (143×143μm). Results: In case of 10μm lead, the SFs of the new system (∼11%) were lower than those of the other system (∼27% with no grid, ∼16% with parallel grid) at 40kV. However, as the tube voltage increased, the SF of new system (∼19%) was higher than that of parallel grid (∼18%) at 120kV. In the case of 20μm lead, the SFs of the new system were lower than those of the other systems at all ranges of the tube voltage (40–120kV). Conclusion: The novel CsI-based FPD system for removing scatter radiation is feasible for improving the image contrast but must be optimized with respect to the lead thickness, considering the system’s purposes and the ranges of the tube voltage in diagnostic radiology. This study was supported by a grant(K1422651) from Institute of Health Science, Korea University.« less

  2. YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste

    Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less

  3. Cavity-photon contribution to the effective interaction of electrons in parallel quantum dots

    NASA Astrophysics Data System (ADS)

    Gudmundsson, Vidar; Sitek, Anna; Abdullah, Nzar Rauf; Tang, Chi-Shung; Manolescu, Andrei

    2016-05-01

    A single cavity photon mode is expected to modify the Coulomb interaction of an electron system in the cavity. Here we investigate this phenomena in a parallel double quantum dot system. We explore properties of the closed system and the system after it has been opened up for electron transport. We show how results for both cases support the idea that the effective electron-electron interaction becomes more repulsive in the presence of a cavity photon field. This can be understood in terms of the cavity photons dressing the polarization terms in the effective mutual electron interaction leading to nontrivial delocalization or polarization of the charge in the double parallel dot potential. In addition, we find that the effective repulsion of the electrons can be reduced by quadrupolar collective oscillations excited by an external classical dipole electric field.

  4. Low bias negative differential conductance and reversal of current in coupled quantum dots in different topological configurations

    NASA Astrophysics Data System (ADS)

    Devi, Sushila; Brogi, B. B.; Ahluwalia, P. K.; Chand, S.

    2018-06-01

    Electronic transport through asymmetric parallel coupled quantum dot system hybridized between normal leads has been investigated theoretically in the Coulomb blockade regime by using Non-Equilibrium Green Function formalism. A new decoupling scheme proposed by Rabani and his co-workers has been adopted to close the chain of higher order Green's functions appearing in the equations of motion. For resonant tunneling case; the calculations of current and differential conductance have been presented during transition of coupled quantum dot system from series to symmetric parallel configuration. It has been found that during this transition, increase in current and differential conductance of the system occurs. Furthermore, clear signatures of negative differential conductance and negative current appear in series case, both of which disappear when topology of system is tuned to asymmetric parallel configuration.

  5. Architectures for reasoning in parallel

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.

    1989-01-01

    The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.

  6. Understanding decimal proportions: discrete representations, parallel access, and privileged processing of zero.

    PubMed

    Varma, Sashank; Karl, Stacy R

    2013-05-01

    Much of the research on mathematical cognition has focused on the numbers 1, 2, 3, 4, 5, 6, 7, 8, and 9, with considerably less attention paid to more abstract number classes. The current research investigated how people understand decimal proportions--rational numbers between 0 and 1 expressed in the place-value symbol system. The results demonstrate that proportions are represented as discrete structures and processed in parallel. There was a semantic interference effect: When understanding a proportion expression (e.g., "0.29"), both the correct proportion referent (e.g., 0.29) and the incorrect natural number referent (e.g., 29) corresponding to the visually similar natural number expression (e.g., "29") are accessed in parallel, and when these referents lead to conflicting judgments, performance slows. There was also a syntactic interference effect, generalizing the unit-decade compatibility effect for natural numbers: When comparing two proportions, their tenths and hundredths components are processed in parallel, and when the different components lead to conflicting judgments, performance slows. The results also reveal that zero decimals--proportions ending in zero--serve multiple cognitive functions, including eliminating semantic interference and speeding processing. The current research also extends the distance, semantic congruence, and SNARC effects from natural numbers to decimal proportions. These findings inform how people understand the place-value symbol system, and the mental implementation of mathematical symbol systems more generally. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Quantum statistics and squeezing for a microwave-driven interacting magnon system.

    PubMed

    Haghshenasfard, Zahra; Cottam, Michael G

    2017-02-01

    Theoretical studies are reported for the statistical properties of a microwave-driven interacting magnon system. Both the magnetic dipole-dipole and the exchange interactions are included and the theory is developed for the case of parallel pumping allowing for the inclusion of the nonlinear processes due to the four-magnon interactions. The method of second quantization is used to transform the total Hamiltonian from spin operators to boson creation and annihilation operators. By using the coherent magnon state representation we have studied the magnon occupation number and the statistical behavior of the system. In particular, it is shown that the nonlinearities introduced by the parallel pumping field and the four-magnon interactions lead to non-classical quantum statistical properties of the system, such as magnon squeezing. Also control of the collapse-and-revival phenomena for the time evolution of the average magnon number is demonstrated by varying the parallel pumping amplitude and the four-magnon coupling.

  8. Deformation along the leading edge of the Maiella thrust sheet in central Italy

    NASA Astrophysics Data System (ADS)

    Aydin, Atilla; Antonellini, Marco; Tondi, Emanuele; Agosta, Fabrizio

    2010-09-01

    The eastern forelimb of the Maiella anticline above the leading edge of the underlying thrust displays a complex system of fractures, faults and a series of kink bands in the Cretaceous platform carbonates. The kink bands have steep limbs, display top-to-the-east shear, parallel to the overall transport direction, and are brecciated and faulted. A system of pervasive normal faults, trending sub-parallel to the strike of the mechanical layers, accommodates local extension generated by flexural slip. Two sets of strike-slip faults exist: one is left-lateral at a high angle to the main Maiella thrust; the other is right-lateral, intersecting the first set at an acute angle. The normal and strike-slip faults were formed by shearing across bed-parallel, strike-, and dip-parallel pressure solution seams and associated splays; the thrust faults follow the tilted mechanical layers along the steeper limb of the kink bands. The three pervasive, mutually-orthogonal pressure solution seams are pre-tilting. One set of low-angle normal faults, the oldest set in the area, is also pre-tilting. All other fault/fold structures appear to show signs of overlapping periods of activity accounting for the complex tri-shear-like deformation that developed as the front evolved during the Oligocene-Pliocene Apennine orogeny.

  9. Robust Synchronization Models for Presentation System Using SMIL-Driven Approach

    ERIC Educational Resources Information Center

    Asnawi, Rustam; Ahmad, Wan Fatimah Wan; Rambli, Dayang Rohaya Awang

    2013-01-01

    Current common Presentation System (PS) models are slide based oriented and lack synchronization analysis either with temporal or spatial constraints. Such models, in fact, tend to lead to synchronization problems, particularly on parallel synchronization with spatial constraints between multimedia element presentations. However, parallel…

  10. The force on the flex: Global parallelism and portability

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1986-01-01

    A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.

  11. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    NASA Astrophysics Data System (ADS)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-07-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.

  12. Requirements for implementing real-time control functional modules on a hierarchical parallel pipelined system

    NASA Technical Reports Server (NTRS)

    Wheatley, Thomas E.; Michaloski, John L.; Lumia, Ronald

    1989-01-01

    Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored.

  13. Parallelization Issues and Particle-In Codes.

    NASA Astrophysics Data System (ADS)

    Elster, Anne Cathrine

    1994-01-01

    "Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.

  14. Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak

    1999-01-01

    The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.

  15. Cosmic space and Pauli exclusion principle in a system of M0-branes

    NASA Astrophysics Data System (ADS)

    Capozziello, Salvatore; Saridakis, Emmanuel N.; Bamba, Kazuharu; Sepehri, Alireza; Rahaman, Farook; Ali, Ahmed Farag; Pincak, Richard; Pradhan, Anirudh

    An emergence of cosmic space has been suggested by Padmanabhan [Emergence and expansion of cosmic space as due to the quest for holographic equipartition, arXiv:hep-th/1206.4916] where he proposed that the expansion of the universe originates from a difference between the number of degrees of freedom on a holographic surface and the one in the emerged bulk. Now, a natural question that arises is how this proposal would explain the production of fermions and an emergence of the Pauli exclusion principle during the evolution of the universe? We try to address this issue in a system of M0-branes. In this model, there is a high symmetry and the system is composed of M0-branes to which only scalar fields are attached that represent scalar modes of the graviton. Then, when M0-branes join each other and hence form M1-branes, this symmetry is broken and gauge fields are formed. Therefore, these M1-branes interact with the anti-M1-branes and the force between them leads to a break of a symmetry such as the lower and upper parts of these branes are not the same. In these conditions, gauge fields which are localized on M1-branes and scalars which are attached to them symmetrically, decay to fermions with upper and lower spins which attach to the upper and lower parts of the M1-branes anti-symmetrically. The curvature produced by the coupling of identical spins has the opposite sign of the curvature produced by non-identical spins which lead to an attractive force between anti-parallel spins and a repelling force between parallel spins and hence an emergence of the Pauli exclusion principle. By approaching M1-branes to each other, the difference between curvatures of parallel spins and curvatures of anti-parallel spins increases, which leads to an inequality between the number of degrees of freedom on the surface and the one in the emerged bulk and hence lead to an occurrence of the cosmic expansion. By approaching M1-branes to each other, the square of the energy of the system becomes negative and hence tachyonic states arise. To remove these states, M1-branes compactify, the sign of gravity changes and anti-gravity emerges which leads to the branes moving away from each other. By joining M1-branes, M3-branes are produced which are similar to an initial system that oscillates between compacting and opening branches. Our universe is placed on one of these M3-branes and by changing the difference between the amount of couplings between identical and non-identical spins, it contracts or expands.

  16. Parallel confocal detection of single biomolecules using diffractive optics and integrated detector units.

    PubMed

    Blom, H; Gösch, M

    2004-04-01

    The past few years we have witnessed a tremendous surge of interest in so-called array-based miniaturised analytical systems due to their value as extremely powerful tools for high-throughput sequence analysis, drug discovery and development, and diagnostic tests in medicine (see articles in Issue 1). Terminologies that have been used to describe these array-based bioscience systems include (but are not limited to): DNA-chip, microarrays, microchip, biochip, DNA-microarrays and genome chip. Potential technological benefits of introducing these miniaturised analytical systems include improved accuracy, multiplexing, lower sample and reagent consumption, disposability, and decreased analysis times, just to mention a few examples. Among the many alternative principles of detection-analysis (e.g.chemiluminescence, electroluminescence and conductivity), fluorescence-based techniques are widely used, examples being fluorescence resonance energy transfer, fluorescence quenching, fluorescence polarisation, time-resolved fluorescence, and fluorescence fluctuation spectroscopy (see articles in Issue 11). Time-dependent fluctuations of fluorescent biomolecules with different molecular properties, like molecular weight, translational and rotational diffusion time, colour and lifetime, potentially provide all the kinetic and thermodynamic information required in analysing complex interactions. In this mini-review article, we present recent extensions aimed to implement parallel laser excitation and parallel fluorescence detection that can lead to even further increase in throughput in miniaturised array-based analytical systems. We also report on developments and characterisations of multiplexing extension that allow multifocal laser excitation together with matched parallel fluorescence detection for parallel confocal dynamical fluorescence fluctuation studies at the single biomolecule level.

  17. Optical computing using optical flip-flops in Fourier processors: use in matrix multiplication and discrete linear transforms.

    PubMed

    Ando, S; Sekine, S; Mita, M; Katsuo, S

    1989-12-15

    An architecture and the algorithms for matrix multiplication using optical flip-flops (OFFs) in optical processors are proposed based on residue arithmetic. The proposed system is capable of processing all elements of matrices in parallel utilizing the information retrieving ability of optical Fourier processors. The employment of OFFs enables bidirectional data flow leading to a simpler architecture and the burden of residue-to-decimal (or residue-to-binary) conversion to operation time can be largely reduced by processing all elements in parallel. The calculated characteristics of operation time suggest a promising use of the system in a real time 2-D linear transform.

  18. Recovery Act - CAREER: Sustainable Silicon -- Energy-Efficient VLSI Interconnect for Extreme-Scale Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chiang, Patrick

    2014-01-31

    The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou­ sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on­ chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.

  19. Characterizing parallel file-access patterns on a large-scale multiprocessor

    NASA Technical Reports Server (NTRS)

    Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael

    1994-01-01

    Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.

  20. Differential Draining of Parallel-Fed Propellant Tanks in Morpheus and Apollo Flight

    NASA Technical Reports Server (NTRS)

    Hurlbert, Eric; Guardado, Hector; Hernandez, Humberto; Desai, Pooja

    2015-01-01

    Parallel-fed propellant tanks are an advantageous configuration for many spacecraft. Parallel-fed tanks allow the center of gravity (cg) to be maintained over the engine(s), as opposed to serial-fed propellant tanks which result in a cg shift as propellants are drained from tank one tank first opposite another. Parallel-fed tanks also allow for tank isolation if that is needed. Parallel tanks and feed systems have been used in several past vehicles including the Apollo Lunar Module. The design of the feedsystem connecting the parallel tank is critical to maintain balance in the propellant tanks. The design must account for and minimize the effect of manufacturing variations that could cause delta-p or mass flowrate differences, which would lead to propellant imbalance. Other sources of differential draining will be discussed. Fortunately, physics provides some self-correcting behaviors that tend to equalize any initial imbalance. The question concerning whether or not active control of propellant in each tank is required or can be avoided or not is also important to answer. In order to provide data on parallel-fed tanks and differential draining in flight for cryogenic propellants (as well as any other fluid), a vertical test bed (flying lander) for terrestrial use was employed. The Morpheus vertical test bed is a parallel-fed propellant tank system that uses passive design to keep the propellant tanks balanced. The system is operated in blow down. The Morpheus vehicle was instrumented with a capacitance level sensor in each propellant tank in order to measure the draining of propellants in over 34 tethered and 12 free flights. Morpheus did experience an approximately 20 lb/m imbalance in one pair of tanks. The cause of this imbalance will be discussed. This paper discusses the analysis, design, flight simulation vehicle dynamic modeling, and flight test of the Morpheus parallel-fed propellant. The Apollo LEM data is also examined in this summary report of the flight data.

  1. Efficacy of lead foil for reducing doses in the head and neck: a simulation study using digital intraoral systems

    PubMed Central

    Silva, A I V; Brasil, D M; Vasconcelos, K F; Haiter Neto, F; Boscolo, F N

    2015-01-01

    Objectives: To assess the efficacy of lead foils in reducing the radiation dose received by different anatomical sites of the head and neck during periapical intraoral examinations performed with digital systems. Methods: Images were acquired through four different manners: phosphor plate (PSP; VistaScan® system; Dürr Dental GmbH, Bissingen, Germany) alone, PSP plus lead foil, complementary metal oxide semiconductor (CMOS; DIGORA® Toto, Soredex®, Tuusula, Finland) alone and CMOS plus lead foil. Radiation dose was measured after a full-mouth periapical series (14 radiographs) using the long-cone paralleling technique. Lithium fluoride (LiF 100) thermoluminescent dosemeters were placed in an anthropomorphic phantom at points corresponding to the tongue, thyroid, crystalline lenses, parotid glands and maxillary sinuses. Results: Dosemeter readings demonstrated the efficacy of the addition of lead foil in the intraoral digital X-ray systems provided in reducing organ doses in the selected structures, approximately 32% in the PSP system and 59% in the CMOS system. Conclusions: The use of lead foils associated with digital X-ray sensors is an effective alternative for the protection of different anatomical sites of the head and neck during full-mouth periapical series acquisition. PMID:26084474

  2. High Performance Parallel Computational Nanotechnology

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.

  3. Quasi-disjoint pentadiagonal matrix systems for the parallelization of compact finite-difference schemes and filters

    NASA Astrophysics Data System (ADS)

    Kim, Jae Wook

    2013-05-01

    This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.

  4. Fast Face-Recognition Optical Parallel Correlator Using High Accuracy Correlation Filter

    NASA Astrophysics Data System (ADS)

    Watanabe, Eriko; Kodate, Kashiko

    2005-11-01

    We designed and fabricated a fully automatic fast face recognition optical parallel correlator [E. Watanabe and K. Kodate: Appl. Opt. 44 (2005) 5666] based on the VanderLugt principle. The implementation of an as-yet unattained ultra high-speed system was aided by reconfiguring the system to make it suitable for easier parallel processing, as well as by composing a higher accuracy correlation filter and high-speed ferroelectric liquid crystal-spatial light modulator (FLC-SLM). In running trial experiments using this system (dubbed FARCO), we succeeded in acquiring remarkably low error rates of 1.3% for false match rate (FMR) and 2.6% for false non-match rate (FNMR). Given the results of our experiments, the aim of this paper is to examine methods of designing correlation filters and arranging database image arrays for even faster parallel correlation, underlining the issues of calculation technique, quantization bit rate, pixel size and shift from optical axis. The correlation filter has proved its excellent performance and higher precision than classical correlation and joint transform correlator (JTC). Moreover, arrangement of multi-object reference images leads to 10-channel correlation signals, as sharply marked as those of a single channel. This experiment result demonstrates great potential for achieving the process speed of 10000 face/s.

  5. Electroluminescence Caused by the Transport of Interacting Electrons through Parallel Quantum Dots in a Photon Cavity

    NASA Astrophysics Data System (ADS)

    Gudmundsson, Vidar; Abdulla, Nzar Rauf; Sitek, Anna; Goan, Hsi-Sheng; Tang, Chi-Shung; Manolescu, Andrei

    2018-02-01

    We show that a Rabi-splitting of the states of strongly interacting electrons in parallel quantum dots embedded in a short quantum wire placed in a photon cavity can be produced by either the para- or the dia-magnetic electron-photon interactions when the geometry of the system is properly accounted for and the photon field is tuned close to a resonance with the electron system. We use these two resonances to explore the electroluminescence caused by the transport of electrons through the one- and two-electron ground states of the system and their corresponding conventional and vacuum electroluminescense as the central system is opened up by coupling it to external leads acting as electron reservoirs. Our analysis indicates that high-order electron-photon processes are necessary to adequately construct the cavity-photon dressed electron states needed to describe both types of electroluminescence.

  6. Automated target recognition and tracking using an optical pattern recognition neural network

    NASA Technical Reports Server (NTRS)

    Chao, Tien-Hsin

    1991-01-01

    The on-going development of an automatic target recognition and tracking system at the Jet Propulsion Laboratory is presented. This system is an optical pattern recognition neural network (OPRNN) that is an integration of an innovative optical parallel processor and a feature extraction based neural net training algorithm. The parallel optical processor provides high speed and vast parallelism as well as full shift invariance. The neural network algorithm enables simultaneous discrimination of multiple noisy targets in spite of their scales, rotations, perspectives, and various deformations. This fully developed OPRNN system can be effectively utilized for the automated spacecraft recognition and tracking that will lead to success in the Automated Rendezvous and Capture (AR&C) of the unmanned Cargo Transfer Vehicle (CTV). One of the most powerful optical parallel processors for automatic target recognition is the multichannel correlator. With the inherent advantages of parallel processing capability and shift invariance, multiple objects can be simultaneously recognized and tracked using this multichannel correlator. This target tracking capability can be greatly enhanced by utilizing a powerful feature extraction based neural network training algorithm such as the neocognitron. The OPRNN, currently under investigation at JPL, is constructed with an optical multichannel correlator where holographic filters have been prepared using the neocognitron training algorithm. The computation speed of the neocognitron-type OPRNN is up to 10(exp 14) analog connections/sec that enabling the OPRNN to outperform its state-of-the-art electronics counterpart by at least two orders of magnitude.

  7. Change in the coil distribution of electrodynamic suspension system

    NASA Technical Reports Server (NTRS)

    Tanaka, Hisashi

    1992-01-01

    At the Miyazaki Maglev Test Center, the initial test runs were completed using a system design that required the superconducting coils to be parallel with the ground levitation coils. Recently, the coil distribution was changed to a system such that the two types of coils were perpendicular to each other. Further system changes will lead to the construction of a side wall levitation system. It is hoped that the development will culminate in a system whereby a superconducting coil will maintain all the functions: levitation, propulsion, and guidance.

  8. Computer science, artificial intelligence, and cybernetics: Applied artificial intelligence in Japan

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rubinger, B.

    1988-01-01

    This sourcebook provides information on the developments in artificial intelligence originating in Japan. Spanning such innovations as software productivity, natural language processing, CAD, and parallel inference machines, this volume lists leading organizations conducting research or implementing AI systems, describes AI applications being pursued, illustrates current results achieved, and highlights sources reporting progress.

  9. A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

    PubMed

    Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

    2013-09-15

    A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and α-Hemolysin (α-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and α-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. Copyright © 2013 Wiley Periodicals, Inc.

  10. ABLE project: Development of an advanced lead-acid storage system for autonomous PV installations

    NASA Astrophysics Data System (ADS)

    Lemaire-Potteau, Elisabeth; Vallvé, Xavier; Pavlov, Detchko; Papazov, G.; Borg, Nico Van der; Sarrau, Jean-François

    In the advanced battery for low-cost renewable energy (ABLE) project, the partners have developed an advanced storage system for small and medium-size PV systems. It is composed of an innovative valve-regulated lead-acid (VRLA) battery, optimised for reliability and manufacturing cost, and an integrated regulator, for optimal battery management and anti-fraudulent use. The ABLE battery performances are comparable to flooded tubular batteries, which are the reference in medium-size PV systems. The ABLE regulator has several innovative features regarding energy management and modular series/parallel association. The storage system has been validated by indoor, outdoor and field tests, and it is expected that this concept could be a major improvement for large-scale implementation of PV within the framework of national rural electrification schemes.

  11. Vectorcardiographic diagnostic & prognostic information derived from the 12-lead electrocardiogram: Historical review and clinical perspective.

    PubMed

    Man, Sumche; Maan, Arie C; Schalij, Martin J; Swenne, Cees A

    2015-01-01

    In the course of time, electrocardiography has assumed several modalities with varying electrode numbers, electrode positions and lead systems. 12-lead electrocardiography and 3-lead vectorcardiography have become particularly popular. These modalities developed in parallel through the mid-twentieth century. In the same time interval, the physical concepts underlying electrocardiography were defined and worked out. In particular, the vector concept (heart vector, lead vector, volume conductor) appeared to be essential to understanding the manifestations of electrical heart activity, both in the 12-lead electrocardiogram (ECG) and in the 3-lead vectorcardiogram (VCG). Not universally appreciated in the clinic, the vectorcardiogram, and with it the vector concept, went out of use. A revival of vectorcardiography started in the 90's, when VCGs were mathematically synthesized from standard 12-lead ECGs. This facilitated combined electrocardiography and vectorcardiography without the need for a special recording system. This paper gives an overview of these historical developments, elaborates on the vector concept and seeks to define where VCG analysis/interpretation can add diagnostic/prognostic value to conventional 12-lead ECG analysis. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Multigrid methods with space–time concurrency

    DOE PAGES

    Falgout, R. D.; Friedhoff, S.; Kolev, Tz. V.; ...

    2017-10-06

    Here, we consider the comparison of multigrid methods for parabolic partial differential equations that allow space–time concurrency. With current trends in computer architectures leading towards systems with more, but not faster, processors, space–time concurrency is crucial for speeding up time-integration simulations. In contrast, traditional time-integration techniques impose serious limitations on parallel performance due to the sequential nature of the time-stepping approach, allowing spatial concurrency only. This paper considers the three basic options of multigrid algorithms on space–time grids that allow parallelism in space and time: coarsening in space and time, semicoarsening in the spatial dimensions, and semicoarsening in the temporalmore » dimension. We develop parallel software and performance models to study the three methods at scales of up to 16K cores and introduce an extension of one of them for handling multistep time integration. We then discuss advantages and disadvantages of the different approaches and their benefit compared to traditional space-parallel algorithms with sequential time stepping on modern architectures.« less

  13. Multigrid methods with space–time concurrency

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Falgout, R. D.; Friedhoff, S.; Kolev, Tz. V.

    Here, we consider the comparison of multigrid methods for parabolic partial differential equations that allow space–time concurrency. With current trends in computer architectures leading towards systems with more, but not faster, processors, space–time concurrency is crucial for speeding up time-integration simulations. In contrast, traditional time-integration techniques impose serious limitations on parallel performance due to the sequential nature of the time-stepping approach, allowing spatial concurrency only. This paper considers the three basic options of multigrid algorithms on space–time grids that allow parallelism in space and time: coarsening in space and time, semicoarsening in the spatial dimensions, and semicoarsening in the temporalmore » dimension. We develop parallel software and performance models to study the three methods at scales of up to 16K cores and introduce an extension of one of them for handling multistep time integration. We then discuss advantages and disadvantages of the different approaches and their benefit compared to traditional space-parallel algorithms with sequential time stepping on modern architectures.« less

  14. PLASMA TURBULENCE AND KINETIC INSTABILITIES AT ION SCALES IN THE EXPANDING SOLAR WIND

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hellinger, Petr; Trávnícek, Pavel M.; Matteini, Lorenzo

    The relationship between a decaying strong turbulence and kinetic instabilities in a slowly expanding plasma is investigated using two-dimensional (2D) hybrid expanding box simulations. We impose an initial ambient magnetic field perpendicular to the simulation box, and we start with a spectrum of large-scale, linearly polarized, random-phase Alfvénic fluctuations that have energy equipartition between kinetic and magnetic fluctuations and vanishing correlation between the two fields. A turbulent cascade rapidly develops; magnetic field fluctuations exhibit a power-law spectrum at large scales and a steeper spectrum at ion scales. The turbulent cascade leads to an overall anisotropic proton heating, protons are heatedmore » in the perpendicular direction, and, initially, also in the parallel direction. The imposed expansion leads to generation of a large parallel proton temperature anisotropy which is at later stages partly reduced by turbulence. The turbulent heating is not sufficient to overcome the expansion-driven perpendicular cooling and the system eventually drives the oblique firehose instability in a form of localized nonlinear wave packets which efficiently reduce the parallel temperature anisotropy. This work demonstrates that kinetic instabilities may coexist with strong plasma turbulence even in a constrained 2D regime.« less

  15. Implementation of a partitioned algorithm for simulation of large CSI problems

    NASA Technical Reports Server (NTRS)

    Alvin, Kenneth F.; Park, K. C.

    1991-01-01

    The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.

  16. Increased Energy Delivery for Parallel Battery Packs with No Regulated Bus

    NASA Astrophysics Data System (ADS)

    Hsu, Chung-Ti

    In this dissertation, a new approach to paralleling different battery types is presented. A method for controlling charging/discharging of different battery packs by using low-cost bi-directional switches instead of DC-DC converters is proposed. The proposed system architecture, algorithms, and control techniques allow batteries with different chemistry, voltage, and SOC to be properly charged and discharged in parallel without causing safety problems. The physical design and cost for the energy management system is substantially reduced. Additionally, specific types of failures in the maximum power point tracking (MPPT) in a photovoltaic (PV) system when tracking only the load current of a DC-DC converter are analyzed. The periodic nonlinear load current will lead MPPT realized by the conventional perturb and observe (P&O) algorithm to be problematic. A modified MPPT algorithm is proposed and it still only requires typically measured signals, yet is suitable for both linear and periodic nonlinear loads. Moreover, for a modular DC-DC converter using several converters in parallel, the input power from PV panels is processed and distributed at the module level. Methods for properly implementing distributed MPPT are studied. A new approach to efficient MPPT under partial shading conditions is presented. The power stage architecture achieves fast input current change rate by combining a current-adjustable converter with a few converters operating at a constant current.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guo Zehua; Tang Xianzhu

    Parallel transport of long mean-free-path plasma along an open magnetic field line is characterized by strong temperature anisotropy, which is driven by two effects. The first is magnetic moment conservation in a non-uniform magnetic field, which can transfer energy between parallel and perpendicular degrees of freedom. The second is decompressional cooling of the parallel temperature due to parallel flow acceleration by conventional presheath electric field which is associated with the sheath condition near the wall surface where the open magnetic field line intercepts the discharge chamber. To the leading order in gyroradius to system gradient length scale expansion, the parallelmore » transport can be understood via the Chew-Goldbeger-Low (CGL) model which retains two components of the parallel heat flux, i.e., q{sub n} associated with the parallel thermal energy and q{sub s} related to perpendicular thermal energy. It is shown that in addition to the effect of magnetic field strength (B) modulation, the two components (q{sub n} and q{sub s}) of the parallel heat flux play decisive roles in the parallel variation of the plasma profile, which includes the plasma density (n), parallel flow (u), parallel and perpendicular temperatures (T{sub Parallel-To} and T{sub Up-Tack }), and the ambipolar potential ({phi}). Both their profile (q{sub n}/B and q{sub s}/B{sup 2}) and the upstream values of the ratio of the conductive and convective thermal flux (q{sub n}/nuT{sub Parallel-To} and q{sub s}/nuT{sub Up-Tack }) provide the controlling physics, in addition to B modulation. The physics described by the CGL model are contrasted with those of the double-adiabatic laws and further elucidated by comparison with the first-principles kinetic simulation for a specific but representative flux expander case.« less

  18. Wire-Guide Manipulator For Automated Welding

    NASA Technical Reports Server (NTRS)

    Morris, Tim; White, Kevin; Gordon, Steve; Emerich, Dave; Richardson, Dave; Faulkner, Mike; Stafford, Dave; Mccutcheon, Kim; Neal, Ken; Milly, Pete

    1994-01-01

    Compact motor drive positions guide for welding filler wire. Drive part of automated wire feeder in partly or fully automated welding system. Drive unit contains three parallel subunits. Rotations of lead screws in three subunits coordinated to obtain desired motions in three degrees of freedom. Suitable for both variable-polarity plasma arc welding and gas/tungsten arc welding.

  19. A novel radiation detector for removing scattered radiation in chest radiography: Monte Carlo simulation-based performance evaluation

    NASA Astrophysics Data System (ADS)

    Roh, Y. H.; Yoon, Y.; Kim, K.; Kim, J.; Kim, J.; Morishita, J.

    2016-10-01

    Scattered radiation is the main reason for the degradation of image quality and the increased patient exposure dose in diagnostic radiology. In an effort to reduce scattered radiation, a novel structure of an indirect flat panel detector has been proposed. In this study, a performance evaluation of the novel system in terms of image contrast as well as an estimation of the number of photons incident on the detector and the grid exposure factor were conducted using Monte Carlo simulations. The image contrast of the proposed system was superior to that of the no-grid system but slightly inferior to that of the parallel-grid system. The number of photons incident on the detector and the grid exposure factor of the novel system were higher than those of the parallel-grid system but lower than those of the no-grid system. The proposed system exhibited the potential for reduced exposure dose without image quality degradation; additionally, can be further improved by a structural optimization considering the manufacturer's specifications of its lead contents.

  20. Fear Control an Danger Control: A Test of the Extended Parallel Process Model (EPPM).

    ERIC Educational Resources Information Center

    Witte, Kim

    1994-01-01

    Explores cognitive and emotional mechanisms underlying success and failure of fear appeals in context of AIDS prevention. Offers general support for Extended Parallel Process Model. Suggests that cognitions lead to fear appeal success (attitude, intention, or behavior changes) via danger control processes, whereas the emotion fear leads to fear…

  1. A massively asynchronous, parallel brain.

    PubMed

    Zeki, Semir

    2015-05-19

    Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously--with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain.

  2. Multiphase complete exchange on Paragon, SP2 and CS-2

    NASA Technical Reports Server (NTRS)

    Bokhari, Shahid H.

    1995-01-01

    The overhead of interprocessor communication is a major factor in limiting the performance of parallel computer systems. The complete exchange is the severest communication pattern in that it requires each processor to send a distinct message to every other processor. This pattern is at the heart of many important parallel applications. On hypercubes, multiphase complete exchange has been developed and shown to provide optimal performance over varying message sizes. Most commercial multicomputer systems do not have a hypercube interconnect. However, they use special purpose hardware and dedicated communication processors to achieve very high performance communication and can be made to emulate the hypercube quite well. Multiphase complete exchange has been implemented on three contemporary parallel architectures: the Intel Paragon, IBM SP2 and Meiko CS-2. The essential features of these machines are described and their basic interprocessor communication overheads are discussed. The performance of multiphase complete exchange is evaluated on each machine. It is shown that the theoretical ideas developed for hypercubes are also applicable in practice to these machines and that multiphase complete exchange can lead to major savings in execution time over traditional solutions.

  3. Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carleton, James Brian; Parks, Michael L.

    Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less

  4. A Programming Framework for Scientific Applications on CPU-GPU Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Owens, John

    2013-03-24

    At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiplemore » parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.« less

  5. Modified Denavit-Hartenberg parameters for better location of joint axis systems in robot arms

    NASA Technical Reports Server (NTRS)

    Barker, L. K.

    1986-01-01

    The Denavit-Hartenberg parameters define the relative location of successive joint axis systems in a robot arm. A recent justifiable criticism is that one of these parameters becomes extremely large when two successive joints have near-parallel rotational axes. Geometrically, this parameter then locates a joint axis system at an excessive distance from the robot arm and, computationally, leads to an ill-conditioned transformation matrix. In this paper, a simple modification (which results from constraining a transverse vector between successive joint rotational axes to be normal to one of the rotational axes, instead of both) overcomes this criticism and favorably locates the joint axis system. An example is given for near-parallel rotational axes of the elbow and shoulder joints in a robot arm. The regular and modified parameters are extracted by an algebraic method with simulated measurement data. Unlike the modified parameters, extracted values of the regular parameters are very sensitive to measurement accuracy.

  6. Fano effect dominance over Coulomb blockade in transport properties of parallel coupled quantum dot system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brogi, Bharat Bhushan, E-mail: brogi-221179@yahoo.in; Ahluwalia, P. K.; Chand, Shyam

    2015-06-24

    Theoretical study of the Coulomb blockade effect on transport properties (Transmission Probability and I-V characteristics) for varied configuration of coupled quantum dot system has been studied by using Non Equilibrium Green Function(NEGF) formalism and Equation of Motion(EOM) method in the presence of magnetic flux. The self consistent approach and intra-dot Coulomb interaction is being taken into account. As the key parameters of the coupled quantum dot system such as dot-lead coupling, inter-dot tunneling and magnetic flux threading through the system can be tuned, the effect of asymmetry parameter and magnetic flux on this tuning is being explored in Coulomb blockademore » regime. The presence of the Coulomb blockade due to on-dot Coulomb interaction decreases the width of transmission peak at energy level ε + U and by adjusting the magnetic flux the swapping effect in the Fano peaks in asymmetric and symmetric parallel configuration sustains despite strong Coulomb blockade effect.« less

  7. Exploring Asynchronous Many-Task Runtime Systems toward Extreme Scales

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Knight, Samuel; Baker, Gavin Matthew; Gamell, Marc

    2015-10-01

    Major exascale computing reports indicate a number of software challenges to meet the dramatic change of system architectures in near future. While several-orders-of-magnitude increase in parallelism is the most commonly cited of those, hurdles also include performance heterogeneity of compute nodes across the system, increased imbalance between computational capacity and I/O capabilities, frequent system interrupts, and complex hardware architectures. Asynchronous task-parallel programming models show a great promise in addressing these issues, but are not yet fully understood nor developed su ciently for computational science and engineering application codes. We address these knowledge gaps through quantitative and qualitative exploration of leadingmore » candidate solutions in the context of engineering applications at Sandia. In this poster, we evaluate MiniAero code ported to three leading candidate programming models (Charm++, Legion and UINTAH) to examine the feasibility of these models that permits insertion of new programming model elements into an existing code base.« less

  8. Ion acceleration and heating by kinetic Alfvén waves associated with magnetic reconnection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liang, Ji; Lin, Yu; Johnson, Jay R.

    In a previous study on the generation and signatures of kinetic Alfv en waves (KAWs) associated with magnetic reconnection in a current sheet revealed that KAWs are a common feature during reconnection [Liang et al. J. Geophys. Res.: Space Phys. 121, 6526 (2016)]. In this paper, ion acceleration and heating by the KAWs generated during magnetic reconnection are investigated with a three-dimensional (3-D) hybrid model. It is found that in the outflow region, a fraction of inflow ions are accelerated by the KAWs generated in the leading bulge region of reconnection, and their parallel velocities gradually increase up to slightly super-Alfv enic. As a result of waveparticle interactions, an accelerated ion beam forms in the direction of the anti-parallel magnetic field, in addition to the core ion population, leading to the development of non-Maxwellian velocity distributions, which include a trapped population with parallel velocities consistent with the wave speed. We then heat ions in both parallel and perpendicular directions. In the parallel direction, the heating results from nonlinear Landau resonance of trapped ions. In the perpendicular direction, however, evidence of stochastic heating by the KAWs is found during the acceleration stage, with an increase of magnetic moment μ. The coherence in the T more » $$\\perp$$ ion temperature and the perpendicular electric and magnetic fields of KAWs also provides evidence for perpendicular heating by KAWs. The parallel and perpendicular heating of the accelerated beam occur simultaneously, leading to the development of temperature anisotropy with the perpendicular temperature T $$\\perp$$>T $$\\parallel$$ temperature. The heating rate agrees with the damping rate of the KAWs, and the heating is dominated by the accelerated ion beam. In the later stage, with the increase of the fraction of the accelerated ions, interaction between the accelerated beam and the core population also contributes to the ion heating, ultimately leading to overlap of the beams and an overall anisotropy with T $$\\perp$$>T $$\\parallel$$.« less

  9. Ion acceleration and heating by kinetic Alfvén waves associated with magnetic reconnection

    DOE PAGES

    Liang, Ji; Lin, Yu; Johnson, Jay R.; ...

    2017-09-19

    In a previous study on the generation and signatures of kinetic Alfv en waves (KAWs) associated with magnetic reconnection in a current sheet revealed that KAWs are a common feature during reconnection [Liang et al. J. Geophys. Res.: Space Phys. 121, 6526 (2016)]. In this paper, ion acceleration and heating by the KAWs generated during magnetic reconnection are investigated with a three-dimensional (3-D) hybrid model. It is found that in the outflow region, a fraction of inflow ions are accelerated by the KAWs generated in the leading bulge region of reconnection, and their parallel velocities gradually increase up to slightly super-Alfv enic. As a result of waveparticle interactions, an accelerated ion beam forms in the direction of the anti-parallel magnetic field, in addition to the core ion population, leading to the development of non-Maxwellian velocity distributions, which include a trapped population with parallel velocities consistent with the wave speed. We then heat ions in both parallel and perpendicular directions. In the parallel direction, the heating results from nonlinear Landau resonance of trapped ions. In the perpendicular direction, however, evidence of stochastic heating by the KAWs is found during the acceleration stage, with an increase of magnetic moment μ. The coherence in the T more » $$\\perp$$ ion temperature and the perpendicular electric and magnetic fields of KAWs also provides evidence for perpendicular heating by KAWs. The parallel and perpendicular heating of the accelerated beam occur simultaneously, leading to the development of temperature anisotropy with the perpendicular temperature T $$\\perp$$>T $$\\parallel$$ temperature. The heating rate agrees with the damping rate of the KAWs, and the heating is dominated by the accelerated ion beam. In the later stage, with the increase of the fraction of the accelerated ions, interaction between the accelerated beam and the core population also contributes to the ion heating, ultimately leading to overlap of the beams and an overall anisotropy with T $$\\perp$$>T $$\\parallel$$.« less

  10. Parallel conjugate gradient algorithms for manipulator dynamic simulation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Scheld, Robert E.

    1989-01-01

    Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

  11. Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

    NASA Astrophysics Data System (ADS)

    Leamy, Michael J.; Springer, Adam C.

    In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.

  12. Antiresonance and decoupling in electronic transport through parallel-coupled quantum-dot structures with laterally-coupled Majorana zero modes

    NASA Astrophysics Data System (ADS)

    Zhang, Ya-Jing; Zhang, Lian-Lian; Jiang, Cui; Gong, Wei-Jiang

    2018-02-01

    We theoretically investigate the electronic transport through a parallel-coupled multi-quantum-dot system, in which the terminal dots of a one-dimensional quantum-dot chain are embodied in the two arms of an Aharonov-Bohm interferometer. It is found that in the structures of odd(even) dots, all their even(odd) molecular states have opportunities to decouple from the leads, and in this process antiresonance occurs which are accordant with the odd(even)-numbered eigenenergies of the sub-molecule without terminal dots. Next when Majorana zero modes are introduced to couple laterally to the terminal dots, the antiresonance and decoupling phenomena still co-exist in the quantum transport process. Such a result can be helpful in understanding the special influence of Majorana zero mode on the electronic transport through quantum-dot systems.

  13. Parallel solution of closely coupled systems

    NASA Technical Reports Server (NTRS)

    Utku, S.; Salama, M.

    1986-01-01

    The odd-even permutation and associated unitary transformations for reordering the matrix coefficient A are employed as means of breaking the strong seriality which is characteristic of closely coupled systems. The nested dissection technique is also reviewed, and the equivalence between reordering A and dissecting its network is established. The effect of transforming A with odd-even permutation on its topology and the topology of its Cholesky factors is discussed. This leads to the construction of directed graphs showing the computational steps required for factoring A, their precedence relationships and their sequential and concurrent assignment to the available processors. Expressions for the speed-up and efficiency of using N processors in parallel relative to the sequential use of a single processor are derived from the directed graph. Similar expressions are also derived when the number of available processors is fewer than required.

  14. Enhanced pyroelectric and piezoelectric properties of PZT with aligned porosity for energy harvesting applications† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7ta00967d Click here for additional data file.

    PubMed Central

    Zhang, Yan; Xie, Mengying; Roscow, James; Bao, Yinxiang; Zhou, Kechao

    2017-01-01

    This paper demonstrates the significant benefits of exploiting highly aligned porosity in piezoelectric and pyroelectric materials for improved energy harvesting performance. Porous lead zirconate (PZT) ceramics with aligned pore channels and varying fractions of porosity were manufactured in a water-based suspension using freeze-casting. The aligned porous PZT ceramics were characterized in detail for both piezoelectric and pyroelectric properties and their energy harvesting performance figures of merit were assessed parallel and perpendicular to the freezing direction. As a result of the introduction of porosity into the ceramic microstructure, high piezoelectric and pyroelectric harvesting figures of merits were achieved for porous freeze-cast PZT compared to dense PZT due to the reduced permittivity and volume specific heat capacity. Experimental results were compared to parallel and series analytical models with good agreement and the PZT with porosity aligned parallel to the freezing direction exhibited the highest piezoelectric and pyroelectric harvesting response; this was a result of the enhanced interconnectivity of the ferroelectric material along the poling direction and reduced fraction of unpoled material that leads to a higher polarization. A complete thermal energy harvesting system, composed of a parallel-aligned PZT harvester element and an AC/DC converter, was successfully demonstrated by charging a storage capacitor. The maximum energy density generated by the 60 vol% porous parallel-connected PZT when subjected to thermal oscillations was 1653 μJ cm–3, which was 374% higher than that of the dense PZT with an energy density of 446 μJ cm–3. The results are beneficial for the design and manufacture of high performance porous pyroelectric and piezoelectric materials in devices for energy harvesting and sensor applications. PMID:28580142

  15. Enhanced pyroelectric and piezoelectric properties of PZT with aligned porosity for energy harvesting applications.

    PubMed

    Zhang, Yan; Xie, Mengying; Roscow, James; Bao, Yinxiang; Zhou, Kechao; Zhang, Dou; Bowen, Chris R

    2017-04-14

    This paper demonstrates the significant benefits of exploiting highly aligned porosity in piezoelectric and pyroelectric materials for improved energy harvesting performance. Porous lead zirconate (PZT) ceramics with aligned pore channels and varying fractions of porosity were manufactured in a water-based suspension using freeze-casting. The aligned porous PZT ceramics were characterized in detail for both piezoelectric and pyroelectric properties and their energy harvesting performance figures of merit were assessed parallel and perpendicular to the freezing direction. As a result of the introduction of porosity into the ceramic microstructure, high piezoelectric and pyroelectric harvesting figures of merits were achieved for porous freeze-cast PZT compared to dense PZT due to the reduced permittivity and volume specific heat capacity. Experimental results were compared to parallel and series analytical models with good agreement and the PZT with porosity aligned parallel to the freezing direction exhibited the highest piezoelectric and pyroelectric harvesting response; this was a result of the enhanced interconnectivity of the ferroelectric material along the poling direction and reduced fraction of unpoled material that leads to a higher polarization. A complete thermal energy harvesting system, composed of a parallel-aligned PZT harvester element and an AC/DC converter, was successfully demonstrated by charging a storage capacitor. The maximum energy density generated by the 60 vol% porous parallel-connected PZT when subjected to thermal oscillations was 1653 μJ cm -3 , which was 374% higher than that of the dense PZT with an energy density of 446 μJ cm -3 . The results are beneficial for the design and manufacture of high performance porous pyroelectric and piezoelectric materials in devices for energy harvesting and sensor applications.

  16. Time-dependent current into and through multilevel parallel quantum dots in a photon cavity

    NASA Astrophysics Data System (ADS)

    Gudmundsson, Vidar; Abdullah, Nzar Rauf; Sitek, Anna; Goan, Hsi-Sheng; Tang, Chi-Shung; Manolescu, Andrei

    2017-05-01

    We analyze theoretically the charging current into, and the transport current through, a nanoscale two-dimensional electron system with two parallel quantum dots embedded in a short wire placed in a photon cavity. A plunger gate is used to place specific many-body states of the interacting system in the bias window defined by the external leads. We show how the transport phenomena active in the many-level complex central system strongly depend on the gate voltage. We identify a resonant transport through the central system as the two spin components of the one-electron ground state are in the bias window. This resonant transport through the lowest energy electron states seems to a large extent independent of the detuned photon field when judged from the transport current. This could be expected in the small bias regime, but an observation of the occupancy of the states of the system reveals that this picture is not entirely true. The current does not reflect slower photon-active internal transitions bringing the system into the steady state. The number of initially present photons determines when the system reaches the real steady state. With two-electron states in the bias window we observe a more complex situation with intermediate radiative and nonradiative relaxation channels leading to a steady state with a weak nonresonant current caused by inelastic tunneling through the two-electron ground state of the system. The presence of the radiative channels makes this phenomena dependent on the number of photons initially in the cavity.

  17. Electric Field Comparison between Microelectrode Recording and Deep Brain Stimulation Systems—A Simulation Study

    PubMed Central

    Johansson, Johannes; Wårdell, Karin; Hemm, Simone

    2018-01-01

    The success of deep brain stimulation (DBS) relies primarily on the localization of the implanted electrode. Its final position can be chosen based on the results of intraoperative microelectrode recording (MER) and stimulation tests. The optimal position often differs from the final one selected for chronic stimulation with the DBS electrode. The aim of the study was to investigate, using finite element method (FEM) modeling and simulations, whether lead design, electrical setup, and operating modes induce differences in electric field (EF) distribution and in consequence, the clinical outcome. Finite element models of a MER system and a chronic DBS lead were developed. Simulations of the EF were performed for homogenous and patient-specific brain models to evaluate the influence of grounding (guide tube vs. stimulator case), parallel MER leads, and non-active DBS contacts. Results showed that the EF is deformed depending on the distance between the guide tube and stimulating contact. Several parallel MER leads and the presence of the non-active DBS contacts influence the EF distribution. The DBS EF volume can cover the intraoperatively produced EF, but can also extend to other anatomical areas. In conclusion, EF deformations between stimulation tests and DBS should be taken into consideration as they can alter the clinical outcome. PMID:29415442

  18. Superelement model based parallel algorithm for vehicle dynamics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Agrawal, O.P.; Danhof, K.J.; Kumar, R.

    1994-05-01

    This paper presents a superelement model based parallel algorithm for a planar vehicle dynamics. The vehicle model is made up of a chassis and two suspension systems each of which consists of an axle-wheel assembly and two trailing arms. In this model, the chassis is treated as a Cartesian element and each suspension system is treated as a superelement. The parameters associated with the superelements are computed using an inverse dynamics technique. Suspension shock absorbers and the tires are modeled by nonlinear springs and dampers. The Euler-Lagrange approach is used to develop the system equations of motion. This leads tomore » a system of differential and algebraic equations in which the constraints internal to superelements appear only explicitly. The above formulation is implemented on a multiprocessor machine. The numerical flow chart is divided into modules and the computation of several modules is performed in parallel to gain computational efficiency. In this implementation, the master (parent processor) creates a pool of slaves (child processors) at the beginning of the program. The slaves remain in the pool until they are needed to perform certain tasks. Upon completion of a particular task, a slave returns to the pool. This improves the overall response time of the algorithm. The formulation presented is general which makes it attractive for a general purpose code development. Speedups obtained in the different modules of the dynamic analysis computation are also presented. Results show that the superelement model based parallel algorithm can significantly reduce the vehicle dynamics simulation time. 52 refs.« less

  19. An 8-channel skin impedance measurement system for acupuncture research.

    PubMed

    Thong, Tran; Colbert, Agatha P; Larsen, Adrian P

    2009-01-01

    An 8-channel skin impedance measurement system for acupuncture research has been developed. The underlying model of the skin used is a parallel R & C network. Pulses are used to measure the R and C values. The measurement circuit is time multiplexed across the 8 channels at the rate of 2 measurements per second, leading to a complete set of measurements every 4 seconds. In static tests, the system has been operational for over 2 days of continuous measurements. In preliminary human tests, measurements over 2 hours have been collected per subject.

  20. cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic Simulator for Massive Parallel Analyses of Biological Systems

    PubMed Central

    Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo

    2014-01-01

    Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957

  1. Numerical techniques in radiative heat transfer for general, scattering, plane-parallel media

    NASA Technical Reports Server (NTRS)

    Sharma, A.; Cogley, A. C.

    1982-01-01

    The study of radiative heat transfer with scattering usually leads to the solution of singular Fredholm integral equations. The present paper presents an accurate and efficient numerical method to solve certain integral equations that govern radiative equilibrium problems in plane-parallel geometry for both grey and nongrey, anisotropically scattering media. In particular, the nongrey problem is represented by a spectral integral of a system of nonlinear integral equations in space, which has not been solved previously. The numerical technique is constructed to handle this unique nongrey governing equation as well as the difficulties caused by singular kernels. Example problems are solved and the method's accuracy and computational speed are analyzed.

  2. Domain decomposition in time for PDE-constrained optimization

    DOE PAGES

    Barker, Andrew T.; Stoll, Martin

    2015-08-28

    Here, PDE-constrained optimization problems have a wide range of applications, but they lead to very large and ill-conditioned linear systems, especially if the problems are time dependent. In this paper we outline an approach for dealing with such problems by decomposing them in time and applying an additive Schwarz preconditioner in time, so that we can take advantage of parallel computers to deal with the very large linear systems. We then illustrate the performance of our method on a variety of problems.

  3. A massively asynchronous, parallel brain

    PubMed Central

    Zeki, Semir

    2015-01-01

    Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871

  4. Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

    PubMed

    Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

    2017-06-01

    An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.

  5. Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆

    PubMed Central

    Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

    2017-01-01

    An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680

  6. Far Infrared Imaging Spectrometer for Large Aperture Infrared Telescope System

    DTIC Science & Technology

    1985-12-01

    resolution Fabry - Perot spectrometer (103 < Resolution < 104) for wavelengths from about 50 to 200 micrometer, employing extended field diffraction limited...photo- metry. The Naval Research Laboratory will provide a high resolution Far Infrared Imaging Spectrometer (FIRIS) using Fabry - Perot techniques in...detectors to provide spatial information. The Fabry - Perot uses electromagnetic coil displacement drivers with a lead screw drive to obtain parallel

  7. Aging with HIV infection: a journey to the center of inflammAIDS, immunosenescence and neuroHIV.

    PubMed

    Nasi, Milena; Pinti, Marcello; De Biasi, Sara; Gibellini, Lara; Ferraro, Diana; Mussini, Cristina; Cossarizza, Andrea

    2014-11-01

    In the last years, a significant improvement in life expectancy of HIV+ patients has been observed in Western countries. The parallel increase in the mean age of these patients causes a parallel increase in the frequency of non-AIDS related complications (i.e., neurocognitive, cardiovascular, liver and kidney diseases, metabolic syndrome, osteoporosis, non-HIV associated cancers, among others), even when antiviral treatment is successful. Immune activation and persistent inflammation characterizes both HIV infection and physiological aging, and both conditions share common detrimental pathways that lead to early immunosenescence. Furthermore, HIV-associated neurocognitive disorders represent important consequences of the infection. The persistent systemic immune activation, the continuous migration of activated monocytes to the central nervous system and progressive patients' aging contribute to develop neuronal injuries, that are in turn linked to HIV-associated neurocognitive disorders, which can persist despite successful antiretroviral treatment. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Modeling evolution of crosstalk in noisy signal transduction networks

    NASA Astrophysics Data System (ADS)

    Tareen, Ammar; Wingreen, Ned S.; Mukhopadhyay, Ranjan

    2018-02-01

    Signal transduction networks can form highly interconnected systems within cells due to crosstalk between constituent pathways. To better understand the evolutionary design principles underlying such networks, we study the evolution of crosstalk for two parallel signaling pathways that arise via gene duplication. We use a sequence-based evolutionary algorithm and evolve the network based on two physically motivated fitness functions related to information transmission. We find that one fitness function leads to a high degree of crosstalk while the other leads to pathway specificity. Our results offer insights on the relationship between network architecture and information transmission for noisy biomolecular networks.

  9. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  10. Efficient Scalable Median Filtering Using Histogram-Based Operations.

    PubMed

    Green, Oded

    2018-05-01

    Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.

  11. Review of An Introduction to Parallel and Vector Scientific Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bailey, David H.; Lefton, Lew

    2006-06-30

    On one hand, the field of high-performance scientific computing is thriving beyond measure. Performance of leading-edge systems on scientific calculations, as measured say by the Top500 list, has increased by an astounding factor of 8000 during the 15-year period from 1993 to 2008, which is slightly faster even than Moore's Law. Even more importantly, remarkable advances in numerical algorithms, numerical libraries and parallel programming environments have led to improvements in the scope of what can be computed that are entirely on a par with the advances in computing hardware. And these successes have spread far beyond the confines of largemore » government-operated laboratories, many universities, modest-sized research institutes and private firms now operate clusters that differ only in scale from the behemoth systems at the large-scale facilities. In the wake of these recent successes, researchers from fields that heretofore have not been part of the scientific computing world have been drawn into the arena. For example, at the recent SC07 conference, the exhibit hall, which long has hosted displays from leading computer systems vendors and government laboratories, featured some 70 exhibitors who had not previously participated. In spite of all these exciting developments, and in spite of the clear need to present these concepts to a much broader technical audience, there is a perplexing dearth of training material and textbooks in the field, particularly at the introductory level. Only a handful of universities offer coursework in the specific area of highly parallel scientific computing, and instructors of such courses typically rely on custom-assembled material. For example, the present reviewer and Robert F. Lucas relied on materials assembled in a somewhat ad-hoc fashion from colleagues and personal resources when presenting a course on parallel scientific computing at the University of California, Berkeley, a few years ago. Thus it is indeed refreshing to see the publication of the book An Introduction to Parallel and Vector Scientic Computing, written by Ronald W. Shonkwiler and Lew Lefton, both of the Georgia Institute of Technology. They have taken the bull by the horns and produced a book that appears to be entirely satisfactory as an introductory textbook for use in such a course. It is also of interest to the much broader community of researchers who are already in the field, laboring day by day to improve the power and performance of their numerical simulations. The book is organized into 11 chapters, plus an appendix. The first three chapters describe the basics of system architecture including vector, parallel and distributed memory systems, the details of task dependence and synchronization, and the various programming models currently in use - threads, MPI and OpenMP. Chapters four through nine provide a competent introduction to floating-point arithmetic, numerical error and numerical linear algebra. Some of the topics presented include Gaussian elimination, LU decomposition, tridiagonal systems, Givens rotations, QR decompositions, Gauss-Seidel iterations and Householder transformations. Chapters 10 and 11 introduce Monte Carlo methods and schemes for discrete optimization such as genetic algorithms.« less

  12. Vasoregression: A Shared Vascular Pathology Underlying Macrovascular And Microvascular Pathologies?

    PubMed Central

    Gupta, Akanksha

    2015-01-01

    Abstract Vasoregression is a common phenomenon underlying physiological vessel development as well as pathological microvascular diseases leading to peripheral neuropathy, nephropathy, and vascular oculopathies. In this review, we describe the hallmarks and pathways of vasoregression. We argue here that there is a parallel between characteristic features of vasoregression in the ocular microvessels and atherosclerosis in the larger vessels. Shared molecular pathways and molecular effectors in the two conditions are outlined, thus highlighting the possible systemic causes of local vascular diseases. Our review gives us a system-wide insight into factors leading to multiple synchronous vascular diseases. Because shared molecular pathways might usefully address the diagnostic and therapeutic needs of multiple common complex diseases, the literature analysis presented here is of broad interest to readership in integrative biology, rational drug development and systems medicine. PMID:26669709

  13. Communication: Ion mobility of the radical cation dimers: (Naphthalene)2+• and naphthalene+•-benzene: Evidence for stacked sandwich and T-shape structures

    NASA Astrophysics Data System (ADS)

    Platt, Sean P.; Attah, Isaac K.; Aziz, Saadullah; El-Shall, M. Samy

    2015-05-01

    Dimer radical cations of aromatic and polycyclic aromatic molecules are good model systems for a fundamental understanding of photoconductivity and ferromagnetism in organic materials which depend on the degree of charge delocalization. The structures of the dimer radical cations are difficult to determine theoretically since the potential energy surface is often very flat with multiple shallow minima representing two major classes of isomers adopting the stacked parallel or the T-shape structure. We present experimental results, based on mass-selected ion mobility measurements, on the gas phase structures of the naphthalene+ṡ ṡ naphthalene homodimer and the naphthalene+ṡ ṡ benzene heterodimer radical cations at different temperatures. Ion mobility studies reveal a persistence of the stacked parallel structure of the naphthalene+ṡ ṡ naphthalene homodimer in the temperature range 230-300 K. On the other hand, the results reveal that the naphthalene+ṡ ṡ benzene heterodimer is able to exhibit both the stacked parallel and T-shape structural isomers depending on the experimental conditions. Exploitation of the unique structural motifs among charged homo- and heteroaromatic-aromatic interactions may lead to new opportunities for molecular design and recognition involving charged aromatic systems.

  14. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems

    PubMed Central

    Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C. M. A.; Saltz, Joel

    2017-01-01

    We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies. PMID:29081725

  15. Partitioning in parallel processing of production systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oflazer, K.

    1987-01-01

    This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less

  16. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.

  17. Prenatal Alcohol Exposure in Rodents As a Promising Model for the Study of ADHD Molecular Basis

    PubMed Central

    Rojas-Mayorquín, Argelia E.; Padilla-Velarde, Edgar; Ortuño-Sahagún, Daniel

    2016-01-01

    A physiological parallelism, or even a causal effect relationship, can be deducted from the analysis of the main characteristics of the “Alcohol Related Neurodevelopmental Disorders” (ARND), derived from prenatal alcohol exposure (PAE), and the behavioral performance in the Attention-deficit/hyperactivity disorder (ADHD). These two clinically distinct disease entities, exhibits many common features. They affect neurological shared pathways, and also related neurotransmitter systems. We briefly review here these parallelisms, with their common and uncommon characteristics, and with an emphasis in the subjacent molecular mechanisms of the behavioral manifestations, that lead us to propose that PAE in rats can be considered as a suitable model for the study of ADHD. PMID:28018163

  18. Speech recognition systems on the Cell Broadband Engine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Y; Jones, H; Vaidya, S

    In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less

  19. The testing of batteries linked to supercapacitors with electrochemical impedance spectroscopy: A comparison between Li-ion and valve regulated lead acid batteries

    NASA Astrophysics Data System (ADS)

    Ferg, Ernst; Rossouw, Claire; Loyson, Peter

    2013-03-01

    For electric vehicles, a supercapacitor can be coupled to the electrical system in order to increase and optimize the energy and power densities of the drive system during acceleration and regenerative breaking. This study looked at the charge acceptance and maximum discharge ability of a valve regulated lead acid (VRLA) and a Li-ion battery connected in parallel to supercapacitors. The test procedure evaluated the advantage of using a supercapacitor at a 2 F:1 Ah ratio with the battery types at various states of charge (SoC). The results showed that about 7% of extra charge was achieved over a 5-s test time for a Li-ion hybrid system at 20% SoC, whereas at the 80% SoC the additional capacity was approximately 16%. While for the VRLA battery hybrid system, an additional charge of up to 20% was achieved when the battery was at 80% SoC, with little or no benefit at the 20% SoC. The advantage of the supercapacitor in parallel with a VRLA battery was noticeable on its discharge ability, where significant extra capacity was achieved for short periods of time for a battery at the 60% and 40% SoC when compared to the Li-ion hybrid system. The study also made use of Electrochemical Impedance Spectroscopy (EIS) with a suitable equivalent circuit model to explain, in particular, the internal resistance and capacitance differences observed between the different battery chemistries with and without a supercapacitor.

  20. An evaluation to design high performance pinhole array detector module for four head SPECT: a simulation study

    NASA Astrophysics Data System (ADS)

    Rahman, Tasneem; Tahtali, Murat; Pickering, Mark R.

    2014-09-01

    The purpose of this study is to derive optimized parameters for a detector module employing an off-the-shelf X-ray camera and a pinhole array collimator applicable for a range of different SPECT systems. Monte Carlo simulations using the Geant4 application for tomographic emission (GATE) were performed to estimate the performance of the pinhole array collimators and were compared to that of low energy high resolution (LEHR) parallel-hole collimator in a four head SPECT system. A detector module was simulated to have 48 mm by 48 mm active area along with 1mm, 1.6mm and 2 mm pinhole aperture sizes at 0.48 mm pitch on a tungsten plate. Perpendicular lead septa were employed to verify overlapping and non-overlapping projections against a proper acceptance angle without lead septa. A uniform shape cylindrical water phantom was used to evaluate the performance of the proposed four head SPECT system of the pinhole array detector module. For each head, 100 pinhole configurations were evaluated based on sensitivity and detection efficiency for 140 keV γ-rays, and compared to LEHR parallel-hole collimator. SPECT images were reconstructed based on filtered back projection (FBP) algorithm where neither scatter nor attenuation corrections were performed. A better reconstruction algorithm development for this specific system is in progress. Nevertheless, activity distribution was well visualized using the backprojection algorithm. In this study, we have evaluated several quantitative and comparative analyses for a pinhole array imaging system providing high detection efficiency and better system sensitivity over a large FOV, comparing to the conventional four head SPECT system. The proposed detector module is expected to provide improved performance in various SPECT imaging.

  1. Study of solid rocket motors for a space shuttle booster. Volume 1: Executive summary

    NASA Technical Reports Server (NTRS)

    Vonderesch, A. H.

    1972-01-01

    The factors affecting the choice of the 156 inch diameter, parallel burn, solid propellant rocket engine for use with the space shuttle booster are presented. Primary considerations leading to the selection are: (1) low booster vehicle cost, (2) the largest proven transportable system, (3) a demonstrated design, (4) recovery/reuse is feasible, (5) abort can be easily accomplished, and (6) ecological effects are minor.

  2. Mass action at the single-molecule level.

    PubMed

    Shon, Min Ju; Cohen, Adam E

    2012-09-05

    We developed a system to reversibly encapsulate small numbers of molecules in an array of nanofabricated "dimples". This system enables highly parallel, long-term, and attachment-free studies of molecular dynamics via single-molecule fluorescence. In studies of bimolecular reactions of small numbers of confined molecules, we see phenomena that, while expected from basic statistical mechanics, are not observed in bulk chemistry. Statistical fluctuations in the occupancy of sealed reaction chambers lead to steady-state fluctuations in reaction equilibria and rates. These phenomena are likely to be important whenever reactions happen in confined geometries.

  3. Cooperative storage of shared files in a parallel computing system with dynamic block size

    DOEpatents

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2015-11-10

    Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).

  4. Parallel/Vector Integration Methods for Dynamical Astronomy

    NASA Astrophysics Data System (ADS)

    Fukushima, Toshio

    1999-01-01

    This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).

  5. Impact of equalizing currents on losses and torque ripples in electrical machines with fractional slot concentrated windings

    NASA Astrophysics Data System (ADS)

    Toporkov, D. M.; Vialcev, G. B.

    2017-10-01

    The implementation of parallel branches is a commonly used manufacturing method of the realizing of fractional slot concentrated windings in electrical machines. If the rotor eccentricity is enabled in a machine with parallel branches, the equalizing currents can arise. The simulation approach of the equalizing currents in parallel branches of an electrical machine winding based on magnetic field calculation by using Finite Elements Method is discussed in the paper. The high accuracy of the model is provided by the dynamic improvement of the inductances in the differential equation system describing a machine. The pre-computed table flux linkage functions are used for that. The functions are the dependences of the flux linkage of parallel branches on the branches currents and rotor position angle. The functions permit to calculate self-inductances and mutual inductances by partial derivative. The calculated results obtained for the electric machine specimen are presented. The results received show that the adverse combination of design solutions and the rotor eccentricity leads to a high value of the equalizing currents and windings heating. Additional torque ripples also arise. The additional ripples harmonic content is not similar to the cogging torque or ripples caused by the rotor eccentricity.

  6. A learnable parallel processing architecture towards unity of memory and computing

    NASA Astrophysics Data System (ADS)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  7. A learnable parallel processing architecture towards unity of memory and computing.

    PubMed

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  8. Observations of large parallel electric fields in the auroral ionosphere

    NASA Technical Reports Server (NTRS)

    Mozer, F. S.

    1976-01-01

    Rocket borne measurements employing a double probe technique were used to gather evidence for the existence of electric fields in the auroral ionosphere having components parallel to the magnetic field direction. An analysis of possible experimental errors leads to the conclusion that no known uncertainties can account for the roughly 10 mV/m parallel electric fields that are observed.

  9. Automatic Management of Parallel and Distributed System Resources

    NASA Technical Reports Server (NTRS)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  10. Reconstruction of the 1997/1998 El Nino from TOPEX/POSEIDON and TOGA/TAO Data Using a Massively Parallel Pacific-Ocean Model and Ensemble Kalman Filter

    NASA Technical Reports Server (NTRS)

    Keppenne, C. L.; Rienecker, M.; Borovikov, A. Y.

    1999-01-01

    Two massively parallel data assimilation systems in which the model forecast-error covariances are estimated from the distribution of an ensemble of model integrations are applied to the assimilation of 97-98 TOPEX/POSEIDON altimetry and TOGA/TAO temperature data into a Pacific basin version the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. in the first system, ensemble of model runs forced by an ensemble of atmospheric model simulations is used to calculate asymptotic error statistics. The data assimilation then occurs in the reduced phase space spanned by the corresponding leading empirical orthogonal functions. The second system is an ensemble Kalman filter in which new error statistics are computed during each assimilation cycle from the time-dependent ensemble distribution. The data assimilation experiments are conducted on NSIPP's 512-processor CRAY T3E. The two data assimilation systems are validated by withholding part of the data and quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The pros and cons of each system are discussed.

  11. Advanced techniques in reliability model representation and solution

    NASA Technical Reports Server (NTRS)

    Palumbo, Daniel L.; Nicol, David M.

    1992-01-01

    The current tendency of flight control system designs is towards increased integration of applications and increased distribution of computational elements. The reliability analysis of such systems is difficult because subsystem interactions are increasingly interdependent. Researchers at NASA Langley Research Center have been working for several years to extend the capability of Markov modeling techniques to address these problems. This effort has been focused in the areas of increased model abstraction and increased computational capability. The reliability model generator (RMG) is a software tool that uses as input a graphical object-oriented block diagram of the system. RMG uses a failure-effects algorithm to produce the reliability model from the graphical description. The ASSURE software tool is a parallel processing program that uses the semi-Markov unreliability range evaluator (SURE) solution technique and the abstract semi-Markov specification interface to the SURE tool (ASSIST) modeling language. A failure modes-effects simulation is used by ASSURE. These tools were used to analyze a significant portion of a complex flight control system. The successful combination of the power of graphical representation, automated model generation, and parallel computation leads to the conclusion that distributed fault-tolerant system architectures can now be analyzed.

  12. Development of an Integrated Data Acquisition System for a Small Flight Probe

    NASA Technical Reports Server (NTRS)

    Swanson, Gregory T.; Empey, Daniel M.; Skokova, Kristina A.; Venkatapathy, Ethiraj

    2012-01-01

    In support of the SPRITE concept, an integrated data acquisition system has been developed and fabricated for preliminary testing. The data acquisition system has been designed to condition traditional thermal protection system sensors, store their data to an on-board memory card, and in parallel, telemeter to an external system. In the fall of 2010, this system was integrated into a 14 in. diameter, 45 degree sphere cone probe instrumented with thermal protection system sensors. This system was then tested at the NASA Ames Research Center Aerodynamic Heating Facility's arc jet at approximately 170 W/sq. cm. The first test in December 2010 highlighted hardware design issues that were redesigned and implemented leading to a successful test in February 2011.

  13. New insights into innate immune control of systemic candidiasis

    PubMed Central

    Lionakis, Michail S.

    2014-01-01

    Systemic infection caused by Candida species is the fourth leading cause of nosocomial bloodstream infection in modern hospitals and carries high morbidity and mortality despite antifungal therapy. A recent surge of immunological studies in the mouse models of systemic candidiasis and the parallel discovery and phenotypic characterization of inherited genetic disorders in antifungal immune factors that are associated with enhanced susceptibility or resistance to the infection have provided new insights into the cellular and molecular basis of protective innate immune responses against Candida. In this review, the new developments in our understanding of how the mammalian immune system responds to systemic Candida challenge are synthesized and important future research directions are highlighted. PMID:25023483

  14. Ion acceleration and heating by kinetic Alfvén waves associated with magnetic reconnection

    NASA Astrophysics Data System (ADS)

    Liang, Ji; Lin, Yu; Johnson, Jay R.; Wang, Zheng-Xiong; Wang, Xueyi

    2017-10-01

    Our previous study on the generation and signatures of kinetic Alfvén waves (KAWs) associated with magnetic reconnection in a current sheet revealed that KAWs are a common feature during reconnection [Liang et al. J. Geophys. Res.: Space Phys. 121, 6526 (2016)]. In this paper, ion acceleration and heating by the KAWs generated during magnetic reconnection are investigated with a three-dimensional (3-D) hybrid model. It is found that in the outflow region, a fraction of inflow ions are accelerated by the KAWs generated in the leading bulge region of reconnection, and their parallel velocities gradually increase up to slightly super-Alfvénic. As a result of wave-particle interactions, an accelerated ion beam forms in the direction of the anti-parallel magnetic field, in addition to the core ion population, leading to the development of non-Maxwellian velocity distributions, which include a trapped population with parallel velocities consistent with the wave speed. The ions are heated in both parallel and perpendicular directions. In the parallel direction, the heating results from nonlinear Landau resonance of trapped ions. In the perpendicular direction, however, evidence of stochastic heating by the KAWs is found during the acceleration stage, with an increase of magnetic moment μ. The coherence in the perpendicular ion temperature T⊥ and the perpendicular electric and magnetic fields of KAWs also provides evidence for perpendicular heating by KAWs. The parallel and perpendicular heating of the accelerated beam occur simultaneously, leading to the development of temperature anisotropy with T⊥>T∥ . The heating rate agrees with the damping rate of the KAWs, and the heating is dominated by the accelerated ion beam. In the later stage, with the increase of the fraction of the accelerated ions, interaction between the accelerated beam and the core population also contributes to the ion heating, ultimately leading to overlap of the beams and an overall anisotropy with T∥>T⊥ .

  15. Design of object-oriented distributed simulation classes

    NASA Technical Reports Server (NTRS)

    Schoeffler, James D. (Principal Investigator)

    1995-01-01

    Distributed simulation of aircraft engines as part of a computer aided design package is being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for 'Numerical Propulsion Simulation System'. NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT 'Actor' model of a concurrent object and uses 'connectors' to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.

  16. Design of Object-Oriented Distributed Simulation Classes

    NASA Technical Reports Server (NTRS)

    Schoeffler, James D.

    1995-01-01

    Distributed simulation of aircraft engines as part of a computer aided design package being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for "Numerical Propulsion Simulation System". NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT "Actor" model of a concurrent object and uses "connectors" to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.

  17. Generation of Alfvenic Double Layers, Formation of Auroral Arcs, and Their Impact on Energy and Momentum Transfer in M-I Coupling System

    NASA Astrophysics Data System (ADS)

    Song, Y.; Lysak, R. L.

    2017-12-01

    Parallel electrostatic electric fields provide a powerful mechanism to accelerate auroral particles to high energy in the auroral acceleration region (AAR), creating both quasi-static and Alfvenic discrete aurorae. The total field-aligned current can be written as J||total=J||+J||D, where the displacement current is denoted as J||D=(1/4π)(∂E||/∂t), which describes the E||-generation (Song and Lysak, 2006). The generation of the total field-aligned current is related to spatial gradients of the parallel vorticity caused by the axial torque acting on field-aligned flux tubes in M-I coupling system. It should be noticed that parallel electric fields are not produced by the field-aligned current. In fact, the E||-generation is caused by Alfvenic interaction in the M-I coupling system, and is favored by a low plasma density and the enhanced localized azimuthal magnetic flux. We suggest that the nonlinear interaction of incident and reflected Alfven wave packets in the AAR can create reactive stress concentration, and therefore can generate the parallel electrostatic electric fields together with a seed low density cavity. The generated electric fields will quickly deepen the seed low density cavity, which can effectively create even stronger electrostatic electric fields. The electrostatic electric fields nested in a low density cavity and surrounded by enhanced azimuthal magnetic flux constitute Alfvenic electromagnetic plasma structures, such as Alfvenic Double Layers (DLs). The Poynting flux carried by Alfven waves can continuously supply energy from the generator region to the auroral acceleration region, supporting and sustaining Alfvenic DLs with long-lasting electrostatic electric fields which accelerate auroral particles to high energy. The generation of parallel electric fields and the formation of auroral arcs can redistribute perpendicular mechanical and magnetic stresses in auroral flux tubes, decoupling the magnetosphere from ionosphere drag locally. This may enhance the magnetotail earthward shear flows and rapidly buildup stronger parallel electric fields in the auroral acceleration region, leading to a sudden and violent tail energy release, if there is accumulated free magnetic energy in the tail.

  18. File concepts for parallel I/O

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1989-01-01

    The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.

  19. Digital hydraulic drive for microfluidics and miniaturized cell culture devices based on shape memory alloy actuators

    NASA Astrophysics Data System (ADS)

    Tsai, Cheng-Han; Wu, Xuanye; Kuan, Da-Han; Zimmermann, Stefan; Zengerle, Roland; Koltay, Peter

    2018-08-01

    In order to culture and analyze individual living cells, microfluidic cultivation and manipulation of cells become an increasingly important topic. Such microfluidic systems allow for exploring the phenotypic differences between thousands of genetically identical cells or pharmacological tests in parallel, which is impossible to achieve by traditional macroscopic cell culture methods. Therefore, plenty of microfluidic systems and devices have been developed for cell biological studies like cell culture, cell sorting, and cell lysis in the past. However, these microfluidic systems are still limited by the external pressure sources which most of the time are large in size and have to be connected by fluidic tubing leading to complex and delicate systems. In order to provide a miniaturized, more robust actuation system a novel, compact and low power consumption digital hydraulic drive (DHD) has been developed that is intended for use in portable and automated microfluidic systems for various applications. The DHD considered in this work consists of a shape memory alloy (SMA) actuator and a pneumatic cylinder. The switching time of the digital modes (pressure ON versus OFF) can be adjusted from 1 s to min. Thus, the DHDs might have many applications for driving microfluidic devices. In this work, different implementations of DHDs are presented and their performance is characterized by experiments. In particular, it will be shown that DHDs can be used for microfluidic large-scale integration (mLSI) valve control (256 valves in parallel) as well as potentially for droplet-based microfluidic systems. As further application example, high-throughput mixing of cell cultures (96 wells in parallel) is demonstrated employing the DHD to drive a so-called ‘functional lid’ (FL), to enable a miniaturized micro bioreactor in a regular 96-well micro well plate.

  20. Molecular pathways to parallel evolution: I. Gene nexuses and their morphological correlates.

    PubMed

    Zuckerkandl, E

    1994-12-01

    Aspects of the regulatory interactions among genes are probably as old as most genes are themselves. Correspondingly, similar predispositions to changes in such interactions must have existed for long evolutionary periods. Features of the structure and the evolution of the system of gene regulation furnish the background necessary for a molecular understanding of parallel evolution. Patently "unrelated" organs, such as the fat body of a fly and the liver of a mammal, can exhibit fractional homology, a fraction expected to become subject to quantitation. This also seems to hold for different organs in the same organism, such as wings and legs of a fly. In informational macromolecules, on the other hand, homology is indeed all or none. In the quite different case of organs, analogy is expected usually to represent attenuated homology. Many instances of putative convergence are likely to turn out to be predominantly parallel evolution, presumably including the case of the vertebrate and cephalopod eyes. Homology in morphological features reflects a similarity in networks of active genes. Similar nexuses of active genes can be established in cells of different embryological origins. Thus, parallel development can be considered a counterpart to parallel evolution. Specific macromolecular interactions leading to the regulation of the c-fos gene are given as an example of a "controller node" defined as a regulatory unit. Quantitative changes in gene control are distinguished from relational changes, and frequent parallelism in quantitative changes is noted in Drosophila enzymes. Evolutionary reversions in quantitative gene expression are also expected. The evolution of relational patterns is attributed to several distinct mechanisms, notably the shuffling of protein domains. The growth of such patterns may in part be brought about by a particular process of compensation for "controller gene diseases," a process that would spontaneously tend to lead to increased regulatory and organismal complexity. Despite the inferred increase in gene interaction complexity, whose course over evolutionary time is unknown, the number of homology groups for the functional and structural protein units designated as domains has probably remained rather constant, even as, in some of its branches, evolution moved toward "higher" organisms. In connection with this process, the question is raised of parallel evolution within the purview of activating and repressing master switches and in regard to the number of levels into which the hierarchies of genic master switches will eventually be resolved.

  1. Japanese project aims at supercomputer that executes 10 gflops

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burskey, D.

    1984-05-03

    Dubbed supercom by its multicompany design team, the decade-long project's goal is an engineering supercomputer that can execute 10 billion floating-point operations/s-about 20 times faster than today's supercomputers. The project, guided by Japan's Ministry of International Trade and Industry (MITI) and the Agency of Industrial Science and Technology encompasses three parallel research programs, all aimed at some angle of the superconductor. One program should lead to superfast logic and memory circuits, another to a system architecture that will afford the best performance, and the last to the software that will ultimately control the computer. The work on logic and memorymore » chips is based on: GAAS circuit; Josephson junction devices; and high electron mobility transistor structures. The architecture will involve parallel processing.« less

  2. Electron Cooling and Isotropization during Magnetotail Current Sheet Thinning: Implications for Parallel Electric Fields

    NASA Astrophysics Data System (ADS)

    Lu, San; Artemyev, A. V.; Angelopoulos, V.

    2017-11-01

    Magnetotail current sheet thinning is a distinctive feature of substorm growth phase, during which magnetic energy is stored in the magnetospheric lobes. Investigation of charged particle dynamics in such thinning current sheets is believed to be important for understanding the substorm energy storage and the current sheet destabilization responsible for substorm expansion phase onset. We use Time History of Events and Macroscale Interactions during Substorms (THEMIS) B and C observations in 2008 and 2009 at 18 - 25 RE to show that during magnetotail current sheet thinning, the electron temperature decreases (cooling), and the parallel temperature decreases faster than the perpendicular temperature, leading to a decrease of the initially strong electron temperature anisotropy (isotropization). This isotropization cannot be explained by pure adiabatic cooling or by pitch angle scattering. We use test particle simulations to explore the mechanism responsible for the cooling and isotropization. We find that during the thinning, a fast decrease of a parallel electric field (directed toward the Earth) can speed up the electron parallel cooling, causing it to exceed the rate of perpendicular cooling, and thus lead to isotropization, consistent with observation. If the parallel electric field is too small or does not change fast enough, the electron parallel cooling is slower than the perpendicular cooling, so the parallel electron anisotropy grows, contrary to observation. The same isotropization can also be accomplished by an increasing parallel electric field directed toward the equatorial plane. Our study reveals the existence of a large-scale parallel electric field, which plays an important role in magnetotail particle dynamics during the current sheet thinning process.

  3. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  4. A Lightweight, High-performance I/O Management Package for Data-intensive Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jun Wang

    2007-07-17

    File storage systems are playing an increasingly important role in high-performance computing as the performance gap between CPU and disk increases. It could take a long time to develop an entire system from scratch. Solutions will have to be built as extensions to existing systems. If new portable, customized software components are plugged into these systems, better sustained high I/O performance and higher scalability will be achieved, and the development cycle of next-generation of parallel file systems will be shortened. The overall research objective of this ECPI development plan aims to develop a lightweight, customized, high-performance I/O management package namedmore » LightI/O to extend and leverage current parallel file systems used by DOE. During this period, We have developed a novel component in LightI/O and prototype them into PVFS2, and evaluate the resultant prototype—extended PVFS2 system on data-intensive applications. The preliminary results indicate the extended PVFS2 delivers better performance and reliability to users. A strong collaborative effort between the PI at the University of Nebraska Lincoln and the DOE collaborators—Drs Rob Ross and Rajeev Thakur at Argonne National Laboratory who are leading the PVFS2 group makes the project more promising.« less

  5. Learning Contrast-Invariant Cancellation of Redundant Signals in Neural Systems

    PubMed Central

    Bol, Kieran; Maler, Leonard; Longtin, André

    2013-01-01

    Cancellation of redundant information is a highly desirable feature of sensory systems, since it would potentially lead to a more efficient detection of novel information. However, biologically plausible mechanisms responsible for such selective cancellation, and especially those robust to realistic variations in the intensity of the redundant signals, are mostly unknown. In this work, we study, via in vivo experimental recordings and computational models, the behavior of a cerebellar-like circuit in the weakly electric fish which is known to perform cancellation of redundant stimuli. We experimentally observe contrast invariance in the cancellation of spatially and temporally redundant stimuli in such a system. Our model, which incorporates heterogeneously-delayed feedback, bursting dynamics and burst-induced STDP, is in agreement with our in vivo observations. In addition, the model gives insight on the activity of granule cells and parallel fibers involved in the feedback pathway, and provides a strong prediction on the parallel fiber potentiation time scale. Finally, our model predicts the existence of an optimal learning contrast around 15% contrast levels, which are commonly experienced by interacting fish. PMID:24068898

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry

    Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less

  7. An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

    DOE PAGES

    Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry; ...

    2016-10-27

    Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less

  8. Quantum statistics for a two-mode magnon system with microwave pumping: application to coupled ferromagnetic nanowires.

    PubMed

    Haghshenasfard, Zahra; Cottam, M G

    2017-05-17

    A microscopic (Hamiltonian-based) method for the quantum statistics of bosonic excitations in a two-mode magnon system is developed. Both the exchange and the dipole-dipole interactions, as well as the Zeeman term for an external applied field, are included in the spin Hamiltonian, and the model also contains the nonlinear effects due to parallel pumping and four-magnon interactions. The quantization of spin operators is achieved through the Holstein-Primakoff formalism, and then a coherent magnon state representation is used to study the occupation magnon number and the quantum statistical behaviour of the system. Particular attention is given to the cross correlation between the two coupled magnon modes in a ferromagnetic nanowire geometry formed by two lines of spins. Manipulation of the collapse-and-revival phenomena for the temporal evolution of the magnon number as well as the control of the cross correlation between the two magnon modes is demonstrated by tuning the parallel pumping field amplitude. The role of the four-magnon interactions is particularly interesting and leads to anti-correlation in some cases with coherent states.

  9. Application of lean manufacturing concepts to drug discovery: rapid analogue library synthesis.

    PubMed

    Weller, Harold N; Nirschl, David S; Petrillo, Edward W; Poss, Michael A; Andres, Charles J; Cavallaro, Cullen L; Echols, Martin M; Grant-Young, Katherine A; Houston, John G; Miller, Arthur V; Swann, R Thomas

    2006-01-01

    The application of parallel synthesis to lead optimization programs in drug discovery has been an ongoing challenge since the first reports of library synthesis. A number of approaches to the application of parallel array synthesis to lead optimization have been attempted over the years, ranging from widespread deployment by (and support of) individual medicinal chemists to centralization as a service by an expert core team. This manuscript describes our experience with the latter approach, which was undertaken as part of a larger initiative to optimize drug discovery. In particular, we highlight how concepts taken from the manufacturing sector can be applied to drug discovery and parallel synthesis to improve the timeliness and thus the impact of arrays on drug discovery.

  10. An Old Story in the Parallel Synthesis World: An Approach to Hydantoin Libraries.

    PubMed

    Bogolubsky, Andrey V; Moroz, Yurii S; Savych, Olena; Pipko, Sergey; Konovets, Angelika; Platonov, Maxim O; Vasylchenko, Oleksandr V; Hurmach, Vasyl V; Grygorenko, Oleksandr O

    2018-01-08

    An approach to the parallel synthesis of hydantoin libraries by reaction of in situ generated 2,2,2-trifluoroethylcarbamates and α-amino esters was developed. To demonstrate utility of the method, a library of 1158 hydantoins designed according to the lead-likeness criteria (MW 200-350, cLogP 1-3) was prepared. The success rate of the method was analyzed as a function of physicochemical parameters of the products, and it was found that the method can be considered as a tool for lead-oriented synthesis. A hydantoin-bearing submicromolar primary hit acting as an Aurora kinase A inhibitor was discovered with a combination of rational design, parallel synthesis using the procedures developed, in silico and in vitro screenings.

  11. Micromagnetic simulations of anisotropies in coupled and uncoupled ferromagnetic nanowire systems.

    PubMed

    Blachowicz, T; Ehrmann, A

    2013-01-01

    The influence of a variation of spatial relative orientations onto the coupling dynamics and subsequent magnetic anisotropies was modeled in ferromagnetic nanowires. The wires were analyzed in the most elementary configurations, thus, arranged in pairs perpendicular to each other, leading to one-dimensional (linear) and zero-dimensional (point-like) coupling. Different distances within each elementary pair of wires and between the pairs give rise to varying interactions between parallel and perpendicular wires, respectively. Simulated coercivities show an exchange of easy and hard axes for systems with different couplings. Additionally, two of the systems exhibit a unique switching behavior which can be utilized for developing new functionalities.

  12. Lesson from Tungsten Leading Edge Heat Load Analysis in KSTAR Divertor

    NASA Astrophysics Data System (ADS)

    Hong, Suk-Ho; Pitts, Richard Anthony; Lee, Hyeong-Ho; Bang, Eunnam; Kang, Chan-Soo; Kim, Kyung-Min; Kim, Hong-Tack; ITER Organization Collaboration; Kstar Team Team

    2016-10-01

    An important design issue for the ITER tungsten (W) divertor and in fact for all such components using metallic plasma-facing elements and which are exposed to high parallel power fluxes, is the question of surface shaping to avoid melting of leading edges. We have fabricated a series of tungsten blocks with a variety of leading edge heights (0.3, 0.6, 1.0, and 2.0 mm), from the ITER worst case to heights even beyond the extreme value tested on JET. They are mounted into adjacent, inertially cooled graphite tile installed in the central divertor region of KSTAR, within the field of view of an infra-red (IR) thermography system with a spatial resolution to 0.4 mm/pixel. Adjustment of the outer divertor strike point position is used to deposit power on the different blocks in different discharges. The measured power flux density on flat regions of the surrounding graphite tiles is used to obtain the parallel power flux, q|| impinging on the various W blocks. Experiments have been performed in Type I ELMing H-mode with Ip = 600 kA, BT = 2 T, PNBI = 3.5 MW, leading to a hot attached divertor with typical pulse lengths of 10 s. Three dimensional ANSYS simulations using q|| and assuming geometric projection of the heat flux are found to be consistent with the observed edge loading. This research was partially supported by Ministry of Science, ICT, and Future Planning under KSTAR project.

  13. Encoding of social signals in all three electrosensory pathways of Eigenmannia virescens.

    PubMed

    Stöckl, Anna; Sinz, Fabian; Benda, Jan; Grewe, Jan

    2014-11-01

    Extracting complementary features in parallel pathways is a widely used strategy for a robust representation of sensory signals. Weakly electric fish offer the rare opportunity to study complementary encoding of social signals in all of its electrosensory pathways. Electrosensory information is conveyed in three parallel pathways: two receptor types of the tuberous (active) system and one receptor type of the ampullary (passive) system. Modulations of the fish's own electric field are sensed by these receptors and used in navigation, prey detection, and communication. We studied the neuronal representation of electric communication signals (called chirps) in the ampullary and the two tuberous pathways of Eigenmannia virescens. We first characterized different kinds of chirps observed in behavioral experiments. Since Eigenmannia chirps simultaneously drive all three types of receptors, we studied their responses in in vivo electrophysiological recordings. Our results demonstrate that different electroreceptor types encode different aspects of the stimuli and each appears best suited to convey information about a certain chirp type. A decoding analysis of single neurons and small populations shows that this specialization leads to a complementary representation of information in the tuberous and ampullary receptors. This suggests that a potential readout mechanism should combine information provided by the parallel processing streams to improve chirp detectability. Copyright © 2014 the American Physiological Society.

  14. Thermoelectric efficiency enhanced in a quantum dot with polarization leads, spin-flip and external magnetic field

    NASA Astrophysics Data System (ADS)

    Yao, Hui; Niu, Peng-Bin; Zhang, Chao; Xu, Wei-Ping; Li, Zhi-Jian; Nie, Yi-Hang

    2018-03-01

    We theoretically study the thermoelectric transport properties in a quantum dot system with two ferromagnetic leads, the spin-flip scattering and the external magnetic field. The results show that the spin polarization of the leads strongly influences thermoelectric coefficients of the device. For the parallel configuration the peak of figure of merit increases with the increase of polarization strength and non-collinear configuration trends to destroy the improvement of figure of merit induced by lead polarization. While the modulation of the spin-flip scattering on the figure of merit is effective only in the absence of external magnetic field or small magnetic field. In terms of improving the thermoelectric efficiency, the external magnetic field plays a more important role than spin-flip scattering. The thermoelectric efficiency can be significantly enhanced by the magnetic field for a given spin-flip scattering strength.

  15. Molecular-dynamics simulations of self-assembled monolayers (SAM) on parallel computers

    NASA Astrophysics Data System (ADS)

    Vemparala, Satyavani

    The purpose of this dissertation is to investigate the properties of self-assembled monolayers, particularly alkanethiols and Poly (ethylene glycol) terminated alkanethiols. These simulations are based on realistic interatomic potentials and require scalable and portable multiresolution algorithms implemented on parallel computers. Large-scale molecular dynamics simulations of self-assembled alkanethiol monolayer systems have been carried out using an all-atom model involving a million atoms to investigate their structural properties as a function of temperature, lattice spacing and molecular chain-length. Results show that the alkanethiol chains tilt from the surface normal by a collective angle of 25° along next-nearest neighbor direction at 300K. At 350K the system transforms to a disordered phase characterized by small tilt angle, flexible tilt direction, and random distribution of backbone planes. With increasing lattice spacing, a, the tilt angle increases rapidly from a nearly zero value at a = 4.7A to as high as 34° at a = 5.3A at 300K. We also studied the effect of end groups on the tilt structure of SAM films. We characterized the system with respect to temperature, the alkane chain length, lattice spacing, and the length of the end group. We found that the gauche defects were predominant only in the tails, and the gauche defects increased with the temperature and number of EG units. Effect of electric field on the structure of poly (ethylene glycol) (PEG) terminated alkanethiol self assembled monolayer (SAM) on gold has been studied using parallel molecular dynamics method. An applied electric field triggers a conformational transition from all-trans to a mostly gauche conformation. The polarity of the electric field has a significant effect on the surface structure of PEG leading to a profound effect on the hydrophilicity of the surface. The electric field applied anti-parallel to the surface normal causes a reversible transition to an ordered state in which the oxygen atoms are exposed. On the other hand, an electric field applied in a direction parallel to the surface normal introduces considerable disorder in the system and the oxygen atoms are buried inside.

  16. Performance of the Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.

  17. DIAC object recognition system

    NASA Astrophysics Data System (ADS)

    Buurman, Johannes

    1992-03-01

    This paper describes the object recognition system used in an intelligent robot cell. It is used to recognize and estimate pose and orientation of parts as they enter the cell. The parts are mostly metal and consist of polyhedral and cylindrical shapes. The system uses feature-based stereo vision to acquire a wireframe of the observed part. Features are defined as straight lines and ellipses, which lead to a wireframe of straight lines and circular arcs (the latter using a new algorithm). This wireframe is compared to a number of wire frame models obtained from the CAD database. Experimental results show that image processing hardware and parallelization may add considerably to the speed of the system.

  18. Backtracking and Re-execution in the Automatic Debugging of Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Matthews, Gregory; Hood, Robert; Johnson, Stephen; Leggett, Peter; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this work we describe a new approach using relative debugging to find differences in computation between a serial program and a parallel version of th it program. We use a combination of re-execution and backtracking in order to find the first difference in computation that may ultimately lead to an incorrect value that the user has indicated. In our prototype implementation we use static analysis information from a parallelization tool in order to perform the backtracking as well as the mapping required between serial and parallel computations.

  19. Design and realization of photoelectric instrument binocular optical axis parallelism calibration system

    NASA Astrophysics Data System (ADS)

    Ying, Jia-ju; Chen, Yu-dan; Liu, Jie; Wu, Dong-sheng; Lu, Jun

    2016-10-01

    The maladjustment of photoelectric instrument binocular optical axis parallelism will affect the observe effect directly. A binocular optical axis parallelism digital calibration system is designed. On the basis of the principle of optical axis binocular photoelectric instrument calibration, the scheme of system is designed, and the binocular optical axis parallelism digital calibration system is realized, which include four modules: multiband parallel light tube, optical axis translation, image acquisition system and software system. According to the different characteristics of thermal infrared imager and low-light-level night viewer, different algorithms is used to localize the center of the cross reticle. And the binocular optical axis parallelism calibration is realized for calibrating low-light-level night viewer and thermal infrared imager.

  20. Software Design for Real-Time Systems on Parallel Computers: Formal Specifications.

    DTIC Science & Technology

    1996-04-01

    This research investigated the important issues related to the analysis and design of real - time systems targeted to parallel architectures. In...particular, the software specification models for real - time systems on parallel architectures were evaluated. A survey of current formal methods for...uniprocessor real - time systems specifications was conducted to determine their extensibility in specifying real - time systems on parallel architectures. In

  1. Flow Field and Nutrient Dynamics Control Over Formation of Parallel Vegetation Patterns in the Florida Everglades

    NASA Astrophysics Data System (ADS)

    Engel, V.; Cheng, Y.; Stieglitz, M.

    2009-12-01

    Pattern formation in vegetated communities reflects the underlying mechanisms governing resource utilization and distribution across the landscape. An example of a patterned ecosystem is the Florida Everglades, which is characterized by parallel and slightly elevated peat "ridges" separated by deeper water "slough" communities (R&S). Ridges are dominated by sawgrass (Cladium jamaiscence). These patterns are thought to be aligned with and develop in response to the historic surface water flow direction, though the precise mechanisms which lead to their formation are poorly understood. Over the years this R&S habitat has degraded in areas where the natural flow regime, hydroperiod, and water depths have been impacted by human development. Managing and restoring this habitat has been an objective of the U.S. Federal and Florida State governments since the Comprehensive Everglades Restoration Plan (CERP) was authorized in 2000. It is imperative, however, to develop a mechanistic understanding of ridge-slough formation before the potential benefits of hydrologic forecasts associated with CERP can be evaluated. Recently, Cheng et al (see Cheng et al, session NG14) employed a simple 2D advection-diffusion model developed by Rietkerk et al (2004) to describe for the first time, the formation of parallel stripes from hydrologic interactions. To simulate parallel stripes, Cheng et al retained the basic equations of the Rietkerk model but allowed for constant advection of water and nutrient in one direction to simulate slope conditions, with evapotranspiration driven advection of water and nutrient perpendicular to the downhill flow direction. We employ this modeling framework and parameterize the model with Everglades field data to simulate ridge-slough formation. In this model, the relatively higher rates of evapotranspiration on the ridges compared to the sloughs create hydraulic gradients which carry dissolved nutrients from the sloughs to the faster growing ridges. With time, the patches aggregate and spread laterally in the direction of the downhill flow. The characteristic wavelengths and spatial patterning of the ridge-slough habitat found in the historic Everglades is reproduced by the model. Nutrient distributions across the landscape and across the ridge-slough interfaces also match observations. Perturbations to the system are modeled in the form of altered hydraulic gradients and nutrient input functions, similar to actual stressors on the system. Under the altered conditions, a loss of patterning in the habitat is observed, in some cases leading to ridge expansion into the sloughs, and in others leading to a complete loss of vegetation pattern. Simulations indicate that the hydrologic changes required to regenerate coherence in the ridge slough patterns in degraded areas are different from those in which the system originally formed. Plant-nutrient interactions and the overall nutrient status are shown to be a major determinant in how the system will respond to hydrologic changes associated with CERP.

  2. Parallel/distributed direct method for solving linear systems

    NASA Technical Reports Server (NTRS)

    Lin, Avi

    1990-01-01

    A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.

  3. Integration experiences and performance studies of A COTS parallel archive systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Hsing-bung; Scott, Cody; Grider, Bary

    2010-01-01

    Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and lessmore » robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of future archival storage systems.« less

  4. Integration experiments and performance studies of a COTS parallel archive system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Hsing-bung; Scott, Cody; Grider, Gary

    2010-06-16

    Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching andmore » less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address requirements of future archival storage systems.« less

  5. Evolutionary Processes in Multiple Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eggleton, P P; Kisseleva-Eggleton, L

    There are several ways in which triple stars can evolve in somewhat unusual ways. They discuss two situations where Case A Roche-lobe overflow, followed by a merger, can produce anomalous wide binaries such as {gamma} Per; and Kozai cycles in triples with non-parallel orbits, which can produce merged rapidly-rotating stars like AB Dor, and which can also lead to the delayed ejection of one component of a multiple, as may have been observed in T Tau in 1998.

  6. Inspection criteria ensure quality control of parallel gap soldering

    NASA Technical Reports Server (NTRS)

    Burka, J. A.

    1968-01-01

    Investigation of parallel gap soldering of electrical leads resulted in recommendation on material preparation, equipment, process control, and visual inspection criteria to ensure reliable solder joints. The recommendations will minimize problems in heat-dwell time, amount of solder, bridging conductors, and damage of circuitry.

  7. Implementation and performance of parallel Prolog interpreter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, S.; Kale, L.V.; Balkrishna, R.

    1988-01-01

    In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

  8. SU(4) Kondo effect in double quantum dots with ferromagnetic leads

    NASA Astrophysics Data System (ADS)

    Weymann, Ireneusz; Chirla, Razvan; Trocha, Piotr; Moca, Cǎtǎlin Paşcu

    2018-02-01

    We investigate the spin-resolved transport properties, such as the linear conductance and the tunnel magnetoresistance, of a double quantum dot device attached to ferromagnetic leads and look for signatures of the SU (4 ) symmetry in the Kondo regime. We show that the transport behavior greatly depends on the magnetic configuration of the device, and the spin-SU(2) as well as the orbital and spin-SU(4) Kondo effects become generally suppressed when the magnetic configuration of the leads varies from the antiparallel to the parallel one. Furthermore, a finite spin polarization of the leads lifts the spin degeneracy and drives the system from the SU(4) to an orbital-SU(2) Kondo state. We analyze in detail the crossover and show that the Kondo temperature between the two fixed points has a nonmonotonic dependence on the degree of spin polarization of the leads. In terms of methods used, we characterize transport by using a combination of analytical and numerical renormalization group approaches.

  9. Parallelism in integrated fluidic circuits

    NASA Astrophysics Data System (ADS)

    Bousse, Luc J.; Kopf-Sill, Anne R.; Parce, J. W.

    1998-04-01

    Many research groups around the world are working on integrated microfluidics. The goal of these projects is to automate and integrate the handling of liquid samples and reagents for measurement and assay procedures in chemistry and biology. Ultimately, it is hoped that this will lead to a revolution in chemical and biological procedures similar to that caused in electronics by the invention of the integrated circuit. The optimal size scale of channels for liquid flow is determined by basic constraints to be somewhere between 10 and 100 micrometers . In larger channels, mixing by diffusion takes too long; in smaller channels, the number of molecules present is so low it makes detection difficult. At Caliper, we are making fluidic systems in glass chips with channels in this size range, based on electroosmotic flow, and fluorescence detection. One application of this technology is rapid assays for drug screening, such as enzyme assays and binding assays. A further challenge in this area is to perform multiple functions on a chip in parallel, without a large increase in the number of inputs and outputs. A first step in this direction is a fluidic serial-to-parallel converter. Fluidic circuits will be shown with the ability to distribute an incoming serial sample stream to multiple parallel channels.

  10. Impact of the implementation of MPI point-to-point communications on the performance of two general sparse solvers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amestoy, Patrick R.; Duff, Iain S.; L'Excellent, Jean-Yves

    2001-10-10

    We examine the mechanics of the send and receive mechanism of MPI and in particular how we can implement message passing in a robust way so that our performance is not significantly affected by changes to the MPI system. This leads us to using the Isend/Irecv protocol which will entail sometimes significant algorithmic changes. We discuss this within the context of two different algorithms for sparse Gaussian elimination that we have parallelized. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. Both algorithms are difficult to parallelize on distributed memory machines. Our initial strategiesmore » were based on simple MPI point-to-point communication primitives. With such approaches, the parallel performance of both codes are very sensitive to the MPI implementation, the way MPI internal buffers are used in particular. We then modified our codes to use more sophisticated nonblocking versions of MPI communication. This significantly improved the performance robustness (independent of the MPI buffering mechanism) and scalability, but at the cost of increased code complexity.« less

  11. Master-slave interferometry for parallel spectral domain interferometry sensing and versatile 3D optical coherence tomography.

    PubMed

    Podoleanu, Adrian Gh; Bradu, Adrian

    2013-08-12

    Conventional spectral domain interferometry (SDI) methods suffer from the need of data linearization. When applied to optical coherence tomography (OCT), conventional SDI methods are limited in their 3D capability, as they cannot deliver direct en-face cuts. Here we introduce a novel SDI method, which eliminates these disadvantages. We denote this method as Master - Slave Interferometry (MSI), because a signal is acquired by a slave interferometer for an optical path difference (OPD) value determined by a master interferometer. The MSI method radically changes the main building block of an SDI sensor and of a spectral domain OCT set-up. The serially provided signal in conventional technology is replaced by multiple signals, a signal for each OPD point in the object investigated. This opens novel avenues in parallel sensing and in parallelization of signal processing in 3D-OCT, with applications in high- resolution medical imaging and microscopy investigation of biosamples. Eliminating the need of linearization leads to lower cost OCT systems and opens potential avenues in increasing the speed of production of en-face OCT images in comparison with conventional SDI.

  12. Tunneling magnetoresistance tuned by a vertical electric field in an AA-stacked graphene bilayer with double magnetic barriers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Dali, E-mail: wangdali@mail.ahnu.edu.cn; National Laboratory of Solid State Microstructures and Department of Physics, Nanjing University, Nanjing 210093; Jin, Guojun, E-mail: gjin@nju.edu.cn

    2013-12-21

    We investigate the effect of a vertical electric field on the electron tunneling and magnetoresistance in an AA-stacked graphene bilayer modulated by the double magnetic barriers with parallel or antiparallel configuration. The results show that the electronic transmission properties in the system are sensitive to the magnetic-barrier configuration and the bias voltage between the graphene layers. In particular, it is found that for the antiparallel configuration, within the low energy region, the blocking effect is more obvious compared with the case for the parallel configuration, and even there may exist a transmission spectrum gap which can be arbitrarily tuned bymore » the field-induced interlayer bias voltage. We also demonstrate that the significant discrepancy between the conductance for both parallel and antiparallel configurations would result in a giant tunneling magnetoresistance ratio, and further the maximal magnetoresistance ratio can be strongly modified by the interlayer bias voltage. This leads to the possible realization of high-quality magnetic sensors controlled by a vertical electric field in the AA-stacked graphene bilayer.« less

  13. A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes

    NASA Technical Reports Server (NTRS)

    Heber, Gerd; Biswas, Rupak; Gao, Guang R.

    1999-01-01

    Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.

  14. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  15. Fluorescence tracers as a reference for pesticide transport in wetland systems

    NASA Astrophysics Data System (ADS)

    Lange, Jens; Passeport, Elodie; Tournebize, Julien

    2010-05-01

    Two different fluorescent tracers, Uranine (UR) and Sulforhodamine (SRB), were injected as a pulse into surface flow wetlands. Tracer breakthrough curves were used to document hydraulic efficiencies, peak attenuation and retention capacities of completely different wetland systems. The tracers were used as a reference to mimic photolytic decay (UR) and sorption (SRB) of contaminants, since a real herbicide (Isoproturon, IPU) was injected in parallel to UR and SRB. Analysis costs limited IPU sampling frequency and single samples deviated from the tracer breakthrough curves. Still, a parallel behavior of IPU and SRB could be observed in totally different wetland systems, including underground passage through drainage lines. Similar recovery rates for IPU and SRB confirmed this observation. Hence, SRB was found to be an appropriate reference tracer to mimic the behavior of mobile pesticides (low KOC, without degradation) in wetland systems and the obtained wetland characteristics for SRB may serve as an indication for contaminant retention. Owing to the properties of IPU, the obtained results should be treated as worst case scenarios for highly mobile pesticides. A comparison of six different wetland types suggested that non-steady wetland systems with large variation in water level may temporally store relatively large amounts of tracers (contaminants), partly in areas that are not continuously saturated. This may lead to an efficient attenuation of peak concentrations. However, when large parts of these systems are flushed by natural storm events, tracers (contaminants) may be re-mobilized. In steady systems vegetation density and water depth were found to be the most important factors for tracer/contaminant retention. Illustrated by SRB, sorption on sediments and vegetation was a quick, almost instantaneous process which lead to considerable tracer losses even at high flow velocities and short contact times. Shallow systems with dense vegetation appeared to be the most efficient SRB/contaminant traps. For photolytic decay no reference contaminant was studied, but the results found for UR may serve as a valuable proxy for this process.

  16. A Domain Decomposition Parallelization of the Fast Marching Method

    NASA Technical Reports Server (NTRS)

    Herrmann, M.

    2003-01-01

    In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.

  17. An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Follen, Gregory J.

    2003-01-01

    Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT).

  18. Research on the adaptive optical control technology based on DSP

    NASA Astrophysics Data System (ADS)

    Zhang, Xiaolu; Xue, Qiao; Zeng, Fa; Zhao, Junpu; Zheng, Kuixing; Su, Jingqin; Dai, Wanjun

    2018-02-01

    Adaptive optics is a real-time compensation technique using high speed support system for wavefront errors caused by atmospheric turbulence. However, the randomness and instantaneity of atmospheric changing introduce great difficulties to the design of adaptive optical systems. A large number of complex real-time operations lead to large delay, which is an insurmountable problem. To solve this problem, hardware operation and parallel processing strategy are proposed, and a high-speed adaptive optical control system based on DSP is developed. The hardware counter is used to check the system. The results show that the system can complete a closed loop control in 7.1ms, and improve the controlling bandwidth of the adaptive optical system. Using this system, the wavefront measurement and closed loop experiment are carried out, and obtain the good results.

  19. Parallel Algorithms for the Exascale Era

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robey, Robert W.

    New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this workmore » has been done by undergraduates and published in leading scientific journals.« less

  20. Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash.

    PubMed

    Pelletier, Mathew G

    2008-02-08

    One of the main hurdles standing in the way of optimal cleaning of cotton lint isthe lack of sensing systems that can react fast enough to provide the control system withreal-time information as to the level of trash contamination of the cotton lint. This researchexamines the use of programmable graphic processing units (GPU) as an alternative to thePC's traditional use of the central processing unit (CPU). The use of the GPU, as analternative computation platform, allowed for the machine vision system to gain asignificant improvement in processing time. By improving the processing time, thisresearch seeks to address the lack of availability of rapid trash sensing systems and thusalleviate a situation in which the current systems view the cotton lint either well before, orafter, the cotton is cleaned. This extended lag/lead time that is currently imposed on thecotton trash cleaning control systems, is what is responsible for system operators utilizing avery large dead-band safety buffer in order to ensure that the cotton lint is not undercleaned.Unfortunately, the utilization of a large dead-band buffer results in the majority ofthe cotton lint being over-cleaned which in turn causes lint fiber-damage as well assignificant losses of the valuable lint due to the excessive use of cleaning machinery. Thisresearch estimates that upwards of a 30% reduction in lint loss could be gained through theuse of a tightly coupled trash sensor to the cleaning machinery control systems. Thisresearch seeks to improve processing times through the development of a new algorithm forcotton trash sensing that allows for implementation on a highly parallel architecture.Additionally, by moving the new parallel algorithm onto an alternative computing platform,the graphic processing unit "GPU", for processing of the cotton trash images, a speed up ofover 6.5 times, over optimized code running on the PC's central processing unit "CPU", wasgained. The new parallel algorithm operating on the GPU was able to process a 1024x1024image in less than 17ms. At this improved speed, the image processing system's performance should now be sufficient to provide a system that would be capable of realtimefeed-back control that is in tight cooperation with the cleaning equipment.

  1. Iterative algorithms for large sparse linear systems on parallel computers

    NASA Technical Reports Server (NTRS)

    Adams, L. M.

    1982-01-01

    Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.

  2. Deferred discrimination algorithm (nibbling) for target filter management

    NASA Astrophysics Data System (ADS)

    Caulfield, H. John; Johnson, John L.

    1999-07-01

    A new method of classifying objects is presented. Rather than trying to form the classifier in one step or in one training algorithm, it is done in a series of small steps, or nibbles. This leads to an efficient and versatile system that is trained in series with single one-shot examples but applied in parallel, is implemented with single layer perceptrons, yet maintains its fully sequential hierarchical structure. Based on the nibbling algorithm, a basic new method of target reference filter management is described.

  3. Architectures for single-chip image computing

    NASA Astrophysics Data System (ADS)

    Gove, Robert J.

    1992-04-01

    This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.

  4. Introduction to Numerical Methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schoonover, Joseph A.

    2016-06-14

    These are slides for a lecture for the Parallel Computing Summer Research Internship at the National Security Education Center. This gives an introduction to numerical methods. Repetitive algorithms are used to obtain approximate solutions to mathematical problems, using sorting, searching, root finding, optimization, interpolation, extrapolation, least squares regresion, Eigenvalue problems, ordinary differential equations, and partial differential equations. Many equations are shown. Discretizations allow us to approximate solutions to mathematical models of physical systems using a repetitive algorithm and introduce errors that can lead to numerical instabilities if we are not careful.

  5. Introducing parallelism to histogramming functions for GEM systems

    NASA Astrophysics Data System (ADS)

    Krawczyk, Rafał D.; Czarski, Tomasz; Kolasinski, Piotr; Pozniak, Krzysztof T.; Linczuk, Maciej; Byszuk, Adrian; Chernyshova, Maryna; Juszczyk, Bartlomiej; Kasprowicz, Grzegorz; Wojenski, Andrzej; Zabolotny, Wojciech

    2015-09-01

    This article is an assessment of potential parallelization of histogramming algorithms in GEM detector system. Histogramming and preprocessing algorithms in MATLAB were analyzed with regard to adding parallelism. Preliminary implementation of parallel strip histogramming resulted in speedup. Analysis of algorithms parallelizability is presented. Overview of potential hardware and software support to implement parallel algorithm is discussed.

  6. Computer-assisted enzyme immunoassays and simplified immunofluorescence assays: applications for the diagnostic laboratory and the veterinarian's office.

    PubMed

    Jacobson, R H; Downing, D R; Lynch, T J

    1982-11-15

    A computer-assisted enzyme-linked immunosorbent assay (ELISA) system, based on kinetics of the reaction between substrate and enzyme molecules, was developed for testing large numbers of sera in laboratory applications. Systematic and random errors associated with conventional ELISA technique were identified leading to results formulated on a statistically validated, objective, and standardized basis. In a parallel development, an inexpensive system for field and veterinary office applications contained many of the qualities of the computer-assisted ELISA. This system uses a fluorogenic indicator (rather than the enzyme-substrate interaction) in a rapid test (15 to 20 minutes' duration) which promises broad application in serodiagnosis.

  7. Control and protection system for paralleled modular static inverter-converter systems

    NASA Technical Reports Server (NTRS)

    Birchenough, A. G.; Gourash, F.

    1973-01-01

    A control and protection system was developed for use with a paralleled 2.5-kWe-per-module static inverter-converter system. The control and protection system senses internal and external fault parameters such as voltage, frequency, current, and paralleling current unbalance. A logic system controls contactors to isolate defective power conditioners or loads. The system sequences contactor operation to automatically control parallel operation, startup, and fault isolation. Transient overload protection and fault checking sequences are included. The operation and performance of a control and protection system, with detailed circuit descriptions, are presented.

  8. A Monte Carlo study on the performance evaluation of a parallel hole collimator for a HiReSPECT: A dedicated small-animal SPECT.

    PubMed

    Abbaspour, Samira; Tanha, Kaveh; Mahmoudian, Babak; Assadi, Majid; Pirayesh Islamian, Jalil

    2018-04-22

    Collimator geometry has an important contribution on the image quality in SPECT imaging. The purpose of this study was to investigate the effect of parallel hole collimator hole-size on the functional parameters (including the spatial resolution and sensitivity) and the image quality of a HiReSPECT imaging system using SIMIND Monte Carlo program. To find a proper trade-off between the sensitivity and spatial resolution, the collimator with hole diameter ranges of 0.3-1.5 mm (in steps of 0.3 mm) were used with a fixed septal and hole thickness values (0.2 mm and 34 mm, respectively). Lead, Gold, and Tungsten as the LEHR collimator material were also investigated. The results on a 99m Tc point source scanning with the experimental and also simulated systems were matched to validate the simulated imaging system. The results on the simulation showed that decreasing the collimator hole size, especially in the Gold collimator, improved the spatial resolution to 18% and 3.2% compared to the Lead and the Tungsten, respectively. Meanwhile, the Lead collimator provided a good sensitivity in about of 7% and 8% better than that of Tungsten and Gold, respectively. Overall, the spatial resolution and sensitivity showed small differences among the three types of collimator materials assayed within the defined energy. By increasing the hole size, the Gold collimator produced lower scatter and penetration fractions than Tungsten and Lead collimator. The minimum detectable size of hot rods in micro-Jaszczak phantom on the iterative maximum-likelihood expectation maximization (MLEM) reconstructed images, were determined in the sectors of 1.6, 1.8, 2.0, 2.4 and 2.6 mm for scanning with the collimators in hole sizes of 0.3, 0.6, 0.9, 1.2 and 1.5 mm at a 5 cm distance from the phantom. The Gold collimator with hole size of 0.3 mm provided a better image quality with the HiReSPECT imaging. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Follen, Gregory J.; Lytle, John K. (Technical Monitor)

    2002-01-01

    Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT). This paper discusses the salient features of the NPSS Architecture including its interface layer, object layer, implementation for accessing legacy codes, numerical zooming infrastructure and its computing layer. The computing layer focuses on the use and deployment of these propulsion simulations on parallel and distributed computing platforms which has been the focus of NASA Ames. Additional features of the object oriented architecture that support MultiDisciplinary (MD) Coupling, computer aided design (CAD) access and MD coupling objects will be discussed. Included will be a discussion of the successes, challenges and benefits of implementing this architecture.

  10. Early stages of transition in viscosity-stratified channel flow

    NASA Astrophysics Data System (ADS)

    Govindarajan, Rama; Jose, Sharath; Brandt, Luca

    2013-11-01

    In parallel shear flows, it is well known that transition to turbulence usually occurs through a subcritical process. In this work we consider a flow through a channel across which there is a linear temperature variation. The temperature gradient leads to a viscosity variation across the channel. A large body of work has been done in the linear regime for this problem, and it has been seen that viscosity stratification can lead to considerable changes in stability and transient growth characteristics. Moreover contradictory effects of introducing a non uniform viscosity in the system have been reported. We conduct a linear stability analysis and direct numerical simulations (DNS) for this system. We show that the optimal initial structures in the viscosity-stratified case, unlike in unstratified flow, do not span the width of the channel, but are focussed near one wall. The nonlinear consequences of the localisation of the structures will be discussed.

  11. A robust H∞-tracking design for uncertain Takagi-Sugeno fuzzy systems with unknown premise variables using descriptor redundancy approach

    NASA Astrophysics Data System (ADS)

    Hassan Asemani, Mohammad; Johari Majd, Vahid

    2015-12-01

    This paper addresses a robust H∞ fuzzy observer-based tracking design problem for uncertain Takagi-Sugeno fuzzy systems with external disturbances. To have a practical observer-based controller, the premise variables of the system are assumed to be not measurable in general, which leads to a more complex design process. The tracker is synthesised based on a fuzzy Lyapunov function approach and non-parallel distributed compensation (non-PDC) scheme. Using the descriptor redundancy approach, the robust stability conditions are derived in the form of strict linear matrix inequalities (LMIs) even in the presence of uncertainties in the system, input, and output matrices simultaneously. Numerical simulations are provided to show the effectiveness of the proposed method.

  12. SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

    NASA Technical Reports Server (NTRS)

    Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

    1990-01-01

    Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.

  13. Design and realization of test system for testing parallelism and jumpiness of optical axis of photoelectric equipment

    NASA Astrophysics Data System (ADS)

    Shi, Sheng-bing; Chen, Zhen-xing; Qin, Shao-gang; Song, Chun-yan; Jiang, Yun-hong

    2014-09-01

    With the development of science and technology, photoelectric equipment comprises visible system, infrared system, laser system and so on, integration, information and complication are higher than past. Parallelism and jumpiness of optical axis are important performance of photoelectric equipment,directly affect aim, ranging, orientation and so on. Jumpiness of optical axis directly affect hit precision of accurate point damage weapon, but we lack the facility which is used for testing this performance. In this paper, test system which is used fo testing parallelism and jumpiness of optical axis is devised, accurate aim isn't necessary and data processing are digital in the course of testing parallelism, it can finish directly testing parallelism of multi-axes, aim axis and laser emission axis, parallelism of laser emission axis and laser receiving axis and first acuualizes jumpiness of optical axis of optical sighting device, it's a universal test system.

  14. The Galley Parallel File System

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  15. JSD: Parallel Job Accounting on the IBM SP2

    NASA Technical Reports Server (NTRS)

    Saphir, William; Jones, James Patton; Walter, Howard (Technical Monitor)

    1995-01-01

    The IBM SP2 is one of the most promising parallel computers for scientific supercomputing - it is fast and usually reliable. One of its biggest problems is a lack of robust and comprehensive system software. Among other things, this software allows a collection of Unix processes to be treated as a single parallel application. It does not, however, provide accounting for parallel jobs other than what is provided by AIX for the individual process components. Without parallel job accounting, it is not possible to monitor system use, measure the effectiveness of system administration strategies, or identify system bottlenecks. To address this problem, we have written jsd, a daemon that collects accounting data for parallel jobs. jsd records information in a format that is easily machine- and human-readable, allowing us to extract the most important accounting information with very little effort. jsd also notifies system administrators in certain cases of system failure.

  16. Photonic content-addressable memory system that uses a parallel-readout optical disk

    NASA Astrophysics Data System (ADS)

    Krishnamoorthy, Ashok V.; Marchand, Philippe J.; Yayla, Gökçe; Esener, Sadik C.

    1995-11-01

    We describe a high-performance associative-memory system that can be implemented by means of an optical disk modified for parallel readout and a custom-designed silicon integrated circuit with parallel optical input. The system can achieve associative recall on 128 \\times 128 bit images and also on variable-size subimages. The system's behavior and performance are evaluated on the basis of experimental results on a motionless-head parallel-readout optical-disk system, logic simulations of the very-large-scale integrated chip, and a software emulation of the overall system.

  17. RAMA: A file system for massively parallel computers

    NASA Technical Reports Server (NTRS)

    Miller, Ethan L.; Katz, Randy H.

    1993-01-01

    This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.

  18. Collectively loading an application in a parallel computer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

    Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

  19. Design and implementation of an automated compound management system in support of lead optimization.

    PubMed

    Quintero, Catherine; Kariv, Ilona

    2009-06-01

    To meet the needs of the increasingly rapid and parallelized lead optimization process, a fully integrated local compound storage and liquid handling system was designed and implemented to automate the generation of assay-ready plates directly from newly submitted and cherry-picked compounds. A key feature of the system is the ability to create project- or assay-specific compound-handling methods, which provide flexibility for any combination of plate types, layouts, and plate bar-codes. Project-specific workflows can be created by linking methods for processing new and cherry-picked compounds and control additions to produce a complete compound set for both biological testing and local storage in one uninterrupted workflow. A flexible cherry-pick approach allows for multiple, user-defined strategies to select the most appropriate replicate of a compound for retesting. Examples of custom selection parameters include available volume, compound batch, and number of freeze/thaw cycles. This adaptable and integrated combination of software and hardware provides a basis for reducing cycle time, fully automating compound processing, and ultimately increasing the rate at which accurate, biologically relevant results can be produced for compounds of interest in the lead optimization process.

  20. Spin effects in transport through triangular quantum dot molecule in different geometrical configurations

    NASA Astrophysics Data System (ADS)

    Wrześniewski, Kacper; Weymann, Ireneusz

    2015-07-01

    We analyze the spin-resolved transport properties of a triangular quantum dot molecule weakly coupled to external ferromagnetic leads. The calculations are performed by using the real-time diagrammatic technique up to the second-order of perturbation theory, which enables a description of both the sequential and cotunneling processes. We study the behavior of the current and differential conductance in the parallel and antiparallel magnetic configurations, as well as the tunnel magnetoresistance (TMR) and the Fano factor in both the linear and nonlinear response regimes. It is shown that the transport characteristics depend greatly on how the system is connected to external leads. Two specific geometrical configurations of the device are considered—the mirror one, which possesses the reflection symmetry with respect to the current flow direction and the fork one, in which this symmetry is broken. In the case of first configuration we show that, depending on the bias and gate voltages, the system exhibits both enhanced TMR and super-Poissonian shot noise. On the other hand, when the system is in the second configuration, we predict a negative TMR and a negative differential conductance in certain transport regimes. The mechanisms leading to those effects are thoroughly discussed.

  1. Effect of an Additional, Parallel Capacitor on Pulsed Inductive Plasma Accelerator Performance

    NASA Technical Reports Server (NTRS)

    Polzin, Kurt A.; Sivak, Amy D.; Balla, Joseph V.

    2011-01-01

    A model of pulsed inductive plasma thrusters consisting of a set of coupled circuit equations and a one-dimensional momentum equation has been used to study the effects of adding a second, parallel capacitor into the system. The equations were nondimensionalized, permitting the recovery of several already-known scaling parameters and leading to the identification of a parameter that is unique to the particular topology studied. The current rise rate through the inductive acceleration coil was used as a proxy measurement of the effectiveness of inductive propellant ionization since higher rise rates produce stronger, potentially better ionizing electric fields at the coil face. Contour plots representing thruster performance (exhaust velocity and efficiency) and current rise rate in the coil were generated numerically as a function of the scaling parameters. The analysis reveals that when the value of the second capacitor is much less than the first capacitor, the performance of the two-capacitor system approaches that of the single-capacitor system. In addition, as the second capacitor is decreased in value the current rise rate can grow to be twice as great as the rise rate attained in the single capacitor case.

  2. Characterization of the seismically imaged Tuscarora fold system and implications for layer parallel shortening in the Pennsylvania salient

    NASA Astrophysics Data System (ADS)

    Mount, Van S.; Wilkins, Scott; Comiskey, Cody S.

    2017-12-01

    The Tuscarora fold system (TFS) is located in the Pennsylvania salient in the foreland of the Valley and Ridge province. The TFS is imaged in high quality 3D seismic data and comprises a system of small-scale folds within relatively flat-lying Lower Silurian Tuscarora Formation strata. We characterize the TFS structures and infer layer parallel shortening (LPS) directions and magnitudes associated with deformation during the Alleghany Orogeny. Previously reported LPS data in our study area are from shallow Devonian and Carboniferous strata (based on outcrop and core analyses) above the shallowest of three major detachments recognized in the region. Seismic data allows us to characterize LPS at depth in strata beneath the shallow detachment. Our LPS data (orientations and inferred magnitudes) are consistent with the shallow data leading us to surmise that LPS during Alleghanian deformation fanned around the salient and was distributed throughout the stratigraphic section - and not isolated to strata above the shallow detachment. We propose that a NW-SE oriented Alleghanian maximum principal stress was perturbed by deep structure associated with the non-linear margin of Laurentia resulting in fanning of shortening directions within the salient.

  3. Acoustic 3D modeling by the method of integral equations

    NASA Astrophysics Data System (ADS)

    Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.

    2018-02-01

    This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.

  4. Architecture of the parallel hierarchical network for fast image recognition

    NASA Astrophysics Data System (ADS)

    Timchenko, Leonid; Wójcik, Waldemar; Kokriatskaia, Natalia; Kutaev, Yuriy; Ivasyuk, Igor; Kotyra, Andrzej; Smailova, Saule

    2016-09-01

    Multistage integration of visual information in the brain allows humans to respond quickly to most significant stimuli while maintaining their ability to recognize small details in the image. Implementation of this principle in technical systems can lead to more efficient processing procedures. The multistage approach to image processing includes main types of cortical multistage convergence. The input images are mapped into a flexible hierarchy that reflects complexity of image data. Procedures of the temporal image decomposition and hierarchy formation are described in mathematical expressions. The multistage system highlights spatial regularities, which are passed through a number of transformational levels to generate a coded representation of the image that encapsulates a structure on different hierarchical levels in the image. At each processing stage a single output result is computed to allow a quick response of the system. The result is presented as an activity pattern, which can be compared with previously computed patterns on the basis of the closest match. With regard to the forecasting method, its idea lies in the following. In the results synchronization block, network-processed data arrive to the database where a sample of most correlated data is drawn using service parameters of the parallel-hierarchical network.

  5. Effects of imbalanced currents on large-format LiFePO4/graphite batteries systems connected in parallel

    NASA Astrophysics Data System (ADS)

    Shi, Wei; Hu, Xiaosong; Jin, Chao; Jiang, Jiuchun; Zhang, Yanru; Yip, Tony

    2016-05-01

    With the development and popularization of electric vehicles, it is urgent and necessary to develop effective management and diagnosis technology for battery systems. In this work, we design a parallel battery model, according to equivalent circuits of parallel voltage and branch current, to study effects of imbalanced currents on parallel large-format LiFePO4/graphite battery systems. Taking a 60 Ah LiFePO4/graphite battery system manufactured by ATL (Amperex Technology Limited, China) as an example, causes of imbalanced currents in the parallel connection are analyzed using our model, and the associated effect mechanisms on long-term stability of each single battery are examined. Theoretical and experimental results show that continuously increasing imbalanced currents during cycling are mainly responsible for the capacity fade of LiFePO4/graphite parallel batteries. It is thus a good way to avoid fast performance fade of parallel battery systems by suppressing variations of branch currents.

  6. Parallel-In-Time For Moving Meshes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Falgout, R. D.; Manteuffel, T. A.; Southworth, B.

    2016-02-04

    With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less

  7. Optimization under uncertainty of parallel nonlinear energy sinks

    NASA Astrophysics Data System (ADS)

    Boroson, Ethan; Missoum, Samy; Mattei, Pierre-Olivier; Vergez, Christophe

    2017-04-01

    Nonlinear Energy Sinks (NESs) are a promising technique for passively reducing the amplitude of vibrations. Through nonlinear stiffness properties, a NES is able to passively and irreversibly absorb energy. Unlike the traditional Tuned Mass Damper (TMD), NESs do not require a specific tuning and absorb energy over a wider range of frequencies. Nevertheless, they are still only efficient over a limited range of excitations. In order to mitigate this limitation and maximize the efficiency range, this work investigates the optimization of multiple NESs configured in parallel. It is well known that the efficiency of a NES is extremely sensitive to small perturbations in loading conditions or design parameters. In fact, the efficiency of a NES has been shown to be nearly discontinuous in the neighborhood of its activation threshold. For this reason, uncertainties must be taken into account in the design optimization of NESs. In addition, the discontinuities require a specific treatment during the optimization process. In this work, the objective of the optimization is to maximize the expected value of the efficiency of NESs in parallel. The optimization algorithm is able to tackle design variables with uncertainty (e.g., nonlinear stiffness coefficients) as well as aleatory variables such as the initial velocity of the main system. The optimal design of several parallel NES configurations for maximum mean efficiency is investigated. Specifically, NES nonlinear stiffness properties, considered random design variables, are optimized for cases with 1, 2, 3, 4, 5, and 10 NESs in parallel. The distributions of efficiency for the optimal parallel configurations are compared to distributions of efficiencies of non-optimized NESs. It is observed that the optimization enables a sharp increase in the mean value of efficiency while reducing the corresponding variance, thus leading to more robust NES designs.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alred, Erik J.; Scheele, Emily G.; Berhanu, Workalemahu M.

    Recent experiments indicate a connection between the structure of amyloid aggregates and their cytotoxicity as related to neurodegenerative diseases. Of particular interest is the Iowa Mutant, which causes early-onset of Alzheimer's disease. While wild-type Amyloid β-peptides form only parallel beta-sheet aggregates, the mutant also forms meta-stable antiparallel beta sheets. Since these structural variations may cause the difference in the pathological effects of the two Aβ-peptides, we have studied in silico the relative stability of the wild type and Iowa mutant in both parallel and antiparallel forms. We compare regular molecular dynamics simulations with such where the viscosity of the samplesmore » is reduced, which, we show, leads to higher sampling efficiency. By analyzing and comparing these four sets of all-atom molecular dynamics simulations, we probe the role of the various factors that could lead to the structural differences. Our analysis indicates that the parallel forms of both wild type and Iowa mutant aggregates are stable, while the antiparallel aggregates are meta-stable for the Iowa mutant and not stable for the wild type. The differences result from the direct alignment of hydrophobic interactions in the in-register parallel oligomers, making them more stable than the antiparallel aggregates. The slightly higher thermodynamic stability of the Iowa mutant fibril-like oligomers in its parallel organization over that in antiparallel form is supported by previous experimental measurements showing slow inter-conversion of antiparallel aggregates into parallel ones. Knowledge of the mechanism that selects between parallel and antiparallel conformations and determines their relative stability may open new avenues for the development of therapies targeting familial forms of early-onset Alzheimer's disease.« less

  9. Default Parallels Plesk Panel Page

    Science.gov Websites

    services that small businesses want and need. Our software includes key building blocks of cloud service virtualized servers Service Provider Products Parallels® Automation Hosting, SaaS, and cloud computing , the leading hosting automation software. You see this page because there is no Web site at this

  10. Robotic Assistance for Ultrasound-Guided Prostate Brachytherapy

    PubMed Central

    Fichtinger, Gabor; Fiene, Jonathan P.; Kennedy, Christopher W.; Kronreif, Gernot; Iordachita, Iulian; Song, Danny Y.; Burdette, Everette C.; Kazanzides, Peter

    2016-01-01

    We present a robotically assisted prostate brachytherapy system and test results in training phantoms and Phase-I clinical trials. The system consists of a transrectal ultrasound (TRUS) and a spatially co-registered robot, fully integrated with an FDA-approved commercial treatment planning system. The salient feature of the system is a small parallel robot affixed to the mounting posts of the template. The robot replaces the template interchangeably, using the same coordinate system. Established clinical hardware, workflow and calibration remain intact. In all phantom experiments, we recorded the first insertion attempt without adjustment. All clinically relevant locations in the prostate were reached. Non-parallel needle trajectories were achieved. The pre-insertion transverse and rotational errors (measured with a Polaris optical tracker relative to the template’s coordinate frame) were 0.25mm (STD=0.17mm) and 0.75° (STD=0.37°). In phantoms, needle tip placement errors measured in TRUS were 1.04mm (STD=0.50mm). A Phase-I clinical feasibility and safety trial has been successfully completed with the system. We encountered needle tip positioning errors of a magnitude greater than 4mm in only 2 out of 179 robotically guided needles, in contrast to manual template guidance where errors of this magnitude are much more common. Further clinical trials are necessary to determine whether the apparent benefits of the robotic assistant will lead to improvements in clinical efficacy and outcomes. PMID:18650122

  11. pcircle - A Suite of Scalable Parallel File System Tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    WANG, FEIYI

    2015-10-01

    Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.

  12. Numerical characteristics of quantum computer simulation

    NASA Astrophysics Data System (ADS)

    Chernyavskiy, A.; Khamitov, K.; Teplov, A.; Voevodin, V.; Voevodin, Vl.

    2016-12-01

    The simulation of quantum circuits is significantly important for the implementation of quantum information technologies. The main difficulty of such modeling is the exponential growth of dimensionality, thus the usage of modern high-performance parallel computations is relevant. As it is well known, arbitrary quantum computation in circuit model can be done by only single- and two-qubit gates, and we analyze the computational structure and properties of the simulation of such gates. We investigate the fact that the unique properties of quantum nature lead to the computational properties of the considered algorithms: the quantum parallelism make the simulation of quantum gates highly parallel, and on the other hand, quantum entanglement leads to the problem of computational locality during simulation. We use the methodology of the AlgoWiki project (algowiki-project.org) to analyze the algorithm. This methodology consists of theoretical (sequential and parallel complexity, macro structure, and visual informational graph) and experimental (locality and memory access, scalability and more specific dynamic characteristics) parts. Experimental part was made by using the petascale Lomonosov supercomputer (Moscow State University, Russia). We show that the simulation of quantum gates is a good base for the research and testing of the development methods for data intense parallel software, and considered methodology of the analysis can be successfully used for the improvement of the algorithms in quantum information science.

  13. Parallel Computation and Visualization of Three-dimensional, Time-dependent, Thermal Convective Flows

    NASA Technical Reports Server (NTRS)

    Wang, P.; Li, P.

    1998-01-01

    A high-resolution numerical study on parallel systems is reported on three-dimensional, time-dependent, thermal convective flows. A parallel implentation on the finite volume method with a multigrid scheme is discussed, and a parallel visualization systemm is developed on distributed systems for visualizing the flow.

  14. Development of Tokamak Transport Solvers for Stiff Confinement Systems

    NASA Astrophysics Data System (ADS)

    St. John, H. E.; Lao, L. L.; Murakami, M.; Park, J. M.

    2006-10-01

    Leading transport models such as GLF23 [1] and MM95 [2] describe turbulent plasma energy, momentum and particle flows. In order to accommodate existing transport codes and associated solution methods effective diffusivities have to be derived from these turbulent flow models. This can cause significant problems in predicting unique solutions. We have developed a parallel transport code solver, GCNMP, that can accommodate both flow based and diffusivity based confinement models by solving the discretized nonlinear equations using modern Newton, trust region, steepest descent and homotopy methods. We present our latest development efforts, including multiple dynamic grids, application of two-level parallel schemes, and operator splitting techniques that allow us to combine flow based and diffusivity based models in tokamk simulations. 6pt [1] R.E. Waltz, et al., Phys. Plasmas 4, 7 (1997). [2] G. Bateman, et al., Phys. Plasmas 5, 1793 (1998).

  15. Improving the Charge Carrier Transport and Suppressing Recombination of Soluble Squaraine-Based Solar Cells via Parallel-Like Structure

    PubMed Central

    Zhu, Youqin; Liu, Jingli; Zhao, Jiao; Li, Yang; Qiao, Bo; Song, Dandan; Huang, Yan; Xu, Zheng; Zhao, Suling; Xu, Xurong

    2018-01-01

    Small molecule organic solar cells (SMOSCs) have attracted extensive attention in recent years. Squaraine (SQ) is a kind of small molecule material for potential use in high-efficiency devices, because of its high extinction coefficient and low-cost synthesis. However, the charge carrier mobility of SQ-based film is much lower than other effective materials, which leads to the pretty low fill factor (FF). In this study, we improve the performance of SQ derivative-based solar cells by incorporating PCDTBT into LQ-51/PC71BM host binary blend film. The incorporation of PCDTBT can not only increase the photon harvesting, but also provide an additional hole transport pathway. Through the charge carrier mobility and transient photovoltage measurement, we find that the hole mobility and charge carrier lifetime increase in the ternary system. Also, we carefully demonstrate that the charge carrier transport follows a parallel-like behavior. PMID:29747394

  16. AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System

    NASA Astrophysics Data System (ADS)

    Wang, R.; Harris, C.; Wicenec, A.

    2016-07-01

    In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.

  17. Refraction of high frequency noise in an arbitrary jet flow

    NASA Technical Reports Server (NTRS)

    Khavaran, Abbas; Krejsa, Eugene A.

    1994-01-01

    Refraction of high frequency noise by mean flow gradients in a jet is studied using the ray-tracing methods of geometrical acoustics. Both the two-dimensional (2D) and three-dimensional (3D) formulations are considered. In the former case, the mean flow is assumed parallel and the governing propagation equations are described by a system of four first order ordinary differential equations. The 3D formulation, on the other hand, accounts for the jet spreading as well as the axial flow development. In this case, a system of six first order differential equations are solved to trace a ray from its source location to an observer in the far field. For subsonic jets with a small spreading angle both methods lead to similar results outside the zone of silence. However, with increasing jet speed the two prediction models diverge to the point where the parallel flow assumption is no longer justified. The Doppler factor of supersonic jets as influenced by the refraction effects is discussed and compared with the conventional modified Doppler factor.

  18. Uncertainty Quantification of Nonlinear Electrokinetic Response in a Microchannel-Membrane Junction

    NASA Astrophysics Data System (ADS)

    Alizadeh, Shima; Iaccarino, Gianluca; Mani, Ali

    2015-11-01

    We have conducted uncertainty quantification (UQ) for electrokinetic transport of ionic species through a hybrid microfluidic system using different probabilistic techniques. The system of interest is an H-configuration consisting of two parallel microchannels that are connected via a nafion junction. This system is commonly used for ion preconcentration and stacking by utilizing a nonlinear response at the channel-nafion junction that leads to deionization shocks. In this work, the nafion medium is modeled as many parallel nano-pores where, the nano-pore diameter, nafion porosity, and surface charge density are independent random variables. We evaluated the resulting uncertainty on the ion concentration fields as well as the deionization shock location. The UQ methods predicted consistent statistics for the outputs and the results revealed that the shock location is weakly sensitive to the nano-pore surface charge and primarily driven by nano-pore diameters. The present study can inform the design of electrokinetic networks with increased robustness to natural manufacturing variability. Applications include water desalination and lab-on-a-chip systems. Shima is a graduate student in the department of Mechanical Engineering at Stanford University. She received her Master's degree from Stanford in 2011. Her research interests include Electrokinetics in porous structures and high performance computing.

  19. The evolution of eyes and visually guided behaviour

    PubMed Central

    Nilsson, Dan-Eric

    2009-01-01

    The morphology and molecular mechanisms of animal photoreceptor cells and eyes reveal a complex pattern of duplications and co-option of genetic modules, leading to a number of different light-sensitive systems that share many components, in which clear-cut homologies are rare. On the basis of molecular and morphological findings, I discuss the functional requirements for vision and how these have constrained the evolution of eyes. The fact that natural selection on eyes acts through the consequences of visually guided behaviour leads to a concept of task-punctuated evolution, where sensory systems evolve by a sequential acquisition of sensory tasks. I identify four key innovations that, one after the other, paved the way for the evolution of efficient eyes. These innovations are (i) efficient photopigments, (ii) directionality through screening pigment, (iii) photoreceptor membrane folding, and (iv) focusing optics. A corresponding evolutionary sequence is suggested, starting at non-directional monitoring of ambient luminance and leading to comparisons of luminances within a scene, first by a scanning mode and later by parallel spatial channels in imaging eyes. PMID:19720648

  20. NAS Requirements Checklist for Job Queuing/Scheduling Software

    NASA Technical Reports Server (NTRS)

    Jones, James Patton

    1996-01-01

    The increasing reliability of parallel systems and clusters of computers has resulted in these systems becoming more attractive for true production workloads. Today, the primary obstacle to production use of clusters of computers is the lack of a functional and robust Job Management System for parallel applications. This document provides a checklist of NAS requirements for job queuing and scheduling in order to make most efficient use of parallel systems and clusters for parallel applications. Future requirements are also identified to assist software vendors with design planning.

  1. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  2. Development of strain gages for use to 1311 K (1900 F)

    NASA Technical Reports Server (NTRS)

    Lemcoe, M. M.

    1974-01-01

    A high temperature electric resistance strain gage system was developed and evaluated to 1366 K (2000 F) for periods of at least one hour. Wire fabricated from a special high temperature strain gage alloy (BCL-3), was used to fabricate the gages. Various joining techniques (NASA butt welding, pulse arc, plasma needle arc, and dc parallel gap welding) were investigated for joining gage filaments to each other, gage filaments to lead-tab ribbons, and lead-tab ribbons to lead wires. The effectiveness of a clad-wire concept as a means of minimizing apparent strain of BCL-3 strain gages was investigated by sputtering platinum coatings of varying thicknesses on wire samples and establishing the optimum coating thickness--in terms of minimum resistivity changes with temperature. Finally, the moisture-proofing effectiveness of barrier coatings subjected to elevated temperatures was studied, and one commercial barrier coating (BLH Barrier H Waterproofing) was evaluated.

  3. Using Parallel Processing for Problem Solving.

    DTIC Science & Technology

    1979-12-01

    are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language

  4. The effect of electrodes on 11 acene molecular spin valve: Semi-empirical study

    NASA Astrophysics Data System (ADS)

    Aadhityan, A.; Preferencial Kala, C.; John Thiruvadigal, D.

    2017-10-01

    A new revolution in electronics is molecular spintronics, with the contemporary evolution of the two novel disciplines of spintronics and molecular electronics. The key point is the creation of molecular spin valve which consists of a diamagnetic molecule in between two magnetic leads. In this paper, non-equilibrium Green's function (NEGF) combined with Extended Huckel Theory (EHT); a semi-empirical approach is used to analyse the electron transport characteristics of 11 acene molecular spin valve. We examine the spin-dependence transport on 11 acene molecular junction with various semi-infinite electrodes as Iron, Cobalt and Nickel. To analyse the spin-dependence transport properties the left and right electrodes are joined to the central region in parallel and anti-parallel configurations. We computed spin polarised device density of states, projected device density of states of carbon and the electrode element, and transmission of these devices. The results demonstrate that the effect of electrodes modifying the spin-dependence behaviours of these systems in a controlled way. In Parallel and anti-parallel configuration the separation of spin up and spin down is lager in the case of iron electrode than nickel and cobalt electrodes. It shows that iron is the best electrode for 11 acene spin valve device. Our theoretical results are reasonably impressive and trigger our motivation for comprehending the transport properties of these molecular-sized contacts.

  5. Prioritizing multiple therapeutic targets in parallel using automated DNA-encoded library screening

    NASA Astrophysics Data System (ADS)

    Machutta, Carl A.; Kollmann, Christopher S.; Lind, Kenneth E.; Bai, Xiaopeng; Chan, Pan F.; Huang, Jianzhong; Ballell, Lluis; Belyanskaya, Svetlana; Besra, Gurdyal S.; Barros-Aguirre, David; Bates, Robert H.; Centrella, Paolo A.; Chang, Sandy S.; Chai, Jing; Choudhry, Anthony E.; Coffin, Aaron; Davie, Christopher P.; Deng, Hongfeng; Deng, Jianghe; Ding, Yun; Dodson, Jason W.; Fosbenner, David T.; Gao, Enoch N.; Graham, Taylor L.; Graybill, Todd L.; Ingraham, Karen; Johnson, Walter P.; King, Bryan W.; Kwiatkowski, Christopher R.; Lelièvre, Joël; Li, Yue; Liu, Xiaorong; Lu, Quinn; Lehr, Ruth; Mendoza-Losana, Alfonso; Martin, John; McCloskey, Lynn; McCormick, Patti; O'Keefe, Heather P.; O'Keeffe, Thomas; Pao, Christina; Phelps, Christopher B.; Qi, Hongwei; Rafferty, Keith; Scavello, Genaro S.; Steiginga, Matt S.; Sundersingh, Flora S.; Sweitzer, Sharon M.; Szewczuk, Lawrence M.; Taylor, Amy; Toh, May Fern; Wang, Juan; Wang, Minghui; Wilkins, Devan J.; Xia, Bing; Yao, Gang; Zhang, Jean; Zhou, Jingye; Donahue, Christine P.; Messer, Jeffrey A.; Holmes, David; Arico-Muendel, Christopher C.; Pope, Andrew J.; Gross, Jeffrey W.; Evindar, Ghotas

    2017-07-01

    The identification and prioritization of chemically tractable therapeutic targets is a significant challenge in the discovery of new medicines. We have developed a novel method that rapidly screens multiple proteins in parallel using DNA-encoded library technology (ELT). Initial efforts were focused on the efficient discovery of antibacterial leads against 119 targets from Acinetobacter baumannii and Staphylococcus aureus. The success of this effort led to the hypothesis that the relative number of ELT binders alone could be used to assess the ligandability of large sets of proteins. This concept was further explored by screening 42 targets from Mycobacterium tuberculosis. Active chemical series for six targets from our initial effort as well as three chemotypes for DHFR from M. tuberculosis are reported. The findings demonstrate that parallel ELT selections can be used to assess ligandability and highlight opportunities for successful lead and tool discovery.

  6. A cascadable circular concentrator with parallel compressed structure for increasing the energy density

    NASA Astrophysics Data System (ADS)

    Ku, Nai-Lun; Chen, Yi-Yung; Hsieh, Wei-Che; Whang, Allen Jong-Woei

    2012-02-01

    Due to the energy crisis, the principle of green energy gains popularity. This leads the increasing interest in renewable energy such as solar energy. Thus, how to collect the sunlight for indoor illumination becomes our ultimate target. With the environmental awareness increasing, we use the nature light as the light source. Then we start to devote the development of solar collecting system. The Natural Light Guiding System includes three parts, collecting, transmitting and lighting part. The idea of our solar collecting system design is a concept for combining the buildings with a combination of collecting modules. Therefore, we can use it anyplace where the sunlight can directly impinges on buildings with collecting elements. In the meantime, while collecting the sunlight with high efficiency, we can transmit the sunlight into indoor through shorter distance zone by light pipe where we needs the light. We proposed a novel design including disk-type collective lens module. With the design, we can let the incident light and exit light be parallel and compressed. By the parallel and compressed design, we make every output light become compressed in the proposed optical structure. In this way, we can increase the ratio about light compression, get the better efficiency and let the energy distribution more uniform for indoor illumination. By the definition of "KPI" as an performance index about light density as following: lm/(mm)2, the simulation results show that the proposed Concentrator is 40,000,000 KPI much better than the 800,000 KPI measured from the traditional ones.

  7. Tutorial: Performance and reliability in redundant disk arrays

    NASA Technical Reports Server (NTRS)

    Gibson, Garth A.

    1993-01-01

    A disk array is a collection of physically small magnetic disks that is packaged as a single unit but operates in parallel. Disk arrays capitalize on the availability of small-diameter disks from a price-competitive market to provide the cost, volume, and capacity of current disk systems but many times their performance. Unfortunately, relative to current disk systems, the larger number of components in disk arrays leads to higher rates of failure. To tolerate failures, redundant disk arrays devote a fraction of their capacity to an encoding of their information. This redundant information enables the contents of a failed disk to be recovered from the contents of non-failed disks. The simplest and least expensive encoding for this redundancy, known as N+1 parity is highlighted. In addition to compensating for the higher failure rates of disk arrays, redundancy allows highly reliable secondary storage systems to be built much more cost-effectively than is now achieved in conventional duplicated disks. Disk arrays that combine redundancy with the parallelism of many small-diameter disks are often called Redundant Arrays of Inexpensive Disks (RAID). This combination promises improvements to both the performance and the reliability of secondary storage. For example, IBM's premier disk product, the IBM 3390, is compared to a redundant disk array constructed of 84 IBM 0661 3 1/2-inch disks. The redundant disk array has comparable or superior values for each of the metrics given and appears likely to cost less. In the first section of this tutorial, I explain how disk arrays exploit the emergence of high performance, small magnetic disks to provide cost-effective disk parallelism that combats the access and transfer gap problems. The flexibility of disk-array configurations benefits manufacturer and consumer alike. In contrast, I describe in this tutorial's second half how parallelism, achieved through increasing numbers of components, causes overall failure rates to rise. Redundant disk arrays overcome this threat to data reliability by ensuring that data remains available during and after component failures.

  8. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  9. Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

    NASA Technical Reports Server (NTRS)

    Gelenbe, Erol

    1988-01-01

    An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.

  10. Determination of Fermi contour and spin polarization of ν = 3 2 composite fermions via ballistic commensurability measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamburov, D.; Mueed, M. A.; Jo, I.

    2014-12-01

    We report ballistic transport commensurability minima in the magnetoresistance of nu = 3/2 composite fermions (CFs). The CFs are formed in high-quality two-dimensional electron systems confined to wide GaAs quantum wells and subjected to an in-plane, unidirectional periodic potential modulation. We observe a slight asymmetry of the CF commensurability positions with respect to nu = 3/2, which we explain quantitatively by comparing three CF density models and concluding that the nu = 3/2 CFs are likely formed by the minority carriers in the upper energy spin state of the lowest Landau level. Our data also allow us to probe themore » shape and size of the CF Fermi contour. At a fixed electron density of similar or equal to 1.8x10(11) cm(-2), as the quantum well width increases from 30 to 60 nm, the CFs show increasing spin polarization. We attribute this to the enhancement of the Zeeman energy relative to the Coulomb energy in wider wells where the latter is softened because of the larger electron layer thickness. The application of an additional parallel magnetic field (B-parallel to) leads to a significant distortion of the CF Fermi contour as B-parallel to couples to the CFs' out-of-plane orbital motion. The distortion is much more severe compared to the nu = 1/2 CF case at comparable B-parallel to. Moreover, the applied B-parallel to further spin-polarizes the nu = 3/2 CFs as deduced from the positions of the commensurability minima.« less

  11. An Expert System for the Development of Efficient Parallel Code

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  12. Casimir force in O(n) systems with a diffuse interface.

    PubMed

    Dantchev, Daniel; Grüneberg, Daniel

    2009-04-01

    We study the behavior of the Casimir force in O(n) systems with a diffuse interface and slab geometry infinity;{d-1}xL , where 2infinity limit of O(n) models with antiperiodic boundary conditions applied along the finite dimension L of the film. We observe that the Casimir amplitude Delta_{Casimir}(dmid R:J_{ perpendicular},J_{ parallel}) of the anisotropic d -dimensional system is related to that of the isotropic system Delta_{Casimir}(d) via Delta_{Casimir}(dmid R:J_{ perpendicular},J_{ parallel})=(J_{ perpendicular}J_{ parallel});{(d-1)2}Delta_{Casimir}(d) . For d=3 we derive the exact Casimir amplitude Delta_{Casimir}(3,mid R:J_{ perpendicular},J_{ parallel})=[Cl_{2}(pi3)3-zeta(3)(6pi)](J_{ perpendicular}J_{ parallel}) , as well as the exact scaling functions of the Casimir force and of the helicity modulus Upsilon(T,L) . We obtain that beta_{c}Upsilon(T_{c},L)=(2pi;{2})[Cl_{2}(pi3)3+7zeta(3)(30pi)](J_{ perpendicular}J_{ parallel})L;{-1} , where T_{c} is the critical temperature of the bulk system. We find that the contributions in the excess free energy due to the existence of a diffuse interface result in a repulsive Casimir force in the whole temperature region.

  13. Comparison between four dissimilar solar panel configurations

    NASA Astrophysics Data System (ADS)

    Suleiman, K.; Ali, U. A.; Yusuf, Ibrahim; Koko, A. D.; Bala, S. I.

    2017-12-01

    Several studies on photovoltaic systems focused on how it operates and energy required in operating it. Little attention is paid on its configurations, modeling of mean time to system failure, availability, cost benefit and comparisons of parallel and series-parallel designs. In this research work, four system configurations were studied. Configuration I consists of two sub-components arranged in parallel with 24 V each, configuration II consists of four sub-components arranged logically in parallel with 12 V each, configuration III consists of four sub-components arranged in series-parallel with 8 V each, and configuration IV has six sub-components with 6 V each arranged in series-parallel. Comparative analysis was made using Chapman Kolmogorov's method. The derivation for explicit expression of mean time to system failure, steady state availability and cost benefit analysis were performed, based on the comparison. Ranking method was used to determine the optimal configuration of the systems. The results of analytical and numerical solutions of system availability and mean time to system failure were determined and it was found that configuration I is the optimal configuration.

  14. Parallels, How Many? Geometry Module for Use in a Mathematics Laboratory Setting.

    ERIC Educational Resources Information Center

    Brotherton, Sheila; And Others

    This is one of a series of geometry modules developed for use by secondary students in a laboratory setting. This module was conceived as an alternative approach to the usual practice of giving Euclid's parallel postulate and then mentioning that alternate postulates would lead to an alternate geometry or geometries. Instead, the student is led…

  15. The Challenge and Challenging of Childhood Studies? Learning from Disability Studies and Research with Disabled Children

    ERIC Educational Resources Information Center

    Tisdall, E. Kay M.

    2012-01-01

    Childhood studies have argued for the social construction of childhood, respecting children and childhood in the present, and recognising children's agency and rights. Such perspectives have parallels to, and challenges for, disability studies. This article considers such parallels and challenges, leading to a (re)consideration of research claims…

  16. Innovative Language-Based & Object-Oriented Structured AMR Using Fortran 90 and OpenMP

    NASA Technical Reports Server (NTRS)

    Norton, C.; Balsara, D.

    1999-01-01

    Parallel adaptive mesh refinement (AMR) is an important numerical technique that leads to the efficient solution of many physical and engineering problems. In this paper, we describe how AMR programing can be performed in an object-oreinted way using the modern aspects of Fortran 90 combined with the parallelization features of OpenMP.

  17. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

    2001-01-01

    This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.

  18. Effects of electrocautery on transvenous lead insulation materials.

    PubMed

    Lim, Kiam-Khiang; Reddy, Shantanu; Desai, Shrojal; Smelley, Matthew; Kim, Susan S; Beshai, John F; Lin, Albert C; Burke, Martin C; Knight, Bradley P

    2009-04-01

    Insulation defects are a leading cause of transvenous lead failure. The purpose of this study was to determine the effects of electrocautery on transvenous lead insulation materials. A preparation was done to simulate dissection of a transvenous lead from tissues. Radiofrequency energy was delivered using a standard cautery blade at outputs of 10, 20, and 30 W, for 3 and 6 seconds, using parallel and perpendicular blade orientations on leads with outermost insulations of silicone rubber, polyurethane, and silicone-polyurethane copolymer. Damage to each lead segment was classified after visual and microscopic analysis. Significant insulation damage occurred to almost all polyurethane leads. Full insulation breaches were observed with 30 W regardless of application duration with a parallel direction and with all power outputs with a perpendicular direction. Thermal insulation damage to copolymer insulation was similar to that of the polyurethane leads. In contrast, there was no thermal damage to silicone leads, regardless of the power output and duration of power delivery. However, mechanical insulation damage was observed to all silicone leads when at least 20 W was applied in a direction perpendicular to the lead. Polyurethane (PU55D) and copolymer materials have low thermal stability and are highly susceptible to thermal damage during cautery. Implanting physicians should be aware of the lead insulation materials being used during implant procedures and their properties. The use of direct contact cautery on transvenous leads should be minimized to avoid damage to the lead, especially on leads with polyurethane or copolymer outer insulations.

  19. A Performance Evaluation of the Cray X1 for Scientific Applications

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David

    2004-01-01

    The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.

  20. A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

    NASA Technical Reports Server (NTRS)

    Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

    2005-01-01

    A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.

  1. Parallel-Processing Test Bed For Simulation Software

    NASA Technical Reports Server (NTRS)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  2. Stability of Iowa mutant and wild type Aβ-peptide aggregates

    NASA Astrophysics Data System (ADS)

    Alred, Erik J.; Scheele, Emily G.; Berhanu, Workalemahu M.; Hansmann, Ulrich H. E.

    2014-11-01

    Recent experiments indicate a connection between the structure of amyloid aggregates and their cytotoxicity as related to neurodegenerative diseases. Of particular interest is the Iowa Mutant, which causes early-onset of Alzheimer's disease. While wild-type Amyloid β-peptides form only parallel beta-sheet aggregates, the mutant also forms meta-stable antiparallel beta sheets. Since these structural variations may cause the difference in the pathological effects of the two Aβ-peptides, we have studied in silico the relative stability of the wild type and Iowa mutant in both parallel and antiparallel forms. We compare regular molecular dynamics simulations with such where the viscosity of the samples is reduced, which, we show, leads to higher sampling efficiency. By analyzing and comparing these four sets of all-atom molecular dynamics simulations, we probe the role of the various factors that could lead to the structural differences. Our analysis indicates that the parallel forms of both wild type and Iowa mutant aggregates are stable, while the antiparallel aggregates are meta-stable for the Iowa mutant and not stable for the wild type. The differences result from the direct alignment of hydrophobic interactions in the in-register parallel oligomers, making them more stable than the antiparallel aggregates. The slightly higher thermodynamic stability of the Iowa mutant fibril-like oligomers in its parallel organization over that in antiparallel form is supported by previous experimental measurements showing slow inter-conversion of antiparallel aggregates into parallel ones. Knowledge of the mechanism that selects between parallel and antiparallel conformations and determines their relative stability may open new avenues for the development of therapies targeting familial forms of early-onset Alzheimer's disease.

  3. A time-parallel approach to strong-constraint four-dimensional variational data assimilation

    NASA Astrophysics Data System (ADS)

    Rao, Vishwas; Sandu, Adrian

    2016-05-01

    A parallel-in-time algorithm based on an augmented Lagrangian approach is proposed to solve four-dimensional variational (4D-Var) data assimilation problems. The assimilation window is divided into multiple sub-intervals that allows parallelization of cost function and gradient computations. The solutions to the continuity equations across interval boundaries are added as constraints. The augmented Lagrangian approach leads to a different formulation of the variational data assimilation problem than the weakly constrained 4D-Var. A combination of serial and parallel 4D-Vars to increase performance is also explored. The methodology is illustrated on data assimilation problems involving the Lorenz-96 and the shallow water models.

  4. Serial and parallel attentive visual searches: evidence from cumulative distribution functions of response times.

    PubMed

    Sung, Kyongje

    2008-12-01

    Participants searched a visual display for a target among distractors. Each of 3 experiments tested a condition proposed to require attention and for which certain models propose a serial search. Serial versus parallel processing was tested by examining effects on response time means and cumulative distribution functions. In 2 conditions, the results suggested parallel rather than serial processing, even though the tasks produced significant set-size effects. Serial processing was produced only in a condition with a difficult discrimination and a very large set-size effect. The results support C. Bundesen's (1990) claim that an extreme set-size effect leads to serial processing. Implications for parallel models of visual selection are discussed.

  5. Feasibility study of a pressure-fed engine for a water recoverable space shuttle booster. Volume 1: Executive summary

    NASA Technical Reports Server (NTRS)

    1972-01-01

    The activities leading to a tentative concept selection for a pressure-fed engine and propulsion support are outlined. Multiple engine concepts were evaluted through parallel engine major component and system analyses. Booster vehicle coordination, tradeoffs, and technology/development aspects are included. The concept selected for further evaluation has a regeneratively cooled combustion chamber and nozzle in conjuction with an impinging element injector. The propellants chosen are LOX/RP-1, and combustion stabilizing baffles are used to assure dynamic combustion stability.

  6. Hermes: Seamless delivery of containerized bioinformatics workflows in hybrid cloud (HTC) environments

    NASA Astrophysics Data System (ADS)

    Kintsakis, Athanassios M.; Psomopoulos, Fotis E.; Symeonidis, Andreas L.; Mitkas, Pericles A.

    Hermes introduces a new "describe once, run anywhere" paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.

  7. A parallel time integrator for noisy nonlinear oscillatory systems

    NASA Astrophysics Data System (ADS)

    Subber, Waad; Sarkar, Abhijit

    2018-06-01

    In this paper, we adapt a parallel time integration scheme to track the trajectories of noisy non-linear dynamical systems. Specifically, we formulate a parallel algorithm to generate the sample path of nonlinear oscillator defined by stochastic differential equations (SDEs) using the so-called parareal method for ordinary differential equations (ODEs). The presence of Wiener process in SDEs causes difficulties in the direct application of any numerical integration techniques of ODEs including the parareal algorithm. The parallel implementation of the algorithm involves two SDEs solvers, namely a fine-level scheme to integrate the system in parallel and a coarse-level scheme to generate and correct the required initial conditions to start the fine-level integrators. For the numerical illustration, a randomly excited Duffing oscillator is investigated in order to study the performance of the stochastic parallel algorithm with respect to a range of system parameters. The distributed implementation of the algorithm exploits Massage Passing Interface (MPI).

  8. Distributed and parallel Ada and the Ada 9X recommendations

    NASA Technical Reports Server (NTRS)

    Volz, Richard A.; Goldsack, Stephen J.; Theriault, R.; Waldrop, Raymond S.; Holzbacher-Valero, A. A.

    1992-01-01

    Recently, the DoD has sponsored work towards a new version of Ada, intended to support the construction of distributed systems. The revised version, often called Ada 9X, will become the new standard sometimes in the 1990s. It is intended that Ada 9X should provide language features giving limited support for distributed system construction. The requirements for such features are given. Many of the most advanced computer applications involve embedded systems that are comprised of parallel processors or networks of distributed computers. If Ada is to become the widely adopted language envisioned by many, it is essential that suitable compilers and tools be available to facilitate the creation of distributed and parallel Ada programs for these applications. The major languages issues impacting distributed and parallel programming are reviewed, and some principles upon which distributed/parallel language systems should be built are suggested. Based upon these, alternative language concepts for distributed/parallel programming are analyzed.

  9. Partitioning problems in parallel, pipelined and distributed computing

    NASA Technical Reports Server (NTRS)

    Bokhari, S.

    1985-01-01

    The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.

  10. Electrostatic focal spot correction for x-ray tubes operating in strong magnetic fields.

    PubMed

    Lillaney, Prasheel; Shin, Mihye; Hinshaw, Waldo; Fahrig, Rebecca

    2014-11-01

    A close proximity hybrid x-ray/magnetic resonance (XMR) imaging system offers several critical advantages over current XMR system installations that have large separation distances (∼5 m) between the imaging fields of view. The two imaging systems can be placed in close proximity to each other if an x-ray tube can be designed to be immune to the magnetic fringe fields outside of the MR bore. One of the major obstacles to robust x-ray tube design is correcting for the effects of the MR fringe field on the x-ray tube focal spot. Any fringe field component orthogonal to the x-ray tube electric field leads to electron drift altering the path of the electron trajectories. The method proposed in this study to correct for the electron drift utilizes an external electric field in the direction of the drift. The electric field is created using two electrodes that are positioned adjacent to the cathode. These electrodes are biased with positive and negative potential differences relative to the cathode. The design of the focusing cup assembly is constrained primarily by the strength of the MR fringe field and high voltage standoff distances between the anode, cathode, and the bias electrodes. From these constraints, a focusing cup design suitable for the close proximity XMR system geometry is derived, and a finite element model of this focusing cup geometry is simulated to demonstrate efficacy. A Monte Carlo simulation is performed to determine any effects of the modified focusing cup design on the output x-ray energy spectrum. An orthogonal fringe field magnitude of 65 mT can be compensated for using bias voltages of +15 and -20 kV. These bias voltages are not sufficient to completely correct for larger orthogonal field magnitudes. Using active shielding coils in combination with the bias electrodes provides complete correction at an orthogonal field magnitude of 88.1 mT. Introducing small fields (<10 mT) parallel to the x-ray tube electric field in addition to the orthogonal field does not affect the electrostatic correction technique. However, rotation of the x-ray tube by 30° toward the MR bore increases the parallel magnetic field magnitude (∼72 mT). The presence of this larger parallel field along with the orthogonal field leads to incomplete correction. Monte Carlo simulations demonstrate that the mean energy of the x-ray spectrum is not noticeably affected by the electrostatic correction, but the output flux is reduced by 7.5%. The maximum orthogonal magnetic field magnitude that can be compensated for using the proposed design is 65 mT. Larger orthogonal field magnitudes cannot be completely compensated for because a pure electrostatic approach is limited by the dielectric strength of the vacuum inside the x-ray tube insert. The electrostatic approach also suffers from limitations when there are strong magnetic fields in both the orthogonal and parallel directions because the electrons prefer to stay aligned with the parallel magnetic field. These challenging field conditions can be addressed by using a hybrid correction approach that utilizes both active shielding coils and biasing electrodes.

  11. Electrostatic focal spot correction for x-ray tubes operating in strong magnetic fields

    PubMed Central

    Lillaney, Prasheel; Shin, Mihye; Hinshaw, Waldo; Fahrig, Rebecca

    2014-01-01

    Purpose: A close proximity hybrid x-ray/magnetic resonance (XMR) imaging system offers several critical advantages over current XMR system installations that have large separation distances (∼5 m) between the imaging fields of view. The two imaging systems can be placed in close proximity to each other if an x-ray tube can be designed to be immune to the magnetic fringe fields outside of the MR bore. One of the major obstacles to robust x-ray tube design is correcting for the effects of the MR fringe field on the x-ray tube focal spot. Any fringe field component orthogonal to the x-ray tube electric field leads to electron drift altering the path of the electron trajectories. Methods: The method proposed in this study to correct for the electron drift utilizes an external electric field in the direction of the drift. The electric field is created using two electrodes that are positioned adjacent to the cathode. These electrodes are biased with positive and negative potential differences relative to the cathode. The design of the focusing cup assembly is constrained primarily by the strength of the MR fringe field and high voltage standoff distances between the anode, cathode, and the bias electrodes. From these constraints, a focusing cup design suitable for the close proximity XMR system geometry is derived, and a finite element model of this focusing cup geometry is simulated to demonstrate efficacy. A Monte Carlo simulation is performed to determine any effects of the modified focusing cup design on the output x-ray energy spectrum. Results: An orthogonal fringe field magnitude of 65 mT can be compensated for using bias voltages of +15 and −20 kV. These bias voltages are not sufficient to completely correct for larger orthogonal field magnitudes. Using active shielding coils in combination with the bias electrodes provides complete correction at an orthogonal field magnitude of 88.1 mT. Introducing small fields (<10 mT) parallel to the x-ray tube electric field in addition to the orthogonal field does not affect the electrostatic correction technique. However, rotation of the x-ray tube by 30° toward the MR bore increases the parallel magnetic field magnitude (∼72 mT). The presence of this larger parallel field along with the orthogonal field leads to incomplete correction. Monte Carlo simulations demonstrate that the mean energy of the x-ray spectrum is not noticeably affected by the electrostatic correction, but the output flux is reduced by 7.5%. Conclusions: The maximum orthogonal magnetic field magnitude that can be compensated for using the proposed design is 65 mT. Larger orthogonal field magnitudes cannot be completely compensated for because a pure electrostatic approach is limited by the dielectric strength of the vacuum inside the x-ray tube insert. The electrostatic approach also suffers from limitations when there are strong magnetic fields in both the orthogonal and parallel directions because the electrons prefer to stay aligned with the parallel magnetic field. These challenging field conditions can be addressed by using a hybrid correction approach that utilizes both active shielding coils and biasing electrodes. PMID:25370658

  12. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  13. Cloud computing approaches to accelerate drug discovery value chain.

    PubMed

    Garg, Vibhav; Arora, Suchir; Gupta, Chitra

    2011-12-01

    Continued advancements in the area of technology have helped high throughput screening (HTS) evolve from a linear to parallel approach by performing system level screening. Advanced experimental methods used for HTS at various steps of drug discovery (i.e. target identification, target validation, lead identification and lead validation) can generate data of the order of terabytes. As a consequence, there is pressing need to store, manage, mine and analyze this data to identify informational tags. This need is again posing challenges to computer scientists to offer the matching hardware and software infrastructure, while managing the varying degree of desired computational power. Therefore, the potential of "On-Demand Hardware" and "Software as a Service (SAAS)" delivery mechanisms cannot be denied. This on-demand computing, largely referred to as Cloud Computing, is now transforming the drug discovery research. Also, integration of Cloud computing with parallel computing is certainly expanding its footprint in the life sciences community. The speed, efficiency and cost effectiveness have made cloud computing a 'good to have tool' for researchers, providing them significant flexibility, allowing them to focus on the 'what' of science and not the 'how'. Once reached to its maturity, Discovery-Cloud would fit best to manage drug discovery and clinical development data, generated using advanced HTS techniques, hence supporting the vision of personalized medicine.

  14. Molecular mechanisms of aging and immune system regulation in Drosophila.

    PubMed

    Eleftherianos, Ioannis; Castillo, Julio Cesar

    2012-01-01

    Aging is a complex process that involves the accumulation of deleterious changes resulting in overall decline in several vital functions, leading to the progressive deterioration in physiological condition of the organism and eventually causing disease and death. The immune system is the most important host-defense mechanism in humans and is also highly conserved in insects. Extensive research in vertebrates has concluded that aging of the immune function results in increased susceptibility to infectious disease and chronic inflammation. Over the years, interest has grown in studying the molecular interaction between aging and the immune response to pathogenic infections. The fruit fly Drosophila melanogaster is an excellent model system for dissecting the genetic and genomic basis of important biological processes, such as aging and the innate immune system, and deciphering parallel mechanisms in vertebrate animals. Here, we review the recent advances in the identification of key players modulating the relationship between molecular aging networks and immune signal transduction pathways in the fly. Understanding the details of the molecular events involved in aging and immune system regulation will potentially lead to the development of strategies for decreasing the impact of age-related diseases, thus improving human health and life span.

  15. Molecular Mechanisms of Aging and Immune System Regulation in Drosophila

    PubMed Central

    Eleftherianos, Ioannis; Castillo, Julio Cesar

    2012-01-01

    Aging is a complex process that involves the accumulation of deleterious changes resulting in overall decline in several vital functions, leading to the progressive deterioration in physiological condition of the organism and eventually causing disease and death. The immune system is the most important host-defense mechanism in humans and is also highly conserved in insects. Extensive research in vertebrates has concluded that aging of the immune function results in increased susceptibility to infectious disease and chronic inflammation. Over the years, interest has grown in studying the molecular interaction between aging and the immune response to pathogenic infections. The fruit fly Drosophila melanogaster is an excellent model system for dissecting the genetic and genomic basis of important biological processes, such as aging and the innate immune system, and deciphering parallel mechanisms in vertebrate animals. Here, we review the recent advances in the identification of key players modulating the relationship between molecular aging networks and immune signal transduction pathways in the fly. Understanding the details of the molecular events involved in aging and immune system regulation will potentially lead to the development of strategies for decreasing the impact of age-related diseases, thus improving human health and life span. PMID:22949833

  16. Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less

  17. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  18. Parallels in the Process of Achieving Personal Growth by Abusing Parents Through Participation in Group Therapy Programs or in Religious Groups

    ERIC Educational Resources Information Center

    Herrenkohl, Ellen C.

    1978-01-01

    Group therapy participation and religious conversion have been cited as sources of personal growth by a number of formerly abusive parents. The parallels in the dynamics of change for the two kinds of experiences are discussed in the context of the factors thought to lead to abuse. (Author)

  19. Using parallel computing for the display and simulation of the space debris environment

    NASA Astrophysics Data System (ADS)

    Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.

    2011-07-01

    Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction to OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.

  20. Using parallel computing for the display and simulation of the space debris environment

    NASA Astrophysics Data System (ADS)

    Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter

    Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction of OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.

  1. Distributed parallel messaging for multiprocessor systems

    DOEpatents

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  2. Parallelized Stochastic Cutoff Method for Long-Range Interacting Systems

    NASA Astrophysics Data System (ADS)

    Endo, Eishin; Toga, Yuta; Sasaki, Munetaka

    2015-07-01

    We present a method of parallelizing the stochastic cutoff (SCO) method, which is a Monte-Carlo method for long-range interacting systems. After interactions are eliminated by the SCO method, we subdivide a lattice into noninteracting interpenetrating sublattices. This subdivision enables us to parallelize the Monte-Carlo calculation in the SCO method. Such subdivision is found by numerically solving the vertex coloring of a graph created by the SCO method. We use an algorithm proposed by Kuhn and Wattenhofer to solve the vertex coloring by parallel computation. This method was applied to a two-dimensional magnetic dipolar system on an L × L square lattice to examine its parallelization efficiency. The result showed that, in the case of L = 2304, the speed of computation increased about 102 times by parallel computation with 288 processors.

  3. Design of on-board parallel computer on nano-satellite

    NASA Astrophysics Data System (ADS)

    You, Zheng; Tian, Hexiang; Yu, Shijie; Meng, Li

    2007-11-01

    This paper provides one scheme of the on-board parallel computer system designed for the Nano-satellite. Based on the development request that the Nano-satellite should have a small volume, low weight, low power cost, and intelligence, this scheme gets rid of the traditional one-computer system and dual-computer system with endeavor to improve the dependability, capability and intelligence simultaneously. According to the method of integration design, it employs the parallel computer system with shared memory as the main structure, connects the telemetric system, attitude control system, and the payload system by the intelligent bus, designs the management which can deal with the static tasks and dynamic task-scheduling, protect and recover the on-site status and so forth in light of the parallel algorithms, and establishes the fault diagnosis, restoration and system restructure mechanism. It accomplishes an on-board parallel computer system with high dependability, capability and intelligence, a flexible management on hardware resources, an excellent software system, and a high ability in extension, which satisfies with the conception and the tendency of the integration electronic design sufficiently.

  4. Environmental concept for engineering software on MIMD computers

    NASA Technical Reports Server (NTRS)

    Lopez, L. A.; Valimohamed, K.

    1989-01-01

    The issues related to developing an environment in which engineering systems can be implemented on MIMD machines are discussed. The problem is presented in terms of implementing the finite element method under such an environment. However, neither the concepts nor the prototype implementation environment are limited to this application. The topics discussed include: the ability to schedule and synchronize tasks efficiently; granularity of tasks; load balancing; and the use of a high level language to specify parallel constructs, manage data, and achieve portability. The objective of developing a virtual machine concept which incorporates solutions to the above issues leads to a design that can be mapped onto loosely coupled, tightly coupled, and hybrid systems.

  5. Parallel dispatch: a new paradigm of electrical power system dispatch

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang

    Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less

  6. Experimental implementation of parallel riverbed erosion to study vegetation uprooting by flow

    NASA Astrophysics Data System (ADS)

    Perona, Paolo; Edmaier, Katharina; Crouzy, Benoît

    2014-05-01

    In nature, flow erosion leading to the uprooting of vegetation is often a delayed process that gradually reduces anchoring by root exposure and correspondingly increases drag on the exposed biomass. The process determining scouring or deposition of the riverbed, and consequently plant root exposure is complex and scale dependent. At the local scale, it is hydrodynamically driven and depends on obstacle porosity, as well as sediment vs obstacle size ratio. At a larger scale it results from morphodynamic conditions, which mostly depend on riverbed topography and stream bedload transport capacity. In the latter case, ablation of sediment gradually reduces local bed elevation around the obstacle at a scale larger than the obstacle size, and uprooting eventually occurs when flow drag exceeds the residual anchoring. Ideally, one would study the timescales of vegetation uprooting by flow by inducing parallel bed erosion. This condition is not trivial to obtain experimentally because bed elevation adjustments occur in relation to longitudinal changes in sediment apportion as described by Exner's equation. In this work, we study the physical conditions leading to parallel bed erosion by reducing Exner equation closed for bedload transport to a nonlinear partial differential equation, and showing that this is a particular "boundary value" problem. Eventually, we use the data of Edmaier (2014) from a small scale mobile-bed flume setup to verify the proposed theoretical framework, and to show how such a simple experiment can provide useful insights into the timescales of the uprooting process (Edmaier et al., 2011). REFERENCES - Edmaier, K., P. Burlando, and P. Perona (2011). Mechanisms of vegetation uprooting by flow in alluvial non-cohesive sediment. Hydrology and Earth System Sciences, vol. 15, p. 1615-1627. - Edmaier, K. Uprooting mechanisms of juvenile vegetation by flow. PhD thesis, EPFL, in preparation.

  7. Study of adaptation to altered gravity through systems analysis of motor control.

    PubMed

    Fox, R A; Daunton, N G; Corcoran, M L

    1998-01-01

    Maintenance of posture and production of functional, coordinated movement demand integration of sensory feedback with spinal and supra-spinal circuitry to produce adaptive motor control in altered gravity (G). To investigate neuroplastic processes leading to optimal performance in altered G we have studied motor control in adult rats using a battery of motor function tests following chronic exposure to various treatments (hyper-G, hindlimb suspension, chemical distruction of hair cells, space flight). These treatments differentially affect muscle fibers, vestibular receptors, and behavioral compensations and, in consequence, differentially disrupt air righting, swimming, posture and gait. The time-course of recovery from these disruptions varies depending on the function tested and the duration and type of treatment. These studies, with others (e.g., D'Amelio et al. in this volume), indicate that adaptation to altered gravity involves alterations in multiple sensory-motor systems that change at different rates. We propose that the use of parallel studies under different altered G conditions will most efficiently lead to an understanding of the modifications in central (neural) and peripheral (sensory and neuromuscular) systems that underlie sensory-motor adaptation in active, intact individuals.

  8. Dynamical formation of a hairy black hole in a cavity from the decay of unstable solitons

    NASA Astrophysics Data System (ADS)

    Sanchis-Gual, Nicolas; Degollado, Juan Carlos; Font, José A.; Herdeiro, Carlos; Radu, Eugen

    2017-08-01

    Recent numerical relativity simulations within the Einstein-Maxwell-(charged-)Klein-Gordon (EMcKG) system have shown that the non-linear evolution of a superradiantly unstable Reissner-Nordström black hole (BH) enclosed in a cavity, leads to the formation of a BH with scalar hair. Perturbative evidence for the stability of such hairy BHs has been independently established, confirming they are the true endpoints of superradiant instability. The same EMcKG system admits also charged scalar soliton-type solutions, which can be either stable or unstable. Using numerical relativity techniques, we provide evidence that the time evolution of some of these unstable solitons leads, again, to the formation of a hairy BH. In some other cases, unstable solitons evolve into a (bald) Reissner-Nordström BH. These results establish that the system admits two distinct channels to form hairy BHs at the threshold of superradiance: growing hair from an unstable (bald) BH, or growing a horizon from an unstable (horizonless) soliton. Some parallelism with the case of asymptotically flat boson stars and Kerr BHs with scalar hair is drawn.

  9. Study of adaptation to altered gravity through systems analysis of motor control

    NASA Astrophysics Data System (ADS)

    Fox, R. A.; Daunton, N. G.; Corcoran, M. L.

    Maintenance of posture and production of functional, coordinated movement demand integration of sensory feedback with spinal and supra-spinal circuitry to produce adaptive motor control in altered gravity (G). To investigate neuroplastic processes leading to optimal performance in altered G we have studied motor control in adult rats using a battery of motor function tests following chronic exposure to various treatments (hyper-G, hindlimb suspension, chemical distruction of hair cells, space flight). These treatments differentially affect muscle fibers, vestibular receptors, and behavioral compensations and, in consequence, differentially disrupt air righting, swimming, posture and gait. The time-course of recovery from these disruptions varies depending on the function tested and the duration and type of treatment. These studies, with others (e.g., D'Amelio et al. in this volume), indicate that adaptation to altered gravity involves alterations in multiple sensory-motor systems that change at different rates. We propose that the use of parallel studies under different altered G conditions will most efficiently lead to an understanding of the modifications in central (neural) and peripheral (sensory and neuromuscular) systems that underlie sensory-motor adaptation in active, intact individuals.

  10. System-wide power management control via clock distribution network

    DOEpatents

    Coteus, Paul W.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Reed, Don D.

    2015-05-19

    An apparatus, method and computer program product for automatically controlling power dissipation of a parallel computing system that includes a plurality of processors. A computing device issues a command to the parallel computing system. A clock pulse-width modulator encodes the command in a system clock signal to be distributed to the plurality of processors. The plurality of processors in the parallel computing system receive the system clock signal including the encoded command, and adjusts power dissipation according to the encoded command.

  11. Critical interactions between the Global Fund-supported HIV programs and the health system in Ghana.

    PubMed

    Atun, Rifat; Pothapregada, Sai Kumar; Kwansah, Janet; Degbotse, D; Lazarus, Jeffrey V

    2011-08-01

    The support of global health initiatives in recipient countries has been vigorously debated. Critics are concerned that disease-specific programs may be creating vertical and parallel service delivery structures that to some extent undermine health systems. This case study of Ghana aimed to explore how the Global Fund-supported HIV program interacts with the health system there and to map the extent and nature of integration of the national disease program across 6 key health systems functions. Qualitative interviews of national stakeholders were conducted to understand the perceptions of the strengths and weaknesses of the relationship between Global Fund-supported activities and the health system and to identify positive synergies and unintended consequences of integration. Ghana has a well-functioning sector-wide approach to financing its health system, with a strong emphasis on integrated care delivery. Ghana has benefited from US $175 million of approved Global Fund support to address the HIV epidemic, accounting for almost 85% of the National AIDS Control Program budget. Investments in infrastructure, human resources, and commodities have enabled HIV interventions to increase exponentially. Global Fund-supported activities have been well integrated into key health system functions to strengthen them, especially financing, planning, service delivery, and demand generation. Yet, with governance and monitoring and evaluation functions, parallel structures to national systems have emerged, leading to inefficiencies. This case study demonstrates that interactions and integration are highly varied across different health system functions, and strong government leadership has facilitated the integration of Global Fund-supported activities within national programs.

  12. Parallel computational fluid dynamics '91; Conference Proceedings, Stuttgart, Germany, Jun. 10-12, 1991

    NASA Technical Reports Server (NTRS)

    Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)

    1992-01-01

    A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.

  13. Data Partitioning and Load Balancing in Parallel Disk Systems

    NASA Technical Reports Server (NTRS)

    Scheuermann, Peter; Weikum, Gerhard; Zabback, Peter

    1997-01-01

    Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible waves, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent, self-reliant file system that aims to optimize striping by taking into account the requirements of the applications and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces.

  14. Conceptual design of a hybrid parallel mechanism for mask exchanging of TMT

    NASA Astrophysics Data System (ADS)

    Wang, Jianping; Zhou, Hongfei; Li, Kexuan; Zhou, Zengxiang; Zhai, Chao

    2015-10-01

    Mask exchange system is an important part of the Multi-Object Broadband Imaging Echellette (MOBIE) on the Thirty Meter Telescope (TMT). To solve the problem of stiffness changing with the gravity vector of the mask exchange system in the MOBIE, the hybrid parallel mechanism design method was introduced into the whole research. By using the characteristics of high stiffness and precision of parallel structure, combined with large moving range of serial structure, a conceptual design of a hybrid parallel mask exchange system based on 3-RPS parallel mechanism was presented. According to the position requirements of the MOBIE, the SolidWorks structure model of the hybrid parallel mask exchange robot was established and the appropriate installation position without interfering with the related components and light path in the MOBIE of TMT was analyzed. Simulation results in SolidWorks suggested that 3-RPS parallel platform had good stiffness property in different gravity vector directions. Furthermore, through the research of the mechanism theory, the inverse kinematics solution of the 3-RPS parallel platform was calculated and the mathematical relationship between the attitude angle of moving platform and the angle of ball-hinges on the moving platform was established, in order to analyze the attitude adjustment ability of the hybrid parallel mask exchange robot. The proposed conceptual design has some guiding significance for the design of mask exchange system of the MOBIE on TMT.

  15. Analytical Assessment of Simultaneous Parallel Approach Feasibility from Total System Error

    NASA Technical Reports Server (NTRS)

    Madden, Michael M.

    2014-01-01

    In a simultaneous paired approach to closely-spaced parallel runways, a pair of aircraft flies in close proximity on parallel approach paths. The aircraft pair must maintain a longitudinal separation within a range that avoids wake encounters and, if one of the aircraft blunders, avoids collision. Wake avoidance defines the rear gate of the longitudinal separation. The lead aircraft generates a wake vortex that, with the aid of crosswinds, can travel laterally onto the path of the trail aircraft. As runway separation decreases, the wake has less distance to traverse to reach the path of the trail aircraft. The total system error of each aircraft further reduces this distance. The total system error is often modeled as a probability distribution function. Therefore, Monte-Carlo simulations are a favored tool for assessing a "safe" rear-gate. However, safety for paired approaches typically requires that a catastrophic wake encounter be a rare one-in-a-billion event during normal operation. Using a Monte-Carlo simulation to assert this event rarity with confidence requires a massive number of runs. Such large runs do not lend themselves to rapid turn-around during the early stages of investigation when the goal is to eliminate the infeasible regions of the solution space and to perform trades among the independent variables in the operational concept. One can employ statistical analysis using simplified models more efficiently to narrow the solution space and identify promising trades for more in-depth investigation using Monte-Carlo simulations. These simple, analytical models not only have to address the uncertainty of the total system error but also the uncertainty in navigation sources used to alert an abort of the procedure. This paper presents a method for integrating total system error, procedure abort rates, avionics failures, and surveillance errors into a statistical analysis that identifies the likely feasible runway separations for simultaneous paired approaches.

  16. Line-drawing algorithms for parallel machines

    NASA Technical Reports Server (NTRS)

    Pang, Alex T.

    1990-01-01

    The fact that conventional line-drawing algorithms, when applied directly on parallel machines, can lead to very inefficient codes is addressed. It is suggested that instead of modifying an existing algorithm for a parallel machine, a more efficient implementation can be produced by going back to the invariants in the definition. Popular line-drawing algorithms are compared with two alternatives; distance to a line (a point is on the line if sufficiently close to it) and intersection with a line (a point on the line if an intersection point). For massively parallel single-instruction-multiple-data (SIMD) machines (with thousands of processors and up), the alternatives provide viable line-drawing algorithms. Because of the pixel-per-processor mapping, their performance is independent of the line length and orientation.

  17. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2001-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  18. Relative Debugging of Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2002-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  19. Work stealing for GPU-accelerated parallel programs in a global address space framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less

  20. DEVELOPMENTAL AND WITHDRAWAL EFFECTS OF ADOLESCENT AAS EXPOSURE ON THE GLUTAMATERGIC SYSTEM IN HAMSTERS

    PubMed Central

    Carrillo, Maria; Ricci, Lesley A.; Melloni, Richard H.

    2011-01-01

    In the Syrian hamster (Mesocricetus auratus) glutamate activity has been implicated in the modulation of adolescent anabolic-androgenic steroid (AAS)-induced aggression. The current study investigated the time course of adolescent AAS-induced neurodevelopmental and withdrawal effects on the glutamatergic system and examined whether these changes paralleled those of adolescent AAS-induced aggression. Glutamate activity in brain areas comprising the aggression circuit in hamsters and aggression were examined following 1, 2, 3 and 4 weeks of AAS treatment or 1, 2, 3 and 4 weeks following the cessation of AAS exposure. In these studies glutamate activity was examined using vesicular glutamate transporter 2 (VGLUT2). The onset of aggression was observed following 2 weeks exposure to AAS and continued to increase showing maximal aggression levels after 4 weeks of AAS treatment. This aggressive phenotype was detected after 2 weeks of withdrawal from AAS. The time-course of AAS-induced changes in latero anterior hypothalamus (LAH)-VGLUT2 closely paralleled increases in aggression. Increases in LAH-VGLUT2 were first detected in animals exposed to AAS for 2 weeks and were maintained up to 3 weeks following the cessation of AAS treatment. AAS treatment also produced developmental and long-term alterations in VGLUT2 expression within other aggression areas. However, AAS-induced changes in glutamate activity within these regions did not coincide with changes in aggression. Together, these data indicate that adolescent AAS treatment leads to alterations in the glutamatergic system in brain areas implicated in aggression control, yet only alterations in LAH-glutamate parallel the time course of AAS-induced changes in the aggressive phenotype. PMID:21500881

  1. Airbreathing Propulsion System Analysis Using Multithreaded Parallel Processing

    NASA Technical Reports Server (NTRS)

    Schunk, Richard Gregory; Chung, T. J.; Rodriguez, Pete (Technical Monitor)

    2000-01-01

    In this paper, parallel processing is used to analyze the mixing, and combustion behavior of hypersonic flow. Preliminary work for a sonic transverse hydrogen jet injected from a slot into a Mach 4 airstream in a two-dimensional duct combustor has been completed [Moon and Chung, 1996]. Our aim is to extend this work to three-dimensional domain using multithreaded domain decomposition parallel processing based on the flowfield-dependent variation theory. Numerical simulations of chemically reacting flows are difficult because of the strong interactions between the turbulent hydrodynamic and chemical processes. The algorithm must provide an accurate representation of the flowfield, since unphysical flowfield calculations will lead to the faulty loss or creation of species mass fraction, or even premature ignition, which in turn alters the flowfield information. Another difficulty arises from the disparity in time scales between the flowfield and chemical reactions, which may require the use of finite rate chemistry. The situations are more complex when there is a disparity in length scales involved in turbulence. In order to cope with these complicated physical phenomena, it is our plan to utilize the flowfield-dependent variation theory mentioned above, facilitated by large eddy simulation. Undoubtedly, the proposed computation requires the most sophisticated computational strategies. The multithreaded domain decomposition parallel processing will be necessary in order to reduce both computational time and storage. Without special treatments involved in computer engineering, our attempt to analyze the airbreathing combustion appears to be difficult, if not impossible.

  2. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

    PubMed Central

    Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

    2013-01-01

    Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358

  3. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  4. Modelling parallel programs and multiprocessor architectures with AXE

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.

    1991-01-01

    AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.

  5. AZTEC. Parallel Iterative method Software for Solving Linear Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hutchinson, S.; Shadid, J.; Tuminaro, R.

    1995-07-01

    AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less

  6. Bit-parallel arithmetic in a massively-parallel associative processor

    NASA Technical Reports Server (NTRS)

    Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

    1992-01-01

    A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.

  7. Equalizer: a scalable parallel rendering framework.

    PubMed

    Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato

    2009-01-01

    Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.

  8. Massively parallel GPU-accelerated minimization of classical density functional theory

    NASA Astrophysics Data System (ADS)

    Stopper, Daniel; Roth, Roland

    2017-08-01

    In this paper, we discuss the ability to numerically minimize the grand potential of hard disks in two-dimensional and of hard spheres in three-dimensional space within the framework of classical density functional and fundamental measure theory on modern graphics cards. Our main finding is that a massively parallel minimization leads to an enormous performance gain in comparison to standard sequential minimization schemes. Furthermore, the results indicate that in complex multi-dimensional situations, a heavy parallel minimization of the grand potential seems to be mandatory in order to reach a reasonable balance between accuracy and computational cost.

  9. Two-stage bulk electron heating in the diffusion region of anti-parallel symmetric reconnection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Le, Ari Yitzchak; Egedal, Jan; Daughton, William Scott

    2016-10-13

    Electron bulk energization in the diffusion region during anti-parallel symmetric reconnection entails two stages. First, the inflowing electrons are adiabatically trapped and energized by an ambipolar parallel electric field. Next, the electrons gain energy from the reconnection electric field as they undergo meandering motion. These collisionless mechanisms have been described previously, and they lead to highly structured electron velocity distributions. Furthermore, a simplified control-volume analysis gives estimates for how the net effective heating scales with the upstream plasma conditions in agreement with fully kinetic simulations and spacecraft observations.

  10. Parallel medicinal chemistry approaches to selective HDAC1/HDAC2 inhibitor (SHI-1:2) optimization.

    PubMed

    Kattar, Solomon D; Surdi, Laura M; Zabierek, Anna; Methot, Joey L; Middleton, Richard E; Hughes, Bethany; Szewczak, Alexander A; Dahlberg, William K; Kral, Astrid M; Ozerova, Nicole; Fleming, Judith C; Wang, Hongmei; Secrist, Paul; Harsch, Andreas; Hamill, Julie E; Cruz, Jonathan C; Kenific, Candia M; Chenard, Melissa; Miller, Thomas A; Berk, Scott C; Tempest, Paul

    2009-02-15

    The successful application of both solid and solution phase library synthesis, combined with tight integration into the medicinal chemistry effort, resulted in the efficient optimization of a novel structural series of selective HDAC1/HDAC2 inhibitors by the MRL-Boston Parallel Medicinal Chemistry group. An initial lead from a small parallel library was found to be potent and selective in biochemical assays. Advanced compounds were the culmination of iterative library design and possess excellent biochemical and cellular potency, as well as acceptable PK and efficacy in animal models.

  11. XDATA

    DTIC Science & Technology

    2017-05-01

    Parallelizing PINT The main focus of our research into the parallelization of the PINT algorithm has been to find appropriately scalable matrix math algorithms...leading eigenvector of the adjacency matrix of the pairwise affinity graph. We reviewed the matrix math implementation currently being used in PINT and...the new versions support a feature called matrix.distributed, which is some level of support for distributed matrix math ; however our code is not

  12. Bioreactor design concepts

    NASA Technical Reports Server (NTRS)

    Bowie, William

    1987-01-01

    Two parallel lines of work are underway in the bioreactor laboratory. One of the efforts is devoted to the continued development and utilization of a laboratory research system. That system's design is intended to be fluid and dynamic. The sole purpose of such a device is to allow testing and development of equipment concepts and procedures. Some of the results of those processes are discussed. A second effort is designed to produce a flight-like bioreactor contained in a double middeck locker. The result of that effort has been to freeze a particular bioreactor design in order to allow fabrication of the custom parts. The system is expected to be ready for flight in early 1988. However, continued use of the laboratory system will lead to improvements in the space bioreactor. Those improvements can only be integrated after the initial flight series.

  13. Odel of Dynamic Integration of Lean Shop Floor Management Within the Organizational Management System

    NASA Astrophysics Data System (ADS)

    Iuga, Virginia; Kifor, Claudiu

    2014-12-01

    The key to achieve a sustainable development lies in the customer satisfaction through improved quality, reduced cost, reduced delivery lead times and proper communication. The objective of the lean manufacturing system (LMS) is to identify and eliminate the processes and resources which do not add value to a product. The following paper aims to present a proposal of further development of integrated management systems in organizations through the implementation of lean shop floor management. In the first part of the paper, a dynamic model of the implementation steps will be presented. Furthermore, the paper underlines the importance of implementing a lean culture parallel with each step of integrating the lean methods and tools. The paper also describes the Toyota philosophy, tools, and the supporting lean culture necessary to implementing an efficient lean system in productive organizations

  14. A Performance Evaluation of the Cray X1 for Scientific Applications

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David

    2003-01-01

    The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.

  15. Anticipation from sensation: using anticipating synchronization to stabilize a system with inherent sensory delay.

    PubMed

    Eberle, Henry; Nasuto, Slawomir J; Hayashi, Yoshikatsu

    2018-03-01

    We present a novel way of using a dynamical model for predictive tracking control that can adapt to a wide range of delays without parameter update. This is achieved by incorporating the paradigm of anticipating synchronization (AS), where a 'slave' system predicts a 'master' via delayed self-feedback. By treating the delayed output of the plant as one half of a 'sensory' AS coupling, the plant and an internal dynamical model can be synchronized such that the plant consistently leads the target's motion. We use two simulated robotic systems with differing arrangements of the plant and internal model ('parallel' and 'serial') to demonstrate that this form of control adapts to a wide range of delays without requiring the parameters of the controller to be changed.

  16. Breaking the Ice

    NASA Technical Reports Server (NTRS)

    1989-01-01

    Electro-Expulsive Separation System, a low power electro-thermal deicer, was invented by Leonard A. Haslim from the Ames Research Center, who was named 1988 NASA Inventor of the Year for his work. Sold under license by Dataproducts New England, Inc., it consists of an elastic, rubber-like deicer boot on the wing's leading edge with copper ribbons embedded in it. Conductors are separated by slits in between and parallel to the ribbons. When the system is switched on, a bank of capacitors in the power supply discharges into the conductors which induces the conductor pairs to repel each other. This results in a powerful force causing the slit voids to expand explosively, removing ice on the wing. EESS is more flexible, more effective, and easier to maintain than previous systems. Potential ship, bridge and industrial applications are under study.

  17. Scalable load balancing for massively parallel distributed Monte Carlo particle transport

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

    2013-07-01

    In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less

  18. A low-altitude mechanism for mesoscale dynamics, structure, and current filamentation in the discrete aurora

    NASA Technical Reports Server (NTRS)

    Keskinen, M. J.; Chaturvedi, P. K.; Ossakow, S. L.

    1992-01-01

    The 2D nonlinear evolution of the ionization-driven adiabatic auroral arc instability is studied. We find: (1) the adiabatic auroral arc instability can fully develop on time scales of tens to hundreds of seconds and on spatial scales of tens to hundreds of kilometers; (2) the evolution of this instability leads to nonlinear 'hook-shaped' conductivity structures: (3) this instability can lead to parallel current filamentation over a wide range of scale sizes; and (4) the k-spectra of the density, electric field, and parallel current develop into inverse power laws in agreement with satellite observations. Comparison with mesoscale auroral phenomenology and current filamentation structures is made.

  19. Design and implementation of a high performance network security processor

    NASA Astrophysics Data System (ADS)

    Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

    2010-03-01

    The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.

  20. A hydrodynamic mechanism for spontaneous formation of ordered drop arrays in confined shear flow

    NASA Astrophysics Data System (ADS)

    Singha, Sagnik; Zurita-Gotor, Mauricio; Loewenberg, Michael; Migler, Kalman; Blawzdziewicz, Jerzy

    2017-11-01

    It has been experimentally demonstrated that a drop monolayer driven by a confined shear flow in a Couette device can spontaneously arrange into a flow-oriented parallel chain microstructure. However, the hydrodynamic mechanism of this puzzling self-assembly phenomenon has so far eluded explanation. In a recent publication we suggested that the observed spontaneous drop ordering may arise from hydrodynamic interparticle interactions via a far-field quadrupolar Hele-Shaw flow associated with drop deformation. To verify this conjecture we have developed a simple numerical-simulation model that includes the far-field Hele-Shaw flow quadrupoles and a near-field short-range repulsion. Our simulations show that an initially disordered particle configuration self-organizes into a system of particle chains, similar to the experimentally observed drop-chain structures. The initial stage of chain formation is fast; subsequently, microstructural defects in a partially ordered system are removed by slow annealing, leading to an array of equally spaced parallel chains with a small number of defects. The microstructure evolution is analyzed using angular and spatial order parameters and correlation functions. Supported by NSF Grants No. CBET 1603627 and CBET 1603806.

  1. Parallel Vortex Body Interaction Enabled by Active Flow Control

    NASA Astrophysics Data System (ADS)

    Weingaertner, Andre; Tewes, Philipp; Little, Jesse

    2017-11-01

    An experimental study was conducted to explore the flow physics of parallel vortex body interaction between two NACA 0012 airfoils. Experiments were carried out at chord Reynolds numbers of 740,000. Initially, the leading airfoil was characterized without the target one being installed. Results are in good agreement with thin airfoil theory and data provided in the literature. Afterward, the leading airfoil was fixed at 18° incidence and the target airfoil was installed 6 chord lengths downstream. Plasma actuation (ns-DBD), originating close to the leading edge, was used to control vortex shedding from the leading airfoil at various frequencies (0.04

  2. Computational analysis of stall and separation control in centrifugal compressors

    NASA Astrophysics Data System (ADS)

    Stein, Alexander

    2000-10-01

    A numerical technique for simulating unsteady viscous fluid flow in turbomachinery components has been developed. In this technique, the three-dimensional form of the Reynolds averaged Navier-Stokes equations is solved in a time-accurate manner. The flow solver is used to study fluid dynamic phenomena that lead to instabilities in centrifugal compressors. The results indicate that large flow incidence angles, at reduced flow rates, can cause boundary layer separation near the blade leading edge. This mechanism is identified as the primary factor in the stall inception process. High-pressure jets upstream of the compressor face are studied as a means of controlling compressor instabilities. Steady jets are found to alter the leading edge flow pattern and effectively suppress compressor instabilities. Yawed jets are more effective than parallel jets and an optimum yaw angle exists for each compression system. Numerical simulations utilizing pulsed jets have also been done. Pulsed jets are found to yield additional performance enhancements and lead to a reduction in external air requirements for operating the jets. Jets pulsed at higher frequencies perform better than low-frequency jets. These findings suggest that air injection is a viable means of alleviating compressor instabilities and could impact gas turbine technology. Results concerning the optimization of practical air injection systems and implications for future research are discussed. The flow solver developed in this work, along with the postprocessing tools developed to interpret the results, provide a rational framework for analyzing and controlling current and next generation compression systems.

  3. Toxicological study of injuries of rat’s hippocampus after lead poisoning by synchrotron microradiography and elemental mapping

    NASA Astrophysics Data System (ADS)

    Liang, Feng; Zhang, Guilin; Xiao, Xianghui; Cai, Zhonghou; Lai, Barry; Hwu, Yeukuang; Yan, Chonghuai; Xu, Jian; Li, Yulan; Tan, Mingguang; Zhang, Chuanfu; Li, Yan

    2010-09-01

    The hippocampus, a major component of the brain, is one of the target nervous organs in lead poisoning. In this work, a rat's hippocampal injury caused by lead was studied. The lead concentrations in blood, bone and hippocampus collected from rats subject to lead poisoning were quantified by Inductively Coupled Plasma Mass Spectrometry while morphological information and elemental distributions in the hippocampus were obtained with synchrotron radiation X-ray phase contrast imaging and synchrotron radiation micro-beam X-ray fluorescence, respectively. For comparison, identical characterization of the specimens from the rats in the control group was done in parallel. Results show that the ratios between the lead content in the treated group and that in the control group of the hippocampus, bone, and blood are about 2.66, 236, and 39.6, respectively. Analysis also revealed that some health elements such as S, K, Cl and P increase in the regions with high lead content in the treated hippocampus. Morphological differences between the normal and lead-exposed hippocampus specimens in some local areas were observed. Explicitly, the structure of the lead-exposed hippocampus was tortuous and irregular, and the density of the neurons in the Dentate Gyrus was significantly lower than that from the control group. The study shows that the synchrotron radiation methods are very powerful for investigating structural injury caused by heavy metals in the nervous system.

  4. PRAIS: Distributed, real-time knowledge-based systems made easy

    NASA Technical Reports Server (NTRS)

    Goldstein, David G.

    1990-01-01

    This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.

  5. MARBLE: A system for executing expert systems in parallel

    NASA Technical Reports Server (NTRS)

    Myers, Leonard; Johnson, Coe; Johnson, Dean

    1990-01-01

    This paper details the MARBLE 2.0 system which provides a parallel environment for cooperating expert systems. The work has been done in conjunction with the development of an intelligent computer-aided design system, ICADS, by the CAD Research Unit of the Design Institute at California Polytechnic State University. MARBLE (Multiple Accessed Rete Blackboard Linked Experts) is a system of C Language Production Systems (CLIPS) expert system tool. A copied blackboard is used for communication between the shells to establish an architecture which supports cooperating expert systems that execute in parallel. The design of MARBLE is simple, but it provides support for a rich variety of configurations, while making it relatively easy to demonstrate the correctness of its parallel execution features. In its most elementary configuration, individual CLIPS expert systems execute on their own processors and communicate with each other through a modified blackboard. Control of the system as a whole, and specifically of writing to the blackboard is provided by one of the CLIPS expert systems, an expert control system.

  6. Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

    NASA Technical Reports Server (NTRS)

    Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

    2003-01-01

    Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.

  7. Methods for design and evaluation of parallel computating systems (The PISCES project)

    NASA Technical Reports Server (NTRS)

    Pratt, Terrence W.; Wise, Robert; Haught, Mary JO

    1989-01-01

    The PISCES project started in 1984 under the sponsorship of the NASA Computational Structural Mechanics (CSM) program. A PISCES 1 programming environment and parallel FORTRAN were implemented in 1984 for the DEC VAX (using UNIX processes to simulate parallel processes). This system was used for experimentation with parallel programs for scientific applications and AI (dynamic scene analysis) applications. PISCES 1 was ported to a network of Apollo workstations by N. Fitzgerald.

  8. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, Michael S.; Strip, David R.

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  9. Design considerations for parallel graphics libraries

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  10. Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    NASA Technical Reports Server (NTRS)

    Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

    1994-01-01

    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.

  11. Thought Leaders during Crises in Massive Social Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Corley, Courtney D.; Farber, Robert M.; Reynolds, William

    The vast amount of social media data that can be gathered from the internet coupled with workflows that utilize both commodity systems and massively parallel supercomputers, such as the Cray XMT, open new vistas for research to support health, defense, and national security. Computer technology now enables the analysis of graph structures containing more than 4 billion vertices joined by 34 billion edges along with metrics and massively parallel algorithms that exhibit near-linear scalability according to number of processors. The challenge lies in making this massive data and analysis comprehensible to an analyst and end-users that require actionable knowledge tomore » carry out their duties. Simply stated, we have developed language and content agnostic techniques to reduce large graphs built from vast media corpora into forms people can understand. Specifically, our tools and metrics act as a survey tool to identify thought leaders' -- those members that lead or reflect the thoughts and opinions of an online community, independent of the source language.« less

  12. Optimization of sparse matrix-vector multiplication on emerging multicore platforms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Samuel; Oliker, Leonid; Vuduc, Richard

    2007-01-01

    We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less

  13. GPUs in a computational physics course

    NASA Astrophysics Data System (ADS)

    Adler, Joan; Nissim, Gal; Kiswani, Ahmad

    2017-10-01

    In an introductory computational physics class of the type that many of us give, time constraints lead to hard choices on topics. Everyone likes to include their own research in such a class but an overview of many areas is paramount. Parallel programming algorithms using MPI is one important topic. Both the principle and the need to break the “fear barrier” of using a large machine with a queuing system via ssh must be sucessfully passed on. Due to the plateau in chip development and to power considerations future HPC hardware choices will include heavy use of GPUs. Thus the need to introduce these at the level of an introductory course has arisen. Just as for parallel coding, explanation of the benefits and simple examples to guide the hesitant first time user should be selected. Several student projects using GPUs that include how-to pages were proposed at the Technion. Two of the more successful ones were lattice Boltzmann and a finite element code, and we present these in detail.

  14. Compile-time estimation of communication costs in multicomputers

    NASA Technical Reports Server (NTRS)

    Gupta, Manish; Banerjee, Prithviraj

    1991-01-01

    An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Any strategy for automatic data partitioning needs a mechanism for estimating the performance of a program under a given partitioning scheme, the most crucial part of which involves determining the communication costs incurred by the program. A methodology is described for estimating the communication costs at compile-time as functions of the numbers of processors over which various arrays are distributed. A strategy is described along with its theoretical basis, for making program transformations that expose opportunities for combining of messages, leading to considerable savings in the communication costs. For certain loops with regular dependences, the compiler can detect the possibility of pipelining, and thus estimate communication costs more accurately than it could otherwise. These results are of great significance to any parallelization system supporting numeric applications on multicomputers. In particular, they lay down a framework for effective synthesis of communication on multicomputers from sequential program references.

  15. Three-Point Gear/Lead Screw Positioning

    NASA Technical Reports Server (NTRS)

    Calco, Frank S.

    1993-01-01

    Triple-ganged-lead-screw positioning mechanism drives movable plate toward or away from fixed plate and keeps plates parallel to each other. Designed for use in tuning microwave resonant cavity. Other potential applications include adjustable bed plates and cantilever tail stocks in machine tools, adjustable platforms for optical equipment, and lifting platforms.

  16. Infrared laser transillumination CT imaging system using parallel fiber arrays and optical switches for finger joint imaging

    NASA Astrophysics Data System (ADS)

    Sasaki, Yoshiaki; Emori, Ryota; Inage, Hiroki; Goto, Masaki; Takahashi, Ryo; Yuasa, Tetsuya; Taniguchi, Hiroshi; Devaraj, Balasigamani; Akatsuka, Takao

    2004-05-01

    The heterodyne detection technique, on which the coherent detection imaging (CDI) method founds, can discriminate and select very weak, highly directional forward scattered, and coherence retaining photons that emerge from scattering media in spite of their complex and highly scattering nature. That property enables us to reconstruct tomographic images using the same reconstruction technique as that of X-Ray CT, i.e., the filtered backprojection method. Our group had so far developed a transillumination laser CT imaging method based on the CDI method in the visible and near-infrared regions and reconstruction from projections, and reported a variety of tomographic images both in vitro and in vivo of biological objects to demonstrate the effectiveness to biomedical use. Since the previous system was not optimized, it took several hours to obtain a single image. For a practical use, we developed a prototype CDI-based imaging system using parallel fiber array and optical switches to reduce the measurement time significantly. Here, we describe a prototype transillumination laser CT imaging system using fiber-optic based on optical heterodyne detection for early diagnosis of rheumatoid arthritis (RA), by demonstrating the tomographic imaging of acrylic phantom as well as the fundamental imaging properties. We expect that further refinements of the fiber-optic-based laser CT imaging system could lead to a novel and practical diagnostic tool for rheumatoid arthritis and other joint- and bone-related diseases in human finger.

  17. Architecture studies and system demonstrations for optical parallel processor for AI and NI

    NASA Astrophysics Data System (ADS)

    Lee, Sing H.

    1988-03-01

    In solving deterministic AI problems the data search for matching the arguments of a PROLOG expression causes serious bottleneck when implemented sequentially by electronic systems. To overcome this bottleneck we have developed the concepts for an optical expert system based on matrix-algebraic formulation, which will be suitable for parallel optical implementation. The optical AI system based on matrix-algebraic formation will offer distinct advantages for parallel search, adult learning, etc.

  18. Extended RF shimming: Sequence‐level parallel transmission optimization applied to steady‐state free precession MRI of the heart

    PubMed Central

    Price, Anthony N.; Padormo, Francesco; Hajnal, Joseph V.; Malik, Shaihan J.

    2017-01-01

    Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 +) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a ‘sequence‐level’ optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady‐state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight‐channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single‐channel operation, a mean‐squared‐error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. PMID:28195684

  19. The immunity-related GTPase Irga6 dimerizes in a parallel head-to-head fashion.

    PubMed

    Schulte, Kathrin; Pawlowski, Nikolaus; Faelber, Katja; Fröhlich, Chris; Howard, Jonathan; Daumke, Oliver

    2016-03-02

    The immunity-related GTPases (IRGs) constitute a powerful cell-autonomous resistance system against several intracellular pathogens. Irga6 is a dynamin-like protein that oligomerizes at the parasitophorous vacuolar membrane (PVM) of Toxoplasma gondii leading to its vesiculation. Based on a previous biochemical analysis, it has been proposed that the GTPase domains of Irga6 dimerize in an antiparallel fashion during oligomerization. We determined the crystal structure of an oligomerization-impaired Irga6 mutant bound to a non-hydrolyzable GTP analog. Contrary to the previous model, the structure shows that the GTPase domains dimerize in a parallel fashion. The nucleotides in the center of the interface participate in dimerization by forming symmetric contacts with each other and with the switch I region of the opposing Irga6 molecule. The latter contact appears to activate GTP hydrolysis by stabilizing the position of the catalytic glutamate 106 in switch I close to the active site. Further dimerization contacts involve switch II, the G4 helix and the trans stabilizing loop. The Irga6 structure features a parallel GTPase domain dimer, which appears to be a unifying feature of all dynamin and septin superfamily members. This study contributes important insights into the assembly and catalytic mechanisms of IRG proteins as prerequisite to understand their anti-microbial action.

  20. Experiments and Analyses of Data Transfers Over Wide-Area Dedicated Connections

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S.; Liu, Qiang; Sen, Satyabrata

    Dedicated wide-area network connections are increasingly employed in high-performance computing and big data scenarios. One might expect the performance and dynamics of data transfers over such connections to be easy to analyze due to the lack of competing traffic. However, non-linear transport dynamics and end-system complexities (e.g., multi-core hosts and distributed filesystems) can in fact make analysis surprisingly challenging. We present extensive measurements of memory-to-memory and disk-to-disk file transfers over 10 Gbps physical and emulated connections with 0–366 ms round trip times (RTTs). For memory-to-memory transfers, profiles of both TCP and UDT throughput as a function of RTT show concavemore » and convex regions; large buffer sizes and more parallel flows lead to wider concave regions, which are highly desirable. TCP and UDT both also display complex throughput dynamics, as indicated by their Poincare maps and Lyapunov exponents. For disk-to-disk transfers, we determine that high throughput can be achieved via a combination of parallel I/O threads, parallel network threads, and direct I/O mode. Our measurements also show that Lustre filesystems can be mounted over long-haul connections using LNet routers, although challenges remain in jointly optimizing file I/O and transport method parameters to achieve peak throughput.« less

  1. Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    E, J. C.; Huang, J. Y.; Bie, B. X.

    Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less

  2. Deformation and fracture of explosion-welded Ti/Al plates: A synchrotron-based study

    DOE PAGES

    E, J. C.; Huang, J. Y.; Bie, B. X.; ...

    2016-08-02

    Here, explosion-welded Ti/Al plates are characterized with energy dispersive spectroscopy and x-ray computed tomography, and exhibit smooth, well-jointed, interface. We perform dynamic and quasi-static uniaxial tension experiments on Ti/Al with the loading direction either perpendicular or parallel to the Ti/Al interface, using a mini split Hopkinson tension bar and a material testing system in conjunction with time-resolved synchrotron x-ray imaging. X-ray imaging and strain-field mapping reveal different deformation mechanisms responsible for anisotropic bulk-scale responses, including yield strength, ductility and rate sensitivity. Deformation and fracture are achieved predominantly in Al layer for perpendicular loading, but both Ti and Al layers asmore » well as the interface play a role for parallel loading. The rate sensitivity of Ti/Al follows those of the constituent metals. For perpendicular loading, single deformation band develops in Al layer under quasi-static loading, while multiple deformation bands nucleate simultaneously under dynamic loading, leading to a higher dynamic fracture strain. For parallel loading, the interface impedes the growth of deformation and results in increased ductility of Ti/Al under quasi-static loading, while interface fracture occurs under dynamic loading due to the disparity in Poisson's contraction.« less

  3. Improving the scalability of hyperspectral imaging applications on heterogeneous platforms using adaptive run-time data compression

    NASA Astrophysics Data System (ADS)

    Plaza, Antonio; Plaza, Javier; Paz, Abel

    2010-10-01

    Latest generation remote sensing instruments (called hyperspectral imagers) are now able to generate hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth. In previous work, we have reported that the scalability of parallel processing algorithms dealing with these high-dimensional data volumes is affected by the amount of data to be exchanged through the communication network of the system. However, large messages are common in hyperspectral imaging applications since processing algorithms are pixel-based, and each pixel vector to be exchanged through the communication network is made up of hundreds of spectral values. Thus, decreasing the amount of data to be exchanged could improve the scalability and parallel performance. In this paper, we propose a new framework based on intelligent utilization of wavelet-based data compression techniques for improving the scalability of a standard hyperspectral image processing chain on heterogeneous networks of workstations. This type of parallel platform is quickly becoming a standard in hyperspectral image processing due to the distributed nature of collected hyperspectral data as well as its flexibility and low cost. Our experimental results indicate that adaptive lossy compression can lead to improvements in the scalability of the hyperspectral processing chain without sacrificing analysis accuracy, even at sub-pixel precision levels.

  4. Extended RF shimming: Sequence-level parallel transmission optimization applied to steady-state free precession MRI of the heart.

    PubMed

    Beqiri, Arian; Price, Anthony N; Padormo, Francesco; Hajnal, Joseph V; Malik, Shaihan J

    2017-06-01

    Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 + ) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a 'sequence-level' optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady-state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight-channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single-channel operation, a mean-squared-error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. © 2017 The Authors. NMR in Biomedicine published by John Wiley & Sons Ltd.

  5. Massively parallel information processing systems for space applications

    NASA Technical Reports Server (NTRS)

    Schaefer, D. H.

    1979-01-01

    NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.

  6. Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.

  7. Design of a dataway processor for a parallel image signal processing system

    NASA Astrophysics Data System (ADS)

    Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

    1995-04-01

    Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.

  8. Replica-exchange Wang Landau sampling: pushing the limits of Monte Carlo simulations in materials sciences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perera, Meewanage Dilina N; Li, Ying Wai; Eisenbach, Markus

    We describe the study of thermodynamics of materials using replica-exchange Wang Landau (REWL) sampling, a generic framework for massively parallel implementations of the Wang Landau Monte Carlo method. To evaluate the performance and scalability of the method, we investigate the magnetic phase transition in body-centered cubic (bcc) iron using the classical Heisenberg model parameterized with first principles calculations. We demonstrate that our framework leads to a significant speedup without compromising the accuracy and precision and facilitates the study of much larger systems than is possible with its serial counterpart.

  9. Consciousness-raising in a gender conflict group.

    PubMed

    Joel, Daphna; Yarimi, Dana

    2014-01-01

    This article describes the main processes and themes in consciousness-raising gender conflict groups for undergraduate students who study in parallel a course on gender and psychology. The main theme of the course is that gender is a classification system that influences individuals, interactions between individuals, and social institutions. The aim of the groups is to provide students with a safe environment to discuss their thoughts and feelings following the encounter of these ideas. Group leading is based on a combination of principles derived from consciousness-raising groups from the 1970s and a model for working with groups in conflict.

  10. A possible explanation of the parallel tracks in kilohertz quasi-periodic oscillations from low-mass-X-ray binaries

    NASA Astrophysics Data System (ADS)

    Shi, Chang-Sheng; Zhang, Shuang-Nan; Li, Xiang-Dong

    2018-05-01

    We recalculate the modes of the magnetohydrodynamics (MHD) waves in the MHD model (Shi, Zhang & Li 2014) of the kilohertz quasi-periodic oscillations (kHz QPOs) in neutron star low mass X-ray binaries (NS-LMXBs), in which the compressed magnetosphere is considered. A method on point-by-point scanning for every parameter of a normal LMXBs is proposed to determine the wave number in a NS-LMXB. Then dependence of the twin kHz QPO frequencies on accretion rates (\\dot{M}) is obtained with the wave number and magnetic field (B*) determined by our method. Based on the MHD model, a new explanation of the parallel tracks, i.e. the slowly varying effective magnetic field leads to the shift of parallel tracks in a source, is presented. In this study, we obtain a simple power-law relation between the kHz QPO frequencies and \\dot{M}/B_{\\ast }^2 in those sources. Finally, we study the dependence of kHz quasi-periodic oscillation frequencies on the spin, mass and radius of a neutron star. We find that the effective magnetic field, the spin, mass and radius of a neutron star lead to the parallel tracks in different sources.

  11. Algorithms and programming tools for image processing on the MPP

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.

    1985-01-01

    Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.

  12. High Performance Input/Output for Parallel Computer Systems

    NASA Technical Reports Server (NTRS)

    Ligon, W. B.

    1996-01-01

    The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.

  13. Efficient Implementation of Multigrid Solvers on Message-Passing Parrallel Systems

    NASA Technical Reports Server (NTRS)

    Lou, John

    1994-01-01

    We discuss our implementation strategies for finite difference multigrid partial differential equation (PDE) solvers on message-passing systems. Our target parallel architecture is Intel parallel computers: the Delta and Paragon system.

  14. A Next-Generation Parallel File System Environment for the OLCF

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dillow, David A; Fuller, Douglas; Gunasekaran, Raghul

    2012-01-01

    When deployed in 2008/2009 the Spider system at the Oak Ridge National Laboratory s Leadership Computing Facility (OLCF) was the world s largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF s diverse computational environment, Spider has since become a blueprint for shared Lustre environments deployed worldwide. Designed to support the parallel I/O requirements of the Jaguar XT5 system and other smallerscale platforms at the OLCF, the upgrade to the Titan XK6 heterogeneous system will begin to push the limits of Spider s originalmore » design by mid 2013. With a doubling in total system memory and a 10x increase in FLOPS, Titan will require both higher bandwidth and larger total capacity. Our goal is to provide a 4x increase in total I/O bandwidth from over 240GB=sec today to 1TB=sec and a doubling in total capacity. While aggregate bandwidth and total capacity remain important capabilities, an equally important goal in our efforts is dramatically increasing metadata performance, currently the Achilles heel of parallel file systems at leadership. We present in this paper an analysis of our current I/O workloads, our operational experiences with the Spider parallel file systems, the high-level design of our Spider upgrade, and our efforts in developing benchmarks that synthesize our performance requirements based on our workload characterization studies.« less

  15. Causes of High-temperature Superconductivity in the Hydrogen Sulfide Electron-phonon System

    NASA Astrophysics Data System (ADS)

    Degtyarenko, N. N.; Mazur, E. A.

    The electron and phonon spectra, as well as the density of electron and phonon states of the stable orthorhombic structure of hydrogen sulfide (SH2) at pressures 100-180 GPa have been calculated. It is found that the set of parallel planes of hydrogen atoms is formed at pressure ∼175 GPa as a result of structural changes in the unit cell of the crystal under pressure. There should be complete concentration of hydrogen atoms in these planes. As a result the electron properties of the system acquire a quasi-two-dimensional character. The features of in phase and antiphase oscillations of hydrogen atoms in these planes leading to two narrow high-energy peaks in the phonon density of states are investigated.

  16. Reasons for high-temperature superconductivity in the electron-phonon system of hydrogen sulfide

    NASA Astrophysics Data System (ADS)

    Degtyarenko, N. N.; Mazur, E. A.

    2015-08-01

    We have calculated the electron and phonon spectra, as well as the densities of the electron and phonon states, of the stable orthorhombic structure of hydrogen sulfide SH2 in the pressure interval 100-180 GPa. It is found that at a pressure of 175 GPa, a set of parallel planes of hydrogen atoms is formed due to a structural modification of the unit cell under pressure with complete accumulation of all hydrogen atoms in these planes. As a result, the electronic properties of the system become quasi-two-dimensional. We have also analyzed the collective synphase and antiphase vibrations of hydrogen atoms in these planes, leading to the occurrence of two high-energy peaks in the phonon density of states.

  17. Periodic activations of behaviours and emotional adaptation in behaviour-based robotics

    NASA Astrophysics Data System (ADS)

    Burattini, Ernesto; Rossi, Silvia

    2010-09-01

    The possible modulatory influence of motivations and emotions is of great interest in designing robotic adaptive systems. In this paper, an attempt is made to connect the concept of periodic behaviour activations to emotional modulation, in order to link the variability of behaviours to the circumstances in which they are activated. The impact of emotion is studied, described as timed controlled structures, on simple but conflicting reactive behaviours. Through this approach it is shown that the introduction of such asynchronies in the robot control system may lead to an adaptation in the emergent behaviour without having an explicit action selection mechanism. The emergent behaviours of a simple robot designed with both a parallel and a hierarchical architecture are evaluated and compared.

  18. A derivation and scalable implementation of the synchronous parallel kinetic Monte Carlo method for simulating long-time dynamics

    NASA Astrophysics Data System (ADS)

    Byun, Hye Suk; El-Naggar, Mohamed Y.; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

    2017-10-01

    Kinetic Monte Carlo (KMC) simulations are used to study long-time dynamics of a wide variety of systems. Unfortunately, the conventional KMC algorithm is not scalable to larger systems, since its time scale is inversely proportional to the simulated system size. A promising approach to resolving this issue is the synchronous parallel KMC (SPKMC) algorithm, which makes the time scale size-independent. This paper introduces a formal derivation of the SPKMC algorithm based on local transition-state and time-dependent Hartree approximations, as well as its scalable parallel implementation based on a dual linked-list cell method. The resulting algorithm has achieved a weak-scaling parallel efficiency of 0.935 on 1024 Intel Xeon processors for simulating biological electron transfer dynamics in a 4.2 billion-heme system, as well as decent strong-scaling parallel efficiency. The parallel code has been used to simulate a lattice of cytochrome complexes on a bacterial-membrane nanowire, and it is broadly applicable to other problems such as computational synthesis of new materials.

  19. Visualizing Parallel Computer System Performance

    NASA Technical Reports Server (NTRS)

    Malony, Allen D.; Reed, Daniel A.

    1988-01-01

    Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.

  20. Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

    NASA Technical Reports Server (NTRS)

    Goldstein, David

    1991-01-01

    Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.

  1. PRIMA-X Final Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lorenz, Daniel; Wolf, Felix

    2016-02-17

    The PRIMA-X (Performance Retargeting of Instrumentation, Measurement, and Analysis Technologies for Exascale Computing) project is the successor of the DOE PRIMA (Performance Refactoring of Instrumentation, Measurement, and Analysis Technologies for Petascale Computing) project, which addressed the challenge of creating a core measurement infrastructure that would serve as a common platform for both integrating leading parallel performance systems (notably TAU and Scalasca) and developing next-generation scalable performance tools. The PRIMA-X project shifts the focus away from refactorization of robust performance tools towards a re-targeting of the parallel performance measurement and analysis architecture for extreme scales. The massive concurrency, asynchronous execution dynamics,more » hardware heterogeneity, and multi-objective prerequisites (performance, power, resilience) that identify exascale systems introduce fundamental constraints on the ability to carry forward existing performance methodologies. In particular, there must be a deemphasis of per-thread observation techniques to significantly reduce the otherwise unsustainable flood of redundant performance data. Instead, it will be necessary to assimilate multi-level resource observations into macroscopic performance views, from which resilient performance metrics can be attributed to the computational features of the application. This requires a scalable framework for node-level and system-wide monitoring and runtime analyses of dynamic performance information. Also, the interest in optimizing parallelism parameters with respect to performance and energy drives the integration of tool capabilities in the exascale environment further. Initially, PRIMA-X was a collaborative project between the University of Oregon (lead institution) and the German Research School for Simulation Sciences (GRS). Because Prof. Wolf, the PI at GRS, accepted a position as full professor at Technische Universität Darmstadt (TU Darmstadt) starting February 1st, 2015, the project ended at GRS on January 31st, 2015. This report reflects the work accomplished at GRS until then. The work of GRS is expected to be continued at TU Darmstadt. The first main accomplishment of GRS is the design of different thread-level aggregation techniques. We created a prototype capable of aggregating the thread-level information in performance profiles using these techniques. The next step will be the integration of the most promising techniques into the Score-P measurement system and their evaluation. The second main accomplishment is a substantial increase of Score-P’s scalability, achieved by improving the design of the system-tree representation in Score-P’s profile format. We developed a new representation and a distributed algorithm to create the scalable system tree representation. Finally, we developed a lightweight approach to MPI wait-state profiling. Former algorithms either needed piggy-backing, which can cause significant runtime overhead, or tracing, which comes with its own set of scaling challenges. Our approach works with local data only and, thus, is scalable and has very little overhead.« less

  2. Functional Constructivism: In Search of Formal Descriptors.

    PubMed

    Trofimova, Irina

    2017-10-01

    The Functional Constructivism (FC) paradigm is an alternative to behaviorism and considers behavior as being generated every time anew, based on an individual's capacities, environmental resources and demands. Walter Freeman's work provided us with evidence supporting the FC principles. In this paper we make parallels between gradual construction processes leading to the formation of individual behavior and habits, and evolutionary processes leading to the establishment of biological systems. Referencing evolutionary theory, several formal descriptors of such processes are proposed. These FC descriptors refer to the most universal aspects for constructing consistent structures: expansion of degrees of freedom, integration processes based on internal and external compatibility between systems and maintenance processes, all given in four different classes of systems: (a) Zone of Proximate Development (poorly defined) systems; (b) peer systems with emerging reproduction of multiple siblings; (c) systems with internalized integration of behavioral elements ('cruise controls'); and (d) systems capable of handling low-probability, not yet present events. The recursive dynamics within this set of descriptors acting on (traditional) downward, upward and horizontal directions of evolution, is conceptualized as diagonal evolution, or di-evolution. Two examples applying these FC descriptors to taxonomy are given: classification of the functionality of neuro-transmitters and temperament traits; classification of mental disorders. The paper is an early step towards finding a formal language describing universal tendencies in highly diverse, complex and multi-level transient systems known in ecology and biology as 'contingency cycles.'

  3. Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes

    DOEpatents

    Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey

    2010-10-05

    In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.

  4. Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

    2017-07-01

    Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.

  5. LDRD final report on massively-parallel linear programming : the parPCx system.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

    2005-02-01

    This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less

  6. Methods for operating parallel computing systems employing sequenced communications

    DOEpatents

    Benner, R.E.; Gustafson, J.L.; Montry, G.R.

    1999-08-10

    A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.

  7. Methods for operating parallel computing systems employing sequenced communications

    DOEpatents

    Benner, Robert E.; Gustafson, John L.; Montry, Gary R.

    1999-01-01

    A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.

  8. A plane wave model for direct simulation of reflection and transmission by discretely inhomogeneous plane parallel media

    NASA Astrophysics Data System (ADS)

    Mackowski, Daniel; Ramezanpour, Bahareh

    2018-07-01

    A formulation is developed for numerically solving the frequency domain Maxwell's equations in plane parallel layers of inhomogeneous media. As was done in a recent work [1], the plane parallel layer is modeled as an infinite square lattice of W × W × H unit cells, with W being a sample width of the layer and H the layer thickness. As opposed to the 3D volume integral/discrete dipole formulation, the derivation begins with a Fourier expansion of the electric field amplitude in the lateral plane, and leads to a coupled system of 1D ordinary differential equations in the depth direction of the layer. A 1D dyadic Green's function is derived for this system and used to construct a set of coupled 1D integral equations for the field expansion coefficients. The resulting mathematical formulation is considerably simpler and more compact than that derived, for the same system, using the discrete dipole approximation applied to the periodic plane lattice. Furthermore, the fundamental property variable appearing in the formulation is the Fourier transformed complex permittivity distribution in the unit cell, and the method obviates any need to define or calculate a dipole polarizability. Although designed primarily for random media calculations, the method is also capable of predicting the single scattering properties of individual particles; comparisons are presented to demonstrate that the method can accurately reproduce, at scattering angles not too close to 90°, the polarimetric scattering properties of single and multiple spheres. The derivation of the dyadic Green's function allows for an analytical preconditioning of the equations, and it is shown that this can result in significantly accelerated solution times when applied to densely-packed systems of particles. Calculation results demonstrate that the method, when applied to inhomogeneous media, can predict coherent backscattering and polarization opposition effects.

  9. Bivelocity Picture in the Nonrelativistic Limit of Relativistic Hydrodynamics

    NASA Astrophysics Data System (ADS)

    Koide, Tomoi; Ramos, Rudnei O.; Vicente, Gustavo S.

    2015-02-01

    We discuss the nonrelativistic limit of the relativistic Navier-Fourier-Stokes (NFS) theory. The next-to-leading order relativistic corrections to the NFS theory for the Landau-Lifshitz fluid are obtained. While the lowest order truncation of the velocity expansion leads to the usual NFS equations of nonrelativistic fluids, we show that when the next-to-leading order relativistic corrections are included, the equations can be expressed concurrently with two different fluid velocities. One of the fluid velocities is parallel to the conserved charge current (which follows the Eckart definition) and the other one is parallel to the energy current (which follows the Landau-Lifshitz definition). We compare this next-to-leading order relativistic hydrodynamics with bivelocity hydrodynamics, which is one of the generalizations of the NFS theory and is formulated in such a way to include the usual mass velocity and also a new velocity, called the volume velocity. We find that the volume velocity can be identified with the velocity obtained in the Landau-Lifshitz definition. Then, the structure of bivelocity hydrodynamics, which is derived using various nontrivial assumptions, is reproduced in the NFS theory including the next-to-leading order relativistic corrections.

  10. Two-dimensional parallel array technology as a new approach to automated combinatorial solid-phase organic synthesis

    PubMed

    Brennan; Biddison; Frauendorf; Schwarcz; Keen; Ecker; Davis; Tinder; Swayze

    1998-01-01

    An automated, 96-well parallel array synthesizer for solid-phase organic synthesis has been designed and constructed. The instrument employs a unique reagent array delivery format, in which each reagent utilized has a dedicated plumbing system. An inert atmosphere is maintained during all phases of a synthesis, and temperature can be controlled via a thermal transfer plate which holds the injection molded reaction block. The reaction plate assembly slides in the X-axis direction, while eight nozzle blocks holding the reagent lines slide in the Y-axis direction, allowing for the extremely rapid delivery of any of 64 reagents to 96 wells. In addition, there are six banks of fixed nozzle blocks, which deliver the same reagent or solvent to eight wells at once, for a total of 72 possible reagents. The instrument is controlled by software which allows the straightforward programming of the synthesis of a larger number of compounds. This is accomplished by supplying a general synthetic procedure in the form of a command file, which calls upon certain reagents to be added to specific wells via lookup in a sequence file. The bottle position, flow rate, and concentration of each reagent is stored in a separate reagent table file. To demonstrate the utility of the parallel array synthesizer, a small combinatorial library of hydroxamic acids was prepared in high throughput mode for biological screening. Approximately 1300 compounds were prepared on a 10 μmole scale (3-5 mg) in a few weeks. The resulting crude compounds were generally >80% pure, and were utilized directly for high throughput screening in antibacterial assays. Several active wells were found, and the activity was verified by solution-phase synthesis of analytically pure material, indicating that the system described herein is an efficient means for the parallel synthesis of compounds for lead discovery. Copyright 1998 John Wiley & Sons, Inc.

  11. Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

    NASA Astrophysics Data System (ADS)

    Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

    2017-10-01

    In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.

  12. Negative tunnel magnetoresistance and differential conductance in transport through double quantum dots

    NASA Astrophysics Data System (ADS)

    Trocha, Piotr; Weymann, Ireneusz; Barnaś, Józef

    2009-10-01

    Spin-dependent transport through two coupled single-level quantum dots weakly connected to ferromagnetic leads with collinear magnetizations is considered theoretically. Transport characteristics, including the current, linear and nonlinear conductances, and tunnel magnetoresistance are calculated using the real-time diagrammatic technique in the parallel, serial, and intermediate geometries. The effects due to virtual tunneling processes between the two dots via the leads, associated with off-diagonal coupling matrix elements, are also considered. Negative differential conductance and negative tunnel magnetoresistance have been found in the case of serial and intermediate geometries, while no such behavior has been observed for double quantum dots coupled in parallel. It is also shown that transport characteristics strongly depend on the magnitude of the off-diagonal coupling matrix elements.

  13. Rethinking key–value store for parallel I/O optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kougkas, Anthony; Eslami, Hassan; Sun, Xian-He

    2015-01-26

    Key-value stores are being widely used as the storage system for large-scale internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems are the dominant storage solution. In this study, we examine the architecture differences and performance characteristics of parallel file systems and key-value stores. We propose using key-value stores to optimize overall Input/Output (I/O) performance, especially for workloads that parallel file systems cannot handle well, such as the cases with intense data synchronization or heavy metadata operations. We conducted experiments with several synthetic benchmarks, an I/O benchmark, and a real application.more » We modeled the performance of these two systems using collected data from our experiments, and we provide a predictive method to identify which system offers better I/O performance given a specific workload. The results show that we can optimize the I/O performance in HPC systems by utilizing key-value stores.« less

  14. Development of a parallel demodulation system used for extrinsic Fabry-Perot interferometer and fiber Bragg grating sensors.

    PubMed

    Jiang, Junfeng; Liu, Tiegen; Zhang, Yimo; Liu, Lina; Zha, Ying; Zhang, Fan; Wang, Yunxin; Long, Pin

    2006-01-20

    A parallel demodulation system for extrinsic Fabry-Perot interferometer (EFPI) and fiber Bragg grating (FBG) sensors is presented, which is based on a Michelson interferometer and combines the methods of low-coherence interference and a Fourier-transform spectrum. The parallel demodulation theory is modeled with Fourier-transform spectrum technology, and a signal separation method with an EFPI and FBG is proposed. The design of an optical path difference scanning and sampling method without a reference light is described. Experiments show that the parallel demodulation system has good spectrum demodulation and low-coherence interference demodulation performance. It can realize simultaneous strain and temperature measurements while keeping the whole system configuration less complex.

  15. Monte Carlo simulation of biomolecular systems with BIOMCSIM

    NASA Astrophysics Data System (ADS)

    Kamberaj, H.; Helms, V.

    2001-12-01

    A new Monte Carlo simulation program, BIOMCSIM, is presented that has been developed in particular to simulate the behaviour of biomolecular systems, leading to insights and understanding of their functions. The computational complexity in Monte Carlo simulations of high density systems, with large molecules like proteins immersed in a solvent medium, or when simulating the dynamics of water molecules in a protein cavity, is enormous. The program presented in this paper seeks to provide these desirable features putting special emphasis on simulations in grand canonical ensembles. It uses different biasing techniques to increase the convergence of simulations, and periodic load balancing in its parallel version, to maximally utilize the available computer power. In periodic systems, the long-ranged electrostatic interactions can be treated by Ewald summation. The program is modularly organized, and implemented using an ANSI C dialect, so as to enhance its modifiability. Its performance is demonstrated in benchmark applications for the proteins BPTI and Cytochrome c Oxidase.

  16. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Gryphon, Coranth D.; Miller, Mark D.

    1991-01-01

    PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.

  17. A psychoanalyst views inception.

    PubMed

    Clemens, Norman A

    2013-05-01

    The author, a psychoanalyst, discusses the 2010 film, Inception, discerning the parallels and differences between cinematic dreaming states as shown in the film and psychoanalytic processes. The movie presents the unknown and un-psychoanalytic phenomena of group shared dreaming, manipulation of other people's dreams with criminal intent, and multiple structured layers of dreaming. In parallel, however, the lead character appears to work through a complicated state of derealization, mourning, guilt, rage, and loss in the course of dreaming.

  18. In silico optimization of pharmacokinetic properties and receptor binding affinity simultaneously: a 'parallel progression approach to drug design' applied to β-blockers.

    PubMed

    Advani, Poonam; Joseph, Blessy; Ambre, Premlata; Pissurlenkar, Raghuvir; Khedkar, Vijay; Iyer, Krishna; Gabhe, Satish; Iyer, Radhakrishnan P; Coutinho, Evans

    2016-01-01

    The present work exploits the potential of in silico approaches for minimizing attrition of leads in the later stages of drug development. We propose a theoretical approach, wherein 'parallel' information is generated to simultaneously optimize the pharmacokinetics (PK) and pharmacodynamics (PD) of lead candidates. β-blockers, though in use for many years, have suboptimal PKs; hence are an ideal test series for the 'parallel progression approach'. This approach utilizes molecular modeling tools viz. hologram quantitative structure activity relationships, homology modeling, docking, predictive metabolism, and toxicity models. Validated models have been developed for PK parameters such as volume of distribution (log Vd) and clearance (log Cl), which together influence the half-life (t1/2) of a drug. Simultaneously, models for PD in terms of inhibition constant pKi have been developed. Thus, PK and PD properties of β-blockers were concurrently analyzed and after iterative cycling, modifications were proposed that lead to compounds with optimized PK and PD. We report some of the resultant re-engineered β-blockers with improved half-lives and pKi values comparable with marketed β-blockers. These were further analyzed by the docking studies to evaluate their binding poses. Finally, metabolic and toxicological assessment of these molecules was done through in silico methods. The strategy proposed herein has potential universal applicability, and can be used in any drug discovery scenario; provided that the data used is consistent in terms of experimental conditions, endpoints, and methods employed. Thus the 'parallel progression approach' helps to simultaneously fine-tune various properties of the drug and would be an invaluable tool during the drug development process.

  19. German-Korean cooperation for erection and test of industrialized solar technologies

    NASA Astrophysics Data System (ADS)

    Pfeiffer, H.

    1986-01-01

    A combined small solar-wind power station and a solar-thermal experimental plant were built. The plants are designed to demonstrate the effective exploitation of solar energy and wind energy and enhanced availability achievable through combination of these two energy sources. A 14 kW wind energy converter and a 2.5 kW solar-cell generator were operated in parallel. The biaxial tracking system used on the solar generator leads to increased and constant generation of electricity throughout the day. A consumer control system switches the energy generators and the consumers in autonomous mode according to changing supply and demand. The solar powered air conditioning unit operates with an absorption type refrigerating unit, high-output flat collectors and an automatic control system. All design values are achieved on start-up of the plant.

  20. An experimental design method leading to chemical Turing patterns.

    PubMed

    Horváth, Judit; Szalai, István; De Kepper, Patrick

    2009-05-08

    Chemical reaction-diffusion patterns often serve as prototypes for pattern formation in living systems, but only two isothermal single-phase reaction systems have produced sustained stationary reaction-diffusion patterns so far. We designed an experimental method to search for additional systems on the basis of three steps: (i) generate spatial bistability by operating autoactivated reactions in open spatial reactors; (ii) use an independent negative-feedback species to produce spatiotemporal oscillations; and (iii) induce a space-scale separation of the activatory and inhibitory processes with a low-mobility complexing agent. We successfully applied this method to a hydrogen-ion autoactivated reaction, the thiourea-iodate-sulfite (TuIS) reaction, and noticeably produced stationary hexagonal arrays of spots and parallel stripes of pH patterns attributed to a Turing bifurcation. This method could be extended to biochemical reactions.

  1. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  2. Parallelization of the FLAPW method and comparison with the PPW method

    NASA Astrophysics Data System (ADS)

    Canning, Andrew; Mannstadt, Wolfgang; Freeman, Arthur

    2000-03-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. In the past the FLAPW method has been limited to systems of about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell running on up to 512 processors on a Cray T3E parallel supercomputer. Some results will also be presented on a comparison of the plane-wave pseudopotential method and the FLAPW method on large systems.

  3. IOPA: I/O-aware parallelism adaption for parallel programs

    PubMed Central

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236

  4. IOPA: I/O-aware parallelism adaption for parallel programs.

    PubMed

    Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

    2017-01-01

    With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.

  5. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  6. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, D.B.

    1996-12-31

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.

  7. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, Dario B.

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  8. Parallel Evolution of Cold Tolerance within Drosophila melanogaster

    PubMed Central

    Braun, Dylan T.; Lack, Justin B.

    2017-01-01

    Drosophila melanogaster originated in tropical Africa before expanding into strikingly different temperate climates in Eurasia and beyond. Here, we find elevated cold tolerance in three distinct geographic regions: beyond the well-studied non-African case, we show that populations from the highlands of Ethiopia and South Africa have significantly increased cold tolerance as well. We observe greater cold tolerance in outbred versus inbred flies, but only in populations with higher inversion frequencies. Each cold-adapted population shows lower inversion frequencies than a closely-related warm-adapted population, suggesting that inversion frequencies may decrease with altitude in addition to latitude. Using the FST-based “Population Branch Excess” statistic (PBE), we found only limited evidence for parallel genetic differentiation at the scale of ∼4 kb windows, specifically between Ethiopian and South African cold-adapted populations. And yet, when we looked for single nucleotide polymorphisms (SNPs) with codirectional frequency change in two or three cold-adapted populations, strong genomic enrichments were observed from all comparisons. These findings could reflect an important role for selection on standing genetic variation leading to “soft sweeps”. One SNP showed sufficient codirectional frequency change in all cold-adapted populations to achieve experiment-wide significance: an intronic variant in the synaptic gene Prosap. Another codirectional outlier SNP, at senseless-2, had a strong association with our cold trait measurements, but in the opposite direction as predicted. More generally, proteins involved in neurotransmission were enriched as potential targets of parallel adaptation. The ability to study cold tolerance evolution in a parallel framework will enhance this classic study system for climate adaptation. PMID:27777283

  9. Hierarchical Fuzzy Control Applied to Parallel Connected UPS Inverters Using Average Current Sharing Scheme

    NASA Astrophysics Data System (ADS)

    Singh, Santosh Kumar; Ghatak Choudhuri, Sumit

    2018-05-01

    Parallel connection of UPS inverters to enhance power rating is a widely accepted practice. Inter-modular circulating currents appear when multiple inverter modules are connected in parallel to supply variable critical load. Interfacing of modules henceforth requires an intensive design, using proper control strategy. The potentiality of human intuitive Fuzzy Logic (FL) control with imprecise system model is well known and thus can be utilised in parallel-connected UPS systems. Conventional FL controller is computational intensive, especially with higher number of input variables. This paper proposes application of Hierarchical-Fuzzy Logic control for parallel connected Multi-modular inverters system for reduced computational burden on the processor for a given switching frequency. Simulated results in MATLAB environment and experimental verification using Texas TMS320F2812 DSP are included to demonstrate feasibility of the proposed control scheme.

  10. Massively parallel E-beam inspection: enabling next-generation patterned defect inspection for wafer and mask manufacturing

    NASA Astrophysics Data System (ADS)

    Malloy, Matt; Thiel, Brad; Bunday, Benjamin D.; Wurm, Stefan; Mukhtar, Maseeh; Quoi, Kathy; Kemen, Thomas; Zeidler, Dirk; Eberle, Anna Lena; Garbowski, Tomasz; Dellemann, Gregor; Peters, Jan Hendrik

    2015-03-01

    SEMATECH aims to identify and enable disruptive technologies to meet the ever-increasing demands of semiconductor high volume manufacturing (HVM). As such, a program was initiated in 2012 focused on high-speed e-beam defect inspection as a complement, and eventual successor, to bright field optical patterned defect inspection [1]. The primary goal is to enable a new technology to overcome the key gaps that are limiting modern day inspection in the fab; primarily, throughput and sensitivity to detect ultra-small critical defects. The program specifically targets revolutionary solutions based on massively parallel e-beam technologies, as opposed to incremental improvements to existing e-beam and optical inspection platforms. Wafer inspection is the primary target, but attention is also being paid to next generation mask inspection. During the first phase of the multi-year program multiple technologies were reviewed, a down-selection was made to the top candidates, and evaluations began on proof of concept systems. A champion technology has been selected and as of late 2014 the program has begun to move into the core technology maturation phase in order to enable eventual commercialization of an HVM system. Performance data from early proof of concept systems will be shown along with roadmaps to achieving HVM performance. SEMATECH's vision for moving from early-stage development to commercialization will be shown, including plans for development with industry leading technology providers.

  11. Detecting opportunities for parallel observations on the Hubble Space Telescope

    NASA Technical Reports Server (NTRS)

    Lucks, Michael

    1992-01-01

    The presence of multiple scientific instruments aboard the Hubble Space Telescope provides opportunities for parallel science, i.e., the simultaneous use of different instruments for different observations. Determining whether candidate observations are suitable for parallel execution depends on numerous criteria (some involving quantitative tradeoffs) that may change frequently. A knowledge based approach is presented for constructing a scoring function to rank candidate pairs of observations for parallel science. In the Parallel Observation Matching System (POMS), spacecraft knowledge and schedulers' preferences are represented using a uniform set of mappings, or knowledge functions. Assessment of parallel science opportunities is achieved via composition of the knowledge functions in a prescribed manner. The knowledge acquisition, and explanation facilities of the system are presented. The methodology is applicable to many other multiple criteria assessment problems.

  12. Convergence issues in domain decomposition parallel computation of hovering rotor

    NASA Astrophysics Data System (ADS)

    Xiao, Zhongyun; Liu, Gang; Mou, Bin; Jiang, Xiong

    2018-05-01

    Implicit LU-SGS time integration algorithm has been widely used in parallel computation in spite of its lack of information from adjacent domains. When applied to parallel computation of hovering rotor flows in a rotating frame, it brings about convergence issues. To remedy the problem, three LU factorization-based implicit schemes (consisting of LU-SGS, DP-LUR and HLU-SGS) are investigated comparatively. A test case of pure grid rotation is designed to verify these algorithms, which show that LU-SGS algorithm introduces errors on boundary cells. When partition boundaries are circumferential, errors arise in proportion to grid speed, accumulating along with the rotation, and leading to computational failure in the end. Meanwhile, DP-LUR and HLU-SGS methods show good convergence owing to boundary treatment which are desirable in domain decomposition parallel computations.

  13. 1060-nm VCSEL-based parallel-optical modules for optical interconnects

    NASA Astrophysics Data System (ADS)

    Nishimura, N.; Nagashima, K.; Kise, T.; Rizky, A. F.; Uemura, T.; Nekado, Y.; Ishikawa, Y.; Nasu, H.

    2015-03-01

    The capability of mounting a parallel-optical module onto a PCB through solder-reflow process contributes to reduce the number of piece parts, simplify its assembly process, and minimize a foot print for both AOC and on-board applications. We introduce solder-reflow-capable parallel-optical modules employing 1060-nm InGaAs/GaAs VCSEL which leads to the advantages of realizing wider modulation bandwidth, longer transmission distance, and higher reliability. We demonstrate 4-channel parallel optical link performance operated at a bit stream of 28 Gb/s 231-1 PRBS for each channel and transmitted through a 50-μm-core MMF beyond 500 m. We also introduce a new mounting technology of paralleloptical module to realize maintaining good coupling and robust electrical connection during solder-reflow process between an optical module and a polymer-waveguide-embedded PCB.

  14. Parallel image reconstruction for 3D positron emission tomography from incomplete 2D projection data

    NASA Astrophysics Data System (ADS)

    Guerrero, Thomas M.; Ricci, Anthony R.; Dahlbom, Magnus; Cherry, Simon R.; Hoffman, Edward T.

    1993-07-01

    The problem of excessive computational time in 3D Positron Emission Tomography (3D PET) reconstruction is defined, and we present an approach for solving this problem through the construction of an inexpensive parallel processing system and the adoption of the FAVOR algorithm. Currently, the 3D reconstruction of the 610 images of a total body procedure would require 80 hours and the 3D reconstruction of the 620 images of a dynamic study would require 110 hours. An inexpensive parallel processing system for 3D PET reconstruction is constructed from the integration of board level products from multiple vendors. The system achieves its computational performance through the use of 6U VME four i860 processor boards, the processor boards from five manufacturers are discussed from our perspective. The new 3D PET reconstruction algorithm FAVOR, FAst VOlume Reconstructor, that promises a substantial speed improvement is adopted. Preliminary results from parallelizing FAVOR are utilized in formulating architectural improvements for this problem. In summary, we are addressing the problem of excessive computational time in 3D PET image reconstruction, through the construction of an inexpensive parallel processing system and the parallelization of a 3D reconstruction algorithm that uses the incomplete data set that is produced by current PET systems.

  15. Parallel dynamics between non-Hermitian and Hermitian systems

    NASA Astrophysics Data System (ADS)

    Wang, P.; Lin, S.; Jin, L.; Song, Z.

    2018-06-01

    We reveals a connection between non-Hermitian and Hermitian systems by studying the connection between a family of non-Hermitian and Hermitian Hamiltonians based on exact solutions. In general, for a dynamic process in a non-Hermitian system H , there always exists a parallel dynamic process governed by the corresponding Hermitian conjugate system H†. We show that a linear superposition of the two parallel dynamics is exactly equivalent to the time evolution of a state under a Hermitian Hamiltonian H , and we present the relations between {H ,H ,H†} .

  16. Dynamic performance of high speed solenoid valve with parallel coils

    NASA Astrophysics Data System (ADS)

    Kong, Xiaowu; Li, Shizhen

    2014-07-01

    The methods of improving the dynamic performance of high speed on/off solenoid valve include increasing the magnetic force of armature and the slew rate of coil current, decreasing the mass and stroke of moving parts. The increase of magnetic force usually leads to the decrease of current slew rate, which could increase the delay time of the dynamic response of solenoid valve. Using a high voltage to drive coil can solve this contradiction, but a high driving voltage can also lead to more cost and a decrease of safety and reliability. In this paper, a new scheme of parallel coils is investigated, in which the single coil of solenoid is replaced by parallel coils with same ampere turns. Based on the mathematic model of high speed solenoid valve, the theoretical formula for the delay time of solenoid valve is deduced. Both the theoretical analysis and the dynamic simulation show that the effect of dividing a single coil into N parallel sub-coils is close to that of driving the single coil with N times of the original driving voltage as far as the delay time of solenoid valve is concerned. A specific test bench is designed to measure the dynamic performance of high speed on/off solenoid valve. The experimental results also prove that both the delay time and switching time of the solenoid valves can be decreased greatly by adopting the parallel coil scheme. This research presents a simple and practical method to improve the dynamic performance of high speed on/off solenoid valve.

  17. The generalized accessibility and spectral gap of lower hybrid waves in tokamaks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Takahashi, Hironori

    1994-03-01

    The generalized accessibility of lower hybrid waves, primarily in the current drive regime of tokamak plasmas, which may include shifting, either upward or downward, of the parallel refractive index (n{sub {parallel}}), is investigated, based upon a cold plasma dispersion relation and various geometrical constraint (G.C.) relations imposed on the behavior of n{sub {parallel}}. It is shown that n{sub {parallel}} upshifting can be bounded and insufficient to bridge a large spectral gap to cause wave damping, depending upon whether the G.C. relation allows the oblique resonance to occur. The traditional n{sub {parallel}} upshifting mechanism caused by the pitch angle of magneticmore » field lines is shown to lead to contradictions with experimental observations. An upshifting mechanism brought about by the density gradient along field lines is proposed, which is not inconsistent with experimental observations, and provides plausible explanations to some unresolved issues of lower hybrid wave theory, including generation of {open_quote}seed electrons.{close_quote}« less

  18. A parallel solver for huge dense linear systems

    NASA Astrophysics Data System (ADS)

    Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

    2011-11-01

    HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.

  19. Parallel Computing Using Web Servers and "Servlets".

    ERIC Educational Resources Information Center

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  20. File-access characteristics of parallel scientific workloads

    NASA Technical Reports Server (NTRS)

    Nieuwejaar, Nils; Kotz, David; Purakayastha, Apratim; Best, Michael; Ellis, Carla Schlatter

    1995-01-01

    Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of parallel file systems. The design of a high-performance parallel file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of parallel file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide parallel file-system design.

  1. Future in biomolecular computation

    NASA Astrophysics Data System (ADS)

    Wimmer, E.

    1988-01-01

    Large-scale computations for biomolecules are dominated by three levels of theory: rigorous quantum mechanical calculations for molecules with up to about 30 atoms, semi-empirical quantum mechanical calculations for systems with up to several hundred atoms, and force-field molecular dynamics studies of biomacromolecules with 10,000 atoms and more including surrounding solvent molecules. It can be anticipated that increased computational power will allow the treatment of larger systems of ever growing complexity. Due to the scaling of the computational requirements with increasing number of atoms, the force-field approaches will benefit the most from increased computational power. On the other hand, progress in methodologies such as density functional theory will enable us to treat larger systems on a fully quantum mechanical level and a combination of molecular dynamics and quantum mechanics can be envisioned. One of the greatest challenges in biomolecular computation is the protein folding problem. It is unclear at this point, if an approach with current methodologies will lead to a satisfactory answer or if unconventional, new approaches will be necessary. In any event, due to the complexity of biomolecular systems, a hierarchy of approaches will have to be established and used in order to capture the wide ranges of length-scales and time-scales involved in biological processes. In terms of hardware development, speed and power of computers will increase while the price/performance ratio will become more and more favorable. Parallelism can be anticipated to become an integral architectural feature in a range of computers. It is unclear at this point, how fast massively parallel systems will become easy enough to use so that new methodological developments can be pursued on such computers. Current trends show that distributed processing such as the combination of convenient graphics workstations and powerful general-purpose supercomputers will lead to a new style of computing in which the calculations are monitored and manipulated as they proceed. The combination of a numeric approach with artificial-intelligence approaches can be expected to open up entirely new possibilities. Ultimately, the most exciding aspect of the future in biomolecular computing will be the unexpected discoveries.

  2. Computational strategies for three-dimensional flow simulations on distributed computer systems. Ph.D. Thesis Semiannual Status Report, 15 Aug. 1993 - 15 Feb. 1994

    NASA Technical Reports Server (NTRS)

    Weed, Richard Allen; Sankar, L. N.

    1994-01-01

    An increasing amount of research activity in computational fluid dynamics has been devoted to the development of efficient algorithms for parallel computing systems. The increasing performance to price ratio of engineering workstations has led to research to development procedures for implementing a parallel computing system composed of distributed workstations. This thesis proposal outlines an ongoing research program to develop efficient strategies for performing three-dimensional flow analysis on distributed computing systems. The PVM parallel programming interface was used to modify an existing three-dimensional flow solver, the TEAM code developed by Lockheed for the Air Force, to function as a parallel flow solver on clusters of workstations. Steady flow solutions were generated for three different wing and body geometries to validate the code and evaluate code performance. The proposed research will extend the parallel code development to determine the most efficient strategies for unsteady flow simulations.

  3. Development of a 3D parallel mechanism robot arm with three vertical-axial pneumatic actuators combined with a stereo vision system.

    PubMed

    Chiang, Mao-Hsiung; Lin, Hao-Ting

    2011-01-01

    This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot's end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H(∞) tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to verify the feasibility of the proposed parallel mechanism robot driven by three vertical pneumatic servo actuators, a full-scale test rig of the proposed parallel mechanism pneumatic robot is set up. Thus, simulations and experiments for different complex 3D motion profiles of the robot end-effector can be successfully achieved. The desired, the actual and the calculated 3D position of the end-effector can be compared in the complex 3D motion control.

  4. Development of a 3D Parallel Mechanism Robot Arm with Three Vertical-Axial Pneumatic Actuators Combined with a Stereo Vision System

    PubMed Central

    Chiang, Mao-Hsiung; Lin, Hao-Ting

    2011-01-01

    This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot’s end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H∞ tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to verify the feasibility of the proposed parallel mechanism robot driven by three vertical pneumatic servo actuators, a full-scale test rig of the proposed parallel mechanism pneumatic robot is set up. Thus, simulations and experiments for different complex 3D motion profiles of the robot end-effector can be successfully achieved. The desired, the actual and the calculated 3D position of the end-effector can be compared in the complex 3D motion control. PMID:22247676

  5. Visible quality aluminum and nickel superpolish polishing technology enabling new missions

    NASA Astrophysics Data System (ADS)

    Carrigan, Keith G.

    2011-06-01

    It is now well understood that with US Department of Defense (DoD) budgets shrinking and the Services and Agencies demanding new systems which can be fielded more quickly, cost and schedule are being emphasized more and more. At the same time, the US has ever growing needs for advanced capabilities to support evolving Intelligence, Surveillance and Reconnaissance objectives. In response to this market demand for ever more cost-effective, faster to market, single-channel, athermal optical systems, we have developed new metal polishing technologies which allow for short-lead, low-cost metal substrates to replace more costly, longer-lead material options. In parallel, the commercial marketplace is being driven continually to release better, faster and cheaper electronics. Growth according to Moore's law, enabled by advancements in photolithography, has produced denser memory, higher resolution displays and faster processors. While the quality of these products continues to increase, their price is falling. This seeming paradox is driven by industry advancements in manufacturing technology. The next steps on this curve can be realized via polishing technology which allows low-cost metal substrates to replace costly Silicon based optics for use in ultra-short wavelength systems.

  6. Optics Program Modified for Multithreaded Parallel Computing

    NASA Technical Reports Server (NTRS)

    Lou, John; Bedding, Dave; Basinger, Scott

    2006-01-01

    A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.

  7. A hybrid algorithm for parallel molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Mangiardi, Chris M.; Meyer, R.

    2017-10-01

    This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.

  8. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    NASA Astrophysics Data System (ADS)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  9. The AIS-5000 parallel processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schmitt, L.A.; Wilson, S.S.

    1988-05-01

    The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In thismore » paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.« less

  10. Code Optimization and Parallelization on the Origins: Looking from Users' Perspective

    NASA Technical Reports Server (NTRS)

    Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)

    2002-01-01

    Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.

  11. Parallel computation using boundary elements in solid mechanics

    NASA Technical Reports Server (NTRS)

    Chien, L. S.; Sun, C. T.

    1990-01-01

    The inherent parallelism of the boundary element method is shown. The boundary element is formulated by assuming the linear variation of displacements and tractions within a line element. Moreover, MACSYMA symbolic program is employed to obtain the analytical results for influence coefficients. Three computational components are parallelized in this method to show the speedup and efficiency in computation. The global coefficient matrix is first formed concurrently. Then, the parallel Gaussian elimination solution scheme is applied to solve the resulting system of equations. Finally, and more importantly, the domain solutions of a given boundary value problem are calculated simultaneously. The linear speedups and high efficiencies are shown for solving a demonstrated problem on Sequent Symmetry S81 parallel computing system.

  12. [Constrained competition in parallel drug importation: the case of simvastatin in Germany, the Netherlands, and the United Kingdom].

    PubMed

    Costa-Font, Joan; Kanavos, Panos

    2007-01-01

    To examine the effects of parallel simvastatin importation on drug price in three of the main parallel importing countries in the European Union, namely the United Kingdom, Germany, and the Netherlands. To estimate the market share of parallel imported simvastatin and the unit price -both locally produced and parallel imported- adjusted by defined daily dose in the importing country and in the exporting country (Spain). Ordinary least squares regression was used to examine the potential price competition resulting from parallel drug trade between 1997 and 2002. The market share of parallel imported simvastatin progressively expanded (especially in the United Kingdom and Germany) in the period examined, although the price difference between parallel imported and locally sourced simvastatin was not significant. Prices tended to rise in the United Kingdom and Germany and declined in the Netherlands. We found no evidence of pro-competitive effects resulting from the expansion of parallel trade. The development of parallel drug importation in the European Union produced unexpected effects (limited competition) on prices that differ from those expected by the introduction of a new competitor. This is partially the result of drug price regulation scant incentives to competition and of the lack of transparency in the drug reimbursement system, especially due to the effect of informal discounts (not observable to researchers). The case of simvastatin reveals that savings to the health system from parallel trade are trivial. Finally, of the three countries examined, the only country that shows a moderate downward pattern in simvastatin prices is the Netherlands. This effect can be attributed to the existence of a system that claws back informal discounts.

  13. Distributed parallel computing in stochastic modeling of groundwater systems.

    PubMed

    Dong, Yanhui; Li, Guomin; Xu, Haizhen

    2013-03-01

    Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.

  14. Methods and systems to enhance flame holding in a gas turbine engine

    DOEpatents

    Zuo, Baifang [Simpsonville, SC; Lacy, Benjamin Paul [Greer, SC; Stevenson, Christian Xavier [Inman, SC

    2012-01-31

    A fuel nozzle including a swirler assembly that includes a shroud, a hub, and a plurality of vanes extending between the shroud and the hub. Each vane includes a pressure sidewall and an opposite suction sidewall coupled to the pressure sidewall at a leading edge and at a trailing edge. At least one suction side fuel injection orifice is formed adjacent to the leading edge and extends from a first fuel supply passage to the suction sidewall. A fuel injection angle is oriented with respect to the suction sidewall. The suction side fuel injection orifice is configured to discharge fuel outward from the suction sidewall. At least one pressure side fuel injection orifice extends from a second fuel supply passage to the pressure sidewall and is substantially parallel to the trailing edge. The pressure side fuel injection orifice is configured to discharge fuel tangentially from the trailing edge.

  15. Reliability models for dataflow computer systems

    NASA Technical Reports Server (NTRS)

    Kavi, K. M.; Buckles, B. P.

    1985-01-01

    The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.

  16. Method for resource control in parallel environments using program organization and run-time support

    NASA Technical Reports Server (NTRS)

    Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

    2001-01-01

    A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.

  17. Method for resource control in parallel environments using program organization and run-time support

    NASA Technical Reports Server (NTRS)

    Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

    1999-01-01

    A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.

  18. Adaptive parallel logic networks

    NASA Technical Reports Server (NTRS)

    Martinez, Tony R.; Vidal, Jacques J.

    1988-01-01

    Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.

  19. Parallel Implementation of a Frozen Flow Based Wavefront Reconstructor

    NASA Astrophysics Data System (ADS)

    Nagy, J.; Kelly, K.

    2013-09-01

    Obtaining high resolution images of space objects from ground based telescopes is challenging, often requiring the use of a multi-frame blind deconvolution (MFBD) algorithm to remove blur caused by atmospheric turbulence. In order for an MFBD algorithm to be effective, it is necessary to obtain a good initial estimate of the wavefront phase. Although wavefront sensors work well in low turbulence situations, they are less effective in high turbulence, such as when imaging in daylight, or when imaging objects that are close to the Earth's horizon. One promising approach, which has been shown to work very well in high turbulence settings, uses a frozen flow assumption on the atmosphere to capture the inherent temporal correlations present in consecutive frames of wavefront data. Exploiting these correlations can lead to more accurate estimation of the wavefront phase, and the associated PSF, which leads to more effective MFBD algorithms. However, with the current serial implementation, the approach can be prohibitively expensive in situations when it is necessary to use a large number of frames. In this poster we describe a parallel implementation that overcomes this constraint. The parallel implementation exploits sparse matrix computations, and uses the Trilinos package developed at Sandia National Laboratories. Trilinos provides a variety of core mathematical software for parallel architectures that have been designed using high quality software engineering practices, The package is open source, and portable to a variety of high-performance computing architectures.

  20. Parallel, Asynchronous Executive (PAX): System concepts, facilities, and architecture

    NASA Technical Reports Server (NTRS)

    Jones, W. H.

    1983-01-01

    The Parallel, Asynchronous Executive (PAX) is a software operating system simulation that allows many computers to work on a single problem at the same time. PAX is currently implemented on a UNIVAC 1100/42 computer system. Independent UNIVAC runstreams are used to simulate independent computers. Data are shared among independent UNIVAC runstreams through shared mass-storage files. PAX has achieved the following: (1) applied several computing processes simultaneously to a single, logically unified problem; (2) resolved most parallel processor conflicts by careful work assignment; (3) resolved by means of worker requests to PAX all conflicts not resolved by work assignment; (4) provided fault isolation and recovery mechanisms to meet the problems of an actual parallel, asynchronous processing machine. Additionally, one real-life problem has been constructed for the PAX environment. This is CASPER, a collection of aerodynamic and structural dynamic problem simulation routines. CASPER is not discussed in this report except to provide examples of parallel-processing techniques.

  1. Developing Information Power Grid Based Algorithms and Software

    NASA Technical Reports Server (NTRS)

    Dongarra, Jack

    1998-01-01

    This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.

  2. Parallel checksumming of data chunks of a shared data object using a log-structured file system

    DOEpatents

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2016-09-06

    Checksum values are generated and used to verify the data integrity. A client executing in a parallel computing system stores a data chunk to a shared data object on a storage node in the parallel computing system. The client determines a checksum value for the data chunk; and provides the checksum value with the data chunk to the storage node that stores the shared object. The data chunk can be stored on the storage node with the corresponding checksum value as part of the shared object. The storage node may be part of a Parallel Log-Structured File System (PLFS), and the client may comprise, for example, a Log-Structured File System client on a compute node or burst buffer. The checksum value can be evaluated when the data chunk is read from the storage node to verify the integrity of the data that is read.

  3. RTEMS SMP and MTAPI for Efficient Multi-Core Space Applications on LEON3/LEON4 Processors

    NASA Astrophysics Data System (ADS)

    Cederman, Daniel; Hellstrom, Daniel; Sherrill, Joel; Bloom, Gedare; Patte, Mathieu; Zulianello, Marco

    2015-09-01

    This paper presents the final result of an European Space Agency (ESA) activity aimed at improving the software support for LEON processors used in SMP configurations. One of the benefits of using a multicore system in a SMP configuration is that in many instances it is possible to better utilize the available processing resources by load balancing between cores. This however comes with the cost of having to synchronize operations between cores, leading to increased complexity. While in an AMP system one can use multiple instances of operating systems that are only uni-processor capable, a SMP system requires the operating system to be written to support multicore systems. In this activity we have improved and extended the SMP support of the RTEMS real-time operating system and ensured that it fully supports the multicore capable LEON processors. The targeted hardware in the activity has been the GR712RC, a dual-core core LEON3FT processor, and the functional prototype of ESA's Next Generation Multiprocessor (NGMP), a quad core LEON4 processor. The final version of the NGMP is now available as a product under the name GR740. An implementation of the Multicore Task Management API (MTAPI) has been developed as part of this activity to aid in the parallelization of applications for RTEMS SMP. It allows for simplified development of parallel applications using the task-based programming model. An existing space application, the Gaia Video Processing Unit, has been ported to RTEMS SMP using the MTAPI implementation to demonstrate the feasibility and usefulness of multicore processors for space payload software. The activity is funded by ESA under contract 4000108560/13/NL/JK. Gedare Bloom is supported in part by NSF CNS-0934725.

  4. Biocellion: accelerating computer simulation of multicellular biological system models

    PubMed Central

    Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

    2014-01-01

    Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572

  5. Principles for problem aggregation and assignment in medium scale multiprocessors

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Saltz, Joel H.

    1987-01-01

    One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.

  6. A parallel expert system for the control of a robotic air vehicle

    NASA Technical Reports Server (NTRS)

    Shakley, Donald; Lamont, Gary B.

    1988-01-01

    Expert systems can be used to govern the intelligent control of vehicles, for example the Robotic Air Vehicle (RAV). Due to the nature of the RAV system the associated expert system needs to perform in a demanding real-time environment. The use of a parallel processing capability to support the associated expert system's computational requirement is critical in this application. Thus, algorithms for parallel real-time expert systems must be designed, analyzed, and synthesized. The design process incorporates a consideration of the rule-set/face-set size along with representation issues. These issues are looked at in reference to information movement and various inference mechanisms. Also examined is the process involved with transporting the RAV expert system functions from the TI Explorer, where they are implemented in the Automated Reasoning Tool (ART), to the iPSC Hypercube, where the system is synthesized using Concurrent Common LISP (CCLISP). The transformation process for the ART to CCLISP conversion is described. The performance characteristics of the parallel implementation of these expert systems on the iPSC Hypercube are compared to the TI Explorer implementation.

  7. Multibus-based parallel processor for simulation

    NASA Technical Reports Server (NTRS)

    Ogrady, E. P.; Wang, C.-H.

    1983-01-01

    A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.

  8. Double lead spiral platen parallel jaw end effector

    NASA Technical Reports Server (NTRS)

    Beals, David C.

    1989-01-01

    The double lead spiral platen parallel jaw end effector is an extremely powerful, compact, and highly controllable end effector that represents a significant improvement in gripping force and efficiency over the LaRC Puma (LP) end effector. The spiral end effector is very simple in its design and has relatively few parts. The jaw openings are highly predictable and linear, making it an ideal candidate for remote control. The finger speed is within acceptable working limits and can be modified to meet the user needs; for instance, greater finger speed could be obtained by increasing the pitch of the spiral. The force relaxation is comparable to the other tested units. Optimization of the end effector design would involve a compromise of force and speed for a given application.

  9. A Tutorial on Parallel and Concurrent Programming in Haskell

    NASA Astrophysics Data System (ADS)

    Peyton Jones, Simon; Singh, Satnam

    This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.

  10. An iterative method for systems of nonlinear hyperbolic equations

    NASA Technical Reports Server (NTRS)

    Scroggs, Jeffrey S.

    1989-01-01

    An iterative algorithm for the efficient solution of systems of nonlinear hyperbolic equations is presented. Parallelism is evident at several levels. In the formation of the iteration, the equations are decoupled, thereby providing large grain parallelism. Parallelism may also be exploited within the solves for each equation. Convergence of the interation is established via a bounding function argument. Experimental results in two-dimensions are presented.

  11. Studies on the π-π stacking features of imidazole units present in a series of 5-amino-1-alkylimidazole-4-carboxamides

    NASA Astrophysics Data System (ADS)

    Ray, Sibdas; Das, Aniruddha

    2015-06-01

    Reaction of 2-ethoxymethyleneamino-2-cyanoacetamide with primary alkyl amines in acetonitrile solvent affords 1-substituted-5-aminoimidazole-4-carboxamides. Single crystal X-ray diffraction studies of these imidazole compounds show that there are both anti-parallel and syn-parallel π-π stackings between two imidazole units in parallel-displaced (PD) conformations and the distance between two π-π stacked imidazole units depends mainly on the anti/ syn-parallel nature and to some extent on the alkyl group attached to N-1 of imidazole; molecules with anti-parallel PD-stacking arrangements of the imidazole units have got vertical π-π stacking distance short enough to impart stabilization whereas the imidazole unit having syn-parallel stacking arrangement have got much larger π-π stacking distances. DFT studies on a pair of anti-parallel imidazole units of such an AICA lead to curves for 'π-π stacking stabilization energy vs. π-π stacking distance' which have got similarity with the 'Morse potential energy diagram for a diatomic molecule' and this affords to find out a minimum π-π stacking distance corresponding to the maximum stacking stabilization energy between the pair of imidazole units. On the other hand, a DFT calculation based curve for 'π-π stacking stabilization energy vs. π-π stacking distance' of a pair of syn-parallel imidazole units is shown to have an exponential nature.

  12. A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics

    NASA Technical Reports Server (NTRS)

    Kurdila, Andrew J.

    1990-01-01

    While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.

  13. Acoustic Resonator Optimisation for Airborne Particle Manipulation

    NASA Astrophysics Data System (ADS)

    Devendran, Citsabehsan; Billson, Duncan R.; Hutchins, David A.; Alan, Tuncay; Neild, Adrian

    Advances in micro-electromechanical systems (MEMS) technology and biomedical research necessitate micro-machined manipulators to capture, handle and position delicate micron-sized particles. To this end, a parallel plate acoustic resonator system has been investigated for the purposes of manipulation and entrapment of micron sized particles in air. Numerical and finite element modelling was performed to optimise the design of the layered acoustic resonator. To obtain an optimised resonator design, careful considerations of the effect of thickness and material properties are required. Furthermore, the effect of acoustic attenuation which is dependent on frequency is also considered within this study, leading to an optimum operational frequency range. Finally, experimental results demonstrated good particle levitation and capture of various particle properties and sizes ranging to as small as 14.8 μm.

  14. Free radical-mediated systemic immunity in plants.

    PubMed

    Wendehenne, David; Gao, Qing-Ming; Kachroo, Aardra; Kachroo, Pradeep

    2014-08-01

    Systemic acquired resistance (SAR) is a form of defense that protects plants against a broad-spectrum of secondary infections by related or unrelated pathogens. SAR related research has witnessed considerable progress in recent years and a number of chemical signals and proteins contributing to SAR have been identified. All of these diverse constituents share their requirement for the phytohormone salicylic acid, an essential downstream component of the SAR pathway. However, recent work demonstrating the essential parallel functioning of nitric oxide (NO)-derived and reactive oxygen species (ROS)-derived signaling together with SA provides important new insights in the overlapping pathways leading to SAR. This review discusses the potential significance of branched pathways and the relative contributions of NO/ROS-derived and SA-derived pathways in SAR. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. SEASAT study documentation

    NASA Technical Reports Server (NTRS)

    1974-01-01

    The proposed spacecraft consists of a bus module, containing all subsystems required for support of the sensors, and a payload module containing all of the sensor equipment. The two modules are bolted together to form the spacecraft, and electrical interfaces are accomplished via mated connectors at the interface plane. This approach permits independent parallel assembly and test operations on each module up until mating for final spacecraft integration and test operations. Proposed program schedules recognize the need to refine sensor/spacecraft interfaces prior to proceeding with procurement, reflect the lead times estimated by suppliers for delivery of equipment, reflect a comprehensive test program, and provide flexibility for unanticipated problems. The spacecraft systems are described in detail along with aerospace ground equipment, ground handling equipment, the launch vehicle, imaging radar incorporation, and systems tests.

  16. Reasons for high-temperature superconductivity in the electron–phonon system of hydrogen sulfide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Degtyarenko, N. N.; Mazur, E. A., E-mail: eugen-masur@mail.ru

    We have calculated the electron and phonon spectra, as well as the densities of the electron and phonon states, of the stable orthorhombic structure of hydrogen sulfide SH{sub 2} in the pressure interval 100–180 GPa. It is found that at a pressure of 175 GPa, a set of parallel planes of hydrogen atoms is formed due to a structural modification of the unit cell under pressure with complete accumulation of all hydrogen atoms in these planes. As a result, the electronic properties of the system become quasi-two-dimensional. We have also analyzed the collective synphase and antiphase vibrations of hydrogen atomsmore » in these planes, leading to the occurrence of two high-energy peaks in the phonon density of states.« less

  17. Neuronal regulation of homeostasis by nutrient sensing.

    PubMed

    Lam, Tony K T

    2010-04-01

    In type 2 diabetes and obesity, the homeostatic control of glucose and energy balance is impaired, leading to hyperglycemia and hyperphagia. Recent studies indicate that nutrient-sensing mechanisms in the body activate negative-feedback systems to regulate energy and glucose homeostasis through a neuronal network. Direct metabolic signaling within the intestine activates gut-brain and gut-brain-liver axes to regulate energy and glucose homeostasis, respectively. In parallel, direct metabolism of nutrients within the hypothalamus regulates food intake and blood glucose levels. These findings highlight the importance of the central nervous system in mediating the ability of nutrient sensing to maintain homeostasis. Futhermore, they provide a physiological and neuronal framework by which enhancing or restoring nutrient sensing in the intestine and the brain could normalize energy and glucose homeostasis in diabetes and obesity.

  18. Analysis of the leading edge effects on the boundary layer transition

    NASA Technical Reports Server (NTRS)

    Chow, Pao-Liu

    1990-01-01

    A general theory of boundary layer control by surface heating is presented. Some analytical results for a simplified model, i.e., the optimal control of temperature fluctuations in a shear flow are described. The results may provide a clue to the effectiveness of the active feedback control of a boundary layer flow by wall heating. In a practical situation, the feedback control may not be feasible from the instrumentational point of view. In this case the vibrational control introduced in systems science can provide a useful alternative. This principle is briefly explained and applied to the control of an unstable wavepacket in a parallel shear flow.

  19. Optimal resonance configuration for ultrasonic wireless power transmission to millimeter-sized biomedical implants.

    PubMed

    Miao Meng; Kiani, Mehdi

    2016-08-01

    In order to achieve efficient wireless power transmission (WPT) to biomedical implants with millimeter (mm) dimensions, ultrasonic WPT links have recently been proposed. Operating both transmitter (Tx) and receiver (Rx) ultrasonic transducers at their resonance frequency (fr) is key in improving power transmission efficiency (PTE). In this paper, different resonance configurations for Tx and Rx transducers, including series and parallel resonance, have been studied to help the designers of ultrasonic WPT links to choose the optimal resonance configuration for Tx and Rx that maximizes PTE. The geometries for disk-shaped transducers of four different sets of links, operating at series-series, series-parallel, parallel-series, and parallel-parallel resonance configurations in Tx and Rx, have been found through finite-element method (FEM) simulation tools for operation at fr of 1.4 MHz. Our simulation results suggest that operating the Tx transducer with parallel resonance increases PTE, while the resonance configuration of the mm-sized Rx transducer highly depends on the load resistance, Rl. For applications that involve large Rl in the order of tens of kΩ, a parallel resonance for a mm-sized Rx leads to higher PTE, while series resonance is preferred for Rl in the order of several kΩ and below.

  20. Does reimportation reduce price differences for prescription drugs? Lessons from the European Union.

    PubMed

    Kyle, Margaret K; Allsbrook, Jennifer S; Schulman, Kevin A

    2008-08-01

    To examine the effect of parallel trade on patterns of price dispersion for prescription drugs in the European Union. Longitudinal data from an IMS Midas database of prices and units sold for drugs in 36 categories in 30 countries from 1993 through 2004. The main outcome measures were mean price differentials and other measures of price dispersion within European Union countries compared with within non-European Union countries. We identified drugs subject to parallel trade using information provided by IMS and by checking membership lists of parallel import trade associations and lists of approved parallel imports. Parallel trade was not associated with substantial reductions in price dispersion in European Union countries. In descriptive and regression analyses, about half of the price differentials exceeded 50 percent in both European Union and non-European Union countries over time, and price distributions among European Union countries did not show a dramatic change concurrent with the adoption of parallel trade. In regression analysis, we found that although price differentials decreased after 1995 in most countries, they decreased less in the European Union than elsewhere. Parallel trade for prescription drugs does not automatically reduce international price differences. Future research should explore how other regulatory schemes might lead to different results elsewhere.

  1. High-rate serial interconnections for embedded and distributed systems with power and resource constraints

    NASA Astrophysics Data System (ADS)

    Sheynin, Yuriy; Shutenko, Felix; Suvorova, Elena; Yablokov, Evgenej

    2008-04-01

    High rate interconnections are important subsystems in modern data processing and control systems of many classes. They are especially important in prospective embedded and on-board systems that used to be multicomponent systems with parallel or distributed architecture, [1]. Modular architecture systems of previous generations were based on parallel busses that were widely used and standardised: VME, PCI, CompactPCI, etc. Busses evolution went in improvement of bus protocol efficiency (burst transactions, split transactions, etc.) and increasing operation frequencies. However, due to multi-drop bus nature and multi-wire skew problems the parallel bussing speedup became more and more limited. For embedded and on-board systems additional reason for this trend was in weight, size and power constraints of an interconnection and its components. Parallel interfaces have become technologically more challenging as their respective clock frequencies have increased to keep pace with the bandwidth requirements of their attached storage devices. Since each interface uses a data clock to gate and validate the parallel data (which is normally 8 bits or 16 bits wide), the clock frequency need only be equivalent to the byte rate or word rate being transmitted. In other words, for a given transmission frequency, the wider the data bus, the slower the clock. As the clock frequency increases, more high frequency energy is available in each of the data lines, and a portion of this energy is dissipated in radiation. Each data line not only transmits this energy but also receives some from its neighbours. This form of mutual interference is commonly called "cross-talk," and the signal distortion it produces can become another major contributor to loss of data integrity unless compensated by appropriate cable designs. Other transmission problems such as frequency-dependent attenuation and signal reflections, while also applicable to serial interfaces, are more troublesome in parallel interfaces due to the number of additional cable conductors involved. In order to compensate for these drawbacks, higher quality cables, shorter cable runs and fewer devices on the bus have been the norm. Finally, the physical bulk of the parallel cables makes them more difficult to route inside an enclosure, hinders cooling airflow and is incompatible with the trend toward smaller form-factor devices. Parallel busses worked in systems during the past 20 years, but the accumulated problems dictate the need for change and the technology is available to spur the transition. The general trend in high-rate interconnections turned from parallel bussing to scalable interconnections with a network architecture and high-rate point-to-point links. Analysis showed that data links with serial information transfer could achieve higher throughput and efficiency and it was confirmed in various research and practical design. Serial interfaces offer an improvement over older parallel interfaces: better performance, better scalability, and also better reliability as the parallel interfaces are at their limits of speed with reliable data transfers and others. The trend was implemented in major standards' families evolution: e.g. from PCI/PCI-X parallel bussing to PCIExpress interconnection architecture with serial lines, from CompactPCI parallel bus to ATCA (Advanced Telecommunications Architecture) specification with serial links and network topologies of an interconnection, etc. In the article we consider a general set of characteristics and features of serial interconnections, give a brief overview of serial interconnections specifications. In more details we present the SpaceWire interconnection technology. Have been developed for space on-board systems applications the SpaceWire has important features and characteristics that make it a prospective interconnection for wide range of embedded systems.

  2. Aerodynamic simulation on massively parallel systems

    NASA Technical Reports Server (NTRS)

    Haeuser, Jochem; Simon, Horst D.

    1992-01-01

    This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but message passing MIMD systems seem to be best suited for large miltiblock applications.

  3. Extrinsic Rashba spin-orbit coupling effect on silicene spin polarized field effect transistors

    NASA Astrophysics Data System (ADS)

    Pournaghavi, Nezhat; Esmaeilzadeh, Mahdi; Abrishamifar, Adib; Ahmadi, Somaieh

    2017-04-01

    Regarding the spin field effect transistor (spin FET) challenges such as mismatch effect in spin injection and insufficient spin life time, we propose a silicene based device which can be a promising candidate to overcome some of those problems. Using non-equilibrium Green’s function method, we investigate the spin-dependent conductance in a zigzag silicene nanoribbon connected to two magnetized leads which are supposed to be either in parallel or anti-parallel configurations. For both configurations, a controllable spin current can be obtained when the Rashba effect is present; thus, we can have a spin filter device. In addition, for anti-parallel configuration, in the absence of Rashba effect, there is an intrinsic energy gap in the system (OFF-state); while, in the presence of Rashba effect, electrons with flipped spin can pass through the channel and make the ON-state. The current voltage (I-V) characteristics which can be tuned by changing the gate voltage or Rashba strength, are studied. More importantly, reducing the mismatch conductivity as well as energy consumption make the silicene based spin FET more efficient relative to the spin FET based on two-dimensional electron gas proposed by Datta and Das. Also, we show that, at the same conditions, the current and {{I}\\text{on}}/{{I}\\text{off}} ratio of silicene based spin FET are significantly greater than that of the graphene based one.

  4. The influence of hand positions on biomechanical injury risk factors at the wrist joint during the round-off skills in female gymnastics.

    PubMed

    Farana, Roman; Jandacka, Daniel; Uchytil, Jaroslav; Zahradnik, David; Irwin, Gareth

    2017-01-01

    The aim of this study was to examine the biomechanical injury risk factors at the wrist, including joint kinetics, kinematics and stiffness in the first and second contact limb for parallel and T-shape round-off (RO) techniques. Seven international-level female gymnasts performed 10 trials of the RO to back handspring with parallel and T-shape hand positions. Synchronised kinematic (3D motion analysis system; 247 Hz) and kinetic (two force plates; 1235 Hz) data were collected for each trial. A two-way repeated measure analysis of variance (ANOVA) assessed differences in the kinematic and kinetic parameters between the techniques for each contact limb. The main findings highlighted that in both the RO techniques, the second contact limb wrist joint is exposed to higher mechanical loads than the first contact limb demonstrated by increased axial compression force and loading rate. In the parallel technique, the second contact limb wrist joint is exposed to higher axial compression load. Differences between wrist joint kinetics highlight that the T-shape technique may potentially lead to reducing these bio-physical loads and consequently protect the second contact limb wrist joint from overload and biological failure. Highlighting the biomechanical risk factors facilitates the process of technique selection making more objective and safe.

  5. A novel milliliter-scale chemostat system for parallel cultivation of microorganisms in stirred-tank bioreactors.

    PubMed

    Schmideder, Andreas; Severin, Timm Steffen; Cremer, Johannes Heinrich; Weuster-Botz, Dirk

    2015-09-20

    A pH-controlled parallel stirred-tank bioreactor system was modified for parallel continuous cultivation on a 10 mL-scale by connecting multichannel peristaltic pumps for feeding and medium removal with micro-pipes (250 μm inner diameter). Parallel chemostat processes with Escherichia coli as an example showed high reproducibility with regard to culture volume and flow rates as well as dry cell weight, dissolved oxygen concentration and pH control at steady states (n=8, coefficient of variation <5%). Reliable estimation of kinetic growth parameters of E. coli was easily achieved within one parallel experiment by preselecting ten different steady states. Scalability of milliliter-scale steady state results was demonstrated by chemostat studies with a stirred-tank bioreactor on a liter-scale. Thus, parallel and continuously operated stirred-tank bioreactors on a milliliter-scale facilitate timesaving and cost reducing steady state studies with microorganisms. The applied continuous bioreactor system overcomes the drawbacks of existing miniaturized bioreactors, like poor mass transfer and insufficient process control. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. The implementation of an aeronautical CFD flow code onto distributed memory parallel systems

    NASA Astrophysics Data System (ADS)

    Ierotheou, C. S.; Forsey, C. R.; Leatham, M.

    2000-04-01

    The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright

  7. Parallel computing on Unix workstation arrays

    NASA Astrophysics Data System (ADS)

    Reale, F.; Bocchino, F.; Sciortino, S.

    1994-12-01

    We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.

  8. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  9. A Fault Oblivious Extreme-Scale Execution Environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McKie, Jim

    The FOX project, funded under the ASCR X-stack I program, developed systems software and runtime libraries for a new approach to the data and work distribution for massively parallel, fault oblivious application execution. Our work was motivated by the premise that exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today’s machines. To deliver the capability of exascale hardware, the systems software must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massivemore » data analysis in a highly unreliable hardware environment with billions of threads of execution. Our OS research has prototyped new methods to provide efficient resource sharing, synchronization, and protection in a many-core compute node. We have experimented with alternative task/dataflow programming models and shown scalability in some cases to hundreds of thousands of cores. Much of our software is in active development through open source projects. Concepts from FOX are being pursued in next generation exascale operating systems. Our OS work focused on adaptive, application tailored OS services optimized for multi → many core processors. We developed a new operating system NIX that supports role-based allocation of cores to processes which was released to open source. We contributed to the IBM FusedOS project, which promoted the concept of latency-optimized and throughput-optimized cores. We built a task queue library based on distributed, fault tolerant key-value store and identified scaling issues. A second fault tolerant task parallel library was developed, based on the Linda tuple space model, that used low level interconnect primitives for optimized communication. We designed fault tolerance mechanisms for task parallel computations employing work stealing for load balancing that scaled to the largest existing supercomputers. Finally, we implemented the Elastic Building Blocks runtime, a library to manage object-oriented distributed software components. To support the research, we won two INCITE awards for time on Intrepid (BG/P) and Mira (BG/Q). Much of our work has had impact in the OS and runtime community through the ASCR Exascale OS/R workshop and report, leading to the research agenda of the Exascale OS/R program. Our project was, however, also affected by attrition of multiple PIs. While the PIs continued to participate and offer guidance as time permitted, losing these key individuals was unfortunate both for the project and for the DOE HPC community.« less

  10. 3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

    PubMed Central

    Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco

    2014-01-01

    Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397

  11. Seeing the forest for the trees: Networked workstations as a parallel processing computer

    NASA Technical Reports Server (NTRS)

    Breen, J. O.; Meleedy, D. M.

    1992-01-01

    Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.

  12. Two schemes for rapid generation of digital video holograms using PC cluster

    NASA Astrophysics Data System (ADS)

    Park, Hanhoon; Song, Joongseok; Kim, Changseob; Park, Jong-Il

    2017-12-01

    Computer-generated holography (CGH), which is a process of generating digital holograms, is computationally expensive. Recently, several methods/systems of parallelizing the process using graphic processing units (GPUs) have been proposed. Indeed, use of multiple GPUs or a personal computer (PC) cluster (each PC with GPUs) enabled great improvements in the process speed. However, extant literature has less often explored systems involving rapid generation of multiple digital holograms and specialized systems for rapid generation of a digital video hologram. This study proposes a system that uses a PC cluster and is able to more efficiently generate a video hologram. The proposed system is designed to simultaneously generate multiple frames and accelerate the generation by parallelizing the CGH computations across a number of frames, as opposed to separately generating each individual frame while parallelizing the CGH computations within each frame. The proposed system also enables the subprocesses for generating each frame to execute in parallel through multithreading. With these two schemes, the proposed system significantly reduced the data communication time for generating a digital hologram when compared with that of the state-of-the-art system.

  13. Investigation of Parallel Radiofrequency Transmission for the Reduction of Heating in Long Conductive Leads in 3 Tesla Magnetic Resonance Imaging

    PubMed Central

    McElcheran, Clare E.; Yang, Benson; Anderson, Kevan J. T.; Golenstani-Rad, Laleh; Graham, Simon J.

    2015-01-01

    Deep Brain Stimulation (DBS) is increasingly used to treat a variety of brain diseases by sending electrical impulses to deep brain nuclei through long, electrically conductive leads. Magnetic resonance imaging (MRI) of patients pre- and post-implantation is desirable to target and position the implant, to evaluate possible side-effects and to examine DBS patients who have other health conditions. Although MRI is the preferred modality for pre-operative planning, MRI post-implantation is limited due to the risk of high local power deposition, and therefore tissue heating, at the tip of the lead. The localized power deposition arises from currents induced in the leads caused by coupling with the radiofrequency (RF) transmission field during imaging. In the present work, parallel RF transmission (pTx) is used to tailor the RF electric field to suppress coupling effects. Electromagnetic simulations were performed for three pTx coil configurations with 2, 4, and 8-elements, respectively. Optimal input voltages to minimize coupling, while maintaining RF magnetic field homogeneity, were determined for all configurations using a Nelder-Mead optimization algorithm. Resulting electric and magnetic fields were compared to that of a 16-rung birdcage coil. Experimental validation was performed with a custom-built 4-element pTx coil. In simulation, 95-99% reduction of the electric field at the tip of the lead was observed between the various pTx coil configurations and the birdcage coil. Maximal reduction in E-field was obtained with the 8-element pTx coil. Magnetic field homogeneity was comparable to the birdcage coil for the 4- and 8-element pTx configurations. In experiment, a temperature increase of 2±0.15°C was observed at the tip of the wire using the birdcage coil, whereas negligible increase (0.2±0.15°C) was observed with the optimized pTx system. Although further research is required, these initial results suggest that the concept of optimizing pTx to reduce DBS heating effects holds considerable promise. PMID:26237218

  14. Investigation of Parallel Radiofrequency Transmission for the Reduction of Heating in Long Conductive Leads in 3 Tesla Magnetic Resonance Imaging.

    PubMed

    McElcheran, Clare E; Yang, Benson; Anderson, Kevan J T; Golenstani-Rad, Laleh; Graham, Simon J

    2015-01-01

    Deep Brain Stimulation (DBS) is increasingly used to treat a variety of brain diseases by sending electrical impulses to deep brain nuclei through long, electrically conductive leads. Magnetic resonance imaging (MRI) of patients pre- and post-implantation is desirable to target and position the implant, to evaluate possible side-effects and to examine DBS patients who have other health conditions. Although MRI is the preferred modality for pre-operative planning, MRI post-implantation is limited due to the risk of high local power deposition, and therefore tissue heating, at the tip of the lead. The localized power deposition arises from currents induced in the leads caused by coupling with the radiofrequency (RF) transmission field during imaging. In the present work, parallel RF transmission (pTx) is used to tailor the RF electric field to suppress coupling effects. Electromagnetic simulations were performed for three pTx coil configurations with 2, 4, and 8-elements, respectively. Optimal input voltages to minimize coupling, while maintaining RF magnetic field homogeneity, were determined for all configurations using a Nelder-Mead optimization algorithm. Resulting electric and magnetic fields were compared to that of a 16-rung birdcage coil. Experimental validation was performed with a custom-built 4-element pTx coil. In simulation, 95-99% reduction of the electric field at the tip of the lead was observed between the various pTx coil configurations and the birdcage coil. Maximal reduction in E-field was obtained with the 8-element pTx coil. Magnetic field homogeneity was comparable to the birdcage coil for the 4- and 8-element pTx configurations. In experiment, a temperature increase of 2±0.15°C was observed at the tip of the wire using the birdcage coil, whereas negligible increase (0.2±0.15°C) was observed with the optimized pTx system. Although further research is required, these initial results suggest that the concept of optimizing pTx to reduce DBS heating effects holds considerable promise.

  15. Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

    NASA Technical Reports Server (NTRS)

    Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

    2005-01-01

    The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.

  16. Method and apparatus of parallel computing with simultaneously operating stream prefetching and list prefetching engines

    DOEpatents

    Boyle, Peter A.; Christ, Norman H.; Gara, Alan; Mawhinney, Robert D.; Ohmacht, Martin; Sugavanam, Krishnan

    2012-12-11

    A prefetch system improves a performance of a parallel computing system. The parallel computing system includes a plurality of computing nodes. A computing node includes at least one processor and at least one memory device. The prefetch system includes at least one stream prefetch engine and at least one list prefetch engine. The prefetch system operates those engines simultaneously. After the at least one processor issues a command, the prefetch system passes the command to a stream prefetch engine and a list prefetch engine. The prefetch system operates the stream prefetch engine and the list prefetch engine to prefetch data to be needed in subsequent clock cycles in the processor in response to the passed command.

  17. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

    DOE PAGES

    Yim, Won Cheol; Cushman, John C.

    2017-07-22

    Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

  18. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yim, Won Cheol; Cushman, John C.

    Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less

  19. On the parallel solution of parabolic equations

    NASA Technical Reports Server (NTRS)

    Gallopoulos, E.; Saad, Youcef

    1989-01-01

    Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.

  20. Knowledge representation into Ada parallel processing

    NASA Technical Reports Server (NTRS)

    Masotto, Tom; Babikyan, Carol; Harper, Richard

    1990-01-01

    The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.

  1. Cognitive and artificial representations in handwriting recognition

    NASA Astrophysics Data System (ADS)

    Lenaghan, Andrew P.; Malyan, Ron

    1996-03-01

    Both cognitive processes and artificial recognition systems may be characterized by the forms of representation they build and manipulate. This paper looks at how handwriting is represented in current recognition systems and the psychological evidence for its representation in the cognitive processes responsible for reading. Empirical psychological work on feature extraction in early visual processing is surveyed to show that a sound psychological basis for feature extraction exists and to describe the features this approach leads to. The first stage of the development of an architecture for a handwriting recognition system which has been strongly influenced by the psychological evidence for the cognitive processes and representations used in early visual processing, is reported. This architecture builds a number of parallel low level feature maps from raw data. These feature maps are thresholded and a region labeling algorithm is used to generate sets of features. Fuzzy logic is used to quantify the uncertainty in the presence of individual features.

  2. Business model for sensor-based fall recognition systems.

    PubMed

    Fachinger, Uwe; Schöpke, Birte

    2014-01-01

    AAL systems require, in addition to sophisticated and reliable technology, adequate business models for their launch and sustainable establishment. This paper presents the basic features of alternative business models for a sensor-based fall recognition system which was developed within the context of the "Lower Saxony Research Network Design of Environments for Ageing" (GAL). The models were developed parallel to the R&D process with successive adaptation and concretization. An overview of the basic features (i.e. nine partial models) of the business model is given and the mutual exclusive alternatives for each partial model are presented. The partial models are interconnected and the combinations of compatible alternatives lead to consistent alternative business models. However, in the current state, only initial concepts of alternative business models can be deduced. The next step will be to gather additional information to work out more detailed models.

  3. Misalignments calibration in small-animal PET scanners based on rotating planar detectors and parallel-beam geometry.

    PubMed

    Abella, M; Vicente, E; Rodríguez-Ruano, A; España, S; Lage, E; Desco, M; Udias, J M; Vaquero, J J

    2012-11-21

    Technological advances have improved the assembly process of PET detectors, resulting in quite small mechanical tolerances. However, in high-spatial-resolution systems, even submillimetric misalignments of the detectors may lead to a notable degradation of image resolution and artifacts. Therefore, the exact characterization of misalignments is critical for optimum reconstruction quality in such systems. This subject has been widely studied for CT and SPECT scanners based on cone beam geometry, but this is not the case for PET tomographs based on rotating planar detectors. The purpose of this work is to analyze misalignment effects in these systems and to propose a robust and easy-to-implement protocol for geometric characterization. The result of the proposed calibration method, which requires no more than a simple calibration phantom, can then be used to generate a correct 3D-sinogram from the acquired list mode data.

  4. Computational Challenges of 3D Radiative Transfer in Atmospheric Models

    NASA Astrophysics Data System (ADS)

    Jakub, Fabian; Bernhard, Mayer

    2017-04-01

    The computation of radiative heating and cooling rates is one of the most expensive components in todays atmospheric models. The high computational cost stems not only from the laborious integration over a wide range of the electromagnetic spectrum but also from the fact that solving the integro-differential radiative transfer equation for monochromatic light is already rather involved. This lead to the advent of numerous approximations and parameterizations to reduce the cost of the solver. One of the most prominent one is the so called independent pixel approximations (IPA) where horizontal energy transfer is neglected whatsoever and radiation may only propagate in the vertical direction (1D). Recent studies implicate that the IPA introduces significant errors in high resolution simulations and affects the evolution and development of convective systems. However, using fully 3D solvers such as for example MonteCarlo methods is not even on state of the art supercomputers feasible. The parallelization of atmospheric models is often realized by a horizontal domain decomposition, and hence, horizontal transfer of energy necessitates communication. E.g. a cloud's shadow at a low zenith angle will cast a long shadow and potentially needs to communication through a multitude of processors. Especially light in the solar spectral range may travel long distances through the atmosphere. Concerning highly parallel simulations, it is vital that 3D radiative transfer solvers put a special emphasis on parallel scalability. We will present an introduction to intricacies computing 3D radiative heating and cooling rates as well as report on the parallel performance of the TenStream solver. The TenStream is a 3D radiative transfer solver using the PETSc framework to iteratively solve a set of partial differential equation. We investigate two matrix preconditioners, (a) geometric algebraic multigrid preconditioning(MG+GAMG) and (b) block Jacobi incomplete LU (ILU) factorization. The TenStream solver is tested for up to 4096 cores and shows a parallel scaling efficiency of 80-90% on various supercomputers.

  5. Effective Parallel Algorithm Animation

    DTIC Science & Technology

    1994-03-01

    parallel computer. The system incorporates the 14 Parallel Processing System us" r User User UMe PMwuM Progra Propu Plropm ýData Dots Data Daft...that produce meaningful animations. The following sections outline characteristics 146 Animation 0 71 r 40 02 I 5 * *2! 4 Idle Bu~sy Send Recv 7...Event Simulation. Technical Report, Georgia Institute of Technology, 1992. 22. Garey, Michael R . and David S. Johnson. Computers and Intractability: A

  6. Parallel Ray Tracing Using the Message Passing Interface

    DTIC Science & Technology

    2007-09-01

    software is available for lens design and for general optical systems modeling. It tends to be designed to run on a single processor and can be very...Cameron, Senior Member, IEEE Abstract—Ray-tracing software is available for lens design and for general optical systems modeling. It tends to be designed to...National Aeronautics and Space Administration (NASA), optical ray tracing, parallel computing, parallel pro- cessing, prime numbers, ray tracing

  7. Optoelectronic associative recall using motionless-head parallel readout optical disk

    NASA Astrophysics Data System (ADS)

    Marchand, P. J.; Krishnamoorthy, A. V.; Ambs, P.; Esener, S. C.

    1990-12-01

    High data rates, low retrieval times, and simple implementation are presently shown to be obtainable by means of a motionless-head 2D parallel-readout system for optical disks. Since the optical disk obviates mechanical head motions for access, focusing, and tracking, addressing is performed exclusively through the disk's rotation. Attention is given to a high-performance associative memory system configuration which employs a parallel readout disk.

  8. Parallel-Vector Algorithm For Rapid Structural Anlysis

    NASA Technical Reports Server (NTRS)

    Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.

    1993-01-01

    New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.

  9. Micro/Nanoscale Parallel Patterning of Functional Biomolecules, Organic Fluorophores and Colloidal Nanocrystals

    PubMed Central

    2009-01-01

    We describe the design and optimization of a reliable strategy that combines self-assembly and lithographic techniques, leading to very precise micro-/nanopositioning of biomolecules for the realization of micro- and nanoarrays of functional DNA and antibodies. Moreover, based on the covalent immobilization of stable and versatile SAMs of programmable chemical reactivity, this approach constitutes a general platform for the parallel site-specific deposition of a wide range of molecules such as organic fluorophores and water-soluble colloidal nanocrystals. PMID:20596482

  10. An experimental investigation of delta wing vortex flow with and without external jet blowing

    NASA Technical Reports Server (NTRS)

    Iwanski, Kenneth P.; Ng, T. Terry; Nelson, Robert C.

    1989-01-01

    A visual and quantitative study of the vortex flow field over a 70-deg delta wing with an external jet blowing parallel to and at the leading edge was conducted. In the experiment, the vortex core was visually marked with TiCl4, and LDA was used to measure the velocity parallel and normal to the wing surface. It is found that jet blowing moved vortex breakdown farther downstream from its natural position and influenced the breakdown characteristics.

  11. Conversion between parallel and antiparallel β -sheets in wild-type and Iowa mutant Aβ40 fibrils

    NASA Astrophysics Data System (ADS)

    Xi, Wenhui; Hansmann, Ulrich H. E.

    2018-01-01

    Using a variant of Hamilton-replica-exchange, we study for wild type and Iowa mutant Aβ40 the conversion between fibrils with antiparallel β-sheets and such with parallel β-sheets. We show that wild type and mutant form distinct salt bridges that in turn stabilize different fibril organizations. The conversion between the two fibril forms leads to the release of small aggregates that in the Iowa mutant may shift the equilibrium from fibrils to more toxic oligomers.

  12. A transient FETI methodology for large-scale parallel implicit computations in structural mechanics

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier

    1992-01-01

    Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.

  13. Solving very large, sparse linear systems on mesh-connected parallel computers

    NASA Technical Reports Server (NTRS)

    Opsahl, Torstein; Reif, John

    1987-01-01

    The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.

  14. Parallelization of the FLAPW method

    NASA Astrophysics Data System (ADS)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  15. Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

    PubMed Central

    Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

    2014-01-01

    The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299

  16. Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

    PubMed

    Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

    2014-11-01

    The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.

  17. Algorithms for parallel flow solvers on message passing architectures

    NASA Technical Reports Server (NTRS)

    Vanderwijngaart, Rob F.

    1995-01-01

    The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those immediately adjacent to them, then the first processor in the pipeline will receive a computational load that is less than that of subsequent processors, magnifying the pipeline slowdown effect. Extra compensation is needed for grid boundary effects, even if all grid blocks are equally sized.

  18. Micro-Macro Simulation of Viscoelastic Fluids in Three Dimensions

    NASA Astrophysics Data System (ADS)

    Rüttgers, Alexander; Griebel, Michael

    2012-11-01

    The development of the chemical industry resulted in various complex fluids that cannot be correctly described by classical fluid mechanics. For instance, this includes paint, engine oils with polymeric additives and toothpaste. We currently perform multiscale viscoelastic flow simulations for which we have coupled our three-dimensional Navier-Stokes solver NaSt3dGPF with the stochastic Brownian configuration field method on the micro-scale. In this method, we represent a viscoelastic fluid as a dumbbell system immersed in a three-dimensional Newtonian liquid which leads to a six-dimensional problem in space. The approach requires large computational resources and therefore depends on an efficient parallelisation strategy. Our flow solver is parallelised with a domain decomposition approach using MPI. It shows excellent scale-up results for up to 128 processors. In this talk, we present simulation results for viscoelastic fluids in square-square contractions due to their relevance for many engineering applications such as extrusion. Another aspect of the talk is the parallel implementation in NaSt3dGPF and the parallel scale-up and speed-up behaviour.

  19. The Potsdam Parallel Ice Sheet Model (PISM-PIK) - Part 2: Dynamic equilibrium simulation of the Antarctic ice sheet

    NASA Astrophysics Data System (ADS)

    Martin, M. A.; Winkelmann, R.; Haseloff, M.; Albrecht, T.; Bueler, E.; Khroulev, C.; Levermann, A.

    2010-08-01

    We present a dynamic equilibrium simulation of the ice sheet-shelf system on Antarctica with the Potsdam Parallel Ice Sheet Model (PISM-PIK). The simulation is initialized with present-day conditions for topography and ice thickness and then run to steady state with constant present-day surface mass balance. Surface temperature and basal melt distribution are parameterized. Grounding lines and calving fronts are free to evolve, and their modeled equilibrium state is compared to observational data. A physically-motivated dynamic calving law based on horizontal spreading rates allows for realistic calving fronts for various types of shelves. Steady-state dynamics including surface velocity and ice flux are analyzed for whole Antarctica and the Ronne-Filchner and Ross ice shelf areas in particular. The results show that the different flow regimes in sheet and shelves, and the transition zone between them, are captured reasonably well, supporting the approach of superposition of SIA and SSA for the representation of fast motion of grounded ice. This approach also leads to a natural emergence of streams in this new 3-D marine ice sheet model.

  20. Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Samuel; Oliker, Leonid; Vuduc, Richard

    2008-10-16

    We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one ofmore » the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less

  1. Parallel multipoint recording of aligned and cultured neurons on micro channel array toward cellular network analysis.

    PubMed

    Tonomura, Wataru; Moriguchi, Hiroyuki; Jimbo, Yasuhiko; Konishi, Satoshi

    2010-08-01

    This paper describes an advanced Micro Channel Array (MCA) for recording electrophysiological signals of neuronal networks at multiple points simultaneously. The developed MCA is designed for neuronal network analysis which has been studied by the co-authors using the Micro Electrode Arrays (MEA) system, and employs the principles of extracellular recordings. A prerequisite for extracellular recordings with good signal-to-noise ratio is a tight contact between cells and electrodes. The MCA described herein has the following advantages. The electrodes integrated around individual micro channels are electrically isolated to enable parallel multipoint recording. Reliable clamping of a targeted cell through micro channels is expected to improve the cellular selectivity and the attachment between the cell and the electrode toward steady electrophysiological recordings. We cultured hippocampal neurons on the developed MCA. As a result, the spontaneous and evoked spike potentials could be recorded by sucking and clamping the cells at multiple points. In this paper, we describe the design and fabrication of the MCA and the successful electrophysiological recordings leading to the development of an effective cellular network analysis device.

  2. Large Negative Differential of Heat Generation in a Two-Level Quantum Dot Coupled to Ferromagnetic Leads

    NASA Astrophysics Data System (ADS)

    Peng, Ya-Jing; Zheng, Jun; Chi, Feng

    2015-12-01

    Heat current exchanged between a two-level quantum dot (QD) and a phonon reservoir coupled to it is studied within the nonequilibrium Green's function method. We consider that the QD is connected to the left and right ferromagnetic leads. It is found that the negative differential of the heat generation (NDHG) phenomenon, i.e., the intensity of the heat generation decreases with increasing bias voltage, is obviously enhanced as compared to that in single-level QD system. The NDHG can emerge in the absence of the negative differential conductance of the electric current, and occurs in different bias voltage regions when the magnetic moments of the two leads are arranged in parallel or antiparallel configurations. The characteristics of the found phenomena can be understood by examining the change of the electron number on the dot. Supported by the National Natural Science Foundation of China under Grant No. 61274101 and the Liaoning Excellent Talents Programand (LJQ2013118), the Foundation of State Key Laboratory of Explosion Science and Technology of Beijing Institute of Technology (KFJJ14-08M)

  3. Görtler instability of the axisymmetric boundary layer along a cone

    NASA Astrophysics Data System (ADS)

    ITOH, Nobutake

    2014-10-01

    Exact partial differential equations are derived to describe Görtler instability, caused by a weakly concave wall, of axisymmetric boundary layers with similar velocity profiles that are decomposed into a sequence of ordinary differential systems on the assumption that the solution can be expanded into inverse powers of local Reynolds number. The leading terms of the series solution are determined by solving a non-parallel version of Görtler’s eigenvalue problem and lead to a neutral stability curve and finite values of critical Görtler number and wave number for stationary and longitudinal vortices. Higher-order terms of the series solution indicate Reynolds-number dependence of Görtler instability and a limited validity of Görtler’s approximation based on the leading terms only. The present formulation is simply applicable to two-dimensional boundary layers of similar profiles, and critical Görtler number and wave number of the Blasius boundary layer on a flat plate are given by G2c = 1.23 and β2c = 0.288, respectively, if the momentum thickness is chosen as the reference length.

  4. Plasma Irregularities on the Leading and Trailing Edges of Polar Cap Patches

    NASA Astrophysics Data System (ADS)

    Lamarche, L. J.; Varney, R. H.; Gillies, R.; Chartier, A.; Mitchell, C. N.

    2017-12-01

    Plasma irregularities in the polar cap have often been attributed to the gradient drift instability (GDI). Traditional fluid theories of GDI predicts irregularity growth only on the trailing edge of polar patches, where the plasma density gradient is parallel to the plasma drift velocity, however many observations show irregularities also form on the leading edge of patches. We consider decameter-scale irregularities detected by polar-latitude SuperDARN (Super Dual Auroral Radar Network) radars with any relationship between the background density gradients and drift velocity. Global electron density from the Multi-Instrument Data Analysis System (MIDAS), a GPS tomography routine, is used to provide context for where irregularities are observed relative to polar patches and finer-scale background density gradients are found from 3D imaging from both the North and Canada faces of the Resolute Bay Incoherent Scatter Radars (RISR-N and RISR-C) jointly. Shear-based instabilities are considered as mechanisms by which plasma irregularities could form on the leading edge of patches. Theoretical predictions of instability growth from both GDI and shear instabilities are compared with irregularity observations for the October 13, 2016 storm.

  5. Compton Scattering Cross Sections in Strong Magnetic Fields: Advances for Neutron Star Applications

    NASA Astrophysics Data System (ADS)

    Eiles, Matthew; Gonthier, P. L.; Baring, M. G.; Wadiasingh, Z.

    2013-04-01

    Various telescopes including RXTE, INTEGRAL and Suzaku have detected non-thermal X-ray emission in the 10 - 200 keV band from strongly magnetic neutron stars. Inverse Compton scattering, a quantum-electrodynamical process, is believed to be a leading candidate for the production of this intense X-ray radiation. Magnetospheric conditions are such that electrons may well possess ultra-relativistic energies, which lead to attractive simplifications of the cross section. We have recently addressed such a case by developing compact analytic expressions using correct spin-dependent widths and Sokolov & Ternov (ST) basis states, focusing specifically on ground state-to-ground state scattering. However, inverse Compton scattering can cool electrons down to mildly-relativistic energies, necessitating the development of a more general case where the incoming photons acquire nonzero incident angles relative to the field in the rest frame of the electron, and the intermediate state can be excited to arbitrary Landau levels. In this paper, we develop results pertaining to this general case using ST formalism, and treating the plethora of harmonic resonances associated with various cyclotron transitions between Landau states. Four possible scattering modes (parallel-parallel, perpendicular-perpendicular, parallel-perpendicular, and perpendicular-parallel) encapsulate the polarization dependence of the cross section. We present preliminary analytic and numerical investigations of the magnitude of the extra Landau state contributions to obtain the full cross section, and compare these new analytic developments with the spin-averaged cross sections, which we develop in parallel. Results will find application to various neutron star problems, including computation of Eddington luminosities in the magnetospheres of magnetars. We express our gratitude for the generous support of the Michigan Space Grant Consortium, of the National Science Foundation (REU and RUI), and the NASA Astrophysics Theory and Fundamental Program.

  6. Experimental Studies Of Pilot Performance At Collision Avoidance During Closely Spaced Parallel Approaches

    NASA Technical Reports Server (NTRS)

    Pritchett, Amy R.; Hansman, R. John

    1997-01-01

    Efforts to increase airport capacity include studies of aircraft systems that would enable simultaneous approaches to closely spaced parallel runway in Instrument Meteorological Conditions (IMC). The time-critical nature of a parallel approach results in key design issues for current and future collision avoidance systems. Two part-task flight simulator studies have examined the procedural and display issues inherent in such a time-critical task, the interaction of the pilot with a collision avoidance system, and the alerting criteria and avoidance maneuvers preferred by subjects.

  7. Small file aggregation in a parallel computing system

    DOEpatents

    Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Zhang, Jingwang

    2014-09-02

    Techniques are provided for small file aggregation in a parallel computing system. An exemplary method for storing a plurality of files generated by a plurality of processes in a parallel computing system comprises aggregating the plurality of files into a single aggregated file; and generating metadata for the single aggregated file. The metadata comprises an offset and a length of each of the plurality of files in the single aggregated file. The metadata can be used to unpack one or more of the files from the single aggregated file.

  8. Effect of brain-derived neurotrophic factor (BDNF) on hepatocyte metabolism.

    PubMed

    Genzer, Yoni; Chapnik, Nava; Froy, Oren

    2017-07-01

    Brain-derived neurotrophic factor (BDNF) plays crucial roles in the development, maintenance, plasticity and homeostasis of the central and peripheral nervous systems. Perturbing BDNF signaling in mouse brain results in hyperphagia, obesity, hyperinsulinemia and hyperglycemia. Currently, little is known whether BDNF affects liver tissue directly. Our aim was to determine the metabolic signaling pathways activated after BDNF treatment in hepatocytes. Unlike its effect in the brain, BDNF did not lead to activation of the liver AKT pathway. However, AMP protein activated kinase (AMPK) was ∼3 times more active and fatty acid synthase (FAS) ∼2-fold less active, suggesting increased fatty acid oxidation and reduced fatty acid synthesis. In addition, cAMP response element binding protein (CREB) was ∼3.5-fold less active together with its output the gluconeogenic transcript phosphoenolpyruvate carboxykinase (Pepck), suggesting reduced gluconeogenesis. The levels of glycogen synthase kinase 3b (GSK3b) was ∼3-fold higher suggesting increased glycogen synthesis. In parallel, the expression levels of the clock genes Bmal1 and Cry1, whose protein products play also a metabolic role, were ∼2-fold increased and decreased, respectively. In conclusion, BDNF binding to hepatocytes leads to activation of catabolic pathways, such as fatty acid oxidation. In parallel gluconeogenesis is inhibited, while glycogen storage is triggered. This metabolic state mimics that of after breakfast, in which the liver continues to oxidize fat, stops gluconeogenesis and replenishes glycogen stores. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Activation of parallel fiber feedback by spatially diffuse stimuli reduces signal and noise correlations via independent mechanisms in a cerebellum-like structure.

    PubMed

    Simmonds, Benjamin; Chacron, Maurice J

    2015-01-01

    Correlations between the activities of neighboring neurons are observed ubiquitously across systems and species and are dynamically regulated by several factors such as the stimulus' spatiotemporal extent as well as by the brain's internal state. Using the electrosensory system of gymnotiform weakly electric fish, we recorded the activities of pyramidal cell pairs within the electrosensory lateral line lobe (ELL) under spatially localized and diffuse stimulation. We found that both signal and noise correlations were markedly reduced (>40%) under the latter stimulation. Through a network model incorporating key anatomical features of the ELL, we reveal how activation of diffuse parallel fiber feedback from granule cells by spatially diffuse stimulation can explain both the reduction in signal as well as the reduction in noise correlations seen experimentally through independent mechanisms. First, we show that burst-timing dependent plasticity, which leads to a negative image of the stimulus and thereby reduces single neuron responses, decreases signal but not noise correlations. Second, we show trial-to-trial variability in the responses of single granule cells to sensory input reduces noise but not signal correlations. Thus, our model predicts that the same feedback pathway can simultaneously reduce both signal and noise correlations through independent mechanisms. To test this prediction experimentally, we pharmacologically inactivated parallel fiber feedback onto ELL pyramidal cells. In agreement with modeling predictions, we found that inactivation increased both signal and noise correlations but that there was no significant relationship between magnitude of the increase in signal correlations and the magnitude of the increase in noise correlations. The mechanisms reported in this study are expected to be generally applicable to the cerebellum as well as other cerebellum-like structures. We further discuss the implications of such decorrelation on the neural coding strategies used by the electrosensory and by other systems to process natural stimuli.

  10. Systematic review automation technologies

    PubMed Central

    2014-01-01

    Systematic reviews, a cornerstone of evidence-based medicine, are not produced quickly enough to support clinical practice. The cost of production, availability of the requisite expertise and timeliness are often quoted as major contributors for the delay. This detailed survey of the state of the art of information systems designed to support or automate individual tasks in the systematic review, and in particular systematic reviews of randomized controlled clinical trials, reveals trends that see the convergence of several parallel research projects. We surveyed literature describing informatics systems that support or automate the processes of systematic review or each of the tasks of the systematic review. Several projects focus on automating, simplifying and/or streamlining specific tasks of the systematic review. Some tasks are already fully automated while others are still largely manual. In this review, we describe each task and the effect that its automation would have on the entire systematic review process, summarize the existing information system support for each task, and highlight where further research is needed for realizing automation for the task. Integration of the systems that automate systematic review tasks may lead to a revised systematic review workflow. We envisage the optimized workflow will lead to system in which each systematic review is described as a computer program that automatically retrieves relevant trials, appraises them, extracts and synthesizes data, evaluates the risk of bias, performs meta-analysis calculations, and produces a report in real time. PMID:25005128

  11. Do all roads lead to Rome? The role of neuro-immune interactions before birth in the programming of offspring obesity

    PubMed Central

    Jasoni, Christine L.; Sanders, Tessa R.; Kim, Dong Won

    2015-01-01

    The functions of the nervous system can be powerfully modulated by the immune system. Although traditionally considered to be quite separate, neuro-immune interactions are increasingly recognized as critical for both normal and pathological nervous system function in the adult. However, a growing body of information supports a critical role for neuro-immune interactions before birth, particularly in the prenatal programming of later-life neurobehavioral disease risk. This review will focus on maternal obesity, as it represents an environment of pathological immune system function during pregnancy that elevates offspring neurobehavioral disease risk. We will first delineate the normal role of the immune system during pregnancy, including the role of the placenta as both a barrier and relayer of inflammatory information between the maternal and fetal environments. This will be followed by the current exciting findings of how immuno-modulatory molecules may elevate offspring risk of neurobehavioral disease by altering brain development and, consequently, later life function. Finally, by drawing parallels with pregnancy complications other than obesity, we will suggest that aberrant immune activation, irrespective of its origin, may lead to neuro-immune interactions that otherwise would not exist in the developing brain. These interactions could conceivably derail normal brain development and/or later life function, and thereby elevate risk for obesity and other neurobehavioral disorders later in the offspring's life. PMID:25691854

  12. A parallel algorithm for 2D visco-acoustic frequency-domain full-waveform inversion: application to a dense OBS data set

    NASA Astrophysics Data System (ADS)

    Sourbier, F.; Operto, S.; Virieux, J.

    2006-12-01

    We present a distributed-memory parallel algorithm for 2D visco-acoustic full-waveform inversion of wide-angle seismic data. Our code is written in fortran90 and use MPI for parallelism. The algorithm was applied to real wide-angle data set recorded by 100 OBSs with a 1-km spacing in the eastern-Nankai trough (Japan) to image the deep structure of the subduction zone. Full-waveform inversion is applied sequentially to discrete frequencies by proceeding from the low to the high frequencies. The inverse problem is solved with a classic gradient method. Full-waveform modeling is performed with a frequency-domain finite-difference method. In the frequency-domain, solving the wave equation requires resolution of a large unsymmetric system of linear equations. We use the massively parallel direct solver MUMPS (http://www.enseeiht.fr/irit/apo/MUMPS) for distributed-memory computer to solve this system. The MUMPS solver is based on a multifrontal method for the parallel factorization. The MUMPS algorithm is subdivided in 3 main steps: a symbolic analysis step that performs re-ordering of the matrix coefficients to minimize the fill-in of the matrix during the subsequent factorization and an estimation of the assembly tree of the matrix. Second, the factorization is performed with dynamic scheduling to accomodate numerical pivoting and provides the LU factors distributed over all the processors. Third, the resolution is performed for multiple sources. To compute the gradient of the cost function, 2 simulations per shot are required (one to compute the forward wavefield and one to back-propagate residuals). The multi-source resolutions can be performed in parallel with MUMPS. In the end, each processor stores in core a sub-domain of all the solutions. These distributed solutions can be exploited to compute in parallel the gradient of the cost function. Since the gradient of the cost function is a weighted stack of the shot and residual solutions of MUMPS, each processor computes the corresponding sub-domain of the gradient. In the end, the gradient is centralized on the master processor using a collective communation. The gradient is scaled by the diagonal elements of the Hessian matrix. This scaling is computed only once per frequency before the first iteration of the inversion. Estimation of the diagonal terms of the Hessian requires performing one simulation per non redondant shot and receiver position. The same strategy that the one used for the gradient is used to compute the diagonal Hessian in parallel. This algorithm was applied to a dense wide-angle data set recorded by 100 OBSs in the eastern Nankai trough, offshore Japan. Thirteen frequencies ranging from 3 and 15 Hz were inverted. Tweny iterations per frequency were computed leading to 260 tomographic velocity models of increasing resolution. The velocity model dimensions are 105 km x 25 km corresponding to a finite-difference grid of 4201 x 1001 grid with a 25-m grid interval. The number of shot was 1005 and the number of inverted OBS gathers was 93. The inversion requires 20 days on 6 32-bits bi-processor nodes with 4 Gbytes of RAM memory per node when only the LU factorization is performed in parallel. Preliminary estimations of the time required to perform the inversion with the fully-parallelized code is 6 and 4 days using 20 and 50 processors respectively.

  13. Biocellion: accelerating computer simulation of multicellular biological system models.

    PubMed

    Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya

    2014-11-01

    Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Parallel interconnect for a novel system approach to short distance high information transfer data links

    NASA Astrophysics Data System (ADS)

    Raskin, Glenn; Lebby, Michael S.; Carney, F.; Kazakia, M.; Schwartz, Daniel B.; Gaw, Craig A.

    1997-04-01

    The OPTOBUSTM family of products provides for high performance parallel interconnection utilizing optical links in a 10-bit wide bi-directional configuration. The link is architected to be 'transparent' in that it is totally asynchronous and dc coupled so that it can be treated as a perfect cable with extremely low skew and no losses. An optical link consists of two identical transceiver modules and a pair of connectorized 62.5 micrometer multi mode fiber ribbon cables. The OPTOBUSTM I link provides bi- directional functionality at 4 Gbps (400 Mbps per channel), while the OPTOBUSTM II link will offer the same capability at 8 Gbps (800 Mbps per channel). The transparent structure of the OPTOBUSTM links allow for an arbitrary data stream regardless of its structure. Both the OPTOBUSTM I and OPTOBUSTM II transceiver modules are packaged as partially populated 14 by 14 pin grid arrays (PGA) with optical receptacles on one side of the module. The modules themselves are composed of several elements; including passives, integrated circuits optoelectronic devices and optical interface units (OIUs) (which consist of polymer waveguides and a specially designed lead frame). The initial offering of the modules electrical interface utilizes differential CML. The CML line driver sinks 5 mA of current into one of two pins. When terminated with 50 ohm pull-up resistors tied to a voltage between VCC and VCC-2, the result is a differential swing of plus or minus 250 mV, capable of driving standard PECL I/Os. Future offerings of the OPTOBUSTM links will incorporate LVDS and PECL interfaces as well as CML. The integrated circuits are silicon based. For OPTOBUSTM I links, a 1.5 micrometer drawn emitter NPN bipolar process is used for the receiver and an enhanced 0.8 micrometer CMOS process for the laser driver. For OPTOBUSTM II links, a 0.8 micrometer drawn emitter NPN bipolar process is used for the receiver and the driver IC utilizes 0.8 micrometer BiCMOS technology. The OPTOBUSTM architecture uses AlGaAs vertical cavity surface emitting lasers (VCSELs) at 850 nm in conjunction with unique opto-electronic packaging concepts. Most laser based transmitter subsystems are incapable of carrying an arbitrary NRZ data stream at high data rates. The receiver subsystem utilizes a conventional GaAs PIN photo-detector. In parallel interconnect systems. The design must take into account the simultaneous switching noise from the neighboring systems. If not well controlled, the high density of the multiple interconnects can limit the sensitivity and therefore the performance of the system. The packaging approach of the VCSEL and PIN arrays allow for high bandwidths and provide the coupling mechanisms necessary to interface to the 62.5 micrometer multi mode fiber. To allow for extremely high electrical signals the OPTOBUSTM package utilizes a multilayer tape automated bonded (TAB) lead frame. The lead frame contains separate signal and ground layers. The ground layer successfully provides for a pseudo-coaxial environment (low inductance and effective signal coupling to the ground plane).

  15. The Goddard Space Flight Center Program to develop parallel image processing systems

    NASA Technical Reports Server (NTRS)

    Schaefer, D. H.

    1972-01-01

    Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.

  16. Design Method of Digital Optimal Control Scheme and Multiple Paralleled Bridge Type Current Amplifier for Generating Gradient Magnetic Fields in MRI Systems

    NASA Astrophysics Data System (ADS)

    Watanabe, Shuji; Takano, Hiroshi; Fukuda, Hiroya; Hiraki, Eiji; Nakaoka, Mutsuo

    This paper deals with a digital control scheme of multiple paralleled high frequency switching current amplifier with four-quadrant chopper for generating gradient magnetic fields in MRI (Magnetic Resonance Imaging) systems. In order to track high precise current pattern in Gradient Coils (GC), the proposal current amplifier cancels the switching current ripples in GC with each other and designed optimum switching gate pulse patterns without influences of the large filter current ripple amplitude. The optimal control implementation and the linear control theory in GC current amplifiers have affinity to each other with excellent characteristics. The digital control system can be realized easily through the digital control implementation, DSPs or microprocessors. Multiple-parallel operational microprocessors realize two or higher paralleled GC current pattern tracking amplifier with optimal control design and excellent results are given for improving the image quality of MRI systems.

  17. Dynamic file-access characteristics of a production parallel scientific workload

    NASA Technical Reports Server (NTRS)

    Kotz, David; Nieuwejaar, Nils

    1994-01-01

    Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.

  18. Storing files in a parallel computing system based on user-specified parser function

    DOEpatents

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

    2014-10-21

    Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.

  19. Methods and apparatus for capture and storage of semantic information with sub-files in a parallel computing system

    DOEpatents

    Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Torres, Aaron

    2015-02-03

    Techniques are provided for storing files in a parallel computing system using sub-files with semantically meaningful boundaries. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a plurality of sub-files. The method comprises the steps of obtaining a user specification of semantic information related to the file; providing the semantic information as a data structure description to a data formatting library write function; and storing the semantic information related to the file with one or more of the sub-files in one or more storage nodes of the parallel computing system. The semantic information provides a description of data in the file. The sub-files can be replicated based on semantically meaningful boundaries.

  20. Methods and apparatus for multi-resolution replication of files in a parallel computing system using semantic information

    DOEpatents

    Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Torres, Aaron

    2015-10-20

    Techniques are provided for storing files in a parallel computing system using different resolutions. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a sub-file. The method comprises the steps of obtaining semantic information related to the file; generating a plurality of replicas of the file with different resolutions based on the semantic information; and storing the file and the plurality of replicas of the file in one or more storage nodes of the parallel computing system. The different resolutions comprise, for example, a variable number of bits and/or a different sub-set of data elements from the file. A plurality of the sub-files can be merged to reproduce the file.

  1. Effect of parallel refraction on magnetospheric upper hybrid waves

    NASA Technical Reports Server (NTRS)

    Engel, J.; Kennel, C. F.

    1984-01-01

    Large amplitude (not less than 10 mV/m) electrostatic plasma waves near the upper hybrid (UH) frequency have been observed from 0 to 50 deg magnetic latitude (MLAT) during satellite plasma-pause crossings. A three-dimensional numerical ray-tracing calculation, based on an electron distribution measured during a GEOS 1 dayside intense upper-hybrid wave event, suggests how UH waves might achieve such large amplitudes away from the geomagnetic equator. Refractive effects largely control the wave amplification and, in particular, the unavoidable refraction due to parallel geomagnetic field gradients restricts growth to levels below those observed. However, a cold electron density gradient parallel to the field can lead to upper hybrid wave growth that can account for the observed emission levels.

  2. AZTEC: A parallel iterative package for the solving linear systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

    1996-12-31

    We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.

  3. Parallel Electrochemical Treatment System and Application for Identifying Acid-Stable Oxygen Evolution Electrocatalysts

    DOE PAGES

    Jones, Ryan J. R.; Shinde, Aniketa; Guevarra, Dan; ...

    2015-01-05

    There are many energy technologies require electrochemical stability or preactivation of functional materials. Due to the long experiment duration required for either electrochemical preactivation or evaluation of operational stability, parallel screening is required to enable high throughput experimentation. We found that imposing operational electrochemical conditions to a library of materials in parallel creates several opportunities for experimental artifacts. We discuss the electrochemical engineering principles and operational parameters that mitigate artifacts int he parallel electrochemical treatment system. We also demonstrate the effects of resistive losses within the planar working electrode through a combination of finite element modeling and illustrative experiments. Operationmore » of the parallel-plate, membrane-separated electrochemical treatment system is demonstrated by exposing a composition library of mixed metal oxides to oxygen evolution conditions in 1M sulfuric acid for 2h. This application is particularly important because the electrolysis and photoelectrolysis of water are promising future energy technologies inhibited by the lack of highly active, acid-stable catalysts containing only earth abundant elements.« less

  4. Compact holographic optical neural network system for real-time pattern recognition

    NASA Astrophysics Data System (ADS)

    Lu, Taiwei; Mintzer, David T.; Kostrzewski, Andrew A.; Lin, Freddie S.

    1996-08-01

    One of the important characteristics of artificial neural networks is their capability for massive interconnection and parallel processing. Recently, specialized electronic neural network processors and VLSI neural chips have been introduced in the commercial market. The number of parallel channels they can handle is limited because of the limited parallel interconnections that can be implemented with 1D electronic wires. High-resolution pattern recognition problems can require a large number of neurons for parallel processing of an image. This paper describes a holographic optical neural network (HONN) that is based on high- resolution volume holographic materials and is capable of performing massive 3D parallel interconnection of tens of thousands of neurons. A HONN with more than 16,000 neurons packaged in an attache case has been developed. Rotation- shift-scale-invariant pattern recognition operations have been demonstrated with this system. System parameters such as the signal-to-noise ratio, dynamic range, and processing speed are discussed.

  5. Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fijany, A.; Milman, M.; Redding, D.

    1994-12-31

    In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less

  6. Parallelized reliability estimation of reconfigurable computer networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Das, Subhendu; Palumbo, Dan

    1990-01-01

    A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance.

  7. Advanced propulsion system for hybrid vehicles

    NASA Technical Reports Server (NTRS)

    Norrup, L. V.; Lintz, A. T.

    1980-01-01

    A number of hybrid propulsion systems were evaluated for application in several different vehicle sizes. A conceptual design was prepared for the most promising configuration. Various system configurations were parametrically evaluated and compared, design tradeoffs performed, and a conceptual design produced. Fifteen vehicle/propulsion systems concepts were parametrically evaluated to select two systems and one vehicle for detailed design tradeoff studies. A single hybrid propulsion system concept and vehicle (five passenger family sedan)were selected for optimization based on the results of the tradeoff studies. The final propulsion system consists of a 65 kW spark-ignition heat engine, a mechanical continuously variable traction transmission, a 20 kW permanent magnet axial-gap traction motor, a variable frequency inverter, a 386 kg lead-acid improved state-of-the-art battery, and a transaxle. The system was configured with a parallel power path between the heat engine and battery. It has two automatic operational modes: electric mode and heat engine mode. Power is always shared between the heat engine and battery during acceleration periods. In both modes, regenerative braking energy is absorbed by the battery.

  8. Parallel Computation of the Regional Ocean Modeling System (ROMS)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, P; Song, Y T; Chao, Y

    2005-04-05

    The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less

  9. Parallel and fault-tolerant algorithms for hypercube multiprocessors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aykanat, C.

    1988-01-01

    Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less

  10. Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

    NASA Astrophysics Data System (ADS)

    Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

    We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.

  11. Cloud object store for checkpoints of high performance computing applications using decoupling middleware

    DOEpatents

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2016-04-19

    Cloud object storage is enabled for checkpoints of high performance computing applications using a middleware process. A plurality of files, such as checkpoint files, generated by a plurality of processes in a parallel computing system are stored by obtaining said plurality of files from said parallel computing system; converting said plurality of files to objects using a log structured file system middleware process; and providing said objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.

  12. Goertler vortices in growing boundary layers: The leading edge receptivity problem, linear growth and the nonlinear breakdown stage

    NASA Technical Reports Server (NTRS)

    Hall, Philip

    1989-01-01

    Goertler vortices are thought to be the cause of transition in many fluid flows of practical importance. A review of the different stages of vortex growth is given. In the linear regime, nonparallel effects completely govern this growth, and parallel flow theories do not capture the essential features of the development of the vortices. A detailed comparison between the parallel and nonparallel theories is given and it is shown that at small vortex wavelengths, the parallel flow theories have some validity; otherwise nonparallel effects are dominant. New results for the receptivity problem for Goertler vortices are given; in particular vortices induced by free stream perturbations impinging on the leading edge of the walls are considered. It is found that the most dangerous mode of this type can be isolated and it's neutral curve is determined. This curve agrees very closely with the available experimental data. A discussion of the different regimes of growth of nonlinear vortices is also given. Again it is shown that, unless the vortex wavelength is small, nonparallel effects are dominant. Some new results for nonlinear vortices of 0(1) wavelengths are given and compared to experimental observations.

  13. Low-Speed Investigation of Upper-Surface Leading-Edge Blowing on a High-Speed Civil Transport Configuration

    NASA Technical Reports Server (NTRS)

    Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.

    1999-01-01

    The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.

  14. GPU-completeness: theory and implications

    NASA Astrophysics Data System (ADS)

    Lin, I.-Jong

    2011-01-01

    This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.

  15. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics.

    PubMed

    Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel

    2012-09-25

    Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.

  16. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics

    PubMed Central

    2012-01-01

    Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363

  17. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  18. Fast I/O for Massively Parallel Applications

    NASA Technical Reports Server (NTRS)

    OKeefe, Matthew T.

    1996-01-01

    The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.

  19. Performance Analysis and Optimization on the UCLA Parallel Atmospheric General Circulation Model Code

    NASA Technical Reports Server (NTRS)

    Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos

    1996-01-01

    An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.

  20. Design of an Input-Parallel Output-Parallel LLC Resonant DC-DC Converter System for DC Microgrids

    NASA Astrophysics Data System (ADS)

    Juan, Y. L.; Chen, T. R.; Chang, H. M.; Wei, S. E.

    2017-11-01

    Compared with the centralized power system, the distributed modularized power system is composed of several power modules with lower power capacity to provide a totally enough power capacity for the load demand. Therefore, the current stress of the power components in each module can then be reduced, and the flexibility of system setup is also enhanced. However, the parallel-connected power modules in the conventional system are usually controlled to equally share the power flow which would result in lower efficiency in low loading condition. In this study, a modular power conversion system for DC micro grid is developed with 48 V dc low voltage input and 380 V dc high voltage output. However, in the developed system control strategy, the numbers of power modules enabled to share the power flow is decided according to the output power at lower load demand. Finally, three 350 W power modules are constructed and parallel-connected to setup a modular power conversion system. From the experimental results, compared with the conventional system, the efficiency of the developed power system in the light loading condition is greatly improved. The modularized design of the power system can also decrease the power loss ratio to the system capacity.

  1. The role of bed-parallel slip in the development of complex normal fault zones

    NASA Astrophysics Data System (ADS)

    Delogkos, Efstratios; Childs, Conrad; Manzocchi, Tom; Walsh, John J.; Pavlides, Spyros

    2017-04-01

    Normal faults exposed in Kardia lignite mine, Ptolemais Basin, NW Greece formed at the same time as bed-parallel slip-surfaces, so that while the normal faults grew they were intermittently offset by bed-parallel slip. Following offset by a bed-parallel slip-surface, further fault growth is accommodated by reactivation on one or both of the offset fault segments. Where one fault is reactivated the site of bed-parallel slip is a bypassed asperity. Where both faults are reactivated, they propagate past each other to form a volume between overlapping fault segments that displays many of the characteristics of relay zones, including elevated strains and transfer of displacement between segments. Unlike conventional relay zones, however, these structures contain either a repeated or a missing section of stratigraphy which has a thickness equal to the throw of the fault at the time of the bed-parallel slip event, and the displacement profiles along the relay-bounding fault segments have discrete steps at their intersections with bed-parallel slip-surfaces. With further increase in displacement, the overlapping fault segments connect to form a fault-bound lens. Conventional relay zones form during initial fault propagation, but with coeval bed-parallel slip, relay-like structures can form later in the growth of a fault. Geometrical restoration of cross-sections through selected faults shows that repeated bed-parallel slip events during fault growth can lead to complex internal fault zone structure that masks its origin. Bed-parallel slip, in this case, is attributed to flexural-slip arising from hanging-wall rollover associated with a basin-bounding fault outside the study area.

  2. 100 Gbps Wireless System and Circuit Design Using Parallel Spread-Spectrum Sequencing

    NASA Astrophysics Data System (ADS)

    Scheytt, J. Christoph; Javed, Abdul Rehman; Bammidi, Eswara Rao; KrishneGowda, Karthik; Kallfass, Ingmar; Kraemer, Rolf

    2017-09-01

    In this article mixed analog/digital signal processing techniques based on parallel spread-spectrum sequencing (PSSS) and radio frequency (RF) carrier synchronization for ultra-broadband wireless communication are investigated on system and circuit level.

  3. A SPECT Scanner for Rodent Imaging Based on Small-Area Gamma Cameras

    NASA Astrophysics Data System (ADS)

    Lage, Eduardo; Villena, José L.; Tapias, Gustavo; Martinez, Naira P.; Soto-Montenegro, Maria L.; Abella, Mónica; Sisniega, Alejandro; Pino, Francisco; Ros, Domènec; Pavia, Javier; Desco, Manuel; Vaquero, Juan J.

    2010-10-01

    We developed a cost-effective SPECT scanner prototype (rSPECT) for in vivo imaging of rodents based on small-area gamma cameras. Each detector consists of a position-sensitive photomultiplier tube (PS-PMT) coupled to a 30 x 30 Nal(Tl) scintillator array and electronics attached to the PS-PMT sockets for adapting the detector signals to an in-house developed data acquisition system. The detector components are enclosed in a lead-shielded case with a receptacle to insert the collimators. System performance was assessed using 99mTc for a high-resolution parallel-hole collimator, and for a 0.75-mm pinhole collimator with a 60° aperture angle and a 42-mm collimator length. The energy resolution is about 10.7% of the photopeak energy. The overall system sensitivity is about 3 cps/μCi/detector and planar spatial resolution ranges from 2.4 mm at 1 cm source-to-collimator distance to 4.1 mm at 4.5 cm with parallel-hole collimators. With pinhole collimators planar spatial resolution ranges from 1.2 mm at 1 cm source-to-collimator distance to 2.4 mm at 4.5 cm; sensitivity at these distances ranges from 2.8 to 0.5 cps/μCi/detector. Tomographic hot-rod phantom images are presented together with images of bone, myocardium and brain of living rodents to demonstrate the feasibility of preclinical small-animal studies with the rSPECT.

  4. A paralleled readout system for an electrical DNA-hybridization assay based on a microstructured electrode array

    NASA Astrophysics Data System (ADS)

    Urban, Matthias; Möller, Robert; Fritzsche, Wolfgang

    2003-02-01

    DNA analytics is a growing field based on the increasing knowledge about the genome with special implications for the understanding of molecular bases for diseases. Driven by the need for cost-effective and high-throughput methods for molecular detection, DNA chips are an interesting alternative to more traditional analytical methods in this field. The standard readout principle for DNA chips is fluorescence based. Fluorescence is highly sensitive and broadly established, but shows limitations regarding quantification (due to signal and/or dye instability) and the need for sophisticated (and therefore high-cost) equipment. This article introduces a readout system for an alternative detection scheme based on electrical detection of nanoparticle-labeled DNA. If labeled DNA is present in the analyte solution, it will bind on complementary capture DNA immobilized in a microelectrode gap. A subsequent metal enhancement step leads to a deposition of conductive material on the nanoparticles, and finally an electrical contact between the electrodes. This detection scheme offers the potential for a simple (low-cost as well as robust) and highly miniaturizable method, which could be well-suited for point-of-care applications in the context of lab-on-a-chip technologies. The demonstrated apparatus allows a parallel readout of an entire array of microstructured measurement sites. The readout is combined with data-processing by an embedded personal computer, resulting in an autonomous instrument that measures and presents the results. The design and realization of such a system is described, and first measurements are presented.

  5. Parallel Plate System for Collecting Data Used to Determine Viscosity

    NASA Technical Reports Server (NTRS)

    Ethridge, Edwin C. (Inventor); Kaukler, William (Inventor)

    2013-01-01

    A parallel-plate system collects data used to determine viscosity. A first plate is coupled to a translator so that the first plate can be moved along a first direction. A second plate has a pendulum device coupled thereto such that the second plate is suspended above and parallel to the first plate. The pendulum device constrains movement of the second plate to a second direction that is aligned with the first direction and is substantially parallel thereto. A force measuring device is coupled to the second plate for measuring force along the second direction caused by movement of the second plate.

  6. Experiences with hypercube operating system instrumentation

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Rudolph, David C.

    1989-01-01

    The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.

  7. Rapid code acquisition algorithms employing PN matched filters

    NASA Technical Reports Server (NTRS)

    Su, Yu T.

    1988-01-01

    The performance of four algorithms using pseudonoise matched filters (PNMFs), for direct-sequence spread-spectrum systems, is analyzed. They are: parallel search with fix dwell detector (PL-FDD), parallel search with sequential detector (PL-SD), parallel-serial search with fix dwell detector (PS-FDD), and parallel-serial search with sequential detector (PS-SD). The operation characteristic for each detector and the mean acquisition time for each algorithm are derived. All the algorithms are studied in conjunction with the noncoherent integration technique, which enables the system to operate in the presence of data modulation. Several previous proposals using PNMF are seen as special cases of the present algorithms.

  8. Observing with HST V: Improvements to the Scheduling of HST Parallel Observations

    NASA Astrophysics Data System (ADS)

    Taylor, D. K.; Vanorsow, D.; Lucks, M.; Henry, R.; Ratnatunga, K.; Patterson, A.

    1994-12-01

    Recent improvements to the Hubble Space Telescope (HST) ground system have significantly increased the frequency of pure parallel observations, i.e. the simultaneous use of multiple HST instruments by different observers. Opportunities for parallel observations are limited by a variety of timing, hardware, and scientific constraints. Formerly, such opportunities were heuristically predicted prior to the construction of the primary schedule (or calendar), and lack of complete information resulted in high rates of scheduling failures and missed opportunities. In the current process the search for parallel opportunities is delayed until the primary schedule is complete, at which point new software tools are employed to identify places where parallel observations are supported. The result has been a considerable increase in parallel throughput. A new technique, known as ``parallel crafting,'' is currently under development to streamline further the parallel scheduling process. This radically new method will replace the standard exposure logsheet with a set of abstract rules from which observation parameters will be constructed ``on the fly'' to best match the constraints of the parallel opportunity. Currently, parallel observers must specify a huge (and highly redundant) set of exposure types in order to cover all possible types of parallel opportunities. Crafting rules permit the observer to express timing, filter, and splitting preferences in a far more succinct manner. The issue of coordinated parallel observations (same PI using different instruments simultaneously), long a troublesome aspect of the ground system, is also being addressed. For Cycle 5, the Phase II Proposal Instructions now have an exposure-level PAR WITH special requirement. While only the primary's alignment will be scheduled on the calendar, new commanding will provide for parallel exposures with both instruments.

  9. Boundedness and exponential convergence in a chemotaxis model for tumor invasion

    NASA Astrophysics Data System (ADS)

    Jin, Hai-Yang; Xiang, Tian

    2016-12-01

    We revisit the following chemotaxis system modeling tumor invasion {ut=Δu-∇ṡ(u∇v),x∈Ω,t>0,vt=Δv+wz,x∈Ω,t>0,wt=-wz,x∈Ω,t>0,zt=Δz-z+u,x∈Ω,t>0, in a smooth bounded domain Ω \\subset {{{R}}n}(n≥slant 1) with homogeneous Neumann boundary and initial conditions. This model was recently proposed by Fujie et al (2014 Adv. Math. Sci. Appl. 24 67-84) as a model for tumor invasion with the role of extracellular matrix incorporated, and was analyzed later by Fujie et al (2016 Discrete Contin. Dyn. Syst. 36 151-69), showing the uniform boundedness and convergence for n≤slant 3 . In this work, we first show that the {{L}∞} -boundedness of the system can be reduced to the boundedness of \\parallel u(\\centerdot,t){{\\parallel}{{L\\frac{n{4}+ɛ}}(Ω )}} for some ɛ >0 alone, and then, for n≥slant 4 , if the initial data \\parallel {{u}0}{{\\parallel}{{L\\frac{n{4}}}}} , \\parallel {{z}0}{{\\parallel}{{L\\frac{n{2}}}}} and \\parallel \

  10. Does Reimportation Reduce Price Differences for Prescription Drugs? Lessons from the European Union

    PubMed Central

    Kyle, Margaret K; Allsbrook, Jennifer S; Schulman, Kevin A

    2008-01-01

    Objective To examine the effect of parallel trade on patterns of price dispersion for prescription drugs in the European Union. Data Sources Longitudinal data from an IMS Midas database of prices and units sold for drugs in 36 categories in 30 countries from 1993 through 2004. Study Design The main outcome measures were mean price differentials and other measures of price dispersion within European Union countries compared with within non-European Union countries. Data Collection/Extraction Methods We identified drugs subject to parallel trade using information provided by IMS and by checking membership lists of parallel import trade associations and lists of approved parallel imports. Principal Findings Parallel trade was not associated with substantial reductions in price dispersion in European Union countries. In descriptive and regression analyses, about half of the price differentials exceeded 50 percent in both European Union and non-European Union countries over time, and price distributions among European Union countries did not show a dramatic change concurrent with the adoption of parallel trade. In regression analysis, we found that although price differentials decreased after 1995 in most countries, they decreased less in the European Union than elsewhere. Conclusions Parallel trade for prescription drugs does not automatically reduce international price differences. Future research should explore how other regulatory schemes might lead to different results elsewhere. PMID:18355258

  11. Multilevel Parallelization of AutoDock 4.2.

    PubMed

    Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

    2011-04-28

    Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.

  12. NETRA: A parallel architecture for integrated vision systems. 1: Architecture and organization

    NASA Technical Reports Server (NTRS)

    Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is considered to be a system that uses vision algorithms from all levels of processing for a high level application (such as object recognition). A model of computation is presented for parallel processing for an IVS. Using the model, desired features and capabilities of a parallel architecture suitable for IVSs are derived. Then a multiprocessor architecture (called NETRA) is presented. This architecture is highly flexible without the use of complex interconnection schemes. The topology of NETRA is recursively defined and hence is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. It is a recursively defined tree-type hierarchical architecture where each of the leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide for desired flexibility. A qualitative evaluation of NETRA is presented. Then general schemes are described to map parallel algorithms onto NETRA. Algorithms are classified according to their communication requirements for parallel processing. An extensive analysis of inter-cluster communication strategies in NETRA is presented, and parameters affecting performance of parallel algorithms when mapped on NETRA are discussed. Finally, a methodology to evaluate performance of algorithms on NETRA is described.

  13. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  14. Parallel machine architecture and compiler design facilities

    NASA Technical Reports Server (NTRS)

    Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

    1990-01-01

    The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.

  15. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

    1999-01-01

    As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.

  16. Current noise generated by spin imbalance in presence of spin relaxation

    NASA Astrophysics Data System (ADS)

    Khrapai, V. S.; Nagaev, K. E.

    2017-01-01

    We calculate current (shot) noise in a metallic diffusive conductor generated by spin imbalance in the absence of a net electric current. This situation is modeled in an idealized three-terminal setup with two biased ferromagnetic leads (F-leads) and one normal lead (N-lead). Parallel magnetization of the F-leads gives rise to spin-imbalance and finite shot noise at the N-lead. Finite spin relaxation results in an increase in the shot noise, which depends on the ratio of the length of the conductor ( L) and the spin relaxation length ( l s). For L >> l s the shot noise increases by a factor of two and coincides with the case of the antiparallel magnetization of the F-leads.

  17. Parallel evolution of image processing tools for multispectral imagery

    NASA Astrophysics Data System (ADS)

    Harvey, Neal R.; Brumby, Steven P.; Perkins, Simon J.; Porter, Reid B.; Theiler, James P.; Young, Aaron C.; Szymanski, John J.; Bloch, Jeffrey J.

    2000-11-01

    We describe the implementation and performance of a parallel, hybrid evolutionary-algorithm-based system, which optimizes image processing tools for feature-finding tasks in multi-spectral imagery (MSI) data sets. Our system uses an integrated spatio-spectral approach and is capable of combining suitably-registered data from different sensors. We investigate the speed-up obtained by parallelization of the evolutionary process via multiple processors (a workstation cluster) and develop a model for prediction of run-times for different numbers of processors. We demonstrate our system on Landsat Thematic Mapper MSI , covering the recent Cerro Grande fire at Los Alamos, NM, USA.

  18. Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1989-01-01

    The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.

  19. Applications and accuracy of the parallel diagonal dominant algorithm

    NASA Technical Reports Server (NTRS)

    Sun, Xian-He

    1993-01-01

    The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines.

  20. Visualization Co-Processing of a CFD Simulation

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi

    1999-01-01

    OVERFLOW, a widely used CFD simulation code, is combined with a visualization system, pV3, to experiment with an environment for simulation/visualization co-processing on a SGI Origin 2000 computer(O2K) system. The shared memory version of the solver is used with the O2K 'pfa' preprocessor invoked to automatically discover parallelism in the source code. No other explicit parallelism is enabled. In order to study the scaling and performance of the visualization co-processing system, sample runs are made with different processor groups in the range of 1 to 254 processors. The data exchange between the visualization system and the simulation system is rapid enough for user interactivity when the problem size is small. This shared memory version of OVERFLOW, with minimal parallelization, does not scale well to an increasing number of available processors. The visualization task takes about 18 to 30% of the total processing time and does not appear to be a major contributor to the poor scaling. Improper load balancing and inter-processor communication overhead are contributors to this poor performance. Work is in progress which is aimed at obtaining improved parallel performance of the solver and removing the limitations of serial data transfer to pV3 by examining various parallelization/communication strategies, including the use of the explicit message passing.

Top