xyce parallel electronic: Topics by Science.gov

Sample records for xyce parallel electronic

Xyce parallel electronic simulator : reference guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.

2011-05-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to runmore » on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.« less
Xyce parallel electronic simulator : users' guide.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.

2011-05-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-artmore » algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Xyce parallel electronic simulator design.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting

2010-09-01

This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly beenmore » funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.« less
Xyce Parallel Electronic Simulator Users Guide Version 6.2.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
Xyce Parallel Electronic Simulator Users Guide Version 6.4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are
Xyce Parallel Electronic Simulator : reference guide, version 2.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Xyce parallel electronic simulator reference guide, version 6.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

2013-08-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce Parallel Electronic Simulator Users' Guide Version 6.7.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one tomore » develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2017 Sandia Corporation. All rights reserved. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks
Xyce parallel electronic simulator reference guide, version 6.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

2014-03-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce Parallel Electronic Simulator : users' guide, version 2.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont

2004-06-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for allmore » numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the
Xyce parallel electronic simulator reference guide, Version 6.0.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

2014-01-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .
Xyce Parallel Electronic Simulator Reference Guide Version 6.7.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . The information herein is subject to change without notice. Copyright c 2002-2017 Sandia Corporation. All rights reserved. Trademarks Xyce TM Electronic Simulator and Xyce TMmore » are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. All other trademarks are property of their respective owners. Contacts World Wide Web http://xyce.sandia.gov https://info.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only) Bug Reports (Sandia only) http://joseki-vm.sandia.gov/bugzilla http://morannon.sandia.gov/bugzilla« less
Xyce Parallel Electronic Simulator Reference Guide Version 6.4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Mei, Ting; Russo, Thomas V.

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TMmore » are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)« less
Xyce Parallel Electronic Simulator Users' Guide Version 6.6.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright c 2002-2016 Sandia Corporation. All rights reserved. Acknowledgements The BSIM Group at the University of California, Berkeley developed the BSIM3, BSIM4, BSIM6, BSIM-CMG and BSIM-SOI models. The BSIM3 is Copyright c 1999, Regents of the University of California. The BSIM4 is Copyright c 2006, Regents of the University of California. The BSIM6 is Copyright c 2015, Regents of the University of California. The BSIM-CMG is Copyright
Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.

This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmanymore » radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability
Xyce parallel electronic simulator users guide, version 6.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce Parallel Electronic Simulator Users' Guide Version 6.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows onemore » to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce parallel electronic simulator users guide, version 6.0.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less
Xyce Parallel Electronic Simulator Reference Guide Version 6.6.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . The information herein is subject to change without notice. Copyright c 2002-2016 Sandia Corporation. All rights reserved. Acknowledgements The BSIM Group at the University ofmore » California, Berkeley developed the BSIM3, BSIM4, BSIM6, BSIM-CMG and BSIM-SOI models. The BSIM3 is Copyright c 1999, Regents of the University of California. The BSIM4 is Copyright c 2006, Regents of the University of California. The BSIM6 is Copyright c 2015, Regents of the University of California. The BSIM-CMG is Copyright c 2012 and 2016, Regents of the University of California. The BSIM-SOI is Copyright c 1990, Regents of the University of California. All rights reserved. The Mextram model has been developed by NXP Semiconductors until 2007, Delft University of Technology from 2007 to 2014, and Auburn University since April 2015. Copyrights c of Mextram are with Delft University of Technology, NXP Semiconductors and Auburn University. The MIT VS Model Research Group developed the MIT Virtual Source (MVS) model. Copyright c 2013 Massachusetts Institute of Technology (MIT). The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. Trademarks Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and Tec
Xyce parallel electronic simulator users' guide, Version 6.0.1.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.« less

Xyce

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thomquist, Heidi K.; Fixel, Deborah A.; Fett, David Brian

The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.
Xyce release and distribution management : version 1.2.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, Scott Alan; Williamson, Charles Michael

2003-10-01

This document presents a high-level description of the Xyce {trademark} Parallel Electronic Simulator Release and Distribution Management Process. The purpose of this process is to standardize the manner in which all Xyce software products progress toward release and how releases are made available to customers. Rigorous Release Management will assure that Xyce releases are created in such a way that the elements comprising the release are traceable and the release itself is reproducible. Distribution Management describes what is to be done with a Xyce release that is eligible for distribution.
Xyce™ Parallel Electronic Simulator Reference Guide Version 6.8

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik Venkatraman; Mei, Ting

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.
Application Note: Power Grid Modeling With Xyce.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sholander, Peter E.

This application note describes how to model steady-state power flows and transient events in electric power grids with the SPICE-compatible Xyce TM Parallel Electronic Simulator developed at Sandia National Labs. This application notes provides a brief tutorial on the basic devices (branches, bus shunts, transformers and generators) found in power grids. The focus is on the features supported and assumptions made by the Xyce models for power grid elements. It then provides a detailed explanation, including working Xyce netlists, for simulating some simple power grid examples such as the IEEE 14-bus test case.
Xyce™ Parallel Electronic Simulator Reference Guide, Version 6.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting

2016-06-01

This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users’ Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users’ Guide. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Simulating neural systems with Xyce.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schiek, Richard Louis; Thornquist, Heidi K.; Mei, Ting

2012-12-01

Sandias parallel circuit simulator, Xyce, can address large scale neuron simulations in a new way extending the range within which one can perform high-fidelity, multi-compartment neuron simulations. This report documents the implementation of neuron devices in Xyce, their use in simulation and analysis of neuron systems.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Keiter, Eric R.; Aadithya, Karthik V.; Mei, Ting

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to developmore » new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.« less
Comparison of ATLOG and Xyce for Bell Labs Electromagnetic Pulse Excitation of Finite-Long Dissipative Conductors over a Ground Plane.

DOE Office of Scientific and Technical Information (OSTI.GOV)

campione, Salvatore; Warne, Larry K.; Schiek, Richard

This report details the modeling results for the response of a finite-length dissipative conductor interacting with a conducting ground to the Bell Labs electromagnetic pulse excitation. We use both a frequency-domain and a time-domain method based on transmission line theory through a code we call ATLOG - Analytic Transmission Line Over Ground. Results are compared to the circuit simulator Xyce for selected cases. Intentionally Left Blank
Real-time electron dynamics for massively parallel excited-state simulations

NASA Astrophysics Data System (ADS)

Andrade, Xavier

The simulation of the real-time dynamics of electrons, based on time dependent density functional theory (TDDFT), is a powerful approach to study electronic excited states in molecular and crystalline systems. What makes the method attractive is its flexibility to simulate different kinds of phenomena beyond the linear-response regime, including strongly-perturbed electronic systems and non-adiabatic electron-ion dynamics. Electron-dynamics simulations are also attractive from a computational point of view. They can run efficiently on massively parallel architectures due to the low communication requirements. Our implementations of electron dynamics, based on the codes Octopus (real-space) and Qball (plane-waves), allow us to simulate systems composed of thousands of atoms and to obtain good parallel scaling up to 1.6 million processor cores. Due to the versatility of real-time electron dynamics and its parallel performance, we expect it to become the method of choice to apply the capabilities of exascale supercomputers for the simulation of electronic excited states.
Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

NASA Astrophysics Data System (ADS)

Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

2018-01-01

This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
Electron Cooling and Isotropization during Magnetotail Current Sheet Thinning: Implications for Parallel Electric Fields

NASA Astrophysics Data System (ADS)

Lu, San; Artemyev, A. V.; Angelopoulos, V.

2017-11-01

Magnetotail current sheet thinning is a distinctive feature of substorm growth phase, during which magnetic energy is stored in the magnetospheric lobes. Investigation of charged particle dynamics in such thinning current sheets is believed to be important for understanding the substorm energy storage and the current sheet destabilization responsible for substorm expansion phase onset. We use Time History of Events and Macroscale Interactions during Substorms (THEMIS) B and C observations in 2008 and 2009 at 18 - 25 RE to show that during magnetotail current sheet thinning, the electron temperature decreases (cooling), and the parallel temperature decreases faster than the perpendicular temperature, leading to a decrease of the initially strong electron temperature anisotropy (isotropization). This isotropization cannot be explained by pure adiabatic cooling or by pitch angle scattering. We use test particle simulations to explore the mechanism responsible for the cooling and isotropization. We find that during the thinning, a fast decrease of a parallel electric field (directed toward the Earth) can speed up the electron parallel cooling, causing it to exceed the rate of perpendicular cooling, and thus lead to isotropization, consistent with observation. If the parallel electric field is too small or does not change fast enough, the electron parallel cooling is slower than the perpendicular cooling, so the parallel electron anisotropy grows, contrary to observation. The same isotropization can also be accomplished by an increasing parallel electric field directed toward the equatorial plane. Our study reveals the existence of a large-scale parallel electric field, which plays an important role in magnetotail particle dynamics during the current sheet thinning process.
Zener Diode Compact Model Parameter Extraction Using Xyce-Dakota Optimization.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Buchheit, Thomas E.; Wilcox, Ian Zachary; Sandoval, Andrew J

This report presents a detailed process for compact model parameter extraction for DC circuit Zener diodes. Following the traditional approach of Zener diode parameter extraction, circuit model representation is defined and then used to capture the different operational regions of a real diode's electrical behavior. The circuit model contains 9 parameters represented by resistors and characteristic diodes as circuit model elements. The process of initial parameter extraction, the identification of parameter values for the circuit model elements, is presented in a way that isolates the dependencies between certain electrical parameters and highlights both the empirical nature of the extraction andmore » portions of the real diode physical behavior which of the parameters are intended to represent. Optimization of the parameters, a necessary part of a robost parameter extraction process, is demonstrated using a 'Xyce-Dakota' workflow, discussed in more detail in the report. Among other realizations during this systematic approach of electrical model parameter extraction, non-physical solutions are possible and can be difficult to avoid because of the interdependencies between the different parameters. The process steps described are fairly general and can be leveraged for other types of semiconductor device model extractions. Also included in the report are recommendations for experiment setups for generating optimum dataset for model extraction and the Parameter Identification and Ranking Table (PIRT) for Zener diodes.« less
Two-stage bulk electron heating in the diffusion region of anti-parallel symmetric reconnection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Le, Ari Yitzchak; Egedal, Jan; Daughton, William Scott

2016-10-13

Electron bulk energization in the diffusion region during anti-parallel symmetric reconnection entails two stages. First, the inflowing electrons are adiabatically trapped and energized by an ambipolar parallel electric field. Next, the electrons gain energy from the reconnection electric field as they undergo meandering motion. These collisionless mechanisms have been described previously, and they lead to highly structured electron velocity distributions. Furthermore, a simplified control-volume analysis gives estimates for how the net effective heating scales with the upstream plasma conditions in agreement with fully kinetic simulations and spacecraft observations.
Acceleration of auroral electrons in parallel electric fields

NASA Technical Reports Server (NTRS)

Kaufmann, R. L.; Walker, D. N.; Arnoldy, R. L.

1976-01-01

Rocket observations of auroral electrons are compared with the predictions of a number of theoretical acceleration mechanisms that involve an electric field parallel to the earth's magnetic field. The theoretical models are discussed in terms of required plasma sources, the location of the acceleration region, and properties of necessary wave-particle scattering mechanisms. We have been unable to find any steady state scatter-free electric field configuration that predicts electron flux distributions in agreement with the observations. The addition of a fluctuating electric field or wave-particle scattering several thousand kilometers above the rocket can modify the theoretical flux distributions so that they agree with measurements. The presence of very narrow energy peaks in the flux contours implies a characteristic temperature of several tens of electron volts or less for the source of field-aligned auroral electrons and a temperature of several hundred electron volts or less for the relatively isotropic 'monoenergetic' auroral electrons. The temperature of the field-aligned electrons is more representative of the magnetosheath or possibly the ionosphere as a source region than of the plasma sheet.
A parallel orbital-updating based plane-wave basis method for electronic structure calculations

NASA Astrophysics Data System (ADS)

Pan, Yan; Dai, Xiaoying; de Gironcoli, Stefano; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui

2017-11-01

Motivated by the recently proposed parallel orbital-updating approach in real space method [1], we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
Evidence for Field-parallel Electron Acceleration in Solar Flares

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haerendel, G.

It is proposed that the coincidence of higher brightness and upward electric current observed by Janvier et al. during a flare indicates electron acceleration by field-parallel potential drops sustained by extremely strong field-aligned currents of the order of 10{sup 4} A m{sup −2}. A consequence of this is the concentration of the currents in sheets with widths of the order of 1 m. The high current density suggests that the field-parallel potential drops are maintained by current-driven anomalous resistivity. The origin of these currents remains a strong challenge for theorists.
Cavity-photon contribution to the effective interaction of electrons in parallel quantum dots

NASA Astrophysics Data System (ADS)

Gudmundsson, Vidar; Sitek, Anna; Abdullah, Nzar Rauf; Tang, Chi-Shung; Manolescu, Andrei

2016-05-01

A single cavity photon mode is expected to modify the Coulomb interaction of an electron system in the cavity. Here we investigate this phenomena in a parallel double quantum dot system. We explore properties of the closed system and the system after it has been opened up for electron transport. We show how results for both cases support the idea that the effective electron-electron interaction becomes more repulsive in the presence of a cavity photon field. This can be understood in terms of the cavity photons dressing the polarization terms in the effective mutual electron interaction leading to nontrivial delocalization or polarization of the charge in the double parallel dot potential. In addition, we find that the effective repulsion of the electrons can be reduced by quadrupolar collective oscillations excited by an external classical dipole electric field.
The structure of the electron diffusion region during asymmetric anti-parallel magnetic reconnection

NASA Astrophysics Data System (ADS)

Swisdak, M.; Drake, J. F.; Price, L.; Burch, J. L.; Cassak, P.

2017-12-01

The structure of the electron diffusion region during asymmetric magnetic reconnection is ex- plored with high-resolution particle-in-cell simulations that focus on an magnetopause event ob- served by the Magnetospheric Multiscale Mission (MMS). A major surprise is the development of a standing, oblique whistler-like structure with regions of intense positive and negative dissipation. This structure arises from high-speed electrons that flow along the magnetosheath magnetic sepa- ratrices, converge in the dissipation region and jet across the x-line into the magnetosphere. The jet produces a region of negative charge and generates intense parallel electric fields that eject the electrons downstream along the magnetospheric separatrices. The ejected electrons produce the parallel velocity-space crescents documented by MMS.
Electron parallel closures for various ion charge numbers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ji, Jeong-Young, E-mail: j.ji@usu.edu; Held, Eric D.; Kim, Sang-Kyeun

2016-03-15

Electron parallel closures for the ion charge number Z = 1 [J.-Y. Ji and E. D. Held, Phys. Plasmas 21, 122116 (2014)] are extended for 1 ≤ Z ≤ 10. Parameters are computed for various Z with the same form of the Z = 1 kernels adopted. The parameters are smoothly varying in Z and hence can be used to interpolate parameters and closures for noninteger, effective ion charge numbers.
Parallel electron force balance and the L-H transition

DOE PAGES

Stoltzfus-Dueck, T.

2016-05-23

In one popular paradigm for the L-H transition, energy transfer to the mean flows directly depletes turbulence fluctuation energy, resulting in suppression of the turbulence and a corresponding transport bifurcation. To quantitatively evaluate this mechanism, one must remember that electron parallel force balance couples nonzonal velocity fluctuations with electron pressure fluctuations on rapid timescales, comparable with the electron transit time. For this reason, energy in the nonzonal velocity stays in a fairly fixed ratio to the free energy in electron density fluctuations, at least for frequency scales much slower than electron transit. Furthermore, in order for direct depletion of themore » energy in turbulent fluctuations to cause the L-H transition, energy transfer via Reynolds stress must therefore drain enough energy to significantly reduce the sum of the free energy in nonzonal velocities and electron pressure fluctuations. At low k⊥, the electron thermal free energy is much larger than the energy in nonzonal velocities, posing a stark challenge for this model of the L-H transition.« less

Quasi-parallel precession diffraction: Alignment method for scanning transmission electron microscopes.

PubMed

Plana-Ruiz, S; Portillo, J; Estradé, S; Peiró, F; Kolb, Ute; Nicolopoulos, S

2018-06-06

A general method to set illuminating conditions for selectable beam convergence and probe size is presented in this work for Transmission Electron Microscopes (TEM) fitted with µs/pixel fast beam scanning control, (S)TEM, and an annular dark field detector. The case of interest of beam convergence and probe size, which enables diffraction pattern indexation, is then used as a starting point in this work to add 100 Hz precession to the beam while imaging the specimen at a fast rate and keeping the projector system in diffraction mode. The described systematic alignment method for the adjustment of beam precession on the specimen plane while scanning at fast rates is mainly based on the sharpness of the precessed STEM image. The complete alignment method for parallel condition and precession, Quasi-Parallel PED-STEM, is presented in block diagram scheme, as it has been tested on a variety of instruments. The immediate application of this methodology is that it renders the TEM column ready for the acquisition of Precessed Electron Diffraction Tomographies (EDT) as well as for the acquisition of slow Precessed Scanning Nanometer Electron Diffraction (SNED). Examples of the quality of the Precessed Electron Diffraction (PED) patterns and PED-STEM alignment images are presented with corresponding probe sizes and convergence angles. Copyright © 2018. Published by Elsevier B.V.
3-D readout-electronics packaging for high-bandwidth massively paralleled imager

DOEpatents

Kwiatkowski, Kris; Lyke, James

2007-12-18

Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
GPAW - massively parallel electronic structure calculations with Python-based software.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Enkovaara, J.; Romero, N.; Shende, S.

2011-01-01

Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used thismore » approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.« less
Measurement of large parallel and perpendicular electric fields on electron spatial scales in the terrestrial bow shock.

PubMed

Bale, S D; Mozer, F S

2007-05-18

Large parallel (parallel electric fields in a collisionless shock. These fields exist on spatial scales comparable to or less than the electron skin depth (a few kilometers) and correspond to magnetic-field-aligned potentials of tens of volts and perpendicular potentials up to a kilovolt. The perpendicular fields are amongst the largest ever measured in space, with energy densities of epsilon0E2/nkBTe of the order of 10%. The measured parallel electric field implies that the electrons are demagnetized, which may result in stochastic (rather than coherent) electron heating.
Electronic scraps--recovering of valuable materials from parallel wire cables.

PubMed

de Araújo, Mishene Christie Pinheiro Bezerra; Chaves, Arthur Pinto; Espinosa, Denise Crocce Romano; Tenório, Jorge Alberto Soares

2008-11-01

Every year, the number of discarded electro-electronic products is increasing. For this reason recycling is needed, to avoid wasting non-renewable natural resources. The objective of this work is to study the recycling of materials from parallel wire cable through unit operations of mineral processing. Parallel wire cables are basically composed of polymer and copper. The following unit operations were tested: grinding, size classification, dense medium separation, electrostatic separation, scrubbing, panning, and elutriation. It was observed that the operations used obtained copper and PVC concentrates with a low degree of cross contamination. It was concluded that total liberation of the materials was accomplished after grinding to less than 3 mm, using a cage mill. Separation using panning and elutriation presented the best results in terms of recovery and cross contamination.
Accelerated Electron-Beam Formation with a High Capture Coefficient in a Parallel Coupled Accelerating Structure

NASA Astrophysics Data System (ADS)

Chernousov, Yu. D.; Shebolaev, I. V.; Ikryanov, I. M.

2018-01-01

An electron beam with a high (close to 100%) coefficient of electron capture into the regime of acceleration has been obtained in a linear electron accelerator based on a parallel coupled slow-wave structure, electron gun with microwave-controlled injection current, and permanent-magnet beam-focusing system. The high capture coefficient was due to the properties of the accelerating structure, beam-focusing system, and electron-injection system. Main characteristics of the proposed systems are presented.
Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

NASA Astrophysics Data System (ADS)

Doerner, Edgardo

2016-05-01

In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.
A description of electron heating with an electrostatic potential jump in a parallel, collisionless, fire hose shock

NASA Technical Reports Server (NTRS)

Ellison, Donald C.; Jones, Frank C.

1988-01-01

The electron heating required if protons scatter elastically in a parallel, collisionless shock is calculated. Near-elastic proton scattering off large amplitude background magnetic field fluctuations might be expected if the waves responsible for the shock dissipation are generated by the fire hose instability. The effects of an electrostatic potential jump in the shock layer are included by assuming that the energy lost by protons in traversing the potential jump is converted into electron thermal pressure. It is found that the electron temperature increase is a strong function of the potential jump. Comparison is made to the parallel shock plasma simulation of Quest (1987).
Ion and Electron Energization in Guide Field Reconnection Outflows with Kinetic Riemann Simulations and Parallel Shock Simulations

NASA Astrophysics Data System (ADS)

Zhang, Q.; Drake, J. F.; Swisdak, M.

2017-12-01

How ions and electrons are energized in magnetic reconnection outflows is an essential topic throughout the heliosphere. Here we carry out guide field PIC Riemann simulations to explore the ion and electron energization mechanisms far downstream of the x-line. Riemann simulations, with their simple magnetic geometry, facilitate the study of the reconnection outflow far downstream of the x-line in much more detail than is possible with conventional reconnection simulations. We find that the ions get accelerated at rotational discontinuities, counter stream, and give rise to two slow shocks. We demonstrate that the energization mechanism at the slow shocks is essentially the same as that of parallel electrostatic shocks. Also, the electron confining electric potential at the slow shocks is driven by the counterstreaming beams, which tend to break the quasi-neutrality. Based on this picture, we build a kinetic model to self consistently predict the downstream ion and electron temperatures. Additional explorations using parallel shock simulations also imply that in a very low beta(0.001 0.01 for a modest guide field) regime, electron energization will be insignificant compared to the ion energization. Our model and the parallel shock simulations might be used as simple tools to understand and estimate the energization of ions and electrons and the energy partition far downstream of the x-line.
A Generalized Electron Heat Flow Relation and its Connection to the Thermal Force and the Solar Wind Parallel Electric Field

NASA Astrophysics Data System (ADS)

Scudder, J. D.

2017-12-01

Enroute to a new formulation of the heat law for the solar wind plasma the role of the invariably neglected, but omnipresent, thermal force for the multi-fluid physics of the corona and solar wind expansion will be discussed. This force (a) controls the size of the collisional ion electron energy exchange, favoring the thermal vs supra thermal electrons; (b) occurs whenever heat flux occurs; (c) remains after the electron and ion fluids come to a no slip, zero parallel current, equilibrium; (d) enhances the equilibrium parallel electric field; but (e) has a size that is theoretically independent of the electron collision frequency - allowing its importance to persist far up into the corona where collisions are invariably ignored in first approximation. The constituent parts of the thermal force allow the derivation of a new generalized electron heat flow relation that will be presented. It depends on the separate field aligned divergences of electron and ion pressures and the gradients of the ion gravitational potential and parallel flow energies and is based upon a multi-component electron distribution function. The new terms in this heat law explicitly incorporate the astrophysical context of gradients, acceleration and external forces that make demands on the parallel electric field and quasi-neutrality; essentially all of these effects are missing in traditional formulations.
Massively parallel first-principles simulation of electron dynamics in materials

DOE PAGES

Draeger, Erik W.; Andrade, Xavier; Gunnels, John A.; ...

2017-08-01

Here we present a highly scalable, parallel implementation of first-principles electron dynamics coupled with molecular dynamics (MD). By using optimized kernels, network topology aware communication, and by fully distributing all terms in the time-dependent Kohn–Sham equation, we demonstrate unprecedented time to solution for disordered aluminum systems of 2000 atoms (22,000 electrons) and 5400 atoms (59,400 electrons), with wall clock time as low as 7.5 s per MD time step. Despite a significant amount of non-local communication required in every iteration, we achieved excellent strong scaling and sustained performance on the Sequoia Blue Gene/Q supercomputer at LLNL. We obtained up tomore » 59% of the theoretical sustained peak performance on 16,384 nodes and performance of 8.75 Petaflop/s (43% of theoretical peak) on the full 98,304 node machine (1,572,864 cores). Lastly, scalable explicit electron dynamics allows for the study of phenomena beyond the reach of standard first-principles MD, in particular, materials subject to strong or rapid perturbations, such as pulsed electromagnetic radiation, particle irradiation, or strong electric currents.« less
Massively parallel first-principles simulation of electron dynamics in materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Draeger, Erik W.; Andrade, Xavier; Gunnels, John A.

Here we present a highly scalable, parallel implementation of first-principles electron dynamics coupled with molecular dynamics (MD). By using optimized kernels, network topology aware communication, and by fully distributing all terms in the time-dependent Kohn–Sham equation, we demonstrate unprecedented time to solution for disordered aluminum systems of 2000 atoms (22,000 electrons) and 5400 atoms (59,400 electrons), with wall clock time as low as 7.5 s per MD time step. Despite a significant amount of non-local communication required in every iteration, we achieved excellent strong scaling and sustained performance on the Sequoia Blue Gene/Q supercomputer at LLNL. We obtained up tomore » 59% of the theoretical sustained peak performance on 16,384 nodes and performance of 8.75 Petaflop/s (43% of theoretical peak) on the full 98,304 node machine (1,572,864 cores). Lastly, scalable explicit electron dynamics allows for the study of phenomena beyond the reach of standard first-principles MD, in particular, materials subject to strong or rapid perturbations, such as pulsed electromagnetic radiation, particle irradiation, or strong electric currents.« less
Cold Electrons as the Drivers of Parallel, Electrostatic Waves in Asymmetric Reconnection

NASA Astrophysics Data System (ADS)

Holmes, J.; Ergun, R.; Newman, D. L.; Wilder, F. D.; Schwartz, S. J.; Goodrich, K.; Eriksson, S.; Torbert, R. B.; Russell, C. T.; Lindqvist, P. A.; Giles, B. L.; Pollock, C. J.; Le Contel, O.; Strangeway, R. J.; Burch, J. L.

2016-12-01

The Magnetospheric MultiScale mission (MMS) has observed several instances of asymmetric reconnection at Earth's magnetopause, where plasma from the magnetosheath encounters that of the magnetosphere. On Earth's dayside, the magnetosphere is often made up of a two-component distribution of cold (<< 10 eV) and hot ( 1 keV) plasma, sometimes including the cold ion plume. Magnetosheath plasma is primarily warm ( 100 eV) post-shock solar wind. Where they meet, magnetopause reconnection alters the magnetic topology such that these two populations are left cohabiting a field line and rapidly mix. There have been several events observed by MMS where the Fast Plasma Instrument (FPI) clearly shows cold ions near the diffusion region impinging upon the warm magnetosheath population. In many of these, we also see patches of strong electrostatic waves parallel to the magnetic field - a smoking gun for rapid mixing via nonlinear processes. Cold ions alone are too slow to create the same waves; solving for roots of a simplified dispersion relation shows the electron population damps out the ion modes. From this, we infer the presence of cold electrons; in one notable case found by Wilder et al. 2016 (in review), they have been observed directly by FPI. Vlasov simulations of plasma mixing for a number of these events closely reproduce the observed electric field signatures. We conclude from numerical analysis and direct MMS observations that cold plasma mixing, including cold electrons, is the primary driver of parallel electrostatic waves observed near the electron diffusion region in asymmetric magnetic reconnection.
Beam quality corrections for parallel-plate ion chambers in electron reference dosimetry

NASA Astrophysics Data System (ADS)

Zink, K.; Wulff, J.

2012-04-01

Current dosimetry protocols (AAPM, IAEA, IPEM, DIN) recommend parallel-plate ionization chambers for dose measurements in clinical electron beams. This study presents detailed Monte Carlo simulations of beam quality correction factors for four different types of parallel-plate chambers: NACP-02, Markus, Advanced Markus and Roos. These chambers differ in constructive details which should have notable impact on the resulting perturbation corrections, hence on the beam quality corrections. The results reveal deviations to the recommended beam quality corrections given in the IAEA TRS-398 protocol in the range of 0%-2% depending on energy and chamber type. For well-guarded chambers, these deviations could be traced back to a non-unity and energy-dependent wall perturbation correction. In the case of the guardless Markus chamber, a nearly energy-independent beam quality correction is resulting as the effects of wall and cavity perturbation compensate each other. For this chamber, the deviations to the recommended values are the largest and may exceed 2%. From calculations of type-B uncertainties including effects due to uncertainties of the underlying cross-sectional data as well as uncertainties due to the chamber material composition and chamber geometry, the overall uncertainty of calculated beam quality correction factors was estimated to be <0.7%. Due to different chamber positioning recommendations given in the national and international dosimetry protocols, an additional uncertainty in the range of 0.2%-0.6% is present. According to the IAEA TRS-398 protocol, the uncertainty in clinical electron dosimetry using parallel-plate ion chambers is 1.7%. This study may help to reduce this uncertainty significantly.
Current correlations for the transport of interacting electrons through parallel quantum dots in a photon cavity

NASA Astrophysics Data System (ADS)

Gudmundsson, Vidar; Abdullah, Nzar Rauf; Sitek, Anna; Goan, Hsi-Sheng; Tang, Chi-Shung; Manolescu, Andrei

2018-06-01

We calculate the current correlations for the steady-state electron transport through multi-level parallel quantum dots embedded in a short quantum wire, that is placed in a non-perfect photon cavity. We account for the electron-electron Coulomb interaction, and the para- and diamagnetic electron-photon interactions with a stepwise scheme of configuration interactions and truncation of the many-body Fock spaces. In the spectral density of the temporal current-current correlations we identify all the transitions, radiative and non-radiative, active in the system in order to maintain the steady state. We observe strong signs of two types of Rabi oscillations.
Electroluminescence Caused by the Transport of Interacting Electrons through Parallel Quantum Dots in a Photon Cavity

NASA Astrophysics Data System (ADS)

Gudmundsson, Vidar; Abdulla, Nzar Rauf; Sitek, Anna; Goan, Hsi-Sheng; Tang, Chi-Shung; Manolescu, Andrei

2018-02-01

We show that a Rabi-splitting of the states of strongly interacting electrons in parallel quantum dots embedded in a short quantum wire placed in a photon cavity can be produced by either the para- or the dia-magnetic electron-photon interactions when the geometry of the system is properly accounted for and the photon field is tuned close to a resonance with the electron system. We use these two resonances to explore the electroluminescence caused by the transport of electrons through the one- and two-electron ground states of the system and their corresponding conventional and vacuum electroluminescense as the central system is opened up by coupling it to external leads acting as electron reservoirs. Our analysis indicates that high-order electron-photon processes are necessary to adequately construct the cavity-photon dressed electron states needed to describe both types of electroluminescence.
Parallel image-acquisition in continuous-wave electron paramagnetic resonance imaging with a surface coil array: Proof-of-concept experiments

NASA Astrophysics Data System (ADS)

Enomoto, Ayano; Hirata, Hiroshi

2014-02-01

This article describes a feasibility study of parallel image-acquisition using a two-channel surface coil array in continuous-wave electron paramagnetic resonance (CW-EPR) imaging. Parallel EPR imaging was performed by multiplexing of EPR detection in the frequency domain. The parallel acquisition system consists of two surface coil resonators and radiofrequency (RF) bridges for EPR detection. To demonstrate the feasibility of this method of parallel image-acquisition with a surface coil array, three-dimensional EPR imaging was carried out using a tube phantom. Technical issues in the multiplexing method of EPR detection were also clarified. We found that degradation in the signal-to-noise ratio due to the interference of RF carriers is a key problem to be solved.
Electron acceleration to high energies at quasi-parallel shock waves in the solar corona

NASA Technical Reports Server (NTRS)

Mann, G.; Classen, H.-T.

1995-01-01

In the solar corona shock waves are generated by flares and/or coronal mass ejections. They manifest themselves in solar type 2 radio bursts appearing as emission stripes with a slow drift from high to low frequencies in dynamic radio spectra. Their nonthermal radio emission indicates that electrons are accelerated to suprathermal and/or relativistic velocities at these shocks. As well known by extraterrestrial in-situ measurements supercritical, quasi-parallel, collisionless shocks are accompanied by so-called SLAMS (short large amplitude magnetic field structures). These SLAMS can act as strong magnetic mirrors, at which charged particles can be reflected and accelerated. Thus, thermal electrons gain energy due to multiple reflections between two SLAMS and reach suprathermal and relativistic velocities. This mechanism of accelerating electrons is discussed for circumstances in the solar corona and may be responsible for the so-called 'herringbones' observed in solar type 2 radio bursts.
Parallel electric fields detected via conjugate electron echoes during the Echo 7 sounding rocket flight

NASA Technical Reports Server (NTRS)

Nemzek, R. J.; Winckler, J. R.

1991-01-01

Electron detectors on the Echo 7 active sounding rocket experiment measured 'conjugate echoes' resulting from artificial electron beam injections. Analysis of the drift motion of the electrons after a complete bounce leads to measurements of the magnetospheric convection electric field mapped to ionospheric altitudes. The magnetospheric field was highly variable, changing by tens of mV/m on time scales of as little as hundreds of millisec. While the smallest-scale magnetospheric field irregularities were mapped out by ionospheric conductivity, larger-scale features were enhanced by up to 50 mV/m in the ionosphere. The mismatch between magnetospheric and ionspheric convection fields indicates a violation of the equipotential field line condition. The parallel fields occurred in regions roughly 10 km across and probably supported a total potential drop of 10-100 V.
The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science.

PubMed

Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H

2014-05-28

Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem

X-Ray and TeV Gamma-Ray Emission from Parallel Electron-Positron or Electron-Proton Beams in BL Lacertae Objects

NASA Astrophysics Data System (ADS)

Krawczynski, H.

2007-04-01

In this paper we discuss models of the X-ray and TeV γ-ray emission from BL Lac objects based on parallel electron-positron or electron-proton beams that form close to the central black hole, due to the strong electric fields generated by the accretion disk and possibly also by the black hole itself. Fitting the energy spectrum of the BL Lac object Mrk 501, we obtain tight constraints on the beam properties. Launching a sufficiently energetic beam requires rather strong magnetic fields close to the black hole (~100-1000 G). However, the model fits imply that the magnetic field in the emission region is only ~0.02 G. Thus, the particles are accelerated close to the black hole and propagate a considerable distance before instabilities trigger the dissipation of energy through synchrotron and self-Compton emission. We discuss various approaches to generate enough power to drive the jet and, at the same time, to accelerate particles to ~20 TeV energies. Although the parallel beam model has its own problems, it explains some of the long-standing problems that plague models based on Fermi-type particle acceleration, such as the presence of a very high minimum Lorentz factor of accelerated particles. We conclude with a brief discussion of the implications of the model for the difference between the processes of jet formation in BL Lac-type objects and those in quasars.
X-ray and TeV Gamma-Ray Emission from Parallel Electron-Positron or Electron-Proton Beams in BL Lac Objects

NASA Astrophysics Data System (ADS)

Krawczynski, Henric

2007-04-01

In this contribution we discuss models of the X-rays and TeV gamma-ray emission from BL Lac objects based on parallel electron-positron or electron-proton beams that form close to the central black hole owing to the strong electric fields generated by the accretion disk and possibly also by the black hole itself. Fitting the energy spectrum of the BL Lac object Mrk 501, we obtain tight constrains on the beam properties. Launching a sufficiently energetic beam requires rather strong magnetic fields close to the black hole 100-1000 G. However, the model fits imply that the magnetic field in the emission region is only 0.02 G. Thus, the particles are accelerated close to the black hole and propagate a considerable distance before instabilities trigger the dissipation of energy through synchrotron and self-Compton emission. We discuss various approaches to generate enough power to drive the jet and, at the same time, to accelerate particles to 20 TeV energies. Although the parallel beam model has its own problems, it explains some of the long-standing problems that plague models based on Fermi type particle acceleration, like the presence of a very high minimum Lorentz factor of accelerated particles. We conclude with a brief discussion of the implications of the model for the difference between the processes of jet formation in BL Lac type objects and in quasars.
Spatial dimensions of the electron diffusion region in anti-parallel magnetic reconnection

NASA Astrophysics Data System (ADS)

Nakamura, Takuma; Nakamura, Rumi; Haseagwa, Hiroshi

2016-03-01

Spatial dimensions of the detailed structures of the electron diffusion region in anti-parallel magnetic reconnection were analyzed based on two-dimensional fully kinetic particle-in-cell simulations. The electron diffusion region in this study is defined as the region where the positive reconnection electric field is sustained by the electron inertial and non-gyrotropic pressure components. Past kinetic studies demonstrated that the dimensions of the whole electron diffusion region and the inner non-gyrotropic region are scaled by the electron inertial length de and the width of the electron meandering motion, respectively. In this study, we successfully obtained more precise scalings of the dimensions of these two regions than the previous studies by performing simulations with sufficiently small grid spacing (1/16-1/8 de) and a sufficient number of particles (800 particles cell-1 on average) under different conditions changing the ion-to-electron mass ratio, the background density and the electron βe (temperature). The obtained scalings are adequately supported by some theories considering spatial variations of field and plasma parameters within the diffusion region. In the reconnection inflow direction, the dimensions of both regions are proportional to de based on the background density. Both dimensions also depend on βe based on the background values, but the dependence in the inner region ( ˜ 0.375th power) is larger than the whole region (0.125th power) reflecting the orbits of meandering and accelerated electrons within the inner region. In the outflow direction, almost only the non-gyrotropic component sustains the positive reconnection electric field. The dimension of this single-scale diffusion region is proportional to the ion-electron hybrid inertial length (dide)1/2 based on the background density and weakly depends on the background βe with the 0.25th power. These firm scalings allow us to predict observable dimensions in real space which are
On-top density functionals for the short-range dynamic correlation between electrons of opposite and parallel spin

NASA Astrophysics Data System (ADS)

Hollett, Joshua W.; Pegoretti, Nicholas

2018-04-01

Separate, one-parameter, on-top density functionals are derived for the short-range dynamic correlation between opposite and parallel-spin electrons, in which the electron-electron cusp is represented by an exponential function. The combination of both functionals is referred to as the Opposite-spin exponential-cusp and Fermi-hole correction (OF) functional. The two parameters of the OF functional are set by fitting the ionization energies and electron affinities, of the atoms He to Ar, predicted by ROHF in combination with the OF functional to the experimental values. For ionization energies, the overall performance of ROHF-OF is better than completely renormalized coupled-cluster [CR-CC(2,3)] and better than, or as good as, conventional density functional methods. For electron affinities, the overall performance of ROHF-OF is less impressive. However, for both ionization energies and electron affinities of third row atoms, the mean absolute error of ROHF-OF is only 3 kJ mol-1.
Study of electrostatic electron cyclotron parallel flow velocity shear instability in the magnetosphere of Saturn

NASA Astrophysics Data System (ADS)

Kandpal, Praveen; Pandey, R. S.

2018-05-01

In the present paper, the study of electrostatic electron cyclotron parallel flow velocity shear instability in presence of perpendicular inhomogeneous DC electric field has been carried out in the magnetosphere of Saturn. Dimensionless growth rate variation of electron cyclotron waves has been observed with respect to k⊥ ρe for various plasma parameters. Effect of velocity shear scale length (Ae), inhomogeneity (P/a), the ratio of ion to electron temperature (Ti/Te) and density gradient (ɛnρe) on the growth of electron cyclotron waves in the inner magnetosphere of Saturn has been studied and analyzed. The mathematical formulation and computation of dispersion relation and growth rate have been done by using the method of characteristic solution and kinetic approach. This theoretical analysis has been done taking the relevant data from the Cassini spacecraft in the inner magnetosphere of Saturn. We have considered ambient magnetic field data and other relevant data for this study at the radial distance of ˜4.82-5.00 Rs. In our study velocity shear and ion to electron temperature ratio have been observed to be the major sources of free energy for the electron cyclotron instability. The inhomogeneity of electric field caused a small noticeable impact on the growth rate of electrostatic electron cyclotron instability. Density gradient has been observed playing stabilizing effect on electron cyclotron instability.
Parallel closure theory for toroidally confined plasmas

NASA Astrophysics Data System (ADS)

Ji, Jeong-Young; Held, Eric D.

2017-10-01

We solve a system of general moment equations to obtain parallel closures for electrons and ions in an axisymmetric toroidal magnetic field. Magnetic field gradient terms are kept and treated using the Fourier series method. Assuming lowest order density (pressure) and temperature to be flux labels, the parallel heat flow, friction, and viscosity are expressed in terms of radial gradients of the lowest-order temperature and pressure, parallel gradients of temperature and parallel flow, and the relative electron-ion parallel flow velocity. Convergence of closure quantities is demonstrated as the number of moments and Fourier modes are increased. Properties of the moment equations in the collisionless limit are also discussed. Combining closures with fluid equations parallel mass flow and electric current are also obtained. Work in collaboration with the PSI Center and supported by the U.S. DOE under Grant Nos. DE-SC0014033, DE-SC0016256, and DE-FG02-04ER54746.
PARAMO: A Parallel Predictive Modeling Platform for Healthcare Analytic Research using Electronic Health Records

PubMed Central

Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R.; Stewart, Walter F.; Malin, Bradley; Sun, Jimeng

2014-01-01

Objective Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: 1) cohort construction, 2) feature construction, 3) cross-validation, 4) feature selection, and 5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. Methods To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which 1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, 2) schedules the tasks in a topological ordering of the graph, and 3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. Results We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3 hours in parallel compared to 9 days if running sequentially. Conclusion This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate
PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records.

PubMed

Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R; Stewart, Walter F; Malin, Bradley; Sun, Jimeng

2014-04-01

Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3h in parallel compared to 9days if running sequentially. This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines
Stabilization of lower hybrid drift modes by finite parallel wavenumber and electron temperature gradients in field-reversed configurations

NASA Astrophysics Data System (ADS)

Farengo, R.; Guzdar, P. N.; Lee, Y. C.

1989-08-01

The effect of finite parallel wavenumber and electron temperature gradients on the lower hybrid drift instability is studied in the parameter regime corresponding to the TRX-2 device [Fusion Technol. 9, 48 (1986)]. Perturbations in the electrostatic potential and all three components of the vector potential are considered and finite beta electron orbit modifications are included. The electron temperature gradient decreases the growth rate of the instability but, for kz=0, unstable modes exist for ηe(=T'en0/Ten0)>6. Since finite kz effects completely stabilize the mode at small values of kz/ky(≂5×10-3), magnetic shear could be responsible for stabilizing the lower hybrid drift instability in field-reversed configurations.
Parallel Nanoshaping of Brittle Semiconductor Nanowires for Strained Electronics.

PubMed

Hu, Yaowu; Li, Ji; Tian, Jifa; Xuan, Yi; Deng, Biwei; McNear, Kelly L; Lim, Daw Gen; Chen, Yong; Yang, Chen; Cheng, Gary J

2016-12-14

Semiconductor nanowires (SCNWs) provide a unique tunability of electro-optical property than their bulk counterparts (e.g., polycrystalline thin films) due to size effects. Nanoscale straining of SCNWs is desirable to enable new ways to tune the properties of SCNWs, such as electronic transport, band structure, and quantum properties. However, there are two bottlenecks to prevent the real applications of straining engineering of SCNWs: strainability and scalability. Unlike metallic nanowires which are highly flexible and mechanically robust for parallel shaping, SCNWs are brittle in nature and could easily break at strains slightly higher than their elastic limits. In addition, the ability to generate nanoshaping in large scale is limited with the current technologies, such as the straining of nanowires with sophisticated manipulators, nanocombing NWs with U-shaped trenches, or buckling NWs with prestretched elastic substrates, which are incompatible with semiconductor technology. Here we present a top-down fabrication methodology to achieve large scale nanoshaping of SCNWs in parallel with tunable elastic strains. This method utilizes nanosecond pulsed laser to generate shock pressure and conformably deform the SCNWs onto 3D-nanostructured silicon substrates in a scalable and ultrafast manner. A polymer dielectric nanolayer is integrated in the process for cushioning the high strain-rate deformation, suppressing the generation of dislocations or cracks, and providing self-preserving mechanism for elastic strain storage in SCNWs. The elastic strain limits have been studied as functions of laser intensity, dimensions of nanowires, and the geometry of nanomolds. As a result of 3D straining, the inhomogeneous elastic strains in GeNWs result in notable Raman peak shifts and broadening, which bring more tunability of the electrical-optical property in SCNWs than traditional strain engineering. We have achieved the first 3D nanostraining enhanced germanium field
An Optimization System with Parallel Processing for Reducing Common-Mode Current on Electronic Control Unit

NASA Astrophysics Data System (ADS)

Okazaki, Yuji; Uno, Takanori; Asai, Hideki

In this paper, we propose an optimization system with parallel processing for reducing electromagnetic interference (EMI) on electronic control unit (ECU). We adopt simulated annealing (SA), genetic algorithm (GA) and taboo search (TS) to seek optimal solutions, and a Spice-like circuit simulator to analyze common-mode current. Therefore, the proposed system can determine the adequate combinations of the parasitic inductance and capacitance values on printed circuit board (PCB) efficiently and practically, to reduce EMI caused by the common-mode current. Finally, we apply the proposed system to an example circuit to verify the validity and efficiency of the system.
An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

NASA Astrophysics Data System (ADS)

Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry

2003-08-01

Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Symmetry Breaking by Parallel Flow Shear

NASA Astrophysics Data System (ADS)

Li, Jiacong; Diamond, Patrick

2015-11-01

Plasma rotation is important in reducing turbulent transport, suppressing MHD instabilities, and is beneficial to confinement. Intrinsic rotation without an external momentum input is of interest for its plausible application on ITER. k∥ spectrum asymmetry is required for residual Reynolds stress that drives the intrinsic rotation. Parallel flows are reported in linear devices without magnetic shear. In CSDX, parallel flows are mostly peaked in the core [Thakur et al., 2014]; more robust flows and reversed profiles are seen in PANTA [Oldenburger, et al. 2012]. A novel mechanism for symmetry breaking in momentum transport is proposed. Magnetic shear or mean flow profile are not required. A seed parallel flow shear (PFS) sets the sign of residual stress by selecting certain modes to grow faster. The resulted spectrum imbalance leads to a nonzero residual stress, which further drives a parallel flow with ∇n as the free energy source, adding to the shear until saturated by diffusion. Balanced flow gradient is set by Π∥Res /χϕ . Residual stress is calculated for ITG turbulence and collisional drift wave turbulence where electron-ion and electron-neutral collisions are discussed and compared. Numerical simulation is proposed for testing the effect of PFS.
Parallelization of the FLAPW method

NASA Astrophysics Data System (ADS)

Canning, A.; Mannstadt, W.; Freeman, A. J.

2000-08-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
Dependence of synergy current driven by lower hybrid wave and electron cyclotron wave on the frequency and parallel refractive index of electron cyclotron wave for Tokamaks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, J.; Chen, S. Y., E-mail: sychen531@163.com; Tang, C. J.

2014-01-15

The physical mechanism of the synergy current driven by lower hybrid wave (LHW) and electron cyclotron wave (ECW) in tokamaks is investigated using theoretical analysis and simulation methods in the present paper. Research shows that the synergy relationship between the two waves in velocity space strongly depends on the frequency ω and parallel refractive index N{sub //} of ECW. For a given spectrum of LHW, the parameter range of ECW, in which the synergy current exists, can be predicted by theoretical analysis, and these results are consistent with the simulation results. It is shown that the synergy effect is mainlymore » caused by the electrons accelerated by both ECW and LHW, and the acceleration of these electrons requires that there is overlap of the resonance regions of the two waves in velocity space.« less
A Parallel Genetic Algorithm for Automated Electronic Circuit Design

NASA Technical Reports Server (NTRS)

Long, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris

2000-01-01

Parallelized versions of genetic algorithms (GAs) are popular primarily for three reasons: the GA is an inherently parallel algorithm, typical GA applications are very compute intensive, and powerful computing platforms, especially Beowulf-style computing clusters, are becoming more affordable and easier to implement. In addition, the low communication bandwidth required allows the use of inexpensive networking hardware such as standard office ethernet. In this paper we describe a parallel GA and its use in automated high-level circuit design. Genetic algorithms are a type of trial-and-error search technique that are guided by principles of Darwinian evolution. Just as the genetic material of two living organisms can intermix to produce offspring that are better adapted to their environment, GAs expose genetic material, frequently strings of 1s and Os, to the forces of artificial evolution: selection, mutation, recombination, etc. GAs start with a pool of randomly-generated candidate solutions which are then tested and scored with respect to their utility. Solutions are then bred by probabilistically selecting high quality parents and recombining their genetic representations to produce offspring solutions. Offspring are typically subjected to a small amount of random mutation. After a pool of offspring is produced, this process iterates until a satisfactory solution is found or an iteration limit is reached. Genetic algorithms have been applied to a wide variety of problems in many fields, including chemistry, biology, and many engineering disciplines. There are many styles of parallelism used in implementing parallel GAs. One such method is called the master-slave or processor farm approach. In this technique, slave nodes are used solely to compute fitness evaluations (the most time consuming part). The master processor collects fitness scores from the nodes and performs the genetic operators (selection, reproduction, variation, etc.). Because of dependency
Module Six: Parallel Circuits; Basic Electricity and Electronics Individualized Learning System.

ERIC Educational Resources Information Center

Bureau of Naval Personnel, Washington, DC.

In this module the student will learn the rules that govern the characteristics of parallel circuits; the relationships between voltage, current, resistance and power; and the results of common troubles in parallel circuits. The module is divided into four lessons: rules of voltage and current, rules for resistance and power, variational analysis,…
Parallel and perpendicular velocity sheared flows driven tripolar vortices in an inhomogeneous electron-ion quantum magnetoplasma

NASA Astrophysics Data System (ADS)

Mirza, Arshad M.; Masood, W.

2011-12-01

Nonlinear equations governing the dynamics of finite amplitude drift-ion acoustic-waves are derived by taking into account sheared ion flows parallel and perpendicular to the ambient magnetic field in a quantum magnetoplasma comprised of electrons and ions. It is shown that stationary solution of the nonlinear equations can be represented in the form of a tripolar vortex for specific profiles of the equilibrium sheared flows. The tripolar vortices are, however, observed to form on very short scales in dense quantum plasmas. The relevance of the present investigation with regard to dense astrophysical environments is also pointed out.
Parallel Quantum Circuit in a Tunnel Junction

NASA Astrophysics Data System (ADS)

Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian; GNS theory Group Team

In between 2 metallic nanopads, adding identical and independent electron transfer paths in parallel increases the electronic effective coupling between the 2 nanopads through the quantum circuit defined by those paths. Measuring this increase of effective coupling using the tunnelling current intensity can lead for example for 2 paths in parallel to the now standard G =G1 +G2 + 2√{G1 .G2 } conductance superposition law (1). This is only valid for the tunnelling regime (2). For large electronic coupling to the nanopads (or at resonance), G can saturate and even decay as a function of the number of parallel paths added in the quantum circuit (3). We provide here the explanation of this phenomenon: the measurement of the effective Rabi oscillation frequency using the current intensity is constrained by the normalization principle of quantum mechanics. This limits the quantum conductance G for example to go when there is only one channel per metallic nanopads. This ef fect has important consequences for the design of Boolean logic gates at the atomic scale using atomic scale or intramolecular circuits. References: This has the financial support by European PAMS project.
Photon escape probabilities in a semi-infinite plane-parallel medium. [from electron plasma surrounding galactic X-ray sources

NASA Technical Reports Server (NTRS)

Williams, A. C.; Elsner, R. F.; Weisskopf, M. C.; Darbro, W.

1984-01-01

It is shown in this work how to obtain the probabilities of photons escaping from a cold electron plasma environment after having undergone an arbitrary number of scatterings. This is done by retaining the exact differential cross section for Thomson scattering as opposed to using its polarization and angle averaged form. The results are given in the form of recursion relations. The geometry used is the semi-infinite plane-parallel geometry witlh a photon source located on a plane at an arbitrary optical depth below the surface. Analytical expressions are given for the probabilities which are accurate over a wide range of initial optical depth. These results can be used to model compact X-ray galactic sources which are surrounded by an electron-rich plasma.

On the wall perturbation correction for a parallel-plate NACP-02 chamber in clinical electron beams.

PubMed

Zink, K; Wulff, J

2011-02-01

In recent years, several Monte Carlo studies have been published concerning the perturbation corrections of a parallel-plate chamber in clinical electron beams. In these studies, a strong depth dependence of the relevant correction factors (p(wall) and P(cav)) for depth beyond the reference depth is recognized and it has been shown that the variation with depth is sensitive to the choice of the chamber's effective point of measurement. Recommendations concerning the positioning of parallel-plate ionization chambers in clinical electron beams are not the same for all current dosimetry protocols. The IAEA TRS-398 as well as the IPEM protocol and the German protocol DIN 6800-2 interpret the depth of measurement within the phantom as the water equivalent depth, i.e., the nonwater equivalence of the entrance window has to be accounted for by shifting the chamber by an amount deltaz. This positioning should ensure that the primary electrons traveling from the surface of the water phantom through the entrance window to the chamber's reference point sustain the same energy loss as the primary electrons in the undisturbed phantom. The objective of the present study is the determination of the shift deltaz for a NACP-02 chamber and the calculation of the resulting wall perturbation correction as a function of depth. Moreover, the contributions of the different chamber walls to the wall perturbation correction are identified. The dose and fluence within the NACP-02 chamber and a wall-less air cavity is calculated using the Monte Carlo code EGSnrc in a water phantom at different depths for different clinical electron beams. In order to determine the necessary shift to account for the nonwater equivalence of the entrance window, the chamber is shifted in steps deltaz around the depth of measurement. The optimal shift deltaz is determined from a comparison of the spectral fluence within the chamber and the bare cavity. The wall perturbation correction is calculated as the ratio
Efficient, massively parallel eigenvalue computation

NASA Technical Reports Server (NTRS)

Huo, Yan; Schreiber, Robert

1993-01-01

In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
Antiresonance and decoupling in electronic transport through parallel-coupled quantum-dot structures with laterally-coupled Majorana zero modes

NASA Astrophysics Data System (ADS)

Zhang, Ya-Jing; Zhang, Lian-Lian; Jiang, Cui; Gong, Wei-Jiang

2018-02-01

We theoretically investigate the electronic transport through a parallel-coupled multi-quantum-dot system, in which the terminal dots of a one-dimensional quantum-dot chain are embodied in the two arms of an Aharonov-Bohm interferometer. It is found that in the structures of odd(even) dots, all their even(odd) molecular states have opportunities to decouple from the leads, and in this process antiresonance occurs which are accordant with the odd(even)-numbered eigenenergies of the sub-molecule without terminal dots. Next when Majorana zero modes are introduced to couple laterally to the terminal dots, the antiresonance and decoupling phenomena still co-exist in the quantum transport process. Such a result can be helpful in understanding the special influence of Majorana zero mode on the electronic transport through quantum-dot systems.
In-situ Isotopic Analysis at Nanoscale using Parallel Ion Electron Spectrometry: A Powerful New Paradigm for Correlative Microscopy

NASA Astrophysics Data System (ADS)

Yedra, Lluís; Eswara, Santhana; Dowsett, David; Wirtz, Tom

2016-06-01

Isotopic analysis is of paramount importance across the entire gamut of scientific research. To advance the frontiers of knowledge, a technique for nanoscale isotopic analysis is indispensable. Secondary Ion Mass Spectrometry (SIMS) is a well-established technique for analyzing isotopes, but its spatial-resolution is fundamentally limited. Transmission Electron Microscopy (TEM) is a well-known method for high-resolution imaging down to the atomic scale. However, isotopic analysis in TEM is not possible. Here, we introduce a powerful new paradigm for in-situ correlative microscopy called the Parallel Ion Electron Spectrometry by synergizing SIMS with TEM. We demonstrate this technique by distinguishing lithium carbonate nanoparticles according to the isotopic label of lithium, viz. 6Li and 7Li and imaging them at high-resolution by TEM, adding a new dimension to correlative microscopy.
Effect of parallel electric fields on the ponderomotive stabilization of MHD instabilities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Litwin, C.; Hershkowitz, N.

The contribution of the wave electric field component E/sub parallel/, parallel to the magnetic field, to the ponderomotive stabilization of curvature driven instabilities is evaluated and compared to the transverse component contribution. For the experimental density range, in which the stability is primarily determined by the m = 1 magnetosonic wave, this contribution is found to be the dominant and stabilizing when the electron temperature is neglected. For sufficiently high electron temperatures the dominant fast wave is found to be axially evanescent. In the same limit, E/sub parallel/ becomes radially oscillating. It is concluded that the increased electron temperature nearmore » the plasma surface reduces the magnitude of ponderomotive effects.« less
One-dimension modeling on the parallel-plate ion extraction process based on a non-electron-equilibrium fluid model

NASA Astrophysics Data System (ADS)

Li, He-Ping; Chen, Jian; Guo, Heng; Jiang, Dong-Jun; Zhou, Ming-Sheng; Department of Engineering Physics Team

2017-10-01

Ion extraction from a plasma under an externally applied electric field involve multi-particle and multi-field interactions, and has wide applications in the fields of materials processing, etching, chemical analysis, etc. In order to develop the high-efficiency ion extraction methods, it is indispensable to establish a feasible model to understand the non-equilibrium transportation processes of the charged particles and the evolutions of the space charge sheath during the extraction process. Most of the previous studies on the ion extraction process are mainly based on the electron-equilibrium fluid model, which assumed that the electrons are in the thermodynamic equilibrium state. However, it may lead to some confusions with neglecting the electron movement during the sheath formation process. In this study, a non-electron-equilibrium model is established to describe the transportation of the charged particles in a parallel-plate ion extraction process. The numerical results show that the formation of the Child-Langmuir sheath is mainly caused by the charge separation. And thus, the sheath shielding effect will be significantly weakened if the charge separation is suppressed during the extraction process of the charged particles.
Solution-phase parallel synthesis of aryloxyimino amides via a novel multicomponent reaction among aromatic (Z)-chlorooximes, isocyanides, and electron-deficient phenols.

PubMed

Mercalli, Valentina; Giustiniano, Mariateresa; Del Grosso, Erika; Varese, Monica; Cassese, Hilde; Massarotti, Alberto; Novellino, Ettore; Tron, Gian Cesare

2014-11-10

A library of 41 aryloxyimino amides was prepared via solution phase parallel synthesis by extending the multicomponent reaction of (Z)-chlorooximes and isocyanides to the use of electron-deficient phenols. The resulting aryloxyiminoamide derivatives can be used as intermediates for the synthesis of benzo[d]isoxazole-3-carboxamides, dramatically reducing the number of synthetic steps required by other methods reported in literature.
Nongyrotropic Electrons in Guide Field Reconnection

NASA Technical Reports Server (NTRS)

Wendel, D. E.; Hesse, M.; Bessho, N.; Adrian, M. L.; Kuznetsova, M.

2016-01-01

We apply a scalar measure of nongyrotropy to the electron pressure tensor in a 2D particle-in-cell simulation of guide field reconnection and assess the corresponding electron distributions and the forces that account for the nongyrotropy. The scalar measure reveals that the nongyrotropy lies in bands that straddle the electron diffusion region and the separatrices, in the same regions where there are parallel electric fields. Analysis of electron distributions and fields shows that the nongyrotropy along the inflow and outflow separatrices emerges as a result of multiple populations of electrons influenced differently by large and small-scale parallel electric fields and by gradients in the electric field. The relevant parallel electric fields include large-scale potential ramps emanating from the x-line and sub-ion inertial scale bipolar electron holes. Gradients in the perpendicular electric field modify electrons differently depending on their phase, thus producing nongyrotropy. Magnetic flux violation occurs along portions of the separatrices that coincide with the parallel electric fields. An inductive electric field in the electron EB drift frame thus develops, which has the effect of enhancing nongyrotropies already produced by other mechanisms and under certain conditions producing their own nongyrotropy. Particle tracing of electrons from nongyrotropic populations along the inflows and outflows shows that the striated structure of nongyrotropy corresponds to electrons arriving from different source regions. We also show that the relevant parallel electric fields receive important contributions not only from the nongyrotropic portion of the electron pressure tensor but from electron spatial and temporal inertial terms as well.
Military Curricula for Vocational & Technical Education. Basic Electricity and Electronics Individualized Learning System. CANTRAC A-100-0010. Module Six: Parallel Circuits. Study Booklet.

ERIC Educational Resources Information Center

Chief of Naval Education and Training Support, Pensacola, FL.

This individualized learning module on parallel circuits is one in a series of modules for a course in basic electricity and electronics. The course is one of a number of military-developed curriculum packages selected for adaptation to vocational instructional and curriculum development in a civilian setting. Four lessons are included in the…
Magnitude of parallel pseudo potential in a magnetosonic shock wave

NASA Astrophysics Data System (ADS)

Ohsawa, Yukiharu

2018-05-01

The parallel pseudo potential F, which is the integral of the parallel electric field along the magnetic field, in a large-amplitude magnetosonic pulse (shock wave) is theoretically studied. Particle simulations revealed in the late 1990's that the product of the elementary charge and F can be much larger than the electron temperature in shock waves, i.e., the parallel electric field can be quite strong. However, no theory was presented for this unexpected result. This paper first revisits the small-amplitude theory for F and then investigates the parallel pseudo potential F in large-amplitude pulses based on the two-fluid model with finite thermal pressures. It is found that the magnitude of F in a shock wave is determined by the wave amplitude, the electron temperature, and the kinetic energy of an ion moving with the Alfvén speed. This theoretically obtained expression for F is nearly identical to the empirical relation for F discovered in the previous simulation work.
Parallel Electric Field on Auroral Magnetic Field Lines.

NASA Astrophysics Data System (ADS)

Yeh, Huey-Ching Betty

1982-03-01

The interaction of Birkeland (magnetic-field-aligned) current carriers and the Earth's magnetic field results in electrostatic potential drops along magnetic field lines. The statistical distributions of the field-aligned potential difference (phi)(,(PARLL)) were determined from the energy spectra of electron inverted "V" events observed at ionospheric altitude for different conditions of geomagnetic activity as indicated by the AE index. Data of 1270 electron inverted "V"'s were obtained from Low-Energy Electron measurements of the Atmosphere Explorer-C and -D Satellite (despun mode) in the interval January 1974-April 1976. In general, (phi)(,(PARLL)) is largest in the dusk to pre-midnight sector, smaller in the post-midnight to dawn sector, and smallest in the near noon sector during quiet and disturbed geomagnetic conditions; there is a steady dusk-dawn-noon asymmetry of the global (phi)(,(PARLL)) distribution. As the geomagnetic activity level increases, the (phi)(,(PARLL)) pattern expands to lower invariant latitudes, and the magnitude of (phi)(,(PARLL)) in the 13-24 magnetic local time sector increases significantly. The spatial structure and intensity variation of the global (phi)(,(PARLL)) distribution are statistically more variable, and the magnitudes of (phi)(,(PARLL)) have smaller correlation with the AE-index, in the post-midnight to dawn sector. A strong correlation is found to exist between upward Birkeland current systems and global parallel potential drops, and between auroral electron precipitation patterns and parallel potential drops, regarding their mophology, their intensity and their dependence of geomagnetic activity. An analysis of the fine-scale simultaneous current-voltage relationship for upward Birkeland currents in Region 1 shows that typical field-aligned potential drops are consistent with model predictions based on linear acceleration of the charge carriers through an electrostatic potential drop along convergent magnetic field
Massively parallel implementations of coupled-cluster methods for electron spin resonance spectra. I. Isotropic hyperfine coupling tensors in large radicals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Verma, Prakash; Morales, Jorge A., E-mail: jorge.morales@ttu.edu; Perera, Ajith

2013-11-07

Coupled cluster (CC) methods provide highly accurate predictions of molecular properties, but their high computational cost has precluded their routine application to large systems. Fortunately, recent computational developments in the ACES III program by the Bartlett group [the OED/ERD atomic integral package, the super instruction processor, and the super instruction architecture language] permit overcoming that limitation by providing a framework for massively parallel CC implementations. In that scheme, we are further extending those parallel CC efforts to systematically predict the three main electron spin resonance (ESR) tensors (A-, g-, and D-tensors) to be reported in a series of papers. Inmore » this paper inaugurating that series, we report our new ACES III parallel capabilities that calculate isotropic hyperfine coupling constants in 38 neutral, cationic, and anionic radicals that include the {sup 11}B, {sup 17}O, {sup 9}Be, {sup 19}F, {sup 1}H, {sup 13}C, {sup 35}Cl, {sup 33}S,{sup 14}N, {sup 31}P, and {sup 67}Zn nuclei. Present parallel calculations are conducted at the Hartree-Fock (HF), second-order many-body perturbation theory [MBPT(2)], CC singles and doubles (CCSD), and CCSD with perturbative triples [CCSD(T)] levels using Roos augmented double- and triple-zeta atomic natural orbitals basis sets. HF results consistently overestimate isotropic hyperfine coupling constants. However, inclusion of electron correlation effects in the simplest way via MBPT(2) provides significant improvements in the predictions, but not without occasional failures. In contrast, CCSD results are consistently in very good agreement with experimental results. Inclusion of perturbative triples to CCSD via CCSD(T) leads to small improvements in the predictions, which might not compensate for the extra computational effort at a non-iterative N{sup 7}-scaling in CCSD(T). The importance of these accurate computations of isotropic hyperfine coupling constants to elucidate
The Goddard Space Flight Center Program to develop parallel image processing systems

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1972-01-01

Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Parallel pivoting combined with parallel reduction

NASA Technical Reports Server (NTRS)

Alaghband, Gita

1987-01-01

Parallel algorithms for triangularization of large, sparse, and unsymmetric matrices are presented. The method combines the parallel reduction with a new parallel pivoting technique, control over generations of fill-ins and a check for numerical stability, all done in parallel with the work being distributed over the active processes. The parallel technique uses the compatibility relation between pivots to identify parallel pivot candidates and uses the Markowitz number of pivots to minimize fill-in. This technique is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds.
A Parallel Genetic Algorithm for Automated Electronic Circuit Design

NASA Technical Reports Server (NTRS)

Lohn, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris; Norvig, Peter (Technical Monitor)

2000-01-01

We describe a parallel genetic algorithm (GA) that automatically generates circuit designs using evolutionary search. A circuit-construction programming language is introduced and we show how evolution can generate practical analog circuit designs. Our system allows circuit size (number of devices), circuit topology, and device values to be evolved. We present experimental results as applied to analog filter and amplifier design tasks.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE PAGES

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...

2017-11-14

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

NASA Astrophysics Data System (ADS)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.

2017-11-01

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
Parallel 3D Finite Element Numerical Modelling of DC Electron Guns

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prudencio, E.; Candel, A.; Ge, L.

2008-02-04

In this paper we present Gun3P, a parallel 3D finite element application that the Advanced Computations Department at the Stanford Linear Accelerator Center is developing for the analysis of beam formation in DC guns and beam transport in klystrons. Gun3P is targeted specially to complex geometries that cannot be described by 2D models and cannot be easily handled by finite difference discretizations. Its parallel capability allows simulations with more accuracy and less processing time than packages currently available. We present simulation results for the L-band Sheet Beam Klystron DC gun, in which case Gun3P is able to reduce simulation timemore » from days to some hours.« less
Influence of a parallel magnetic field on the microwave photoconductivity in a high-mobility two-dimensional electron system

NASA Astrophysics Data System (ADS)

Yang, C. L.; Du, R. R.; Pfeiffer, L. N.; West, K. W.

2006-07-01

Using a two-axis magnet, we have studied experimentally the influence of a parallel magnetic field (B//) on microwave-induced resistance oscillations (MIROs) and zero-resistance states (ZRS) previously discovered in a high-mobility two-dimensional electron system. We have observed a strong suppression of MIRO/ZRS by a modest B//˜1T . In Hall bar samples, magnetoplasmon resonance (MPR) has also been observed concurrently with the MIRO/ZRS. In contrast to the suppression of MIRO/ZRS, the MPR peak is apparently enhanced by B// . These findings cannot be explained by a simple modification of single-particle energy spectrum and/or scattering parameters by B// .

DGDFT: A massively parallel method for large scale density functional theory calculations.

PubMed

Hu, Wei; Lin, Lin; Yang, Chao

2015-09-28

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
Effect of parallel refraction on magnetospheric upper hybrid waves

NASA Technical Reports Server (NTRS)

Engel, J.; Kennel, C. F.

1984-01-01

Large amplitude (not less than 10 mV/m) electrostatic plasma waves near the upper hybrid (UH) frequency have been observed from 0 to 50 deg magnetic latitude (MLAT) during satellite plasma-pause crossings. A three-dimensional numerical ray-tracing calculation, based on an electron distribution measured during a GEOS 1 dayside intense upper-hybrid wave event, suggests how UH waves might achieve such large amplitudes away from the geomagnetic equator. Refractive effects largely control the wave amplification and, in particular, the unavoidable refraction due to parallel geomagnetic field gradients restricts growth to levels below those observed. However, a cold electron density gradient parallel to the field can lead to upper hybrid wave growth that can account for the observed emission levels.
Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
Multi-threading: A new dimension to massively parallel scientific computation

NASA Astrophysics Data System (ADS)

Nielsen, Ida M. B.; Janssen, Curtis L.

2000-06-01

Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
DL_MG: A Parallel Multigrid Poisson and Poisson-Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution.

PubMed

Womack, James C; Anton, Lucian; Dziedzic, Jacek; Hasnip, Phil J; Probert, Matt I J; Skylaris, Chris-Kriton

2018-03-13

The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential-a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the Poisson equation, featuring nonhomogeneous dielectric permittivities, ionic concentrations with nonlinear dependencies, and diverse boundary conditions. The analytic solutions generally used to solve the Poisson equation in vacuum (or with homogeneous permittivity) are not applicable in these circumstances, and numerical methods must be used. In this work, we present DL_MG, a flexible, scalable, and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions. Using two chemically relevant model systems, we tested the accuracy and computational performance of DL_MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼10 9 unknowns and 100s of CPU cores. We also applied DL_MG in actual large-scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL_MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.
Partitioning of electron flux between the respiratory chains of the yeast Candida parapsilosis: parallel working of the two chains.

PubMed

Guerin, M G; Camougrand, N M

1994-02-08

Partitioning of the electron flux between the classical and the alternative respiratory chains of the yeast Candida parapsilosis, was measured as a function of the oxidation rate and of the Q-pool redox poise. At low respiration rate, electrons from external NADH travelled preferentially through the alternative pathway as indicated by the antimycin A-insensitivity of electron flow. Inhibition of the alternative pathway by SHAM restored full antimycin A-sensitivity to the remaining electro flow. The dependence of the respiratory rate on the redox poise of the quinone pool was investigated when the electron flux was mediated either by the main respiratory chain (growth in the absence of antimycin A) or by the second respiratory chain (growth in the presence of antimycin A). In the former case, a linear relationship was found between these two parameters. In contrast, in the latter case, the relationship between Q-pool reduction level and electron flux was non-linear, but it could be resolved into two distinct curves. This second quinone is not reducible in the presence of antimycin A but only in the presence of high concentrations of myxothiazol or cyanide. Since two quinone species exist in C. parapsilosis, UQ9 and Qx (C33H54O4), we hypothesized that these two curves could correspond to the functioning of the second quinone engaged during the alternative pathway activity. Partitioning of electrons between both respiratory chains could occur upstream of complex III with the second chain functioning in parallel to the main one, and with the additional possibility of merging into the main one at the complex IV level.
Monte Carlo study of electron relaxation in graphene with spin polarized, degenerate electron gas in presence of electron-electron scattering

NASA Astrophysics Data System (ADS)

Borowik, Piotr; Thobel, Jean-Luc; Adamowicz, Leszek

2017-12-01

The Monte Carlo simulation method is applied to study the relaxation of excited electrons in monolayer graphene. The presence of spin polarized background electrons population, with density corresponding to highly degenerate conditions is assumed. Formulas of electron-electron scattering rates, which properly account for electrons presence in two energetically degenerate, inequivalent valleys in this material are presented. The electron relaxation process can be divided into two phases: thermalization and cooling, which can be clearly distinguished when examining the standard deviation of electron energy distribution. The influence of the exchange effect in interactions between electrons with parallel spins is shown to be important only in transient conditions, especially during the thermalization phase.
Rocket measurement of auroral partial parallel distribution functions

NASA Astrophysics Data System (ADS)

Lin, C.-A.

1980-01-01

The auroral partial parallel distribution functions are obtained by using the observed energy spectra of electrons. The experiment package was launched by a Nike-Tomahawk rocket from Poker Flat, Alaska over a bright auroral band and covered an altitude range of up to 180 km. Calculated partial distribution functions are presented with emphasis on their slopes. The implications of the slopes are discussed. It should be pointed out that the slope of the partial parallel distribution function obtained from one energy spectra will be changed by superposing another energy spectra on it.
Electronics Book II.

ERIC Educational Resources Information Center

Johnson, Dennis; And Others

This manual, the second of three curriculum guides for an electronics course, is intended for use in a program combining vocational English as a second language (VESL) with bilingual vocational education. Ten units cover the electrical team, Ohm's law, Watt's law, series resistive circuits, parallel resistive circuits, series parallel circuits,…
Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

PubMed

Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-07-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.
Characterization of beryllium-boron-bearing materials by Parallel Electron Energy-Loss Spectroscopy (PEELS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garvie, L.A.J.; Buseck, P.R.; Rez, P.

The stoichiometry of a set of commercial and laboratory-synthesized Be-B-bearing materials was determined by parallel electron energy-loss spectroscopy (PEELS). The Be and B K edges are well separated in energy, allowing easy determination of the elemental ratios. PEELS analyses of materials with reported compositions of {open_quotes}Be{sub 2}B{close_quotes} and {open_quotes}Be{sub 2}B{sub 3}{close_quotes} provided stoichiometries near Be{sub 2.8}B and Be{sub 4}B{sub 5}, respectively. We confirmed an earlier report of Be{sub 0.6({+-}0.1)} B{sub 5.9({+-}0.7)}C, a material with the {alpha}-rhombohedral B structure. We also synthesized and characterized Be{sub 0.5({+-}0.1)}B{sub 4.9({+-}0.3)}N, a new ternary material with the {alpha}-rhombohedral B structure with refined hexagonal cell parametersmore » of a{sub h} = 5.487(5) {alpha} and c{sub h} = 12.486(14) {angstrom}. By comparison of the features at the core-loss edges of Be{sub 0.5({+-}0.1)} B{sub 4.9({+-}0.3)}N, we conclude the most N forms N{sub 2} pairs between the icosahedra and Be substitutes for B within the icosahedra.« less
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

PubMed

Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

2014-09-09

A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.
Flexure mechanism-based parallelism measurements for chip-on-glass bonding

NASA Astrophysics Data System (ADS)

Jung, Seung Won; Yun, Won Soo; Jin, Songwan; Kim, Bo Sun; Jeong, Young Hun

2011-08-01

Recently, liquid crystal displays (LCDs) have played vital roles in a variety of electronic devices such as televisions, cellular phones, and desktop/laptop monitors because of their enhanced volume, performance, and functionality. However, there is still a need for thinner LCD panels due to the trend of miniaturization in electronic applications. Thus, chip-on-glass (COG) bonding has become one of the most important aspects in the LCD panel manufacturing process. In this study, a novel sensor was developed to measure the parallelism between the tooltip planes of the bonding head and the backup of the COG main bonder, which has previously been estimated by prescale pressure films in industry. The sensor developed in this study is based on a flexure mechanism, and it can measure the total pressing force and the inclination angles in two directions that satisfy the quantitative definition of parallelism. To improve the measurement accuracy, the sensor was calibrated based on the estimation of the total pressing force and the inclination angles using the least-squares method. To verify the accuracy of the sensor, the estimation results for parallelism were compared with those from prescale pressure film measurements. In addition, the influence of parallelism on the bonding quality was experimentally demonstrated. The sensor was successfully applied to the measurement of parallelism in the COG-bonding process with an accuracy of more than three times that of the conventional method using prescale pressure films.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Parallel, distributed and GPU computing technologies in single-particle electron microscopy

PubMed Central

Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-01-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined. PMID:19564686
Studies in optical parallel processing. [All optical and electro-optic approaches

NASA Technical Reports Server (NTRS)

Lee, S. H.

1978-01-01

Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.
Magnetospheric Multiscale Observations of the Electron Diffusion Region of Large Guide Field Magnetic Reconnection

NASA Technical Reports Server (NTRS)

Eriksson, S.; Wilder, F. D.; Ergun, R. E.; Schwartz, S. J.; Cassak, P. A.; Burch, J. L.; Chen, Li-Jen; Torbert, R. B.; Phan, T. D.; Lavraud, B.;

2016-01-01

We report observations from the Magnetospheric Multiscale (MMS) satellites of a large guide field magnetic reconnection event. The observations suggest that two of the four MMS spacecraft sampled the electron diffusion region, whereas the other two spacecraft detected the exhaust jet from the event. The guide magnetic field amplitude is approximately 4 times that of the reconnecting field. The event is accompanied by a significant parallel electric field (E(sub parallel lines) that is larger than predicted by simulations. The high-speed (approximately 300 km/s) crossing of the electron diffusion region limited the data set to one complete electron distribution inside of the electron diffusion region, which shows significant parallel heating. The data suggest that E(sub parallel lines) is balanced by a combination of electron inertia and a parallel gradient of the gyrotropic electron pressure.

Parallelism in integrated fluidic circuits

NASA Astrophysics Data System (ADS)

Bousse, Luc J.; Kopf-Sill, Anne R.; Parce, J. W.

1998-04-01

Many research groups around the world are working on integrated microfluidics. The goal of these projects is to automate and integrate the handling of liquid samples and reagents for measurement and assay procedures in chemistry and biology. Ultimately, it is hoped that this will lead to a revolution in chemical and biological procedures similar to that caused in electronics by the invention of the integrated circuit. The optimal size scale of channels for liquid flow is determined by basic constraints to be somewhere between 10 and 100 micrometers . In larger channels, mixing by diffusion takes too long; in smaller channels, the number of molecules present is so low it makes detection difficult. At Caliper, we are making fluidic systems in glass chips with channels in this size range, based on electroosmotic flow, and fluorescence detection. One application of this technology is rapid assays for drug screening, such as enzyme assays and binding assays. A further challenge in this area is to perform multiple functions on a chip in parallel, without a large increase in the number of inputs and outputs. A first step in this direction is a fluidic serial-to-parallel converter. Fluidic circuits will be shown with the ability to distribute an incoming serial sample stream to multiple parallel channels.
NOTE: Calibration of low-energy electron beams from a mobile linear accelerator with plane-parallel chambers using both TG-51 and TG-21 protocols

NASA Astrophysics Data System (ADS)

Beddar, A. S.; Tailor, R. C.

2004-04-01

A new approach to intraoperative radiation therapy led to the development of mobile linear electron accelerators that provide lower electron energy beams than the usual conventional accelerators commonly encountered in radiotherapy. Such mobile electron accelerators produce electron beams that have nominal energies of 4, 6, 9 and 12 MeV. This work compares the absorbed dose output calibrations using both the AAPM TG-51 and TG-21 dose calibration protocols for two types of ion chambers: a plane-parallel (PP) ionization chamber and a cylindrical ionization chamber. Our results indicate that the use of a 'Markus' PP chamber causes 2 3% overestimation in dose output determination if accredited dosimetry-calibration laboratory based chamber factors \\big(N_{{\\rm D},{\\rm w}}^{{}^{60}{\\rm Co}}, N_x\\big) are used. However, if the ionization chamber factors are derived using a cross-comparison at a high-energy electron beam, then a good agreement is obtained (within 1%) with a calibrated cylindrical chamber over the entire energy range down to 4 MeV. Furthermore, even though the TG-51 does not recommend using cylindrical chambers at the low energies, our results show that the cylindrical chamber has a good agreement with the PP chamber not only at 6 MeV but also down to 4 MeV electron beams.
Study on acceleration processes of the radiation belt electrons through interaction with sub-packet chorus waves in parallel propagation

NASA Astrophysics Data System (ADS)

Hiraga, R.; Omura, Y.

2017-12-01

By recent observations, chorus waves include fine structures such as amplitude fluctuations (i.e. sub-packet structure), and it has not been verified in detail yet how energetic electrons are efficiently accelerated under the wave features. In this study, we firstly focus on the acceleration process of a single electron: how it experiences the efficient energy increase by interaction with sub-packet chorus waves in parallel propagation along the Earth's magnetic field. In order to reproduce the chorus waves as seen by the latest observations by Van Allen Probes (Foster et al. 2017), the wave model amplitude in our simulation is structured such that when the wave amplitude nonlinearly grows to reach the optimum amplitude, it starts decreasing until crossing the threshold. Once it crosses the threshold, the wave dissipates and a new wave rises to repeat the nonlinear growth and damping in the same manner. The multiple occurrence of this growth-damping cycle forms a saw tooth-like amplitude variation called sub-packet. This amplitude variation also affects the wave frequency behavior which is derived by the chorus wave equations as a function of the wave amplitude (Omura et al. 2009). It is also reasonable to assume that when a wave packet diminishes and the next wave rises, it has a random phase independent of the previous wave. This randomness (discontinuity) in phase variation is included in the simulation. Through interaction with such waves, dynamics of energetic electrons were tracked. As a result, some electrons underwent an efficient acceleration process defined as successive entrapping, in which an electron successfully continues to surf the trapping potential generated by consecutive wave packets. When successive entrapping occurs, an electron trapped and de-trapped (escape the trapping potential) by a single wave packet falls into another trapping potential generated by the next wave sub-packet and continuously accelerated. The occurrence of successive

Massively Parallel Real-Time TDDFT Simulations of Electronic Stopping Processes

NASA Astrophysics Data System (ADS)

Yost, Dillon; Lee, Cheng-Wei; Draeger, Erik; Correa, Alfredo; Schleife, Andre; Kanai, Yosuke

Electronic stopping describes transfer of kinetic energy from fast-moving charged particles to electrons, producing massive electronic excitations in condensed matter. Understanding this phenomenon for ion irradiation has implications in modern technologies, ranging from nuclear reactors, to semiconductor devices for aerospace missions, to proton-based cancer therapy. Recent advances in high-performance computing allow us to achieve an accurate parameter-free description of these phenomena through numerical simulations. Here we discuss results from our recently-developed large-scale real-time TDDFT implementation for electronic stopping processes in important example materials such as metals, semiconductors, liquid water, and DNA. We will illustrate important insight into the physics underlying electronic stopping and we discuss current limitations of our approach both regarding physical and numerical approximations. This work is supported by the DOE through the INCITE awards and by the NSF. Part of this work was performed under the auspices of U.S. DOE by LLNL under Contract DE-AC52-07NA27344.
Multipactor saturation in parallel-plate waveguides

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sorolla, E.; Mattes, M.

2012-07-15

The saturation stage of a multipactor discharge is considered of interest, since it can guide towards a criterion to assess the multipactor onset. The electron cloud under multipactor regime within a parallel-plate waveguide is modeled by a thin continuous distribution of charge and the equations of motion are calculated taking into account the space charge effects. The saturation is identified by the interaction of the electron cloud with its image charge. The stability of the electron population growth is analyzed and two mechanisms of saturation to explain the steady-state multipactor for voltages near above the threshold onset are identified. Themore » impact energy in the collision against the metal plates decreases during the electron population growth due to the attraction of the electron sheet on the image through the initial plate. When this growth remains stable till the impact energy reaches the first cross-over point, the electron surface density tends to a constant value. When the stability is broken before reaching the first cross-over point the surface charge density oscillates chaotically bounded within a certain range. In this case, an expression to calculate the maximum electron surface charge density is found whose predictions agree with the simulations when the voltage is not too high.« less
Some thoughts about parallel process and psychotherapy supervision: when is a parallel just a parallel?

PubMed

Watkins, C Edward

2012-09-01

In a way not done before, Tracey, Bludworth, and Glidden-Tracey ("Are there parallel processes in psychotherapy supervision: An empirical examination," Psychotherapy, 2011, advance online publication, doi.10.1037/a0026246) have shown us that parallel process in psychotherapy supervision can indeed be rigorously and meaningfully researched, and their groundbreaking investigation provides a nice prototype for future supervision studies to emulate. In what follows, I offer a brief complementary comment to Tracey et al., addressing one matter that seems to be a potentially important conceptual and empirical parallel process consideration: When is a parallel just a parallel? PsycINFO Database Record (c) 2012 APA, all rights reserved.
The formation of quasi-parallel shocks. [in space, solar and astrophysical plasmas

NASA Technical Reports Server (NTRS)

Cargill, Peter J.

1991-01-01

In a collisionless plasma, the coupling between a piston and the plasma must take place through either laminar or turbulent electromagnetic fields. Of the three types of coupling (laminar, Larmor and turbulent), shock formation in the parallel regime is dominated by the latter and in the quasi-parallel regime by a combination of all three, depending on the piston. In the quasi-perpendicular regime, there is usually a good separation between piston and shock. This is not true in the quasi-parallel and parallel regime. Hybrid numerical simulations for hot plasma pistons indicate that when the electrons are hot, a shock forms, but does not cleanly decouple from the piston. For hot ion pistons, no shock forms in the parallel limit: in the quasi-parallel case, a shock forms, but there is severe contamination from hot piston ions. These results suggest that the properties of solar and astrophysical shocks, such as particle acceleration, cannot be readily separated from their driving mechanism.
Megavolt parallel potentials arising from double-layer streams in the Earth's outer radiation belt.

PubMed

Mozer, F S; Bale, S D; Bonnell, J W; Chaston, C C; Roth, I; Wygant, J

2013-12-06

Huge numbers of double layers carrying electric fields parallel to the local magnetic field line have been observed on the Van Allen probes in connection with in situ relativistic electron acceleration in the Earth's outer radiation belt. For one case with adequate high time resolution data, 7000 double layers were observed in an interval of 1 min to produce a 230,000 V net parallel potential drop crossing the spacecraft. Lower resolution data show that this event lasted for 6 min and that more than 1,000,000 volts of net parallel potential crossed the spacecraft during this time. A double layer traverses the length of a magnetic field line in about 15 s and the orbital motion of the spacecraft perpendicular to the magnetic field was about 700 km during this 6 min interval. Thus, the instantaneous parallel potential along a single magnetic field line was the order of tens of kilovolts. Electrons on the field line might experience many such potential steps in their lifetimes to accelerate them to energies where they serve as the seed population for relativistic acceleration by coherent, large amplitude whistler mode waves. Because the double-layer speed of 3100 km/s is the order of the electron acoustic speed (and not the ion acoustic speed) of a 25 eV plasma, the double layers may result from a new electron acoustic mode. Acceleration mechanisms involving double layers may also be important in planetary radiation belts such as Jupiter, Saturn, Uranus, and Neptune, in the solar corona during flares, and in astrophysical objects.
Massive parallel 3D PIC simulation of negative ion extraction

NASA Astrophysics Data System (ADS)

Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu

2017-09-01

The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
Electron-acoustic solitons and double layers in the inner magnetosphere: ELECTRON-ACOUSTIC SOLITONS

DOE PAGES

Vasko, I. Y.; Agapitov, O. V.; Mozer, F. S.; ...

2017-05-28

The Van Allen Probes observe generally two types of electrostatic solitary waves (ESW) contributing to the broadband electrostatic wave activity in the nightside inner magnetosphere. ESW with symmetric bipolar parallel electric field are electron phase space holes. The nature of ESW with asymmetric bipolar (and almost unipolar) parallel electric field has remained puzzling. To address their nature, we consider a particular event observed by Van Allen Probes to argue that during the broadband wave activity electrons with energy above 200 eV provide the dominant contribution to the total electron density, while the density of cold electrons (below a few eV)more » is less than a few tenths of the total electron density. We show that velocities of the asymmetric ESW are close to velocity of electron-acoustic waves (existing due to the presence of cold and hot electrons) and follow the Korteweg-de Vries (KdV) dispersion relation derived for the observed plasma conditions (electron energy spectrum is a power law between about 100 eV and 10 keV and Maxwellian above 10 keV). The ESW spatial scales are in general agreement with the KdV theory. We interpret the asymmetric ESW in terms of electron-acoustic solitons and double layers (shocks waves).« less
Electron-acoustic solitons and double layers in the inner magnetosphere: ELECTRON-ACOUSTIC SOLITONS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vasko, I. Y.; Agapitov, O. V.; Mozer, F. S.

The Van Allen Probes observe generally two types of electrostatic solitary waves (ESW) contributing to the broadband electrostatic wave activity in the nightside inner magnetosphere. ESW with symmetric bipolar parallel electric field are electron phase space holes. The nature of ESW with asymmetric bipolar (and almost unipolar) parallel electric field has remained puzzling. To address their nature, we consider a particular event observed by Van Allen Probes to argue that during the broadband wave activity electrons with energy above 200 eV provide the dominant contribution to the total electron density, while the density of cold electrons (below a few eV)more » is less than a few tenths of the total electron density. We show that velocities of the asymmetric ESW are close to velocity of electron-acoustic waves (existing due to the presence of cold and hot electrons) and follow the Korteweg-de Vries (KdV) dispersion relation derived for the observed plasma conditions (electron energy spectrum is a power law between about 100 eV and 10 keV and Maxwellian above 10 keV). The ESW spatial scales are in general agreement with the KdV theory. We interpret the asymmetric ESW in terms of electron-acoustic solitons and double layers (shocks waves).« less
Experimental determination of pCo perturbation factors for plane-parallel chambers

NASA Astrophysics Data System (ADS)

Kapsch, R. P.; Bruggmoser, G.; Christ, G.; Dohm, O. S.; Hartmann, G. H.; Schüle, E.

2007-12-01

For plane-parallel chambers used in electron dosimetry, modern dosimetry protocols recommend a cross-calibration against a calibrated cylindrical chamber. The rationale for this is the unacceptably large (up to 3-4%) chamber-to-chamber variations of the perturbation factors (pwall)Co, which have been reported for plane-parallel chambers of a given type. In some recent publications, it was shown that this is no longer the case for modern plane-parallel chambers. The aims of the present study are to obtain reliable information about the variation of the perturbation factors for modern types of plane-parallel chambers, and—if this variation is found to be acceptably small—to determine type-specific mean values for these perturbation factors which can be used for absorbed dose measurements in electron beams using plane-parallel chambers. In an extensive multi-center study, the individual perturbation factors pCo (which are usually assumed to be equal to (pwall)Co) for a total of 35 plane-parallel chambers of the Roos type, 15 chambers of the Markus type and 12 chambers of the Advanced Markus type were determined. From a total of 188 cross-calibration measurements, variations of the pCo values for different chambers of the same type of at most 1.0%, 0.9% and 0.6% were found for the chambers of the Roos, Markus and Advanced Markus types, respectively. The mean pCo values obtained from all measurements are \\bar{p}^Roos_Co = 1.0198, \\bar{p}^Markus_Co = 1.0175 and \\bar{p}^Advanced_Co = 1.0155 ; the relative experimental standard deviation of the individual pCo values is less than 0.24% for all chamber types; the relative standard uncertainty of the mean pCo values is 1.1%.
Design of on-board parallel computer on nano-satellite

NASA Astrophysics Data System (ADS)

You, Zheng; Tian, Hexiang; Yu, Shijie; Meng, Li

2007-11-01

This paper provides one scheme of the on-board parallel computer system designed for the Nano-satellite. Based on the development request that the Nano-satellite should have a small volume, low weight, low power cost, and intelligence, this scheme gets rid of the traditional one-computer system and dual-computer system with endeavor to improve the dependability, capability and intelligence simultaneously. According to the method of integration design, it employs the parallel computer system with shared memory as the main structure, connects the telemetric system, attitude control system, and the payload system by the intelligent bus, designs the management which can deal with the static tasks and dynamic task-scheduling, protect and recover the on-site status and so forth in light of the parallel algorithms, and establishes the fault diagnosis, restoration and system restructure mechanism. It accomplishes an on-board parallel computer system with high dependability, capability and intelligence, a flexible management on hardware resources, an excellent software system, and a high ability in extension, which satisfies with the conception and the tendency of the integration electronic design sufficiently.
THEMIS Observations of the Magnetopause Electron Diffusion Region: Large Amplitude Waves and Heated Electrons

NASA Technical Reports Server (NTRS)

Tang, Xiangwei; Cattell, Cynthia; Dombeck, John; Dai, Lei; Wilson, Lynn B. III; Breneman, Aaron; Hupack, Adam

2013-01-01

We present the first observations of large amplitude waves in a well-defined electron diffusion region based on the criteria described by Scudder et al at the subsolar magnetopause using data from one Time History of Events and Macroscale Interactions during Substorms (THEMIS) satellite. These waves identified as whistler mode waves, electrostatic solitary waves, lower hybrid waves, and electrostatic electron cyclotron waves, are observed in the same 12 s waveform capture and in association with signatures of active magnetic reconnection. The large amplitude waves in the electron diffusion region are coincident with abrupt increases in electron parallel temperature suggesting strong wave heating. The whistler mode waves, which are at the electron scale and which enable us to probe electron dynamics in the diffusion region were analyzed in detail. The energetic electrons (approx. 30 keV) within the electron diffusion region have anisotropic distributions with T(sub e(right angle))/T(sub e(parallel)) > 1 that may provide the free energy for the whistler mode waves. The energetic anisotropic electrons may be produced during the reconnection process. The whistler mode waves propagate away from the center of the "X-line" along magnetic field lines, suggesting that the electron diffusion region is a possible source region of the whistler mode waves.
Parallel rendering

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1995-01-01

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Parallel Processing of Broad-Band PPM Signals

NASA Technical Reports Server (NTRS)

Gray, Andrew; Kang, Edward; Lay, Norman; Vilnrotter, Victor; Srinivasan, Meera; Lee, Clement

2010-01-01

A parallel-processing algorithm and a hardware architecture to implement the algorithm have been devised for timeslot synchronization in the reception of pulse-position-modulated (PPM) optical or radio signals. As in the cases of some prior algorithms and architectures for parallel, discrete-time, digital processing of signals other than PPM, an incoming broadband signal is divided into multiple parallel narrower-band signals by means of sub-sampling and filtering. The number of parallel streams is chosen so that the frequency content of the narrower-band signals is low enough to enable processing by relatively-low speed complementary metal oxide semiconductor (CMOS) electronic circuitry. The algorithm and architecture are intended to satisfy requirements for time-varying time-slot synchronization and post-detection filtering, with correction of timing errors independent of estimation of timing errors. They are also intended to afford flexibility for dynamic reconfiguration and upgrading. The architecture is implemented in a reconfigurable CMOS processor in the form of a field-programmable gate array. The algorithm and its hardware implementation incorporate three separate time-varying filter banks for three distinct functions: correction of sub-sample timing errors, post-detection filtering, and post-detection estimation of timing errors. The design of the filter bank for correction of timing errors, the method of estimating timing errors, and the design of a feedback-loop filter are governed by a host of parameters, the most critical one, with regard to processing very broadband signals with CMOS hardware, being the number of parallel streams (equivalently, the rate-reduction parameter).
Development and Application of a Parallel LCAO Cluster Method

NASA Astrophysics Data System (ADS)

Patton, David C.

1997-08-01

CPU intensive steps in the SCF electronic structure calculations of clusters and molecules with a first-principles LCAO method have been fully parallelized via a message passing paradigm. Identification of the parts of the code that are composed of many independent compute-intensive steps is discussed in detail as they are the most readily parallelized. Most of the parallelization involves spatially decomposing numerical operations on a mesh. One exception is the solution of Poisson's equation which relies on distribution of the charge density and multipole methods. The method we use to parallelize this part of the calculation is quite novel and is covered in detail. We present a general method for dynamically load-balancing a parallel calculation and discuss how we use this method in our code. The results of benchmark calculations of the IR and Raman spectra of PAH molecules such as anthracene (C_14H_10) and tetracene (C_18H_12) are presented. These benchmark calculations were performed on an IBM SP2 and a SUN Ultra HPC server with both MPI and PVM. Scalability and speedup for these calculations is analyzed to determine the efficiency of the code. In addition, performance and usage issues for MPI and PVM are presented.
Magnetospheric Multiscale Observations of Large-Amplitude Parallel, Electrostatic Waves Associated with Magnetic Reconnection at the Magnetopause

NASA Technical Reports Server (NTRS)

Ergun, R. E.; Holmes, J. C.; Goodrich, K. A.; Wilder, F. D.; Stawarz, J. E.; Eriksson, S.; Newman, D. L.; Schwartz, S. J.; Goldman, M. V.; Sturner, A. P.;

2016-01-01

We report observations from the Magnetospheric Multiscale satellites of large-amplitude, parallel, electrostatic waves associated with magnetic reconnection at the Earth's magnetopause. The observed waves have parallel electric fields (E(sub parallel)) with amplitudes on the order of 100 mV/m and display nonlinear characteristics that suggest a possible net E(sub parallel). These waves are observed within the ion diffusion region and adjacent to (within several electron skin depths) the electron diffusion region. They are in or near the magnetosphere side current layer. Simulation results support that the strong electrostatic linear and nonlinear wave activities appear to be driven by a two stream instability, which is a consequence of mixing cold (less than 10eV) plasma in the magnetosphere with warm (approximately 100eV) plasma from the magnetosheath on a freshly reconnected magnetic field line. The frequent observation of these waves suggests that cold plasma is often present near the magnetopause.

PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

1994-01-01

A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
Electronic Neural Networks

NASA Technical Reports Server (NTRS)

Thakoor, Anil

1990-01-01

Viewgraphs on electronic neural networks for space station are presented. Topics covered include: electronic neural networks; electronic implementations; VLSI/thin film hybrid hardware for neurocomputing; computations with analog parallel processing; features of neuroprocessors; applications of neuroprocessors; neural network hardware for terrain trafficability determination; a dedicated processor for path planning; neural network system interface; neural network for robotic control; error backpropagation algorithm for learning; resource allocation matrix; global optimization neuroprocessor; and electrically programmable read only thin-film synaptic array.
Magnetospheric Multiscale Satellites Observations of Parallel Electric Fields Associated with Magnetic Reconnection

NASA Technical Reports Server (NTRS)

Ergun, R. E.; Goodrich, K. A.; Wilder, F. D.; Holmes, J. C.; Stawarz, J. E.; Eriksson, S.; Sturner, A. P.; Malaspina, D. M.; Usanova, M. E.; Torbert, R. B.;

2016-01-01

We report observations from the Magnetospheric Multiscale satellites of parallel electric fields (E (sub parallel)) associated with magnetic reconnection in the subsolar region of the Earth's magnetopause. E (sub parallel) events near the electron diffusion region have amplitudes on the order of 100 millivolts per meter, which are significantly larger than those predicted for an antiparallel reconnection electric field. This Letter addresses specific types of E (sub parallel) events, which appear as large-amplitude, near unipolar spikes that are associated with tangled, reconnected magnetic fields. These E (sub parallel) events are primarily in or near a current layer near the separatrix and are interpreted to be double layers that may be responsible for secondary reconnection in tangled magnetic fields or flux ropes. These results are telling of the three-dimensional nature of magnetopause reconnection and indicate that magnetopause reconnection may be often patchy and/or drive turbulence along the separatrix that results in flux ropes and/or tangled magnetic fields.

Parallelization of the FLAPW method and comparison with the PPW method

NASA Astrophysics Data System (ADS)

Canning, Andrew; Mannstadt, Wolfgang; Freeman, Arthur

2000-03-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. In the past the FLAPW method has been limited to systems of about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell running on up to 512 processors on a Cray T3E parallel supercomputer. Some results will also be presented on a comparison of the plane-wave pseudopotential method and the FLAPW method on large systems.
Relativistic effects in the energy loss of a fast charged particle moving parallel to a two-dimensional electron gas

NASA Astrophysics Data System (ADS)

Mišković, Zoran L.; Akbari, Kamran; Segui, Silvina; Gervasoni, Juana L.; Arista, Néstor R.

2018-05-01

We present a fully relativistic formulation for the energy loss rate of a charged particle moving parallel to a sheet containing two-dimensional electron gas, allowing that its in-plane polarization may be described by different longitudinal and transverse conductivities. We apply our formulation to the case of a doped graphene layer in the terahertz range of frequencies, where excitation of the Dirac plasmon polariton (DPP) in graphene plays a major role. By using the Drude model with zero damping we evaluate the energy loss rate due to excitation of the DPP, and show that the retardation effects are important when the incident particle speed and its distance from graphene both increase. Interestingly, the retarded energy loss rate obtained in this manner may be both larger and smaller than its non-retarded counterpart for different combinations of the particle speed and distance.

Architecture studies and system demonstrations for optical parallel processor for AI and NI

NASA Astrophysics Data System (ADS)

Lee, Sing H.

1988-03-01

In solving deterministic AI problems the data search for matching the arguments of a PROLOG expression causes serious bottleneck when implemented sequentially by electronic systems. To overcome this bottleneck we have developed the concepts for an optical expert system based on matrix-algebraic formulation, which will be suitable for parallel optical implementation. The optical AI system based on matrix-algebraic formation will offer distinct advantages for parallel search, adult learning, etc.
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.

PubMed

Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J

2015-07-05

We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

ERIC Educational Resources Information Center

von Davier, Matthias

2016-01-01

This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
The new landscape of parallel computer architecture

NASA Astrophysics Data System (ADS)

Shalf, John

2007-07-01

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

NASA Astrophysics Data System (ADS)

Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

1997-12-01

Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
Mixing the Solar Wind Proton and Electron Scales: Effects of Electron Temperature Anisotropy on the Oblique Proton Firehose Instability

NASA Technical Reports Server (NTRS)

Maneva, Y.; Lazar, M.; Vinas, A.; Poedts, S.

2016-01-01

The double adiabatic expansion of the nearly collisionless solar wind plasma creates conditions for the firehose instability to develop and efficiently prevent the further increase of the plasma temperature in the direction parallel to the interplanetary magnetic field. The conditions imposed by the firehose instability have been extensively studied using idealized approaches that ignore the mutual effects of electrons and protons. Recently, more realistic approaches have been proposed that take into account the interplay between electrons and protons,? unveiling new regimes of the parallel oscillatory modes. However, for oblique wave propagation the instability develops distinct branches that grow much faster and may therefore be more efficient than the parallel firehose instability in constraining the temperature anisotropy of the plasma particles. This paper reports for the first time on the effects of electron plasma properties on the oblique proton firehose (PFH) instability and provides a comprehensive vision of the entire unstable wave-vector spectrum, unifying the proton and the smaller electron scales. The plasma ß and temperature anisotropy regimes considered here are specific for the solar wind and magnetospheric conditions, and enable the electrons and protons to interact via the excited electromagnetic fluctuations. For the selected parameters, simultaneous electron and PFH instabilities can be observed with a dispersion spectrum of the electron firehose (EFH) extending toward the proton scales. Growth rates of the PFH instability are markedly boosted by the anisotropic electrons, especially in the oblique direction where the EFH growth rates are orders of magnitude higher.
The language parallel Pascal and other aspects of the massively parallel processor

NASA Technical Reports Server (NTRS)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
The computer-aided parallel external fixator for complex lower limb deformity correction.

PubMed

Wei, Mengting; Chen, Jianwen; Guo, Yue; Sun, Hao

2017-12-01

Since parameters of the parallel external fixator are difficult to measure and calculate in real applications, this study developed computer software that can help the doctor measure parameters using digital technology and generate an electronic prescription for deformity correction. According to Paley's deformity measurement method, we provided digital measurement techniques. In addition, we proposed an deformity correction algorithm to calculate the elongations of the six struts and developed a electronic prescription software. At the same time, a three-dimensional simulation of the parallel external fixator and deformed fragment was made using virtual reality modeling language technology. From 2013 to 2015, fifteen patients with complex lower limb deformity were treated with parallel external fixators and the self-developed computer software. All of the cases had unilateral limb deformity. The deformities were caused by old osteomyelitis in nine cases and traumatic sequelae in six cases. A doctor measured the related angulation, displacement and rotation on postoperative radiographs using the digital measurement techniques. Measurement data were input into the electronic prescription software to calculate the daily adjustment elongations of the struts. Daily strut adjustments were conducted according to the data calculated. The frame was removed when expected results were achieved. Patients lived independently during the adjustment. The mean follow-up was 15 months (range 10-22 months). The duration of frame fixation from the time of application to the time of removal averaged 8.4 months (range 2.5-13.1 months). All patients were satisfied with the corrected limb alignment. No cases of wound infections or complications occurred. Using the computer-aided parallel external fixator for the correction of lower limb deformities can achieve satisfactory outcomes. The correction process can be simplified and is precise and digitized, which will greatly improve the
Beyond 2D: Parallel Electric Fields and Dissipation in Guide Field Reconnectio

NASA Astrophysics Data System (ADS)

Wilder, F. D.; Ergun, R.; Ahmadi, N.; Goodrich, K.; Eriksson, S.; Shimoda, E.; Burch, J. L.; Phan, T.; Torbert, R. B.; Strangeway, R. J.; Giles, B. L.; Lindqvist, P. A.; Khotyaintsev, Y. V.

2017-12-01

In 2015, NASA launched the Magnetospheric Multiscale (MMS) mission to study phenomenon of magnetic reconnection down to the electron scale. Advantages of MMS include a 20s spin period and long axial booms, which together allow for measurement of 3-D electric fields with accuracy down to 1 mV/m. During the two dayside phases of the prime mission, MMS has observed multiple electron and ion diffusion region events at the Earth's subsolar and flank magnetopause, as well as in the magnetosheath, providing an option to study both symmetric and asymmetric reconnection at a variety of guide field strengths. We present a review of parallel electric fields observed by MMS during diffusion region events, and discuss their implications for simulations and laboratory observations of reconnection. We find that as the guide field increases, the dissipation in the diffusion region transitions from being due to currents and fields perpendicular to the background magnetic field, to being associated with parallel electric fields and currents. Additionally, the observed parallel electric fields are significantly larger than those predicted by simulations of reconnection under strong guide field conditions.
Implementation of highly parallel and large scale GW calculations within the OpenAtom software

NASA Astrophysics Data System (ADS)

Ismail-Beigi, Sohrab

The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.
Parallel integer sorting with medium and fine-scale parallelism

NASA Technical Reports Server (NTRS)

Dagum, Leonardo

1993-01-01

Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Parallel 3-D numerical simulation of dielectric barrier discharge plasma actuators

NASA Astrophysics Data System (ADS)

Houba, Tomas

Dielectric barrier discharge plasma actuators have shown promise in a range of applications including flow control, sterilization and ozone generation. Developing numerical models of plasma actuators is of great importance, because a high-fidelity parallel numerical model allows new design configurations to be tested rapidly. Additionally, it provides a better understanding of the plasma actuator physics which is useful for further innovation. The physics of plasma actuators is studied numerically. A loosely coupled approach is utilized for the coupling of the plasma to the neutral fluid. The state of the art in numerical plasma modeling is advanced by the development of a parallel, three-dimensional, first-principles model with detailed air chemistry. The model incorporates 7 charged species and 18 reactions, along with a solution of the electron energy equation. To the author's knowledge, a parallel three-dimensional model of a gas discharge with a detailed air chemistry model and the solution of electron energy is unique. Three representative geometries are studied using the gas discharge model. The discharge of gas between two parallel electrodes is used to validate the air chemistry model developed for the gas discharge code. The gas discharge model is then applied to the discharge produced by placing a dc powered wire and grounded plate electrodes in a channel. Finally, a three-dimensional simulation of gas discharge produced by electrodes placed inside a riblet is carried out. The body force calculated with the gas discharge model is loosely coupled with a fluid model to predict the induced flow inside the riblet.
Bilingual parallel programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; Overbeek, R.

1990-01-01

Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less
MIXING THE SOLAR WIND PROTON AND ELECTRON SCALES: EFFECTS OF ELECTRON TEMPERATURE ANISOTROPY ON THE OBLIQUE PROTON FIREHOSE INSTABILITY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maneva, Y.; Lazar, M.; Poedts, S.

2016-11-20

The double adiabatic expansion of the nearly collisionless solar wind plasma creates conditions for the firehose instability to develop and efficiently prevent the further increase of the plasma temperature in the direction parallel to the interplanetary magnetic field. The conditions imposed by the firehose instability have been extensively studied using idealized approaches that ignore the mutual effects of electrons and protons. Recently, more realistic approaches have been proposed that take into account the interplay between electrons and protons, unveiling new regimes of the parallel oscillatory modes. However, for oblique wave propagation the instability develops distinct branches that grow much fastermore » and may therefore be more efficient than the parallel firehose instability in constraining the temperature anisotropy of the plasma particles. This paper reports for the first time on the effects of electron plasma properties on the oblique proton firehose (PFH) instability and provides a comprehensive vision of the entire unstable wave-vector spectrum, unifying the proton and the smaller electron scales. The plasma β and temperature anisotropy regimes considered here are specific for the solar wind and magnetospheric conditions, and enable the electrons and protons to interact via the excited electromagnetic fluctuations. For the selected parameters, simultaneous electron and PFH instabilities can be observed with a dispersion spectrum of the electron firehose (EFH) extending toward the proton scales. Growth rates of the PFH instability are markedly boosted by the anisotropic electrons, especially in the oblique direction where the EFH growth rates are orders of magnitude higher.« less
Kinetic theory of turbulence for parallel propagation revisited: Low-to-intermediate frequency regime

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoon, Peter H., E-mail: yoonp@umd.edu; School of Space Research, Kyung Hee University, Yongin, Gyeonggi 446-701

2015-09-15

A previous paper [P. H. Yoon, “Kinetic theory of turbulence for parallel propagation revisited: Formal results,” Phys. Plasmas 22, 082309 (2015)] revisited the second-order nonlinear kinetic theory for turbulence propagating in directions parallel/anti-parallel to the ambient magnetic field, in which the original work according to Yoon and Fang [Phys. Plasmas 15, 122312 (2008)] was refined, following the paper by Gaelzer et al. [Phys. Plasmas 22, 032310 (2015)]. The main finding involved the dimensional correction pertaining to discrete-particle effects in Yoon and Fang's theory. However, the final result was presented in terms of formal linear and nonlinear susceptibility response functions. Inmore » the present paper, the formal equations are explicitly written down for the case of low-to-intermediate frequency regime by making use of approximate forms for the response functions. The resulting equations are sufficiently concrete so that they can readily be solved by numerical means or analyzed by theoretical means. The derived set of equations describe nonlinear interactions of quasi-parallel modes whose frequency range covers the Alfvén wave range to ion-cyclotron mode, but is sufficiently lower than the electron cyclotron mode. The application of the present formalism may range from the nonlinear evolution of whistler anisotropy instability in the high-beta regime, and the nonlinear interaction of electrons with whistler-range turbulence.« less
Scalable parallel communications

NASA Technical Reports Server (NTRS)

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

1992-01-01

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth
Parallel simulation today

NASA Technical Reports Server (NTRS)

Nicol, David; Fujimoto, Richard

1992-01-01

This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions

NASA Astrophysics Data System (ADS)

Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin

2017-01-01

We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Parallel compression/decompression-based datapath architecture for multibeam mask writers

NASA Astrophysics Data System (ADS)

Chaudhary, Narendra; Savari, Serap A.

2017-06-01

Multibeam electron beam systems will be used in the future for mask writing and for complimentary lithography. The major challenges of the multibeam systems are in meeting throughput requirements and in handling the large data volumes associated with writing grayscale data on the wafer. In terms of future communications and computational requirements Amdahl's Law suggests that a simple increase of computation power and parallelism may not be a sustainable solution. We propose a parallel data compression algorithm to exploit the sparsity of mask data and a grayscale video-like representation of data. To improve the communication and computational efficiency of these systems at the write time we propose an alternate datapath architecture partly motivated by multibeam direct write lithography and partly motivated by the circuit testing literature, where parallel decompression reduces clock cycles. We explain a deflection plate architecture inspired by NuFlare Technology's multibeam mask writing system and how our datapath architecture can be easily added to it to improve performance.
Parallel compression/decompression-based datapath architecture for multibeam mask writers

NASA Astrophysics Data System (ADS)

Chaudhary, Narendra; Savari, Serap A.

2017-10-01

Multibeam electron beam systems will be used in the future for mask writing and for complementary lithography. The major challenges of the multibeam systems are in meeting throughput requirements and in handling the large data volumes associated with writing grayscale data on the wafer. In terms of future communications and computational requirements, Amdahl's law suggests that a simple increase of computation power and parallelism may not be a sustainable solution. We propose a parallel data compression algorithm to exploit the sparsity of mask data and a grayscale video-like representation of data. To improve the communication and computational efficiency of these systems at the write time, we propose an alternate datapath architecture partly motivated by multibeam direct-write lithography and partly motivated by the circuit testing literature, where parallel decompression reduces clock cycles. We explain a deflection plate architecture inspired by NuFlare Technology's multibeam mask writing system and how our datapath architecture can be easily added to it to improve performance.
Energy Dependence of Electron-Scale Currents and Dissipation During Magnetopause Reconnection

NASA Astrophysics Data System (ADS)

Shuster, J. R.; Gershman, D. J.; Giles, B. L.; Dorelli, J.; Avanov, L. A.; Chen, L. J.; Wang, S.; Bessho, N.; Torbert, R. B.; Farrugia, C. J.; Argall, M. R.; Strangeway, R. J.; Schwartz, S. J.

2017-12-01

We investigate the electron-scale physics of reconnecting current structures observed at the magnetopause during Phase 1B of the Magnetospheric Multiscale (MMS) mission when the spacecraft separation was less than 10 km. Using single-spacecraft measurements of the current density vector Jplasma = en(vi - ve) enabled by the accuracy of the Fast Plasma Investigation (FPI) electron moments as demonstrated by Phan et al. [2016], we consider perpendicular (J⊥1 and J⊥2) and parallel (J//) currents and their corresponding kinetic electron signatures. These currents can correspond to a variety of structures in the electron velocity distribution functions measured by FPI, including perpendicular and parallel crescents like those first reported by Burch et al. [2016], parallel electron beams, counter-streaming electron populations, or sometimes simply a bulk velocity shift. By integrating the distribution function over only its angular dimensions, we compute energy-dependent 'partial' moments and employ them to characterize the energy dependence of velocities, currents, and dissipation associated with magnetic reconnection diffusion regions caught by MMS. Our technique aids in visualizing and elucidating the plasma energization mechanisms that operate during collisionless reconnection.
Turbulence-driven anisotropic electron tail generation during magnetic reconnection

NASA Astrophysics Data System (ADS)

DuBois, A. M.; Scherer, A.; Almagri, A. F.; Anderson, J. K.; Pandya, M. D.; Sarff, J. S.

2018-05-01

Magnetic reconnection (MR) plays an important role in particle transport, energization, and acceleration in space, astrophysical, and laboratory plasmas. In the Madison Symmetric Torus reversed field pinch, discrete MR events release large amounts of energy from the equilibrium magnetic field, a fraction of which is transferred to electrons and ions. Previous experiments revealed an anisotropic electron tail that favors the perpendicular direction and is symmetric in the parallel. New profile measurements of x-ray emission show that the tail distribution is localized near the magnetic axis, consistent modeling of the bremsstrahlung emission. The tail appears first near the magnetic axis and then spreads radially, and the dynamics in the anisotropy and diffusion are discussed. The data presented imply that the electron tail formation likely results from a turbulent wave-particle interaction and provides evidence that high energy electrons are escaping the core-localized region through pitch angle scattering into the parallel direction, followed by stochastic parallel transport to the plasma edge. New measurements also show a strong correlation between high energy x-ray measurements and tearing mode dynamics, suggesting that the coupling between core and edge tearing modes is essential for energetic electron tail formation.
The calibration of plane parallel ionisation chambers for the measurement of absorbed dose in electron beams of low to medium energies. Part 2: The PTW/MARKUS chamber.

PubMed

Cross, P; Freeman, N

1997-06-01

The purpose of Part 2 study of calibration methods for plane parallel ionisation chambers was to determine the feasibility of using beams of calibration of the MARKUS chamber other than the standard AAPM TG39 reference beams of 60Co and a high energy electron beam (E0 > or = 15 MeV). A previous study of the NACP chamber had demonstrated an acceptable level of accuracy with corresponding spread of -0.5% to +0.8% for its calibration in non-standard situations (medium to low energy electron and photon beams). For non-standard situations the spread in NDMARKUS values was found to be +/-2.5%. The results suggest that user calibrations of the MARKUS chamber in non-standard situations are associated with more uncertainties than is the case with the NACP chamber.
Solution-processed parallel tandem polymer solar cells using silver nanowires as intermediate electrode.

PubMed

Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J

2014-12-23

Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.
Bit-parallel arithmetic in a massively-parallel associative processor

NASA Technical Reports Server (NTRS)

Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

1992-01-01

A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Progress on complementary patterning using plasmon-excited electron beamlets (Conference Presentation)

NASA Astrophysics Data System (ADS)

Du, Zhidong; Chen, Chen; Pan, Liang

2017-04-01

Maskless lithography using parallel electron beamlets is a promising solution for next generation scalable maskless nanolithography. Researchers have focused on this goal but have been unable to find a robust technology to generate and control high-quality electron beamlets with satisfactory brightness and uniformity. In this work, we will aim to address this challenge by developing a revolutionary surface-plasmon-enhanced-photoemission (SPEP) technology to generate massively-parallel electron beamlets for maskless nanolithography. The new technology is built upon our recent breakthroughs in plasmonic lenses, which will be used to excite and focus surface plasmons to generate massively-parallel electron beamlets through photoemission. Specifically, the proposed SPEP device consists of an array of plasmonic lens and electrostatic micro-lens pairs, each pair independently producing an electron beamlet. During lithography, a spatial optical modulator will dynamically project light onto individual plasmonic lenses to control the switching and brightness of electron beamlets. The photons incident onto each plasmonic lens are concentrated into a diffraction-unlimited spot as localized surface plasmons to excite the local electrons to near their vacuum levels. Meanwhile, the electrostatic micro-lens extracts the excited electrons to form a focused beamlet, which can be rastered across a wafer to perform lithography. Studies showed that surface plasmons can enhance the photoemission by orders of magnitudes. This SPEP technology can scale up the maskless lithography process to write at wafers per hour. In this talk, we will report the mechanism of the strong electron-photon couplings and the locally enhanced photoexcitation, design of a SPEP device, overview of our proof-of-concept study, and demonstrated parallel lithography of 20-50 nm features.
MMS observations and hybrid simulations of rippled and reforming quasi-parallel shocks

NASA Astrophysics Data System (ADS)

Gingell, I.; Schwartz, S. J.; Burgess, D.; Johlander, A.; Russell, C. T.; Burch, J. L.; Ergun, R.; Fuselier, S. A.; Gershman, D. J.; Giles, B. L.; Goodrich, K.; Khotyaintsev, Y. V.; Lavraud, B.; Lindqvist, P. A.; Strangeway, R. J.; Trattner, K. J.; Torbert, R. B.; Wilder, F. D.

2017-12-01

Surface ripples, i.e. deviations in the nominal local shock orientation, are expected to propagate in the ramp and overshoot of collisionless shocks. These ripples have typically been associated with observations and simulations of quasi-perpendicular shocks. We present observations of a crossing of Earth's marginally quasi-parallel (θBn ˜ 45°) bow shock by the MMS spacecraft on 2015-11-27 06:01:44 UTC, for which we identify signatures consistent with a propagating surface ripple. In order to demonstrate the differences between ripples at quasi-perpendicular and quasi-parallel shocks, we also present two-dimensional hybrid simulations over a range of shock normal angles θBn under the observed solar wind conditions. We show that in the quasi-parallel cases surface ripples are transient phenomena modulated by the cyclic reformation of the shock front. These ripples develop faster than an ion gyroperiod and only during the period of the reformation cycle when a newly developed shock ramp is unaffected by turbulence in the foot. We conclude that the change of properties of the surface ripple observed by MMS while crossing Earth's quasi-parallel bow shock are consistent with the influence of cyclic reformation on shock structure. Given that both surface ripples and cyclic reformation are expected to affect the acceleration of electrons within the shock, the interaction of these phenomena and any other sources of shock non-stationary are important for models of particle acceleration. We therefore discuss signatures of electron heating and acceleration in several rippled shocks observed by MMS.
Are supernova remnants quasi-parallel or quasi-perpendicular accelerators

NASA Technical Reports Server (NTRS)

Spangler, S. R.; Leckband, J. A.; Cairns, I. H.

1989-01-01

Observations of shock waves in the solar system which show a pronounced difference in the plasma wave and particle environment depending on whether the shock is propagating along or perpendicular to the interplanetary magnetic field are discussed. Theories for particle acceleration developed for quasi-parallel and quasi-perpendicular shocks, when extended to the interstellar medium suggest that the relativistic electrons in radio supernova remnants are accelerated by either the Q parallel or Q perpendicular mechanisms. A model for the galactic magnetic field and published maps of supernova remnants were used to search for a dependence of structure on the angle Phi. Results show no tendency for the remnants as a whole to favor the relationship expected for either mechanism, although individual sources resemble model remnants of one or the other acceleration process.
A scalable parallel black oil simulator on distributed memory parallel computers

NASA Astrophysics Data System (ADS)

Wang, Kun; Liu, Hui; Chen, Zhangxin

2015-11-01

This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Scaling of Electron Heating During Magnetic Reconnection

NASA Astrophysics Data System (ADS)

Ohia, O.; Le, A.; Daughton, W. S.; Egedal, J.

2016-12-01

While magnetic reconnection plays a major role in accelerating and heating magnetospheric plasma, it remains poorly understood how the level of particle energization depends on the plasma conditions. Meanwhile, a recent survey of THEMIS magnetopause reconnection observations [Phan et al. GRL 2013] and a numerical study [Shay et al. PoP 2014] found empirically that the electron heating scales with the square of the upstream Alfven speed. Equivalently for weak guide fields, the fractional electron temperature increase is inversely proportional to the upstream electron beta (ratio of electron to magnetic pressure). We present models for symmetric reconnection with moderate [Ohia et al., GRL 2015] or zero guide field that predict the electron bulk heating. In the models, adiabatically trapped electrons gain energy from parallel electric fields in the inflowing region. For purely anti-parallel reconnection, meandering electrons receive additional energy from the reconnection electric field. The predicted scalings are in quantitative agreement with fluid and kinetic simulations, as well as spacecraft observations. Using kinetic simulations, we extend this work to explore how the layer dynamics and electron bulk heating vary as functions of the magnetic shear and plasma and magnetic pressure asymmetry across the reconnection layer. These results are pertinent to recent Magnetospheric Multiscale (MMS) Mission measurements of electron dynamics during dayside magnetopause reconnection.
Highly parallel implementation of non-adiabatic Ehrenfest molecular dynamics

NASA Astrophysics Data System (ADS)

Kanai, Yosuke; Schleife, Andre; Draeger, Erik; Anisimov, Victor; Correa, Alfredo

2014-03-01

While the adiabatic Born-Oppenheimer approximation tremendously lowers computational effort, many questions in modern physics, chemistry, and materials science require an explicit description of coupled non-adiabatic electron-ion dynamics. Electronic stopping, i.e. the energy transfer of a fast projectile atom to the electronic system of the target material, is a notorious example. We recently implemented real-time time-dependent density functional theory based on the plane-wave pseudopotential formalism in the Qbox/qb@ll codes. We demonstrate that explicit integration using a fourth-order Runge-Kutta scheme is very suitable for modern highly parallelized supercomputers. Applying the new implementation to systems with hundreds of atoms and thousands of electrons, we achieved excellent performance and scalability on a large number of nodes both on the BlueGene based ``Sequoia'' system at LLNL as well as the Cray architecture of ``Blue Waters'' at NCSA. As an example, we discuss our work on computing the electronic stopping power of aluminum and gold for hydrogen projectiles, showing an excellent agreement with experiment. These first-principles calculations allow us to gain important insight into the the fundamental physics of electronic stopping.
Parallel algorithms for mapping pipelined and parallel computations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition

NASA Astrophysics Data System (ADS)

Jiang, Yuning; Kang, Jinfeng; Wang, Xinan

2017-03-01

Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Parallel computation with molecular-motor-propelled agents in nanofabricated networks.

PubMed

Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V

2016-03-08

The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.

Parallel computing works

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
High-performance parallel interface to synchronous optical network gateway

DOEpatents

St. John, Wallace B.; DuBois, David H.

1996-01-01

A system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway.
High-performance parallel interface to synchronous optical network gateway

DOEpatents

St. John, W.B.; DuBois, D.H.

1996-12-03

Disclosed is a system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway. 7 figs.
Characterization of reticulated vitreous carbon foam using a frisch-grid parallel-plate ionization chamber

NASA Astrophysics Data System (ADS)

Edwards, Nathaniel S.; Conley, Jerrod C.; Reichenberger, Michael A.; Nelson, Kyle A.; Tiner, Christopher N.; Hinson, Niklas J.; Ugorowski, Philip B.; Fronk, Ryan G.; McGregor, Douglas S.

2018-06-01

The propagation of electrons through several linear pore densities of reticulated vitreous carbon (RVC) foam was studied using a Frisch-grid parallel-plate ionization chamber pressurized to 1 psig of P-10 proportional gas. The operating voltages of the electrodes contained within the Frisch-grid parallel-plate ionization chamber were defined by measuring counting curves using a collimated 241Am alpha-particle source with and without a Frisch grid. RVC foam samples with linear pore densities of 5, 10, 20, 30, 45, 80, and 100 pores per linear inch were separately positioned between the cathode and anode. Pulse-height spectra and count rates from a collimated 241Am alpha-particle source positioned between the cathode and each RVC foam sample were measured and compared to a measurement without an RVC foam sample. The Frisch grid was positioned in between the RVC foam sample and the anode. The measured pulse-height spectra were indiscernible from background and resulted in negligible net count rates for all RVC foam samples. The Frisch grid parallel-plate ionization chamber measurement results indicate that electrons do not traverse the bulk of RVC foam and consequently do not produce a pulse.
Modelling and simulation of parallel triangular triple quantum dots (TTQD) by using SIMON 2.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fathany, Maulana Yusuf, E-mail: myfathany@gmail.com; Fuada, Syifaul, E-mail: fsyifaul@gmail.com; Lawu, Braham Lawas, E-mail: bram-labs@rocketmail.com

2016-04-19

This research presents analysis of modeling on Parallel Triple Quantum Dots (TQD) by using SIMON (SIMulation Of Nano-structures). Single Electron Transistor (SET) is used as the basic concept of modeling. We design the structure of Parallel TQD by metal material with triangular geometry model, it is called by Triangular Triple Quantum Dots (TTQD). We simulate it with several scenarios using different parameters; such as different value of capacitance, various gate voltage, and different thermal condition.
Parallelized implicit propagators for the finite-difference Schrödinger equation

NASA Astrophysics Data System (ADS)

Parker, Jonathan; Taylor, K. T.

1995-08-01

We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
Template based parallel checkpointing in a massively parallel computer system

DOEpatents

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Plasma and energetic particle structure of a collisionless quasi-parallel shock

NASA Technical Reports Server (NTRS)

Kennel, C. F.; Scarf, F. L.; Coroniti, F. V.; Russell, C. T.; Smith, E. J.; Wenzel, K. P.; Reinhard, R.; Sanderson, T. R.; Feldman, W. C.; Parks, G. K.

1983-01-01

The quasi-parallel interplanetary shock of November 11-12, 1978 from both the collisionless shock and energetic particle points of view were studied using measurements of the interplanetary magnetic and electric fields, solar wind electrons, plasma and MHD waves, and intermediate and high energy ions obtained on ISEE-1, -2, and -3. The interplanetary environment through which the shock was propagating when it encountered the three spacecraft was characterized; the observations of this shock are documented and current theories of quasi-parallel shock structure and particle acceleration are tested. These observations tend to confirm present self consistent theories of first order Fermi acceleration by shocks and of collisionless shock dissipation involving firehouse instability.
Research in parallel computing

NASA Technical Reports Server (NTRS)

Ortega, James M.; Henderson, Charles

1994-01-01

This report summarizes work on parallel computations for NASA Grant NAG-1-1529 for the period 1 Jan. - 30 June 1994. Short summaries on highly parallel preconditioners, target-specific parallel reductions, and simulation of delta-cache protocols are provided.
Distinct Particle Morphologies Revealed through Comparative Parallel Analyses of Retrovirus-Like Particles.

PubMed

Martin, Jessica L; Cao, Sheng; Maldonado, Jose O; Zhang, Wei; Mansky, Louis M

2016-09-15

The Gag protein is the main retroviral structural protein, and its expression alone is usually sufficient for production of virus-like particles (VLPs). In this study, we sought to investigate-in parallel comparative analyses-Gag cellular distribution, VLP size, and basic morphological features using Gag expression constructs (Gag or Gag-YFP, where YFP is yellow fluorescent protein) created from all representative retroviral genera: Alpharetrovirus, Betaretrovirus, Deltaretrovirus, Epsilonretrovirus, Gammaretrovirus, Lentivirus, and Spumavirus. We analyzed Gag cellular distribution by confocal microscopy, VLP budding by thin-section transmission electron microscopy (TEM), and general morphological features of the VLPs by cryogenic transmission electron microscopy (cryo-TEM). Punctate Gag was observed near the plasma membrane for all Gag constructs tested except for the representative Beta- and Epsilonretrovirus Gag proteins. This is the first report of Epsilonretrovirus Gag localizing to the nucleus of HeLa cells. While VLPs were not produced by the representative Beta- and Epsilonretrovirus Gag proteins, the other Gag proteins produced VLPs as confirmed by TEM, and morphological differences were observed by cryo-TEM. In particular, we observed Deltaretrovirus-like particles with flat regions of electron density that did not follow viral membrane curvature, Lentivirus-like particles with a narrow range and consistent electron density, suggesting a tightly packed Gag lattice, and Spumavirus-like particles with large envelope protein spikes and no visible electron density associated with a Gag lattice. Taken together, these parallel comparative analyses demonstrate for the first time the distinct morphological features that exist among retrovirus-like particles. Investigation of these differences will provide greater insights into the retroviral assembly pathway. Comparative analysis among retroviruses has been critically important in enhancing our understanding of
An in situ Comparison of Electron Acceleration at Collisionless Shocks under Differing Upstream Magnetic Field Orientations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Masters, A.; Dougherty, M. K.; Sulaiman, A. H.

A leading explanation for the origin of Galactic cosmic rays is acceleration at high-Mach number shock waves in the collisionless plasma surrounding young supernova remnants. Evidence for this is provided by multi-wavelength non-thermal emission thought to be associated with ultrarelativistic electrons at these shocks. However, the dependence of the electron acceleration process on the orientation of the upstream magnetic field with respect to the local normal to the shock front (quasi-parallel/quasi-perpendicular) is debated. Cassini spacecraft observations at Saturn’s bow shock have revealed examples of electron acceleration under quasi-perpendicular conditions, and the first in situ evidence of electron acceleration at amore » quasi-parallel shock. Here we use Cassini data to make the first comparison between energy spectra of locally accelerated electrons under these differing upstream magnetic field regimes. We present data taken during a quasi-perpendicular shock crossing on 2008 March 8 and during a quasi-parallel shock crossing on 2007 February 3, highlighting that both were associated with electron acceleration to at least MeV energies. The magnetic signature of the quasi-perpendicular crossing has a relatively sharp upstream–downstream transition, and energetic electrons were detected close to the transition and immediately downstream. The magnetic transition at the quasi-parallel crossing is less clear, energetic electrons were encountered upstream and downstream, and the electron energy spectrum is harder above ∼100 keV. We discuss whether the acceleration is consistent with diffusive shock acceleration theory in each case, and suggest that the quasi-parallel spectral break is due to an energy-dependent interaction between the electrons and short, large-amplitude magnetic structures.« less
Collisionless dissipation processes in quasi-parallel shocks. [in solar wind

NASA Technical Reports Server (NTRS)

Quest, K. B.; Forslund, D. W.; Brackbill, J. U.; Lee, K.

1983-01-01

The evolution of collisionless, quasi-parallel shocks (the angle between the shock normal and the upstream magnetic field being less than 45 deg) is examined using two dimensional particle simulations. Reflected ions upstream from the shock are observed with average guiding center velocity and gyrational energy which agree well with the prediction of simple specular reflection. Strong ion heating through the shock ramp is apparently caused by large amplitude whistler turbulence. A flux of suprathermal electrons is also the magnetic field direction. Much stronger ion heating occurs in the shock than electron heating. The relevance of this work to the earth's bow shock is discussed.
Electron trapping in rad-hard RCA IC's irradiated with electrons and gamma rays

NASA Technical Reports Server (NTRS)

Danchenko, V.; Brashears, S. S.; Fang, P. H.

1984-01-01

Enhanced electron trapping has been observed in n-channels of rad-hard CMOS devices due to electron and gamma-ray irradiation. Room-temperature annealing results in a positive shift in the threshold potential far beyond its initial value. The slope of the annealing curve immediately after irradiation was found to depend strongly on the gate bias applied during irradiation. Some dependence was also observed on the electron dose rate. No clear dependence on energy and shielding over a delidded device was observed. The threshold shift is probably due to electron trapping at the radiation-induced interface states and tunneling of electrons through the oxide-silicon energy barrier to fill the radiation-induced electron traps. A mathematical analysis, based on two parallel annealing kinetics, hole annealing and electron trapping, is applied to the data for various electron dose rates.
Parallel design patterns for a low-power, software-defined compressed video encoder

NASA Astrophysics Data System (ADS)

Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar

2011-06-01

Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
Re-forming supercritical quasi-parallel shocks. I - One- and two-dimensional simulations

NASA Technical Reports Server (NTRS)

Thomas, V. A.; Winske, D.; Omidi, N.

1990-01-01

The process of reforming supercritical quasi-parallel shocks is investigated using one-dimensional and two-dimensional hybrid (particle ion, massless fluid electron) simulations both of shocks and of simpler two-stream interactions. It is found that the supercritical quasi-parallel shock is not steady. Instread of a well-defined shock ramp between upstream and downstream states that remains at a fixed position in the flow, the ramp periodically steepens, broadens, and then reforms upstream of its former position. It is concluded that the wave generation process is localized at the shock ramp and that the reformation process proceeds in the absence of upstream perturbations intersecting the shock.
Experimental verification of the role of electron pressure in fast magnetic reconnection with a guide field

DOE PAGES

Fox, W.; Sciortino, F.; v. Stechow, A.; ...

2017-03-21

We report detailed laboratory observations of the structure of a reconnection current sheet in a two-fluid plasma regime with a guide magnetic field. We observe and quantitatively analyze the quadrupolar electron pressure variation in the ion-diffusion region, as originally predicted by extended magnetohydrodynamics simulations. The projection of the electron pressure gradient parallel to the magnetic field contributes significantly to balancing the parallel electric field, and the resulting cross-field electron jets in the reconnection layer are diamagnetic in origin. Furthermore, these results demonstrate how parallel and perpendicular force balance are coupled in guide field reconnection and confirm basic theoretical models ofmore » the importance of electron pressure gradients for obtaining fast magnetic reconnection.« less
The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
Drift waves, intense parallel electric fields, and turbulence associated with asymmetric magnetic reconnection at the magnetopause

NASA Astrophysics Data System (ADS)

Ergun, R. E.; Chen, L.-J.; Wilder, F. D.; Ahmadi, N.; Eriksson, S.; Usanova, M. E.; Goodrich, K. A.; Holmes, J. C.; Sturner, A. P.; Malaspina, D. M.; Newman, D. L.; Torbert, R. B.; Argall, M. R.; Lindqvist, P.-A.; Burch, J. L.; Webster, J. M.; Drake, J. F.; Price, L.; Cassak, P. A.; Swisdak, M.; Shay, M. A.; Graham, D. B.; Strangeway, R. J.; Russell, C. T.; Giles, B. L.; Dorelli, J. C.; Gershman, D.; Avanov, L.; Hesse, M.; Lavraud, B.; Le Contel, O.; Retino, A.; Phan, T. D.; Goldman, M. V.; Stawarz, J. E.; Schwartz, S. J.; Eastwood, J. P.; Hwang, K.-J.; Nakamura, R.; Wang, S.

2017-04-01

Observations of magnetic reconnection at Earth's magnetopause often display asymmetric structures that are accompanied by strong magnetic field (B) fluctuations and large-amplitude parallel electric fields (E||). The B turbulence is most intense at frequencies above the ion cyclotron frequency and below the lower hybrid frequency. The B fluctuations are consistent with a thin, oscillating current sheet that is corrugated along the electron flow direction (along the X line), which is a type of electromagnetic drift wave. Near the X line, electron flow is primarily due to a Hall electric field, which diverts ion flow in asymmetric reconnection and accompanies the instability. Importantly, the drift waves appear to drive strong parallel currents which, in turn, generate large-amplitude ( 100 mV/m) E|| in the form of nonlinear waves and structures. These observations suggest that turbulence may be common in asymmetric reconnection, penetrate into the electron diffusion region, and possibly influence the magnetic reconnection process.
Quasi-parallel whistler mode waves observed by THEMIS during near-earth dipolarizations

NASA Astrophysics Data System (ADS)

Le Contel, O.; Roux, A.; Jacquey, C.; Robert, P.; Berthomier, M.; Chust, T.; Grison, B.; Angelopoulos, V.; Sibeck, D.; Chaston, C. C.; Cully, C. M.; Ergun, B.; Glassmeier, K.-H.; Auster, U.; McFadden, J.; Carlson, C.; Larson, D.; Bonnell, J. W.; Mende, S.; Russell, C. T.; Donovan, E.; Mann, I.; Singer, H.

2009-06-01

We report on quasi-parallel whistler emissions detected by the near-earth satellites of the THEMIS mission before, during, and after local dipolarization. These emissions are associated with an electron temperature anisotropy α=T⊥e/T||e>1 consistent with the linear theory of whistler mode anisotropy instability. When the whistler mode emissions are observed the measured electron anisotropy varies inversely with β||e (the ratio of the electron parallel pressure to the magnetic pressure) as predicted by Gary and Wang (1996). Narrow band whistler emissions correspond to the small α existing before dipolarization whereas the broad band emissions correspond to large α observed during and after dipolarization. The energy in the whistler mode is leaving the current sheet and is propagating along the background magnetic field, towards the Earth. A simple time-independent description based on the Liouville's theorem indicates that the electron temperature anisotropy decreases with the distance along the magnetic field from the equator. Once this variation of α is taken into account, the linear theory predicts an equatorial origin for the whistler mode. The linear theory is also consistent with the observed bandwidth of wave emissions. Yet, the anisotropy required to be fully consistent with the observations is somewhat larger than the measured one. Although the discrepancy remains within the instrumental error bars, this could be due to time-dependent effects which have been neglected. The possible role of the whistler waves in the substorm process is discussed.
Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena

NASA Astrophysics Data System (ADS)

Pankratius, V.; Gowanlock, M.; Blair, D. M.

2015-12-01

Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).

Parallel digital forensics infrastructure.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, Lorie M.; Duggan, David Patrick

2009-10-01

This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexicomore » Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.« less
Parallel processing and expert systems

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Lau, Sonie

1991-01-01

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Electron heating in a Monte Carlo model of a high Mach number, supercritical, collisionless shock

NASA Technical Reports Server (NTRS)

Ellison, Donald C.; Jones, Frank C.

1987-01-01

Preliminary work in the investigation of electron injection and acceleration at parallel shocks is presented. A simple model of electron heating that is derived from a unified shock model which includes the effects of an electrostatic potential jump is described. The unified shock model provides a kinetic description of the injection and acceleration of ions and a fluid description of electron heating at high Mach number, supercritical, and parallel shocks.
Computer-Aided Parallelizer and Optimizer

NASA Technical Reports Server (NTRS)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-11-12

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Direct determination of k Q factors for cylindrical and plane-parallel ionization chambers in high-energy electron beams from 6 MeV to 20 MeV.

PubMed

Krauss, A; Kapsch, R-P

2018-02-06

For the ionometric determination of the absorbed dose to water, D w , in high-energy electron beams from a clinical accelerator, beam quality dependent correction factors, k Q , are required. By using a water calorimeter, these factors can be determined experimentally and potentially with lower standard uncertainties than those of the calculated k Q factors, which are tabulated in various dosimetry protocols. However, one of the challenges of water calorimetry in electron beams is the small measurement depths in water, together with the steep dose gradients present especially at lower energies. In this investigation, water calorimetry was implemented in electron beams to determine k Q factors for different types of cylindrical and plane-parallel ionization chambers (NE2561, NE2571, FC65-G, TM34001) in 10 cm × 10 cm electron beams from 6 MeV to 20 MeV (corresponding beam quality index R 50 ranging from 1.9 cm to 7.5 cm). The measurements were carried out using the linear accelerator facility of the Physikalisch-Technische Bundesanstalt. Relative standard uncertainties for the k Q factors between 0.50% for the 20 MeV beam and 0.75% for the 6 MeV beam were achieved. For electron energies above 8 MeV, general agreement was found between the relative electron energy dependencies of the k Q factors measured and those derived from the AAPM TG-51 protocol and recent Monte Carlo-based studies, as well as those from other experimental investigations. However, towards lower energies, discrepancies of up to 2.0% occurred for the k Q factors of the TM34001 and the NE2571 chamber.
Direct determination of k Q factors for cylindrical and plane-parallel ionization chambers in high-energy electron beams from 6 MeV to 20 MeV

NASA Astrophysics Data System (ADS)

Krauss, A.; Kapsch, R.-P.

2018-02-01

For the ionometric determination of the absorbed dose to water, D w, in high-energy electron beams from a clinical accelerator, beam quality dependent correction factors, k Q, are required. By using a water calorimeter, these factors can be determined experimentally and potentially with lower standard uncertainties than those of the calculated k Q factors, which are tabulated in various dosimetry protocols. However, one of the challenges of water calorimetry in electron beams is the small measurement depths in water, together with the steep dose gradients present especially at lower energies. In this investigation, water calorimetry was implemented in electron beams to determine k Q factors for different types of cylindrical and plane-parallel ionization chambers (NE2561, NE2571, FC65-G, TM34001) in 10 cm × 10 cm electron beams from 6 MeV to 20 MeV (corresponding beam quality index R 50 ranging from 1.9 cm to 7.5 cm). The measurements were carried out using the linear accelerator facility of the Physikalisch-Technische Bundesanstalt. Relative standard uncertainties for the k Q factors between 0.50% for the 20 MeV beam and 0.75% for the 6 MeV beam were achieved. For electron energies above 8 MeV, general agreement was found between the relative electron energy dependencies of the k Q factors measured and those derived from the AAPM TG-51 protocol and recent Monte Carlo-based studies, as well as those from other experimental investigations. However, towards lower energies, discrepancies of up to 2.0% occurred for the k Q factors of the TM34001 and the NE2571 chamber.
MMS Observations of Parallel Electric Fields During a Quasi-Perpendicular Bow Shock Crossing

NASA Astrophysics Data System (ADS)

Goodrich, K.; Schwartz, S. J.; Ergun, R.; Wilder, F. D.; Holmes, J.; Burch, J. L.; Gershman, D. J.; Giles, B. L.; Khotyaintsev, Y. V.; Le Contel, O.; Lindqvist, P. A.; Strangeway, R. J.; Russell, C.; Torbert, R. B.

2016-12-01

Previous observations of the terrestrial bow shock have frequently shown large-amplitude fluctuations in the parallel electric field. These parallel electric fields are seen as both nonlinear solitary structures, such as double layers and electron phase-space holes, and short-wavelength waves, which can reach amplitudes greater than 100 mV/m. The Magnetospheric Multi-Scale (MMS) Mission has crossed the Earth's bow shock more than 200 times. The parallel electric field signatures observed in these crossings are seen in very discrete packets and evolve over time scales of less than a second, indicating the presence of a wealth of kinetic-scale activity. The high time resolution of the Fast Particle Instrument (FPI) available on MMS offers greater detail of the kinetic-scale physics that occur at bow shocks than ever before, allowing greater insight into the overall effect of these observed electric fields. We present a characterization of these parallel electric fields found in a single bow shock event and how it reflects the kinetic-scale activity that can occur at the terrestrial bow shock.
Parallel Algorithms and Patterns

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robey, Robert W.

2016-06-16

This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Application Portable Parallel Library

NASA Technical Reports Server (NTRS)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Parallel Logic Programming and Parallel Systems Software and Hardware

DTIC Science & Technology

1989-07-29

Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
Observations of ionospheric electron beams in the plasma sheet.

PubMed

Zheng, H; Fu, S Y; Zong, Q G; Pu, Z Y; Wang, Y F; Parks, G K

2012-11-16

Electrons streaming along the magnetic field direction are frequently observed in the plasma sheet of Earth's geomagnetic tail. The impact of these field-aligned electrons on the dynamics of the geomagnetic tail is however not well understood. Here we report the first detection of field-aligned electrons with fluxes increasing at ~1 keV forming a "cool" beam just prior to the dissipation of energy in the current sheet. These field-aligned beams at ~15 R(E) in the plasma sheet are nearly identical to those commonly observed at auroral altitudes, suggesting the beams are auroral electrons accelerated upward by electric fields parallel (E([parallel])) to the geomagnetic field. The density of the beams relative to the ambient electron density is δn(b)/n(e)~5-13% and the current carried by the beams is ~10(-8)-10(-7) A m(-2). These beams in high β plasmas with large density and temperature gradients appear to satisfy the Bohm criteria to initiate current driven instabilities.
Solar wind interaction with Venus and Mars in a parallel hybrid code

NASA Astrophysics Data System (ADS)

Jarvinen, Riku; Sandroos, Arto

2013-04-01

We discuss the development and applications of a new parallel hybrid simulation, where ions are treated as particles and electrons as a charge-neutralizing fluid, for the interaction between the solar wind and Venus and Mars. The new simulation code under construction is based on the algorithm of the sequential global planetary hybrid model developed at the Finnish Meteorological Institute (FMI) and on the Corsair parallel simulation platform also developed at the FMI. The FMI's sequential hybrid model has been used for studies of plasma interactions of several unmagnetized and weakly magnetized celestial bodies for more than a decade. Especially, the model has been used to interpret in situ particle and magnetic field observations from plasma environments of Mars, Venus and Titan. Further, Corsair is an open source MPI (Message Passing Interface) particle and mesh simulation platform, mainly aimed for simulations of diffusive shock acceleration in solar corona and interplanetary space, but which is now also being extended for global planetary hybrid simulations. In this presentation we discuss challenges and strategies of parallelizing a legacy simulation code as well as possible applications and prospects of a scalable parallel hybrid model for the solar wind interactions of Venus and Mars.
Electromagnetic cyclotron-loss-cone instability associated with weakly relativistic electrons

NASA Technical Reports Server (NTRS)

Wong, H. K.; Wu, C. S.; Ke, F. J.; Schneider, R. S.; Ziebell, L. F.

1982-01-01

The amplification of fast extraordinary mode waves at frequencies very close to the electron cyclotron frequency, due to the presence of a population of energetic electrons with a loss-cone type distribution, is studied. Low-energy background electrons are included in the analysis. Two types of loss-cone distribution functions are considered, and it is found that the maximum growth rates for both distribution functions are of the same order of magnitude. When the thermal effects of the energetic electrons are included in the dispersion equation, the real frequencies of the waves are lower than those obtained by using the cold plasma approximation. This effect tends to enhance the growth rate. An idealized case including a parallel electric field such that the distribution function of the trapped energetic electrons is modified is also considered. It is assumed that the parallel electric field can remove the low-energy background electrons away from the source region of radiation. Both these effects increase the growth rate.
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

PubMed

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2014-02-11

Data communications in a parallel active messaging interface ('PAMI') or a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution of a compute node, including specification of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications instruction, the instruction characterized by instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance witht the instruction type, the transfer data from the origin endpoin to the target endpoint.
Developing software to use parallel processing effectively. Final report, June-December 1987

DOE Office of Scientific and Technical Information (OSTI.GOV)

Center, J.

1988-10-01

This report describes the difficulties involved in writing efficient parallel programs and describes the hardware and software support currently available for generating software that utilizes processing effectively. Historically, the processing rate of single-processor computers has increased by one order of magnitude every five years. However, this pace is slowing since electronic circuitry is coming up against physical barriers. Unfortunately, the complexity of engineering and research problems continues to require ever more processing power (far in excess of the maximum estimated 3 Gflops achievable by single-processor computers). For this reason, parallel-processing architectures are receiving considerable interest, since they offer high performancemore » more cheaply than a single-processor supercomputer, such as the Cray.« less
An explanation for parallel electric field pulses observed over thunderstorms

NASA Astrophysics Data System (ADS)

Kelley, M. C.; Barnum, B. H.

2009-10-01

Every electric field instrument flown on sounding rockets over a thunderstorm has detected pulses of electric fields parallel to the Earth's magnetic field associated with every strike. This paper describes the ionospheric signatures found during a flight from Wallops Island, Virginia, on 2 September 1995. The electric field results in a drifting Maxwellian corresponding to energies up to 1 eV. The distribution function relaxes because of elastic and inelastic collisions, resulting in electron heating up to 4000-5000 K and potentially observable red line emissions and enhanced ISR electron temperatures. The field strength scales with the current in cloud-to-ground strikes and falls off as r -1 with distance. Pulses of both polarities are found, although most electric fields are downward, parallel to the magnetic field. The pulse may be the reaction of ambient plasma to a current pulse carried at the whistler packet's highest group velocity. The charge source required to produce the electric field is very likely electrons of a few keV traveling at the packet velocity. We conjecture that the current source is the divergence of the current flowing at mesospheric heights, the phenomenon called an elve. The whistler packet's effective radiated power is as high as 25 mW at ionospheric heights, comparable to some ionospheric heater transmissions. Comparing the Poynting flux at the base of the ionosphere with flux an equal distance away along the ground, some 30 db are lost in the mesosphere. Another 10 db are lost in the transition from free space to the whistler mode.
Representing and computing regular languages on massively parallel networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M.I.; O'Sullivan, J.A.; Boysam, B.

1991-01-01

This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochasticmore » diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.« less

Parallels in History.

ERIC Educational Resources Information Center

Mugleston, William F.

2000-01-01

Believes that by focusing on the recurrent situations and problems, or parallels, throughout history, students will understand the relevance of history to their own times and lives. Provides suggestions for parallels in history that may be introduced within lectures or as a means to class discussions. (CMK)
Cyclic behavior at quasi-parallel collisionless shocks

NASA Technical Reports Server (NTRS)

Burgess, D.

1989-01-01

Large scale one-dimensional hybrid simulations with resistive electrons have been carried out of a quasi-parallel high-Mach-number collisionless shock. The shock initially appears stable, but then exhibits cyclic behavior. For the magnetic field, the cycle consists of a period when the transition from upstream to downstream is steep and well defined, followed by a period when the shock transition is extended and perturbed. This cyclic shock solution results from upstream perturbations caused by backstreaming gyrating ions convecting into the shock. The cyclic reformation of a sharp shock transition can allow ions, at one time upstream because of reflection or leakage, to contribute to the shock thermalization.
Electron Currents and Heating in the Ion Diffusion Region of Asymmetric Reconnection

NASA Technical Reports Server (NTRS)

Graham, D. B.; Khotyaintsev, Yu. V.; Norgren, C.; Vaivads, A.; Andre, M.; Lindqvist, P. A.; Marklund, G. T.; Ergun, R. E.; Paterson, W. R.; Gershman, D. J.;

2016-01-01

In this letter the structure of the ion diffusion region of magnetic reconnection at Earths magnetopause is investigated using the Magnetospheric Multiscale (MMS) spacecraft. The ion diffusion region is characterized by a strong DC electric field, approximately equal to the Hall electric field, intense currents, and electron heating parallel to the background magnetic field. Current structures well below ion spatial scales are resolved, and the electron motion associated with lower hybrid drift waves is shown to contribute significantly to the total current density. The electron heating is shown to be consistent with large-scale parallel electric fields trapping and accelerating electrons, rather than wave-particle interactions. These results show that sub-ion scale processes occur in the ion diffusion region and are important for understanding electron heating and acceleration.

Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Integrated Task and Data Parallel Programming

NASA Technical Reports Server (NTRS)

Grimshaw, A. S.

1998-01-01

This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated
Tunable color parallel tandem organic light emitting devices with carbon nanotube and metallic sheet interlayers

NASA Astrophysics Data System (ADS)

Oliva, Jorge; Papadimitratos, Alexios; Desirena, Haggeo; De la Rosa, Elder; Zakhidov, Anvar A.

2015-11-01

Parallel tandem organic light emitting devices (OLEDs) were fabricated with transparent multiwall carbon nanotube sheets (MWCNT) and thin metal films (Al, Ag) as interlayers. In parallel monolithic tandem architecture, the MWCNT (or metallic films) interlayers are an active electrode which injects similar charges into subunits. In the case of parallel tandems with common anode (C.A.) of this study, holes are injected into top and bottom subunits from the common interlayer electrode; whereas in the configuration of common cathode (C.C.), electrons are injected into the top and bottom subunits. Both subunits of the tandem can thus be monolithically connected functionally in an active structure in which each subunit can be electrically addressed separately. Our tandem OLEDs have a polymer as emitter in the bottom subunit and a small molecule emitter in the top subunit. We also compared the performance of the parallel tandem with that of in series and the additional advantages of the parallel architecture over the in-series were: tunable chromaticity, lower voltage operation, and higher brightness. Finally, we demonstrate that processing of the MWCNT sheets as a common anode in parallel tandems is an easy and low cost process, since their integration as electrodes in OLEDs is achieved by simple dry lamination process.
Data communications in a parallel active messaging interface of a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-29

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2014-10-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Sounding rocket study of auroral electron precipitation

DOE Office of Scientific and Technical Information (OSTI.GOV)

McFadden, J.P.

1985-01-01

Measurement of energetic electrons in the auroral zone have proved to be one of the most useful tools in investigating the phenomena of auroral arc formation. This dissertation presents a detailed analysis of the electron data from two sounding rocket campaigns and interprets the measurements in terms of existing auroral models. The Polar Cusp campaign consisted of a single rocket launched from Cape Parry, Canada into the afternoon auroral zone at 1:31:13 UT on January 21, 1982. The results include the measurement of a narrow, magnetic field aligned electron flux at the edge of an arc. This electron precipitation wasmore » found to have a remarkably constant 1.2 eV temperature perpendicular to the magnetic field over a 200 to 900 eV energy range. The payload also made simultaneous measurements of both energetic electrons and 3-MHz plasma waves in an auroral arc. Analysis has shown that the waves are propagating in the upper hybrid band and should be generated by a positive slope in the parallel electron distribution. A correlation was found between the 3-MHz waves and small positive slopes in the parallel electron distribution but experimental uncertainties in the electron measurement were large enough to influence the analysis. The BIDARCA campaign consisted of two sounding rockets launched from Poker Flat and Fort Yukon, Alaska at 9:09:00 UT and 9:10:40 UT on February 7, 1984.« less
Parallel Optical Random Access Memory (PORAM)

NASA Technical Reports Server (NTRS)

Alphonse, G. A.

1989-01-01

It is shown that the need to minimize component count, power and size, and to maximize packing density require a parallel optical random access memory to be designed in a two-level hierarchy: a modular level and an interconnect level. Three module designs are proposed, in the order of research and development requirements. The first uses state-of-the-art components, including individually addressed laser diode arrays, acousto-optic (AO) deflectors and magneto-optic (MO) storage medium, aimed at moderate size, moderate power, and high packing density. The next design level uses an electron-trapping (ET) medium to reduce optical power requirements. The third design uses a beam-steering grating surface emitter (GSE) array to reduce size further and minimize the number of components.
File concepts for parallel I/O

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1989-01-01

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
Electron Jet of Asymmetric Reconnection

NASA Technical Reports Server (NTRS)

Khotyaintsev, Yu. V.; Graham, D. B.; Norgren, C.; Eriksson, E.; Li, W.; Johlander, A.; Vaivads, A.; Andre, M.; Pritchett, P. L.; Retino, A.;

2016-01-01

We present Magnetospheric Multiscale observations of an electron-scale current sheet and electron outflow jet for asymmetric reconnection with guide field at the subsolar magnetopause. The electron jet observed within the reconnection region has an electron Mach number of 0.35 and is associated with electron agyrotropy. The jet is unstable to an electrostatic instability which generates intense waves with E(sub parallel lines) amplitudes reaching up to 300 mV/m and potentials up to 20% of the electron thermal energy. We see evidence of interaction between the waves and the electron beam, leading to quick thermalization of the beam and stabilization of the instability. The wave phase speed is comparable to the ion thermal speed, suggesting that the instability is of Buneman type, and therefore introduces electron-ion drag and leads to braking of the electron flow. Our observations demonstrate that electrostatic turbulence plays an important role in the electron-scale physics of asymmetric reconnection.

Non-Cartesian Parallel Imaging Reconstruction

PubMed Central

Wright, Katherine L.; Hamilton, Jesse I.; Griswold, Mark A.; Gulani, Vikas; Seiberlich, Nicole

2014-01-01

Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be employed to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the non-homogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian GRAPPA, and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. PMID:24408499
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives

NASA Astrophysics Data System (ADS)

Usha, S.; Subramani, C.

2018-04-01

Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
Graph-based linear scaling electronic structure theory.

PubMed

Niklasson, Anders M N; Mniszewski, Susan M; Negre, Christian F A; Cawkwell, Marc J; Swart, Pieter J; Mohd-Yusof, Jamal; Germann, Timothy C; Wall, Michael E; Bock, Nicolas; Rubensson, Emanuel H; Djidjev, Hristo

2016-06-21

We show how graph theory can be combined with quantum theory to calculate the electronic structure of large complex systems. The graph formalism is general and applicable to a broad range of electronic structure methods and materials, including challenging systems such as biomolecules. The methodology combines well-controlled accuracy, low computational cost, and natural low-communication parallelism. This combination addresses substantial shortcomings of linear scaling electronic structure theory, in particular with respect to quantum-based molecular dynamics simulations.
Graph-based linear scaling electronic structure theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Niklasson, Anders M. N., E-mail: amn@lanl.gov; Negre, Christian F. A.; Cawkwell, Marc J.

2016-06-21

We show how graph theory can be combined with quantum theory to calculate the electronic structure of large complex systems. The graph formalism is general and applicable to a broad range of electronic structure methods and materials, including challenging systems such as biomolecules. The methodology combines well-controlled accuracy, low computational cost, and natural low-communication parallelism. This combination addresses substantial shortcomings of linear scaling electronic structure theory, in particular with respect to quantum-based molecular dynamics simulations.
Electron acceleration in a secondary magnetic island formed during magnetic reconnection with a guide field

NASA Astrophysics Data System (ADS)

Wang, Huanyu; Lu, Quanming; Huang, Can; Wang, Shui

2017-05-01

Secondary magnetic islands may be generated in the vicinity of an X line during magnetic reconnection. In this paper, by performing two-dimensional (2-D) particle-in-cell simulations, we investigate the role of a secondary magnetic island in electron acceleration during magnetic reconnection with a guide field. The electron motions are found to be adiabatic, and we analyze the contributions of the parallel electric field and Fermi and betatron mechanisms to electron acceleration in the secondary island during the evolution of magnetic reconnection. When the secondary island is formed, electrons are accelerated by the parallel electric field due to the existence of the reconnection electric field in the electron current sheet. Electrons can be accelerated by both the parallel electric field and Fermi mechanism when the secondary island begins to merge with the primary magnetic island, which is formed simultaneously with the appearance of X lines. With the increase in the guide field, the contributions of the Fermi mechanism to electron acceleration become less and less important. When the guide field is sufficiently large, the contribution of the Fermi mechanism is almost negligible.
Tunneling of Bloch electrons through vacuum barrier

NASA Astrophysics Data System (ADS)

Mazin, I. I.

2001-08-01

Tunneling of Bloch electrons through a vacuum barrier introduces new physical effects in comparison with the textbook case of free (plane wave) electrons. For the latter, the exponential decay rate in the vacuum is minimal for electrons with the parallel component of momentum kparallel = 0, and the prefactor is defined by the electron momentum component in the normal to the surface direction. However, the decay rate of Bloch electrons may be minimal at an arbitrary kparallel ("hot spots" ), and the prefactor is determined by the electron's group velocity, rather than by its quasimomentum. We illustrate this by first-principles calculations for (110) Pd surface.
Parallel k-means++

DOE Office of Scientific and Technical Information (OSTI.GOV)

A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
Fully Parallel MHD Stability Analysis Tool

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2015-11-01

Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.

Design Sketches For Optical Crossbar Switches Intended For Large-Scale Parallel Processing Applications

NASA Astrophysics Data System (ADS)

Hartmann, Alfred; Redfield, Steve

1989-04-01

This paper discusses design of large-scale (1000x 1000) optical crossbar switching networks for use in parallel processing supercom-puters. Alternative design sketches for an optical crossbar switching network are presented using free-space optical transmission with either a beam spreading/masking model or a beam steering model for internodal communications. The performances of alternative multiple access channel communications protocol-unslotted and slotted ALOHA and carrier sense multiple access (CSMA)-are compared with the performance of the classic arbitrated bus crossbar of conventional electronic parallel computing. These comparisons indicate an almost inverse relationship between ease of implementation and speed of operation. Practical issues of optical system design are addressed, and an optically addressed, composite spatial light modulator design is presented for fabrication to arbitrarily large scale. The wide range of switch architecture, communications protocol, optical systems design, device fabrication, and system performance problems presented by these design sketches poses a serious challenge to practical exploitation of highly parallel optical interconnects in advanced computer designs.
Intrachain versus interchain electron transport in poly(fluorene-alt-benzothiadiazole): a quantum-chemical insight.

PubMed

Van Vooren, Antoine; Kim, Ji-Seon; Cornil, Jérôme

2008-05-16

Poly(9,9-di-n-octylfluorene-alt-benzothiadiazole) [F8BT], displays very different charge-transport properties for holes versus electrons when comparing annealed and pristine thin films and transport parallel (intrachain) and perpendicular (interchain) to the polymer axes. The present theoretical contribution focuses on the electron-transport properties of F8BT chains and compares the efficiency of intrachain versus interchain transport in the hopping regime. The theoretical results rationalize significantly lowered electron mobility in annealed F8BT thin films and the smaller mobility anisotropy (mu( parallel)/mu( perpendicular)) measured for electrons in aligned films (i.e. 5-7 compared to 10-15 for holes).
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Nonadiabatic electron response in the Hasegawa-Wakatani equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stoltzfus-Dueck, T.; Scott, B. D.; Krommes, J. A.

2013-08-15

Tokamak edge turbulence is strongly influenced by parallel electron physics, which relaxes density and potential fluctuations towards electron adiabatic response. Beginning with the paradigmatic Hasegawa-Wakatani equations (HWEs) for resistive tokamak edge turbulence, a unique decomposition of the electric potential (φ) into adiabatic (a) and nonadiabatic (b) portions is derived, based on the requirement that a neither drive nor respond to the parallel current j{sub ∥}. The form of the decomposition clarifies that, at perpendicular scales large relative to the sound radius, the electron adiabatic response controls the nonzonal φ, not the fluctuating density n. Simple energy balance arguments allow onemore » to rigorously bound the ratio of rms nonzonal nonadiabatic fluctuations (b(tilde sign)) relative to adiabatic ones (ã). The role of the vorticity nonlinearity in transferring energy between adiabatic and nonadiabatic fluctuations aids intuitive understanding of self-sustained turbulence in the HWEs. When the normalized parallel resistivity is weak, b(tilde sign) becomes effectively slaved, allowing the reduction to an approximate one-field model that remains valid for strong turbulence. In addition to guiding physical intuition, the one-field reduction should greatly ease further analytical manipulations. Direct numerical simulation of the 2D HWEs confirms the convergence of the asymptotic formula for b(tilde sign)« less
Neural assembly models derived through nano-scale measurements.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fan, Hongyou; Branda, Catherine; Schiek, Richard Louis

2009-09-01

This report summarizes accomplishments of a three-year project focused on developing technical capabilities for measuring and modeling neuronal processes at the nanoscale. It was successfully demonstrated that nanoprobes could be engineered that were biocompatible, and could be biofunctionalized, that responded within the range of voltages typically associated with a neuronal action potential. Furthermore, the Xyce parallel circuit simulator was employed and models incorporated for simulating the ion channel and cable properties of neuronal membranes. The ultimate objective of the project had been to employ nanoprobes in vivo, with the nematode C elegans, and derive a simulation based on the resultingmore » data. Techniques were developed allowing the nanoprobes to be injected into the nematode and the neuronal response recorded. To the authors's knowledge, this is the first occasion in which nanoparticles have been successfully employed as probes for recording neuronal response in an in vivo animal experimental protocol.« less
Perpendicular dynamics of runaway electrons in tokamak plasmas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fernandez-Gomez, I.; Martin-Solis, J. R.; Sanchez, R.

2012-10-15

In this paper, it will be shown that the runaway phenomenon in tokamak plasmas cannot be reduced to a one-dimensional problem, based on the competence between electric field acceleration and collisional friction losses in the parallel direction. A Langevin approach, including collisional diffusion in velocity space, will be used to analyze the two-dimensional runaway electron dynamics. An investigation of the runaway probability in velocity space will yield a criterion for runaway, which will be shown to be consistent with the results provided by the more simple test particle description of the runaway dynamics [Fuchs et al., Phys. Fluids 29, 2931more » (1986)]. Electron perpendicular collisional scattering will be found to play an important role, relaxing the conditions for runaway. Moreover, electron pitch angle scattering perpendicularly broadens the runaway distribution function, increasing the electron population in the runaway plateau region in comparison with what it should be expected from electron acceleration in the parallel direction only. The perpendicular broadening of the runaway distribution function, its dependence on the plasma parameters, and the resulting enhancement of the runaway production rate will be discussed.« less
Low-dose electron energy-loss spectroscopy using electron counting direct detectors.

PubMed

Maigné, Alan; Wolf, Matthias

2018-03-01

Since the development of parallel electron energy loss spectroscopy (EELS), charge-coupled devices (CCDs) have been the default detectors for EELS. With the recent development of electron-counting direct-detection cameras, micrographs can be acquired under very low electron doses at significantly improved signal-to-noise ratio. In spectroscopy, in particular in combination with a monochromator, the signal can be extremely weak and the detection limit is principally defined by noise introduced by the detector. Here we report the use of an electron-counting direct-detection camera for EEL spectroscopy. We studied the oxygen K edge of amorphous ice and obtained a signal noise ratio up to 10 times higher than with a conventional CCD.We report the application of electron counting to record time-resolved EEL spectra of a biological protein embedded in amorphous ice, revealing chemical changes observed in situ while exposed by the electron beam. A change in the fine structure of nitrogen K and the carbon K edges were recorded during irradiation. A concentration of 3 at% nitrogen was detected with a total electron dose of only 1.7 e-/Å2, extending the boundaries of EELS signal detection at low electron doses.
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

NASA Astrophysics Data System (ADS)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Relativistic electrons and whistlers in Jupiter's magnetosphere

NASA Technical Reports Server (NTRS)

Barbosa, D. D.; Coroniti, F. V.

1976-01-01

The paper examines some of the consequences of relativistic electrons in stably trapped equilibrium with parallel propagating whistlers in the inner magnetosphere of Jupiter. Approximate scaling laws for the stably trapped electron flux and equilibrium wave intensity are derived, and the equatorial growth rate for whistlers is determined. It is shown that fluxes are near the stably trapped limit, which suggests that whistler intensities may be high enough to cause significant diffusion of electrons, accounting for the observed reduction of phase space densities.
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Implementation of Multivariable Logic Functions in Parallel by Electrically Addressing a Molecule of Three Dopants in Silicon.

PubMed

Fresch, Barbara; Bocquel, Juanita; Hiluf, Dawit; Rogge, Sven; Levine, Raphael D; Remacle, Françoise

2017-07-05

To realize low-power, compact logic circuits, one can explore parallel operation on single nanoscale devices. An added incentive is to use multivalued (as distinct from Boolean) logic. Here, we theoretically demonstrate that the computation of all the possible outputs of a multivariate, multivalued logic function can be implemented in parallel by electrical addressing of a molecule made up of three interacting dopant atoms embedded in Si. The electronic states of the dopant molecule are addressed by pulsing a gate voltage. By simulating the time evolution of the non stationary electronic density built by the gate voltage, we show that one can implement a molecular decision tree that provides in parallel all the outputs for all the inputs of the multivariate, multivalued logic function. The outputs are encoded in the populations and in the bond orders of the dopant molecule, which can be measured using an STM tip. We show that the implementation of the molecular logic tree is equivalent to a spectral function decomposition. The function that is evaluated can be field-programmed by changing the time profile of the pulsed gate voltage. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Calculating electronic tunnel currents in networks of disordered irregularly shaped nanoparticles by mapping networks to arrays of parallel nonlinear resistors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aghili Yajadda, Mir Massoud

2014-10-21

We have shown both theoretically and experimentally that tunnel currents in networks of disordered irregularly shaped nanoparticles (NPs) can be calculated by considering the networks as arrays of parallel nonlinear resistors. Each resistor is described by a one-dimensional or a two-dimensional array of equal size nanoparticles that the tunnel junction gaps between nanoparticles in each resistor is assumed to be equal. The number of tunnel junctions between two contact electrodes and the tunnel junction gaps between nanoparticles are found to be functions of Coulomb blockade energies. In addition, the tunnel barriers between nanoparticles were considered to be tilted at highmore » voltages. Furthermore, the role of thermal expansion coefficient of the tunnel junction gaps on the tunnel current is taken into account. The model calculations fit very well to the experimental data of a network of disordered gold nanoparticles, a forest of multi-wall carbon nanotubes, and a network of few-layer graphene nanoplates over a wide temperature range (5-300 K) at low and high DC bias voltages (0.001 mV–50 V). Our investigations indicate, although electron cotunneling in networks of disordered irregularly shaped NPs may occur, non-Arrhenius behavior at low temperatures cannot be described by the cotunneling model due to size distribution in the networks and irregular shape of nanoparticles. Non-Arrhenius behavior of the samples at zero bias voltage limit was attributed to the disorder in the samples. Unlike the electron cotunneling model, we found that the crossover from Arrhenius to non-Arrhenius behavior occurs at two temperatures, one at a high temperature and the other at a low temperature.« less
Magnetospheric Multiscale observations of large-amplitude, parallel, electrostatic waves associated with magnetic reconnection at the magnetopause

NASA Astrophysics Data System (ADS)

Ergun, R. E.; Holmes, J. C.; Goodrich, K. A.; Wilder, F. D.; Stawarz, J. E.; Eriksson, S.; Newman, D. L.; Schwartz, S. J.; Goldman, M. V.; Sturner, A. P.; Malaspina, D. M.; Usanova, M. E.; Torbert, R. B.; Argall, M.; Lindqvist, P.-A.; Khotyaintsev, Y.; Burch, J. L.; Strangeway, R. J.; Russell, C. T.; Pollock, C. J.; Giles, B. L.; Dorelli, J. J. C.; Avanov, L.; Hesse, M.; Chen, L. J.; Lavraud, B.; Le Contel, O.; Retino, A.; Phan, T. D.; Eastwood, J. P.; Oieroset, M.; Drake, J.; Shay, M. A.; Cassak, P. A.; Nakamura, R.; Zhou, M.; Ashour-Abdalla, M.; André, M.

2016-06-01

We report observations from the Magnetospheric Multiscale satellites of large-amplitude, parallel, electrostatic waves associated with magnetic reconnection at the Earth's magnetopause. The observed waves have parallel electric fields (E||) with amplitudes on the order of 100 mV/m and display nonlinear characteristics that suggest a possible net E||. These waves are observed within the ion diffusion region and adjacent to (within several electron skin depths) the electron diffusion region. They are in or near the magnetosphere side current layer. Simulation results support that the strong electrostatic linear and nonlinear wave activities appear to be driven by a two stream instability, which is a consequence of mixing cold (<10 eV) plasma in the magnetosphere with warm (~100 eV) plasma from the magnetosheath on a freshly reconnected magnetic field line. The frequent observation of these waves suggests that cold plasma is often present near the magnetopause.
Non-isomorphic radial wavenumber dependencies of residual zonal flows in ion and electron Larmor radius scales, and effects of initial parallel flow and electromagnetic potentials in a circular tokamak

NASA Astrophysics Data System (ADS)

Yamagishi, Osamu

2018-04-01

Radial wavenumber dependencies of the residual zonal potential for E × B flow in a circular, large aspect ratio tokamak is investigated by means of the collisionless gyrokinetic simulations of Rosenbluth-Hinton (RH) test and the semi-analytic approach using an analytic solution of the gyrokinetic equation Rosenbluth and Hinton (1998 Phys. Rev. Lett. 80 724). By increasing the radial wavenumber from an ion Larmor radius scale {k}r{ρ }i≲ 1 to an electron Larmor radius scale {k}r{ρ }e≲ 1, the well-known level ˜ O[1/(1+1.6{q}2/\\sqrt{r/{R}0})] is retained, while the level remains O(1) when the wavenumber is decreased from the electron to the ion Larmor radius scale, if physically same adiabatic assumption is presumed for species other than the main species that is treated kinetically. The conclusion is not modified by treating both species kinetically, so that in the intermediate scale between the ion and electron Larmor radius scale it seems difficult to determine the level uniquely. The toroidal momentum conservation property in the RH test is also investigated by including an initial parallel flow in addition to the perpendicular flow. It is shown that by taking a balance between the initial parallel flow and perpendicular flows which include both E × B flow and diamagnetic flow in the initial condition, the mechanical toroidal angular momentum is approximately conserved despite the toroidal symmetry breaking due to the finite radial wavenumber zonal modes. Effect of electromagnetic potentials is also investigated. When the electromagnetic potentials are applied initially, fast oscillations which are faster than the geodesic acoustic modes are introduced in the decay phase of the zonal modes. Although the residual level in the long time limit is not modified, this can make the time required to reach the stationary zonal flows longer and may weaken the effectiveness of the turbulent transport suppression by the zonal flows.
CRUNCH_PARALLEL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shumaker, Dana E.; Steefel, Carl I.

The code CRUNCH_PARALLEL is a parallel version of the CRUNCH code. CRUNCH code version 2.0 was previously released by LLNL, (UCRL-CODE-200063). Crunch is a general purpose reactive transport code developed by Carl Steefel and Yabusake (Steefel Yabsaki 1996). The code handles non-isothermal transport and reaction in one, two, and three dimensions. The reaction algorithm is generic in form, handling an arbitrary number of aqueous and surface complexation as well as mineral dissolution/precipitation. A standardized database is used containing thermodynamic and kinetic data. The code includes advective, dispersive, and diffusive transport.
Parallel processing and expert systems

NASA Technical Reports Server (NTRS)

Lau, Sonie; Yan, Jerry C.

1991-01-01

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
CSM parallel structural methods research

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1989-01-01

Parallel structural methods, research team activities, advanced architecture computers for parallel computational structural mechanics (CSM) research, the FLEX/32 multicomputer, a parallel structural analyses testbed, blade-stiffened aluminum panel with a circular cutout and the dynamic characteristics of a 60 meter, 54-bay, 3-longeron deployable truss beam are among the topics discussed.
Effects of a parallel resistor on electrical characteristics of a piezoelectric transformer in open-circuit transient state.

PubMed

Chang, Kuo-Tsai

2007-01-01

This paper investigates electrical transient characteristics of a Rosen-type piezoelectric transformer (PT), including maximum voltages, time constants, energy losses and average powers, and their improvements immediately after turning OFF. A parallel resistor connected to both input terminals of the PT is needed to improve the transient characteristics. An equivalent circuit for the PT is first given. Then, an open-circuit voltage, involving a direct current (DC) component and an alternating current (AC) component, and its related energy losses are derived from the equivalent circuit with initial conditions. Moreover, an AC power control system, including a DC-to-AC resonant inverter, a control switch and electronic instruments, is constructed to determine the electrical characteristics of the OFF transient state. Furthermore, the effects of the parallel resistor on the transient characteristics at different parallel resistances are measured. The advantages of adding the parallel resistor also are discussed. From the measured results, the DC time constant is greatly decreased from 9 to 0.04 ms by a 10 k(omega) parallel resistance under open output.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining
Ion resonances and ELF wave production by an electron beam injected into the ionosphere - ECHO 6

NASA Astrophysics Data System (ADS)

Winckler, J. R.; Steffen, J. E.; Malcolm, P. R.; Erickson, K. N.; Abe, Y.; Swanson, R. L.

1984-09-01

Two effects observed with electron antennas ejected from a sounding rocket launched into the ionosphere in March 1983 carrying electron beam guns are discussed. The sensor packages were ejected and travelled parallel to the vehicle trajectory. Electric potentials were measured between the single probes and a plasma diagnostic package while the gun injected electrons into the ionosphere in perpendicular and parallel 1 kHz directions. Signal pulses over the dc-1250 kHz range were detected. A kHz gun frequency caused a signal that decreased by two orders of magnitude between 45-90 m from the beam field line. However, the signal was detectable at 1 mV/m at 120 m, supporting earlier data that indicated that pulsed electron beams can cause ELF waves in space. Beam injection parallel to the magnetic field produced an 840 Hz resonance that could be quenched by activation of a transverse beam.

Simplified Parallel Domain Traversal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erickson III, David J

2011-01-01

Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

1993-01-01

A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Field programmable chemistry: integrated chemical and electronic processing of informational molecules towards electronic chemical cells.

PubMed

Wagler, Patrick F; Tangen, Uwe; Maeke, Thomas; McCaskill, John S

2012-07-01

The topic addressed is that of combining self-constructing chemical systems with electronic computation to form unconventional embedded computation systems performing complex nano-scale chemical tasks autonomously. The hybrid route to complex programmable chemistry, and ultimately to artificial cells based on novel chemistry, requires a solution of the two-way massively parallel coupling problem between digital electronics and chemical systems. We present a chemical microprocessor technology and show how it can provide a generic programmable platform for complex molecular processing tasks in Field Programmable Chemistry, including steps towards the grand challenge of constructing the first electronic chemical cells. Field programmable chemistry employs a massively parallel field of electrodes, under the control of latched voltages, which are used to modulate chemical activity. We implement such a field programmable chemistry which links to chemistry in rather generic, two-phase microfluidic channel networks that are separated into weakly coupled domains. Electric fields, produced by the high-density array of electrodes embedded in the channel floors, are used to control the transport of chemicals across the hydrodynamic barriers separating domains. In the absence of electric fields, separate microfluidic domains are essentially independent with only slow diffusional interchange of chemicals. Electronic chemical cells, based on chemical microprocessors, exploit a spatially resolved sandwich structure in which the electronic and chemical systems are locally coupled through homogeneous fine-grained actuation and sensor networks and play symmetric and complementary roles. We describe how these systems are fabricated, experimentally test their basic functionality, simulate their potential (e.g. for feed forward digital electrophoretic (FFDE) separation) and outline the application to building electronic chemical cells. Copyright © 2012 Elsevier Ireland Ltd. All rights
Parallel flow diffusion battery

DOEpatents

Yeh, H.C.; Cheng, Y.S.

1984-01-01

A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
Parallel flow diffusion battery

DOEpatents

Yeh, Hsu-Chi; Cheng, Yung-Sung

1984-08-07

A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.
Current structure and flow pattern on the electron separatrix in reconnection region

NASA Astrophysics Data System (ADS)

Guo, Ruilong; Pu, Zuyin; Wei, Yong

2017-12-01

Results from 2.5D Particle-in-cell (PIC) simulations of symmetric reconnection with negligible guide field reveal that the accessible boundary of the electrons accelerated in the magnetic reconnection region is displayed by enhanced electron nongyrotropy downstream from the X-line. The boundary, hereafter termed the electron separatrix, occurs at a few d e (electron inertial length) away from the exhaust side of the magnetic separatrix. On the inflow side of the electron separatrix, the current is mainly carried by parallel accelerated electrons, served as the inflow region patch of the Hall current. The out-of-plane current density enhances at the electron separatrix. The dominating current carriers are the electrons, nongyrotropic distribution functions of which contribute significantly to the perpendicular electron velocity by increasing the electron diamagnetic drift velocity. When crossing the separatrix region where the Hall electric field is enhanced, electron velocity orientation is changed dramatically, which could be a diagnostic indicator to detect the electron separatrix. In the exhaust region, ions are the main carriers for the out-of-plane current, while the parallel current is still mainly carried by electrons. The current density peak in the separatrix region implies that a thin current sheet is formed apart from the neutral line, which can evolve to the bifurcated current sheet.
Particle simulation of plasmas on the massively parallel processor

NASA Technical Reports Server (NTRS)

Gledhill, I. M. A.; Storey, L. R. O.

1987-01-01

Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
Parallel consistent labeling algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samal, A.; Henderson, T.

Mackworth and Freuder have analyzed the time complexity of several constraint satisfaction algorithms. Mohr and Henderson have given new algorithms, AC-4 and PC-3, for arc and path consistency, respectively, and have shown that the arc consistency algorithm is optimal in time complexity and of the same order space complexity as the earlier algorithms. In this paper, they give parallel algorithms for solving node and arc consistency. They show that any parallel algorithm for enforcing arc consistency in the worst case must have O(na) sequential steps, where n is number of nodes, and a is the number of labels per node.more » They give several parallel algorithms to do arc consistency. It is also shown that they all have optimal time complexity. The results of running the parallel algorithms on a BBN Butterfly multiprocessor are also presented.« less
High-resolution, high-throughput imaging with a multibeam scanning electron microscope.

PubMed

Eberle, A L; Mikula, S; Schalek, R; Lichtman, J; Knothe Tate, M L; Zeidler, D

2015-08-01

Electron-electron interactions and detector bandwidth limit the maximal imaging speed of single-beam scanning electron microscopes. We use multiple electron beams in a single column and detect secondary electrons in parallel to increase the imaging speed by close to two orders of magnitude and demonstrate imaging for a variety of samples ranging from biological brain tissue to semiconductor wafers. © 2015 The Authors Journal of Microscopy © 2015 Royal Microscopical Society.
Partitioning in parallel processing of production systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oflazer, K.

1987-01-01

This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less
Growth of electron plasma waves above and below f(p) in the electron foreshock

NASA Technical Reports Server (NTRS)

Cairns, Iver H.; Fung, Shing F.

1988-01-01

This paper investigates the conditions required for electron beams to drive wave growth significantly above and below the electron plasma frequency, f(p), by numerically solving the linear dispersion equation. It is shown that kinetic growth well below f(p) may occur over a broad range of frequencies due to the beam instability, when the electron beam is slow, dilute, and relatively cold. Alternatively, a cold or sharp feature at low parallel velocities in the distribution function may drive kinetic growth significantly below f(p). Kinetic broadband growth significantly above f(p) is explained in terms of faster warmer beams. A unified qualitative theory for the narrow-band and broad-band waves is proposed.
The NAS parallel benchmarks

NASA Technical Reports Server (NTRS)

Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

1991-01-01

A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.
Sublattice parallel replica dynamics.

PubMed

Martínez, Enrique; Uberuaga, Blas P; Voter, Arthur F

2014-06-01

Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998)] by combining it with the synchronous sublattice approach of Shim and Amar [ and , Phys. Rev. B 71, 125432 (2005)], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.
The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Parallel computation with the force

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1985-01-01

A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Outrunning damage: Electrons vs X-rays-timescales and mechanisms.

PubMed

Spence, John C H

2017-07-01

Toward the end of his career, Zewail developed strong interest in fast electron spectroscopy and imaging, a field to which he made important contributions toward his aim of making molecular movies free of radiation damage. We therefore compare here the atomistic mechanisms leading to destruction of protein samples in diffract-and-destroy experiments for the cases of high-energy electron beam irradiation and X-ray laser pulses. The damage processes and their time-scales are compared and relevant elastic, inelastic, and photoelectron cross sections are given. Inelastic mean-free paths for ejected electrons at very low energies in insulators are compared with the bioparticle size. The dose rate and structural damage rate for electrons are found to be much lower, allowing longer pulses, reduced beam current, and Coulomb interactions for the formation of smaller probes. High-angle electron scattering from the nucleus, which has no parallel in the X-ray case, tracks the slowly moving nuclei during the explosion, just as the gain of the XFEL (X-ray free-electron laser) has no parallel in the electron case. Despite reduced damage and much larger elastic scattering cross sections in the electron case, leading to not dissimilar elastic scattering rates (when account is taken of the greatly increased incident XFEL fluence), progress for single-particle electron diffraction is seen to depend on the effort to reduce emittance growth due to Coulomb interactions, and so allow formation of intense sub-micron beams no larger than a virus.
Outrunning damage: Electrons vs X-rays—timescales and mechanisms

PubMed Central

Spence, John C. H.

2017-01-01

Toward the end of his career, Zewail developed strong interest in fast electron spectroscopy and imaging, a field to which he made important contributions toward his aim of making molecular movies free of radiation damage. We therefore compare here the atomistic mechanisms leading to destruction of protein samples in diffract-and-destroy experiments for the cases of high-energy electron beam irradiation and X-ray laser pulses. The damage processes and their time-scales are compared and relevant elastic, inelastic, and photoelectron cross sections are given. Inelastic mean-free paths for ejected electrons at very low energies in insulators are compared with the bioparticle size. The dose rate and structural damage rate for electrons are found to be much lower, allowing longer pulses, reduced beam current, and Coulomb interactions for the formation of smaller probes. High-angle electron scattering from the nucleus, which has no parallel in the X-ray case, tracks the slowly moving nuclei during the explosion, just as the gain of the XFEL (X-ray free-electron laser) has no parallel in the electron case. Despite reduced damage and much larger elastic scattering cross sections in the electron case, leading to not dissimilar elastic scattering rates (when account is taken of the greatly increased incident XFEL fluence), progress for single-particle electron diffraction is seen to depend on the effort to reduce emittance growth due to Coulomb interactions, and so allow formation of intense sub-micron beams no larger than a virus. PMID:28653018
Highly-Parallel, Highly-Compact Computing Structures Implemented in Nanotechnology

NASA Technical Reports Server (NTRS)

Crawley, D. G.; Duff, M. J. B.; Fountain, T. J.; Moffat, C. D.; Tomlinson, C. D.

1995-01-01

In this paper, we describe work in which we are evaluating how the evolving properties of nano-electronic devices could best be utilized in highly parallel computing structures. Because of their combination of high performance, low power, and extreme compactness, such structures would have obvious applications in spaceborne environments, both for general mission control and for on-board data analysis. However, the anticipated properties of nano-devices mean that the optimum architecture for such systems is by no means certain. Candidates include single instruction multiple datastream (SIMD) arrays, neural networks, and multiple instruction multiple datastream (MIMD) assemblies.
Design considerations for parallel graphics libraries

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1994-01-01

Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

NASA Technical Reports Server (NTRS)

Straeter, T. A.; Markos, A. T.

1975-01-01

A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

Automatic Multilevel Parallelization Using OpenMP

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

2002-01-01

In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.
Nonlinear Electron Acoustic Waves in the Inner Magnetosphere

NASA Astrophysics Data System (ADS)

Dillard, C. S.; Vasko, I.; Mozer, F.; Agapitov, O. V.

2017-12-01

The Van Allen Probes observe intense broad-band electrostatic wave activity in the inner magnetosphere. The high-resolution electric field measurements show that these broad-band wave activity is made of large-amplitude electrostatic solitary waves propagating generally along the background magnetic field with velocities of a few thousands km/s. There are generally two types of the observed solitary waves. The solitary waves with the bipolar parallel electric field are interpreted as electron phase space holes, while the nature of solitary waves with asymmetric parallel electric field has remained puzzling. In the present work we show that asymmetric solitary waves propagate with velocities (1000-5000 km/s) and have spatial scales (100 m-1 km) similar to those for electron-acoustic waves existing due to two temperature electron population. Through the numerical fluid simulation we show that the spikes are produced from the initially harmonic electron-acoustic perturbation due to the nonlinear steepening. Through the analysis of the modified KdV equation we show that the steepening is arrested at some moment by the collisionless Landau dissipation and results in formation of the observed asymmetric spikes (shocklets).
A derivation and scalable implementation of the synchronous parallel kinetic Monte Carlo method for simulating long-time dynamics

NASA Astrophysics Data System (ADS)

Byun, Hye Suk; El-Naggar, Mohamed Y.; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2017-10-01

Kinetic Monte Carlo (KMC) simulations are used to study long-time dynamics of a wide variety of systems. Unfortunately, the conventional KMC algorithm is not scalable to larger systems, since its time scale is inversely proportional to the simulated system size. A promising approach to resolving this issue is the synchronous parallel KMC (SPKMC) algorithm, which makes the time scale size-independent. This paper introduces a formal derivation of the SPKMC algorithm based on local transition-state and time-dependent Hartree approximations, as well as its scalable parallel implementation based on a dual linked-list cell method. The resulting algorithm has achieved a weak-scaling parallel efficiency of 0.935 on 1024 Intel Xeon processors for simulating biological electron transfer dynamics in a 4.2 billion-heme system, as well as decent strong-scaling parallel efficiency. The parallel code has been used to simulate a lattice of cytochrome complexes on a bacterial-membrane nanowire, and it is broadly applicable to other problems such as computational synthesis of new materials.
Multicoil resonance-based parallel array for smart wireless power delivery.

PubMed

Mirbozorgi, S A; Sawan, M; Gosselin, B

2013-01-01

This paper presents a novel resonance-based multicoil structure as a smart power surface to wirelessly power up apparatus like mobile, animal headstage, implanted devices, etc. The proposed powering system is based on a 4-coil resonance-based inductive link, the resonance coil of which is formed by an array of several paralleled coils as a smart power transmitter. The power transmitter employs simple circuit connections and includes only one power driver circuit per multicoil resonance-based array, which enables higher power transfer efficiency and power delivery to the load. The power transmitted by the driver circuit is proportional to the load seen by the individual coil in the array. Thus, the transmitted power scales with respect to the load of the electric/electronic system to power up, and does not divide equally over every parallel coils that form the array. Instead, only the loaded coils of the parallel array transmit significant part of total transmitted power to the receiver. Such adaptive behavior enables superior power, size and cost efficiency then other solutions since it does not need to use complex detection circuitry to find the location of the load. The performance of the proposed structure is verified by measurement results. Natural load detection and covering 4 times bigger area than conventional topologies with a power transfer efficiency of 55% are the novelties of presented paper.
Competition Between Electromagnetic Modes in a Free-Electron Maser

DTIC Science & Technology

1994-02-28

electron perpendicular momentum familiar from gyrotron theory 111). The electron mass is me, initial electron velocity perpendicular and parallel to the...are Q Q2 of zeroth order (-1). Similarly, 48 Y tqfia IIOP --T-V I V s_*/ U- s sI J(93~+ I(*JQL4 8aq 5 Using matrix notation, we can write (i) = (C...disks were in turn electron beam welded to stainless steel flanges. While Kovar was needed to provide a good brazing interface, the mass of the material
Using Parallel Processing for Problem Solving.

DTIC Science & Technology

1979-12-01

are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
Introduction to Electronics. Training Workbook.

ERIC Educational Resources Information Center

Anoka-Hennepin Technical Coll., Minneapolis, MN.

This workbook is intended for students enrolled in a 3-day introductory course to electronics developed during a project to retrain defense industry workers at risk of job loss or dislocation because of conversion of the defense industry. The workbook begins with a course outline and is divided into three sections that parallel the following…
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains.

PubMed

Jha, Ashwani; Flurchick, K M; Bikdash, Marwan; Kc, Dukka B

2016-01-01

Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10-15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains

PubMed Central

Jha, Ashwani; Flurchick, K. M.; Bikdash, Marwan

2016-01-01

Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors. PMID:27747230
Turbomachinery CFD on parallel computers

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Milner, Edward J.; Quealy, Angela; Townsend, Scott E.

1992-01-01

The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations.
Parallel CE/SE Computations via Domain Decomposition

NASA Technical Reports Server (NTRS)

Himansu, Ananda; Jorgenson, Philip C. E.; Wang, Xiao-Yen; Chang, Sin-Chung

2000-01-01

This paper describes the parallelization strategy and achieved parallel efficiency of an explicit time-marching algorithm for solving conservation laws. The Space-Time Conservation Element and Solution Element (CE/SE) algorithm for solving the 2D and 3D Euler equations is parallelized with the aid of domain decomposition. The parallel efficiency of the resultant algorithm on a Silicon Graphics Origin 2000 parallel computer is checked.
Parallel image compression

NASA Technical Reports Server (NTRS)

Reif, John H.

1987-01-01

A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Massively parallel and linear-scaling algorithm for second-order Moller–Plesset perturbation theory applied to the study of supramolecular wires

DOE PAGES

Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...

2016-11-16

Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
Parallel momentum input by tangential neutral beam injections in stellarator and heliotron plasmas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nishimura, S., E-mail: nishimura.shin@lhd.nifs.ac.jp; Nakamura, Y.; Nishioka, K.

The configuration dependence of parallel momentum inputs to target plasma particle species by tangentially injected neutral beams is investigated in non-axisymmetric stellarator/heliotron model magnetic fields by assuming the existence of magnetic flux-surfaces. In parallel friction integrals of the full Rosenbluth-MacDonald-Judd collision operator in thermal particles' kinetic equations, numerically obtained eigenfunctions are used for excluding trapped fast ions that cannot contribute to the friction integrals. It is found that the momentum inputs to thermal ions strongly depend on magnetic field strength modulations on the flux-surfaces, while the input to electrons is insensitive to the modulation. In future plasma flow studies requiringmore » flow calculations of all particle species in more general non-symmetric toroidal configurations, the eigenfunction method investigated here will be useful.« less
Electron energy distributions measured during electron beam/plasma interactions. [in E region

NASA Technical Reports Server (NTRS)

Jost, R. J.; Anderson, H. R.; Mcgarity, J. O.

1980-01-01

In the large vacuum facility at the NASA-Johnson Space Center an electron beam was projected 20 m parallel to B from a gun with variable accelerating potential (1.0 to 2.5 kV) to an aluminum target. The ionospheric neutral pressure and field were approximated. Beam electron energy distributions were measured directly using an electrostatic deflection analyzer and indirectly with a detector that responded to the X-rays produced by electron impact on the target. At low currents the distribution is sharply peaked at the acceleration potential. At high currents a beam plasma discharge occurs and electrons are redistributed in energy so that the former energy peak broadens to 10-15 percent FWHM with a strongly enhanced low energy tail. At the 10% of maximum point the energy spectrum ranges from less than 1/2 to 1.2 times the gun energy. The effect is qualitatively the same at all pitch angles and locations sampled.
Electron Energization and Mixing Observed by MMS in the Vicinity of an Electron Diffusion Region During Magnetopause Reconnection

NASA Technical Reports Server (NTRS)

Chen, Li-Jen; Hesse, Michael; Wang, Shan; Gershman, Daniel; Ergun, Robert; Pollock, Craig; Torbert, Roy; Bessho, Naoki; Daughton, William; Dorelli, John;

2016-01-01

Measurements from the Magnetospheric Multiscale (MMS) mission are reported to show distinct features of electron energization and mixing in the diffusion region of the terrestrial magnetopause reconnection. At the ion jet and magnetic field reversals, distribution functions exhibiting signatures of accelerated meandering electrons are observed at an electron out-of-plane flow peak. The meandering signatures manifested as triangular and crescent structures are established features of the electron diffusion region (EDR). Effects of meandering electrons on the electric field normal to the reconnection layer are detected. Parallel acceleration and mixing of the inflowing electrons with exhaust electrons shape the exhaust flow pattern. In the EDR vicinity, the measured distribution functions indicate that locally, the electron energization and mixing physics is captured by two-dimensional reconnection, yet to account for the simultaneous four-point measurements, translational invariant in the third dimension must be violated on the ion-skin-depth scale.

Electron energization and mixing observed by MMS in the vicinity of an electron diffusion region during magnetopause reconnection

NASA Astrophysics Data System (ADS)

Chen, Li-Jen; Hesse, Michael; Wang, Shan; Gershman, Daniel; Ergun, Robert; Pollock, Craig; Torbert, Roy; Bessho, Naoki; Daughton, William; Dorelli, John; Giles, Barbara; Strangeway, Robert; Russell, Christopher; Khotyaintsev, Yuri; Burch, Jim; Moore, Thomas; Lavraud, Benoit; Phan, Tai; Avanov, Levon

2016-06-01

Measurements from the Magnetospheric Multiscale (MMS) mission are reported to show distinct features of electron energization and mixing in the diffusion region of the terrestrial magnetopause reconnection. At the ion jet and magnetic field reversals, distribution functions exhibiting signatures of accelerated meandering electrons are observed at an electron out-of-plane flow peak. The meandering signatures manifested as triangular and crescent structures are established features of the electron diffusion region (EDR). Effects of meandering electrons on the electric field normal to the reconnection layer are detected. Parallel acceleration and mixing of the inflowing electrons with exhaust electrons shape the exhaust flow pattern. In the EDR vicinity, the measured distribution functions indicate that locally, the electron energization and mixing physics is captured by two-dimensional reconnection, yet to account for the simultaneous four-point measurements, translational invariant in the third dimension must be violated on the ion-skin-depth scale.
Parallel processing considerations for image recognition tasks

NASA Astrophysics Data System (ADS)

Simske, Steven J.

2011-01-01

Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.

Expressing Parallelism with ROOT

NASA Astrophysics Data System (ADS)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Massively parallel processor computer

NASA Technical Reports Server (NTRS)

Fung, L. W. (Inventor)

1983-01-01

An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Laser-induced free-free transitions in elastic electron scattering from CO2

NASA Astrophysics Data System (ADS)

Musa, Mohamed; MacDonald, Amy; Tidswell, Lisa; Holmes, Jim; St. Francis Xavier Laser Scattering Lab Team

2011-03-01

This report presents measurements of laser-induced free-free transitions of electrons scattered from CO2 molecules in the ground electronic state at incident electron energies of 3.8 and 5.8 eV under pulsed CO2 laser field. The differential cross section of free-free transitions involving absorption and emission of up to two photons were measured at various scattering angles with the polarization of the laser either parallel with or perpendicular to the the momentum change vector of the scattered electrons. The results of the parallel geometry are found to be in qualitative agreement with the predictions of the Kroll-Watson approximation within the experimental uncertainty whereas those of the perpendicular geometry show marked discrepancy with the Kroll-Watson predictions. This work was supported by the Natural Sciences and Engineering Research Council of Canada and the St. Francis Xavier University Council for Research.
Sub-GeV dark matter detection with electron recoils in carbon nanotubes

NASA Astrophysics Data System (ADS)

Cavoto, G.; Luchetta, F.; Polosa, A. D.

2018-01-01

Directional detection of Dark Matter particles (DM) in the MeV mass range could be accomplished by studying electron recoils in large arrays of parallel carbon nanotubes. In a scattering process with a lattice electron, a DM particle might transfer sufficient energy to eject it from the nanotube surface. An external electric field is added to drive the electron from the open ends of the array to the detection region. The anisotropic response of this detection scheme, as a function of the orientation of the target with respect to the DM wind, is calculated, and it is concluded that no direct measurement of the electron ejection angle is needed to explore significant regions of the light DM exclusion plot. A compact sensor, in which the cathode element is substituted with a dense array of parallel carbon nanotubes, could serve as the basic detection unit.
Parallel computing for probabilistic fatigue analysis

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

1993-01-01

This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Velocity diagnostics of electron beams within a 140 GHz gyrotron

NASA Astrophysics Data System (ADS)

Polevoy, Jeffrey Todd

1989-06-01

Experimental measurements of the average axial velocity v(sub parallel) of the electron beam within the M.I.T. 140 GHz MW gyrotron have been performed. The method involves the simultaneous measurement of the radial electrostatic potential of the electron beam V(sub p) and the beam current I(sub b). The V(sub p) is measured through the use of a capacitive probe installed near or within the gyrotron cavity, while I(sub b) is measured with a previously installed Rogowski coil. Three capacitive probes have been designed and built, and two have operated within the gyrotron. The probe results are repeatable and consistent with theory. The measurements of v(sub parallel) and calculations of the corresponding transverse to longitudinal beam velocity ratio (alpha) = v(sub perpendicular)/v(sub parallel) at the cavity have been made at various gyrotron operation parameters. These measurements will provide insight into the causes of discrepancies between theoretical RF interaction efficiencies and experimental efficiencies obtained in experiments with the M.I.T. 140 GHz MW gyrotron. The expected values of v(sub parallel) and (alpha) are determined through the use of a computer code (EGUN) which is used to model the cathode and anode regions of the gyrotron. It also computes the trajectories and velocities of the electrons within the gyrotron. There is good correlation between the expected and measured values of (alpha) at low (alpha), with the expected values from EGUN often falling within the standard errors of the measured values.
Exploiting Symmetry on Parallel Architectures.

NASA Astrophysics Data System (ADS)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Architectures for reasoning in parallel

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.

1989-01-01

The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Efficiency of parallel direct optimization

NASA Technical Reports Server (NTRS)

Janies, D. A.; Wheeler, W. C.

2001-01-01

Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

PubMed

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-06-01

We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
The effect of cell design and test criteria on the series/parallel performance of nickel cadmium cells and batteries

NASA Technical Reports Server (NTRS)

Halpert, G.; Webb, D. A.

1983-01-01

Three batteries were operated in parallel from a common bus during charge and discharge. SMM utilized NASA Standard 20AH cells and batteries, and LANDSAT-D NASA 50AH cells and batteries of a similar design. Each battery consisted of 22 series connected cells providing the nominal 28V bus. The three batteries were charged in parallel using the voltage limit/current taper mode wherein the voltage limit was temperature compensated. Discharge occurred on the demand of the spacecraft instruments and electronics. Both flights were planned for three to five year missions. The series/parallel configuration of cells and batteries for the 3-5 yr mission required a well controlled product with built-in reliability and uniformity. Examples of how component, cell and battery selection methods affect the uniformity of the series/parallel operation of the batteries both in testing and in flight are given.
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
Simulation Exploration through Immersive Parallel Planes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunhart-Lupo, Nicholas J; Bush, Brian W; Gruchalla, Kenny M

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less
The electron foreshock

NASA Technical Reports Server (NTRS)

Fitzenreiter, R. J.

1995-01-01

An overview of the observations of backstreaming electrons in the foreshock and the mechanisms that have been proposed to explain their properties will be presented. A primary characteristic of observed foreshock electrons is that their velocity distributions are spatially structured in a systematic way depending on distance from the magnetic field line which is tangent to the shock. There are two interrelated aspects to explaining the structure of velocity distributions in the foreshock, one involving the acceleration mechanism and the other, propagation from the source to the observing point. First, the source distribution of electrons energized by the shock must be determined along the shock surface. Proposed acceleration mechanisms include magnetic mirroring of incoming solar wind particles and mechanisms involving transmission of particles through the shock. Secondly, the kinematics of observable electrons streaming away from a curved shock with an initial parallel velocity and a downstream perpendicular velocity component due to the motional electric field must be determined. This is the context in which the observations and their explanations will be reviewed.
THE MECHANISMS OF ELECTRON ACCELERATION DURING MULTIPLE X LINE MAGNETIC RECONNECTION WITH A GUIDE FIELD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Huanyu; Lu, Quanming; Huang, Can

2016-04-20

The interactions between magnetic islands are considered to play an important role in electron acceleration during magnetic reconnection. In this paper, two-dimensional particle-in-cell simulations are performed to study electron acceleration during multiple X line reconnection with a guide field. Because the electrons remain almost magnetized, we can analyze the contributions of the parallel electric field, Fermi, and betatron mechanisms to electron acceleration during the evolution of magnetic reconnection through comparison with a guide-center theory. The results show that with the magnetic reconnection proceeding, two magnetic islands are formed in the simulation domain. Next, the electrons are accelerated by both themore » parallel electric field in the vicinity of the X lines and the Fermi mechanism due to the contraction of the two magnetic islands. Then, the two magnetic islands begin to merge into one, and, in such a process, the electrons can be accelerated by both the parallel electric field and betatron mechanisms. During the betatron acceleration, the electrons are locally accelerated in the regions where the magnetic field is piled up by the high-speed flow from the X line. At last, when the coalescence of the two islands into one big island finishes, the electrons can be further accelerated by the Fermi mechanism because of the contraction of the big island. With the increase of the guide field, the contributions of the Fermi and betatron mechanisms to electron acceleration become less and less important. When the guide field is sufficiently large, the contributions of the Fermi and betatron mechanisms are almost negligible.« less
Hypercluster Parallel Processor

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

1992-01-01

Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Limitations of silicon diodes for clinical electron dosimetry.

PubMed

Song, Haijun; Ahmad, Munir; Deng, Jun; Chen, Zhe; Yue, Ning J; Nath, Ravinder

2006-01-01

This work investigates the relevance of several factors affecting the response of silicon diode dosemeters in depth-dose scans of electron beams. These factors are electron energy, instantaneous dose rate, dose per pulse, photon/electron dose ratio and electron scattering angle (directional response). Data from the literature and our own experiments indicate that the impact of these factors may be up to +/-15%. Thus, the different factors would have to cancel out perfectly at all depths in order to produce true depth-dose curves. There are reports of good agreement between depth-doses measured with diodes and ionisation chambers. However, our measurements with a Scantronix electron field detector (EFD) diode and with a plane-parallel ionisation chamber show discrepancies both in the build-up and in the low-dose regions, with a ratio up to 1.4. Moreover, the absolute sensitivity of two diodes of the same EFD model was found to differ by a factor of 3, and this ratio was not constant but changed with depth between 5 and 15% in the low-dose regions of some clinical electron beams. Owing to these inhomogeneities among diodes even of the same model, corrections for each factor would have to be diode-specific and beam-specific. All these corrections would have to be determined using parallel plane chambers, as recommended by AAPM TG-25, which would be unrealistic in clinical practice. Our conclusion is that in general diodes are not reliable in the measurement of depth-dose curves of clinical electron beams.
Neoclassical parallel flow calculation in the presence of external parallel momentum sources in Heliotron J

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nishioka, K.; Nakamura, Y.; Nishimura, S.

A moment approach to calculate neoclassical transport in non-axisymmetric torus plasmas composed of multiple ion species is extended to include the external parallel momentum sources due to unbalanced tangential neutral beam injections (NBIs). The momentum sources that are included in the parallel momentum balance are calculated from the collision operators of background particles with fast ions. This method is applied for the clarification of the physical mechanism of the neoclassical parallel ion flows and the multi-ion species effect on them in Heliotron J NBI plasmas. It is found that parallel ion flow can be determined by the balance between themore » parallel viscosity and the external momentum source in the region where the external source is much larger than the thermodynamic force driven source in the collisional plasmas. This is because the friction between C{sup 6+} and D{sup +} prevents a large difference between C{sup 6+} and D{sup +} flow velocities in such plasmas. The C{sup 6+} flow velocities, which are measured by the charge exchange recombination spectroscopy system, are numerically evaluated with this method. It is shown that the experimentally measured C{sup 6+} impurity flow velocities do not contradict clearly with the neoclassical estimations, and the dependence of parallel flow velocities on the magnetic field ripples is consistent in both results.« less
Feasibility of optically interconnected parallel processors using wavelength division multiplexing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deri, R.J.; De Groot, A.J.; Haigh, R.E.

1996-03-01

New national security demands require enhanced computing systems for nearly ab initio simulations of extremely complex systems and analyzing unprecedented quantities of remote sensing data. This computational performance is being sought using parallel processing systems, in which many less powerful processors are ganged together to achieve high aggregate performance. Such systems require increased capability to communicate information between individual processor and memory elements. As it is likely that the limited performance of today`s electronic interconnects will prevent the system from achieving its ultimate performance, there is great interest in using fiber optic technology to improve interconnect communication. However, little informationmore » is available to quantify the requirements on fiber optical hardware technology for this application. Furthermore, we have sought to explore interconnect architectures that use the complete communication richness of the optical domain rather than using optics as a simple replacement for electronic interconnects. These considerations have led us to study the performance of a moderate size parallel processor with optical interconnects using multiple optical wavelengths. We quantify the bandwidth, latency, and concurrency requirements which allow a bus-type interconnect to achieve scalable computing performance using up to 256 nodes, each operating at GFLOP performance. Our key conclusion is that scalable performance, to {approx}150 GFLOPS, is achievable for several scientific codes using an optical bus with a small number of WDM channels (8 to 32), only one WDM channel received per node, and achievable optoelectronic bandwidth and latency requirements. 21 refs. , 10 figs.« less

Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

PubMed

Bhandarkar, S M; Chirravuri, S; Arnold, J

1996-01-01

Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

NASA Astrophysics Data System (ADS)

Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul

2017-03-01

We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R

Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with themore » endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.« less
Expressing Parallelism with ROOT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piparo, D.; Tejedor, E.; Guiraud, E.

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
Parallel programming of industrial applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heroux, M; Koniges, A; Simon, H

1998-07-21

In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from thesemore » applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).« less
First principles study of electronic properties, interband transitions and electron energy loss of α-graphyne

NASA Astrophysics Data System (ADS)

Behzad, Somayeh

2016-04-01

The electronic and optical properties of α-graphyne sheet are investigated by using density functional theory. The results confirm that α-graphyne sheet is a zero-gap semimetal. The optical properties of the α-graphyne sheet such as dielectric function, refraction index, electron energy loss function, reflectivity, absorption coefficient and extinction index are calculated for both parallel and perpendicular electric field polarizations. The optical spectra are strongly anisotropic along these two polarizations. For (E ∥ x), absorption edge is at 0 eV, while there is no absorption below 8 eV for (E ∥ z).
Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

DOE PAGES

Zhang, Hong; Zapol, Peter; Dixon, David A.; ...

2015-11-17

The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Hong; Zapol, Peter; Dixon, David A.

The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
Parallel Event Analysis Under Unix

NASA Astrophysics Data System (ADS)

Looney, S.; Nilsson, B. S.; Oest, T.; Pettersson, T.; Ranjard, F.; Thibonnier, J.-P.

The ALEPH experiment at LEP, the CERN CN division and Digital Equipment Corp. have, in a joint project, developed a parallel event analysis system. The parallel physics code is identical to ALEPH's standard analysis code, ALPHA, only the organisation of input/output is changed. The user may switch between sequential and parallel processing by simply changing one input "card". The initial implementation runs on an 8-node DEC 3000/400 farm, using the PVM software, and exhibits a near-perfect speed-up linearity, reducing the turn-around time by a factor of 8.
Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hirata, So

2003-11-20

We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Fast imaging with inelastically scattered electrons by off-axis chromatic confocal electron microscopy.

PubMed

Zheng, Changlin; Zhu, Ye; Lazar, Sorin; Etheridge, Joanne

2014-04-25

We introduce off-axis chromatic scanning confocal electron microscopy, a technique for fast mapping of inelastically scattered electrons in a scanning transmission electron microscope without a spectrometer. The off-axis confocal mode enables the inelastically scattered electrons to be chromatically dispersed both parallel and perpendicular to the optic axis. This enables electrons with different energy losses to be separated and detected in the image plane, enabling efficient energy filtering in a confocal mode with an integrating detector. We describe the experimental configuration and demonstrate the method with nanoscale core-loss chemical mapping of silver (M4,5) in an aluminium-silver alloy and atomic scale imaging of the low intensity core-loss La (M4,5@840 eV) signal in LaB6. Scan rates up to 2 orders of magnitude faster than conventional methods were used, enabling a corresponding reduction in radiation dose and increase in the field of view. If coupled with the enhanced depth and lateral resolution of the incoherent confocal configuration, this offers an approach for nanoscale three-dimensional chemical mapping.
Collectively loading an application in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
Electronic and Solid State Sciences Program Summary, FY 1979.

DTIC Science & Technology

1979-01-01

studies of the interaction of the electromagnetic field with heat conducting and electrically non-conducting and conducting polarizable and mag- netizable...Physical Review Letters, 42, 401-404 (1979). 9. "The low temperature electronic specific heat of disordered one dimensional chains", by P. S...technique exploits parallel photoheating and dc electrical- heating experiments. The CO laser hot electron studies have provided information on the
Electron Scattering by High-Frequency Whistler Waves at Earth's Bow Shock

NASA Technical Reports Server (NTRS)

Oka, M.; Wilson, L. B., III; Phan, T. D.; Hull, A. J.; Amano, T.; Hoshino, M.; Argall, M. R.; Le Contel, O.; Agapitov, O.; Gersham, D. J.;

2017-01-01

Electrons are accelerated to non-thermal energies at shocks in space and astrophysical environments. While different mechanisms of electron acceleration have been proposed, it remains unclear how non-thermal electrons are produced out of the thermal plasma pool. Here, we report in situ evidence of pitch-angle scattering of non-thermal electrons by whistler waves at Earths bow shock. On 2015 November 4, the Magnetospheric Multiscale (MMS) mission crossed the bow shock with an Alfvn Mach number is approximately 11 and a shock angle of approximately 84deg. In the ramp and overshoot regions, MMS revealed bursty enhancements of non-thermal (0.52 keV) electron flux, correlated with high-frequency (0.2 - 0.4 Omega(sub ce), where Omega(sub ce) is the cyclotron frequency) parallel-propagating whistler waves. The electron velocity distribution (measured at 30 ms cadence) showed an enhanced gradient of phase-space density at and around the region where the electron velocity component parallel to the magnetic field matched the resonant energy inferred from the wave frequency range. The flux of 0.5 keV electrons (measured at 1ms cadence) showed fluctuations with the same frequency. These features indicate that non-thermal electrons were pitch-angle scattered by cyclotron resonance with the high-frequency whistler waves. However, the precise role of the pitch-angle scattering by the higher-frequency whistler waves and possible nonlinear effects in the electron acceleration process remains unclear.

Parallel-In-Time For Moving Meshes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Falgout, R. D.; Manteuffel, T. A.; Southworth, B.

2016-02-04

With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less
Integrated Task And Data Parallel Programming: Language Design

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated
Performance of the Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
10-channel fiber array fabrication technique for parallel optical coherence tomography system

NASA Astrophysics Data System (ADS)

Arauz, Lina J.; Luo, Yuan; Castillo, Jose E.; Kostuk, Raymond K.; Barton, Jennifer

2007-02-01

Optical Coherence Tomography (OCT) shows great promise for low intrusive biomedical imaging applications. A parallel OCT system is a novel technique that replaces mechanical transverse scanning with electronic scanning. This will reduce the time required to acquire image data. In this system an array of small diameter fibers is required to obtain an image in the transverse direction. Each fiber in the array is configured in an interferometer and is used to image one pixel in the transverse direction. In this paper we describe a technique to package 15μm diameter fibers on a siliconsilica substrate to be used in a 2mm endoscopic probe tip. Single mode fibers are etched to reduce the cladding diameter from 125μm to 15μm. Etched fibers are placed into a 4mm by 150μm trench in a silicon-silica substrate and secured with UV glue. Active alignment was used to simplify the lay out of the fibers and minimize unwanted horizontal displacement of the fibers. A 10-channel fiber array was built, tested and later incorporated into a parallel optical coherence system. This paper describes the packaging, testing, and operation of the array in a parallel OCT system.
Noncovalent Molecular Electronics.

PubMed

Gryn'ova, G; Corminboeuf, C

2018-05-03

Molecular electronics covers several distinctly different conducting architectures, including organic semiconductors and single-molecule junctions. The noncovalent interactions, abundant in the former, are also often found in the latter, i.e., the dimer junctions. In the present work, we draw the parallel between the two types of noncovalent molecular electronics for a range of π-conjugated heteroaromatic molecules. In silico modeling allows us to distill the factors that arise from the chemical nature of their building blocks and from their mutual arrangement. We find that the same compounds are consistently the worst and the best performers in the two types of electronic assemblies, emphasizing the universal imprint of the underlying chemistry of the molecular cores on their diverse charge transport characteristics. The interplay between molecular and intermolecular factors creates a spectrum of noncovalent conductive architectures, which can be manipulated using the design strategies based upon the established relationships between chemistry and transport.
Parallel computing on Unix workstation arrays

NASA Astrophysics Data System (ADS)

Reale, F.; Bocchino, F.; Sciortino, S.

1994-12-01

We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.

Revisiting Parallel Cyclic Reduction and Parallel Prefix-Based Algorithms for Block Tridiagonal System of Equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul

2013-01-01

Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less
On the generation of double layers from ion- and electron-acoustic instabilities

NASA Astrophysics Data System (ADS)

Fu, Xiangrong; Cowee, Misa M.; Gary, S. Peter; Winske, Dan

2016-03-01

A plasma double layer (DL) is a nonlinear electrostatic structure that carries a uni-polar electric field parallel to the background magnetic field due to local charge separation. Past studies showed that DLs observed in space plasmas are mostly associated with the ion acoustic instability. Recent Van Allen Probes observations of parallel electric field structures traveling much faster than the ion acoustic speed have motivated a computational study to test the hypothesis that a new type of DLs—electron acoustic DLs—generated from the electron acoustic instability are responsible for these electric fields. Nonlinear particle-in-cell simulations yield negative results, i.e., the hypothetical electron acoustic DLs cannot be formed in a way similar to ion acoustic DLs. Linear theory analysis and the simulations show that the frequencies of electron acoustic waves are too high for ions to respond and maintain charge separation required by DLs. However, our results do show that local density perturbations in a two-electron-component plasma can result in unipolar-like electric field structures that propagate at the electron thermal speed, suggesting another potential explanation for the observations.
On the generation of double layers from ion- and electron-acoustic instabilities

DOE PAGES

Fu, Xiangrong; Cowee, Misa M.; Gary, Stephen Peter; ...

2016-03-17

A plasma double layer (DL) is a nonlinear electrostatic structure that carries a uni-polar electric field parallel to the background magnetic field due to local charge separation. Past studies showed that DLs observed in space plasmas are mostly associated with the ion acoustic instability. Recent Van Allen Probes observations of parallel electric fields traveling much faster than the ion acoustic speed have motivated a computational study to test the hypothesis that a new type of DLs – electron acoustic DLs – generated from the electron acoustic instability are responsible for these electric fields. Nonlinear particle-in-cell simulations yield negative results, i.e.more » the hypothetical electron acoustic DLs cannot be formed in a way similar to ion acoustic DLs. We find that linear theory analysis and the simulations show that the frequencies of electron acoustic waves are too high for ions to respond and maintain charge separation required by DLs. However, our results do show that local density perturbations in a two-electron-component plasma can result in unipolar-like electric fields that propagate at the electron thermal speed, suggesting another potential explanation for the observations.« less
Automatic Multilevel Parallelization Using OpenMP

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

2002-01-01

In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.
Parallel and Multivalued Logic by the Two-Dimensional Photon-Echo Response of a Rhodamine–DNA Complex

PubMed Central

2015-01-01

Implementing parallel and multivalued logic operations at the molecular scale has the potential to improve the miniaturization and efficiency of a new generation of nanoscale computing devices. Two-dimensional photon-echo spectroscopy is capable of resolving dynamical pathways on electronic and vibrational molecular states. We experimentally demonstrate the implementation of molecular decision trees, logic operations where all possible values of inputs are processed in parallel and the outputs are read simultaneously, by probing the laser-induced dynamics of populations and coherences in a rhodamine dye mounted on a short DNA duplex. The inputs are provided by the bilinear interactions between the molecule and the laser pulses, and the output values are read from the two-dimensional molecular response at specific frequencies. Our results highlights how ultrafast dynamics between multiple molecular states induced by light–matter interactions can be used as an advantage for performing complex logic operations in parallel, operations that are faster than electrical switching. PMID:25984269
Magnetic Field Would Reduce Electron Backstreaming in Ion Thrusters

NASA Technical Reports Server (NTRS)

Foster, John E.

2003-01-01

strong to impede backstreaming electrons, but not so strong as to significantly perturb ion trajectories. An electromagnet or permanent magnetic circuit can be used to impose the transverse magnetic field downstream of the accelerator-grid electrode. For example, in the case of an accelerator grid containing straight, parallel rows of apertures, one can apply nearly uniform magnetic fields across all the apertures by the use of permanent magnets of alternating polarity connected to pole pieces laid out parallel to the rows, as shown in the left part of the figure. For low-temperature operation, the pole pieces can be replaced with bar magnets of alternating polarity. Alternatively, for the same accelerator grid, one could use an electromagnet in the form of current-carrying rods laid out parallel to the rows.
Stability of tapered and parallel-walled dental implants: A systematic review and meta-analysis.

PubMed

Atieh, Momen A; Alsabeeha, Nabeel; Duncan, Warwick J

2018-05-15

Clinical trials have suggested that dental implants with a tapered configuration have improved stability at placement, allowing immediate placement and/or loading. The aim of this systematic review and meta-analysis was to evaluate the implant stability of tapered dental implants compared to standard parallel-walled dental implants. Applying the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement, randomized controlled trials (RCTs) were searched for in electronic databases and complemented by hand searching. The risk of bias was assessed using the Cochrane Collaboration's Risk of Bias tool and data were analyzed using statistical software. A total of 1199 studies were identified, of which, five trials were included with 336 dental implants in 303 participants. Overall meta-analysis showed that tapered dental implants had higher implant stability values than parallel-walled dental implants at insertion and 8 weeks but the difference was not statistically significant. Tapered dental implants had significantly less marginal bone loss compared to parallel-walled dental implants. No significant differences in implant failure rate were found between tapered and parallel-walled dental implants. There is limited evidence to demonstrate the effectiveness of tapered dental implants in achieving greater implant stability compared to parallel-walled dental implants. Superior short-term results in maintaining peri-implant marginal bone with tapered dental implants are possible. Further properly designed RCTs are required to endorse the supposed advantages of tapered dental implants in immediate loading protocol and other complex clinical scenarios. © 2018 Wiley Periodicals, Inc.
Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

PubMed Central

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
A parallel variable metric optimization algorithm

NASA Technical Reports Server (NTRS)

Straeter, T. A.

1973-01-01

An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
Genetic Parallel Programming: design and implementation.

PubMed

Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

2006-01-01

This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Parallel programming with Easy Java Simulations

NASA Astrophysics Data System (ADS)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Parallel auto-correlative statistics with VTK.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pebay, Philippe Pierre; Bennett, Janine Camille

2013-08-01

This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Neuromimetic Circuits with Synaptic Devices Based on Strongly Correlated Electron Systems

NASA Astrophysics Data System (ADS)

Ha, Sieu D.; Shi, Jian; Meroz, Yasmine; Mahadevan, L.; Ramanathan, Shriram

2014-12-01

Strongly correlated electron systems such as the rare-earth nickelates (R NiO3 , R denotes a rare-earth element) can exhibit synapselike continuous long-term potentiation and depression when gated with ionic liquids; exploiting the extreme sensitivity of coupled charge, spin, orbital, and lattice degrees of freedom to stoichiometry. We present experimental real-time, device-level classical conditioning and unlearning using nickelate-based synaptic devices in an electronic circuit compatible with both excitatory and inhibitory neurons. We establish a physical model for the device behavior based on electric-field-driven coupled ionic-electronic diffusion that can be utilized for design of more complex systems. We use the model to simulate a variety of associate and nonassociative learning mechanisms, as well as a feedforward recurrent network for storing memory. Our circuit intuitively parallels biological neural architectures, and it can be readily generalized to other forms of cellular learning and extinction. The simulation of neural function with electronic device analogs may provide insight into biological processes such as decision making, learning, and adaptation, while facilitating advanced parallel information processing in hardware.
Capabilities of Fully Parallelized MHD Stability Code MARS

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2016-10-01

Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

PubMed

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Massive parallelization of serial inference algorithms for a complex generalized linear model

PubMed Central

Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David

2014-01-01

Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Parallel Quantum Circuit in a Tunnel Junction

NASA Astrophysics Data System (ADS)

Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

2016-07-01

Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N).
Parallel Quantum Circuit in a Tunnel Junction

PubMed Central

Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

2016-01-01

Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N). PMID:27453262
Parallel Quantum Circuit in a Tunnel Junction.

PubMed

Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian

2016-07-25

Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N(2) power law is preserved for Ωab(N) and for Vab(N).
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.

Force user's manual: A portable, parallel FORTRAN

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

1990-01-01

The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.
Effective Parallel Algorithm Animation

DTIC Science & Technology

1994-03-01

parallel computer. The system incorporates the 14 Parallel Processing System us" r User User UMe PMwuM Progra Propu Plropm ýData Dots Data Daft...that produce meaningful animations. The following sections outline characteristics 146 Animation 0 71 r 40 02 I 5 * *2! 4 Idle Bu~sy Send Recv 7...Event Simulation. Technical Report, Georgia Institute of Technology, 1992. 22. Garey, Michael R . and David S. Johnson. Computers and Intractability: A
Parallel Algorithms for the Exascale Era

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robey, Robert W.

New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this workmore » has been done by undergraduates and published in leading scientific journals.« less
Parallel Energy Transport in Detached DIII-D Divertor Plasmas

NASA Astrophysics Data System (ADS)

Leonard, A. W.; Lore, J. D.; Canik, J. M.; McLean, A. G.; Makowski, M. A.

2017-10-01

A comparison of experiment and modeling of detached divertor plasmas is examined in the context of parallel energy transport. Experimental estimates of power carried by electron thermal conduction versus plasma convection are experimentally inferred from power balance measurements of radiated power and target plate heat flux combined with Thomson scattering measurements of the Te profile along the divertor leg. Experimental profiles of Te exhibit relatively low gradients with Te < 15 eV from the X-point to the target implying transport dominated by convection. In contrast, fluid modeling with SOLPS produces sharp Te gradients for Te > 3 eV, characteristic of transport dominated by electron conduction through the bulk of the divertor. This discrepancy with experimental transport dominated by convection and modeling by conduction has significant implications for the radiative capacity of divertor plasmas and may explain at least part of the difficulty for fluid modeling to obtain the experimentally observed radiative losses. Comparisons are also made for helium plasmas where the match between experiment and modeling is much better. Work supported by the US DOE under DE-FC02-04ER54698.
Relationship between field-aligned currents and inverted-V parallel potential drops observed at midaltitudes

NASA Astrophysics Data System (ADS)

Sakanoi, T.; Fukunishi, H.; Mukai, T.

1995-10-01

The inverted-V field-aligned acceleration region existing in the altitude range of several thousand kilometers plays an essential role for the magnetosphere-ionosphere coupling system. The adiabatic plasma theory predicts a linear relationship between field-aligned current density (J∥) and parallel potential drop (Φ∥), that is, J∥=KΦ∥, where K is the field-aligned conductance. We examined this relationship using the charged particle and magnetic field data obtained from the Akebono (Exos D) satellite. The potential drop above the satellite was derived from the peak energy of downward electrons, while the potential drop below the satellite was derived from two different methods: the peak energy of upward ions and the energy-dependent widening of electron loss cone. On the other hand, field-aligned current densities in the inverted-V region were estimated from the Akebono magnetometer data. Using these potential drops and field-aligned current densities, we estimated the linear field-aligned conductance KJΦ. Further, we obtained the corrected field-aligned conductance KCJΦ by applying the full Knight's formula to the current-voltage relationship. We also independently estimated the field-aligned conductance KTN from the number density and the thermal temperature of magnetospheric source electrons which were obtained by fitting accelerated Maxwellian functions for precipitating electrons. The results are summarized as follows: (1) The latitudinal dependence of parallel potential drops is characterized by a narrow V-shaped structure with a width of 0.4°-1.0°. (2) Although the inverted-V potential region exactly corresponds to the upward field aligned current region, the latitudinal dependence of upward current intensity is an inverted-U shape rather than an inverted-V shape. Thus it is suggested that the field-aligned conductance KCJΦ changes with a V-shaped latitudinal dependence. In many cases, KCJΦ values at the edge of the inverted-V region are
Learning in Parallel: Using Parallel Corpora to Enhance Written Language Acquisition at the Beginning Level

ERIC Educational Resources Information Center

Bluemel, Brody

2014-01-01

This article illustrates the pedagogical value of incorporating parallel corpora in foreign language education. It explores the development of a Chinese/English parallel corpus designed specifically for pedagogical application. The corpus tool was created to aid language learners in reading comprehension and writing development by making foreign…
Collisions between quasi-parallel shocks

NASA Technical Reports Server (NTRS)

Cargill, Peter J.

1991-01-01

The collision between pairs of quasi-parallel shocks is examined using hybrid numerical simulations. In the interaction, the two shocks are transmitted through each other leaving behind a hot plasma with a population of particles with energies in excess of 40 E0, where E0 is the kinetic energy of particles in the shock frame prior to the collision. The energization is more efficient for quasi-parallel shocks than parallel shocks. Collisions between shocks of equal strengths are more efficient than those that are unequal. The results are of importance for phenomena during the impulsive phase of solar flares, in the distant solar wind and at planetary bow shocks.
Electron Scattering by High-frequency Whistler Waves at Earth’s Bow Shock

NASA Astrophysics Data System (ADS)

Oka, M.; Wilson, L. B., III; Phan, T. D.; Hull, A. J.; Amano, T.; Hoshino, M.; Argall, M. R.; Le Contel, O.; Agapitov, O.; Gershman, D. J.; Khotyaintsev, Y. V.; Burch, J. L.; Torbert, R. B.; Pollock, C.; Dorelli, J. C.; Giles, B. L.; Moore, T. E.; Saito, Y.; Avanov, L. A.; Paterson, W.; Ergun, R. E.; Strangeway, R. J.; Russell, C. T.; Lindqvist, P. A.

2017-06-01

Electrons are accelerated to non-thermal energies at shocks in space and astrophysical environments. While different mechanisms of electron acceleration have been proposed, it remains unclear how non-thermal electrons are produced out of the thermal plasma pool. Here, we report in situ evidence of pitch-angle scattering of non-thermal electrons by whistler waves at Earth’s bow shock. On 2015 November 4, the Magnetospheric Multiscale (MMS) mission crossed the bow shock with an Alfvén Mach number ˜11 and a shock angle ˜84°. In the ramp and overshoot regions, MMS revealed bursty enhancements of non-thermal (0.5-2 keV) electron flux, correlated with high-frequency (0.2-0.4 {{{Ω }}}{ce}, where {{{Ω }}}{ce} is the cyclotron frequency) parallel-propagating whistler waves. The electron velocity distribution (measured at 30 ms cadence) showed an enhanced gradient of phase-space density at and around the region where the electron velocity component parallel to the magnetic field matched the resonant energy inferred from the wave frequency range. The flux of 0.5 keV electrons (measured at 1 ms cadence) showed fluctuations with the same frequency. These features indicate that non-thermal electrons were pitch-angle scattered by cyclotron resonance with the high-frequency whistler waves. However, the precise role of the pitch-angle scattering by the higher-frequency whistler waves and possible nonlinear effects in the electron acceleration process remains unclear.
Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

ERIC Educational Resources Information Center

Agarwal, Mayank

2009-01-01

The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…
Bayer image parallel decoding based on GPU

NASA Astrophysics Data System (ADS)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Ultralow-Power Electronic Trapping of Nanoparticles with Sub-10 nm Gold Nanogap Electrodes.

PubMed

Barik, Avijit; Chen, Xiaoshu; Oh, Sang-Hyun

2016-10-12

We demonstrate nanogap electrodes for rapid, parallel, and ultralow-power trapping of nanoparticles. Our device pushes the limit of dielectrophoresis by shrinking the separation between gold electrodes to sub-10 nm, thereby creating strong trapping forces at biases as low as the 100 mV ranges. Using high-throughput atomic layer lithography, we manufacture sub-10 nm gaps between 0.8 mm long gold electrodes and pattern them into individually addressable parallel electronic traps. Unlike pointlike junctions made by electron-beam lithography or larger micron-gap electrodes that are used for conventional dielectrophoresis, our sub-10 nm gold nanogap electrodes provide strong trapping forces over a mm-scale trapping zone. Importantly, our technology solves the key challenges associated with traditional dielectrophoresis experiments, such as high voltages that cause heat generation, bubble formation, and unwanted electrochemical reactions. The strongly enhanced fields around the nanogap induce particle-transport speed exceeding 10 μm/s and enable the trapping of 30 nm polystyrene nanoparticles using an ultralow bias of 200 mV. We also demonstrate rapid electronic trapping of quantum dots and nanodiamond particles on arrays of parallel traps. Our sub-10 nm gold nanogap electrodes can be combined with plasmonic sensors or nanophotonic circuitry, and their low-power electronic operation can potentially enable high-density integration on a chip as well as portable biosensing.
Parallel machine architecture and compiler design facilities

NASA Technical Reports Server (NTRS)

Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex

1990-01-01

The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Parallel Decomposition of the Fictitious Lagrangian Algorithm and its Accuracy for Molecular Dynamics Simulations of Semiconductors.

NASA Astrophysics Data System (ADS)

Yeh, Mei-Ling

We have performed a parallel decomposition of the fictitious Lagrangian method for molecular dynamics with tight-binding total energy expression into the hypercube computer. This is the first time in literature that the dynamical simulation of semiconducting systems containing more than 512 silicon atoms has become possible with the electrons treated as quantum particles. With the utilization of the Intel Paragon system, our timing analysis predicts that our code is expected to perform realistic simulations on very large systems consisting of thousands of atoms with time requirements of the order of tens of hours. Timing results and performance analysis of our parallel code are presented in terms of calculation time, communication time, and setup time. The accuracy of the fictitious Lagrangian method in molecular dynamics simulation is also investigated, especially the energy conservation of the total energy of ions. We find that the accuracy of the fictitious Lagrangian scheme in small silicon cluster and very large silicon system simulations is good for as long as the simulations proceed, even though we quench the electronic coordinates to the Born-Oppenheimer surface only in the beginning of the run. The kinetic energy of electrons does not increase as time goes on, and the energy conservation of the ionic subsystem remains very good. This means that, as far as the ionic subsystem is concerned, the electrons are on the average in the true quantum ground states. We also tie up some odds and ends regarding a few remaining questions about the fictitious Lagrangian method, such as the difference between the results obtained from the Gram-Schmidt and SHAKE method of orthonormalization, and differences between simulations where the electrons are quenched to the Born -Oppenheimer surface only once compared with periodic quenching.
Parallel Computing Using Web Servers and "Servlets".

ERIC Educational Resources Information Center

Lo, Alfred; Bloor, Chris; Choi, Y. K.

2000-01-01

Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
On high-latitude convection field inhomogeneities, parallel electric fields and inverted-V precipitation events

NASA Technical Reports Server (NTRS)

Lennartsson, W.

1977-01-01

A simple model of a static electric field with a component parallel to the magnetic field is proposed for calculating the electric field and current distributions at various altitudes when the horizontal distribution of the convection electric field is given at a certain altitude above the auroral ionosphere. The model is shown to be compatible with satellite observations of inverted-V electron precipitation structures and associated irregularities in the convection electric field.
Linearly exact parallel closures for slab geometry

NASA Astrophysics Data System (ADS)

Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

2013-08-01

Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

PubMed

Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

2009-01-01

Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Implementing Shared Memory Parallelism in MCBEND

NASA Astrophysics Data System (ADS)

Bird, Adam; Long, David; Dobson, Geoff

2017-09-01

MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Configuration affects parallel stent grafting results.

PubMed

Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L

2018-05-01

A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31

Two-stage Electron Acceleration by 3D Collisionless Guide-field Magnetic Reconnection

NASA Astrophysics Data System (ADS)

Buechner, J.; Munoz, P.

2017-12-01

We discuss a two-stage process of electron acceleration near X-lines of 3D collisionless guide-field magnetic reconnection. Non-relativistic electrons are first pre-accelerated by magnetic-field-aligned (parallel) electric fields. At the nonlinear stage of 3D guide-field magnetic reconnection electric and magnetic fields become filamentary structured due to streaming instabilities. This causes an additional curvature-driven electron acceleration in the guide-field direction. The resulting spectrum of the accelerated electrons follows a power law.
A conservative scheme for electromagnetic simulation of magnetized plasmas with kinetic electrons

NASA Astrophysics Data System (ADS)

Bao, J.; Lin, Z.; Lu, Z. X.

2018-02-01

A conservative scheme has been formulated and verified for gyrokinetic particle simulations of electromagnetic waves and instabilities in magnetized plasmas. An electron continuity equation derived from the drift kinetic equation is used to time advance the electron density perturbation by using the perturbed mechanical flow calculated from the parallel vector potential, and the parallel vector potential is solved by using the perturbed canonical flow from the perturbed distribution function. In gyrokinetic particle simulations using this new scheme, the shear Alfvén wave dispersion relation in the shearless slab and continuum damping in the sheared cylinder have been recovered. The new scheme overcomes the stringent requirement in the conventional perturbative simulation method that perpendicular grid size needs to be as small as electron collisionless skin depth even for the long wavelength Alfvén waves. The new scheme also avoids the problem in the conventional method that an unphysically large parallel electric field arises due to the inconsistency between electrostatic potential calculated from the perturbed density and vector potential calculated from the perturbed canonical flow. Finally, the gyrokinetic particle simulations of the Alfvén waves in sheared cylinder have superior numerical properties compared with the fluid simulations, which suffer from numerical difficulties associated with singular mode structures.
Radiation Hardened Electronics for Extreme Environments

NASA Technical Reports Server (NTRS)

Keys, Andrew S.; Watson, Michael D.

2007-01-01

The Radiation Hardened Electronics for Space Environments (RHESE) project consists of a series of tasks designed to develop and mature a broad spectrum of radiation hardened and low temperature electronics technologies. Three approaches are being taken to address radiation hardening: improved material hardness, design techniques to improve radiation tolerance, and software methods to improve radiation tolerance. Within these approaches various technology products are being addressed including Field Programmable Gate Arrays (FPGA), Field Programmable Analog Arrays (FPAA), MEMS Serial Processors, Reconfigurable Processors, and Parallel Processors. In addition to radiation hardening, low temperature extremes are addressed with a focus on material and design approaches.
Matching pursuit parallel decomposition of seismic data

NASA Astrophysics Data System (ADS)

Li, Chuanhui; Zhang, Fanchang

2017-07-01

In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
Parallel tempering for the traveling salesman problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Percus, Allon; Wang, Richard; Hyman, Jeffrey

We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulationsmore » are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.« less
On the generation of double layers from ion- and electron-acoustic instabilities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fu, Xiangrong, E-mail: xrfu@lanl.gov; Cowee, Misa M.; Winske, Dan

2016-03-15

A plasma double layer (DL) is a nonlinear electrostatic structure that carries a uni-polar electric field parallel to the background magnetic field due to local charge separation. Past studies showed that DLs observed in space plasmas are mostly associated with the ion acoustic instability. Recent Van Allen Probes observations of parallel electric field structures traveling much faster than the ion acoustic speed have motivated a computational study to test the hypothesis that a new type of DLs—electron acoustic DLs—generated from the electron acoustic instability are responsible for these electric fields. Nonlinear particle-in-cell simulations yield negative results, i.e., the hypothetical electronmore » acoustic DLs cannot be formed in a way similar to ion acoustic DLs. Linear theory analysis and the simulations show that the frequencies of electron acoustic waves are too high for ions to respond and maintain charge separation required by DLs. However, our results do show that local density perturbations in a two-electron-component plasma can result in unipolar-like electric field structures that propagate at the electron thermal speed, suggesting another potential explanation for the observations.« less
Parallel evolutionary computation in bioinformatics applications.

PubMed

Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

2013-05-01

A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
Parallel MR imaging: a user's guide.

PubMed

Glockner, James F; Hu, Houchun H; Stanley, David W; Angelos, Lisa; King, Kevin

2005-01-01

Parallel imaging is a recently developed family of techniques that take advantage of the spatial information inherent in phased-array radiofrequency coils to reduce acquisition times in magnetic resonance imaging. In parallel imaging, the number of sampled k-space lines is reduced, often by a factor of two or greater, thereby significantly shortening the acquisition time. Parallel imaging techniques have only recently become commercially available, and the wide range of clinical applications is just beginning to be explored. The potential clinical applications primarily involve reduction in acquisition time, improved spatial resolution, or a combination of the two. Improvements in image quality can be achieved by reducing the echo train lengths of fast spin-echo and single-shot fast spin-echo sequences. Parallel imaging is particularly attractive for cardiac and vascular applications and will likely prove valuable as 3-T body and cardiovascular imaging becomes part of standard clinical practice. Limitations of parallel imaging include reduced signal-to-noise ratio and reconstruction artifacts. It is important to consider these limitations when deciding when to use these techniques. (c) RSNA, 2005.
Relative Debugging of Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

2002-01-01

We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.
PCLIPS: Parallel CLIPS

NASA Technical Reports Server (NTRS)

Gryphon, Coranth D.; Miller, Mark D.

1991-01-01

PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.
Multitasking TORT under UNICOS: Parallel performance models and measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barnett, A.; Azmy, Y.Y.

1999-09-27

The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azmy, Y.Y.; Barnett, D.A.

1999-09-27

The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.
Lineation-parallel c-axis Fabric of Quartz Formed Under Water-rich Conditions

NASA Astrophysics Data System (ADS)

Wang, Y.; Zhang, J.; Li, P.

2014-12-01

The crystallographic preferred orientation (CPO) of quartz is of great significance because it records much valuable information pertinent to the deformation of quartz-rich rocks in the continental crust. The lineation-parallel c-axis CPO (i.e., c-axis forming a maximum parallel to the lineation) in naturally deformed quartz is generally considered to form under high temperature (> ~550 ºC) conditions. However, most laboratory deformation experiments on quartzite failed to produce such a CPO at high temperatures up to 1200 ºC. Here we reported a new occurrence of the lineation-parallel c-axis CPO of quartz from kyanite-quartz veins in eclogite. Optical microstructural observations, fourier transform infrared (FTIR) and electron backscattered diffraction (EBSD) techniques were integrated to illuminate the nature of quartz CPOs. Quartz exhibits mostly straight to slightly curved grain boundaries, modest intracrystalline plasticity, and significant shape preferred orientation (SPO) and CPOs, indicating dislocation creep dominated the deformation of quartz. Kyanite grains in the veins are mostly strain-free, suggestive of their higher strength than quartz. The pronounced SPO and CPOs in kyanite were interpreted to originate from anisotropic crystal growth and/or mechanical rotation during vein-parallel shearing. FTIR results show quartz contains a trivial amount of structurally bound water (several tens of H/106 Si), while kyanite has a water content of 384-729 H/106 Si; however, petrographic observations suggest quartz from the veins were practically deformed under water-rich conditions. We argue that the observed lineation-parallel c-axis fabric in quartz was inherited from preexisting CPOs as a result of anisotropic grain growth under stress facilitated by water, but rather than due to a dominant c-slip. The preservation of the quartz CPOs probably benefited from the preexisting quartz CPOs which renders most quartz grains unsuitably oriented for an easy a-slip at
Parallel Processing at the High School Level.

ERIC Educational Resources Information Center

Sheary, Kathryn Anne

This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…
Simulation Exploration through Immersive Parallel Planes: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, eachmore » individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.« less
Parallel integrated frame synchronizer chip

NASA Technical Reports Server (NTRS)

Solomon, Jeffrey Michael (Inventor); Ghuman, Parminder Singh (Inventor); Bennett, Toby Dennis (Inventor)

2000-01-01

A parallel integrated frame synchronizer which implements a sequential pipeline process wherein serial data in the form of telemetry data or weather satellite data enters the synchronizer by means of a front-end subsystem and passes to a parallel correlator subsystem or a weather satellite data processing subsystem. When in a CCSDS mode, data from the parallel correlator subsystem passes through a window subsystem, then to a data alignment subsystem and then to a bit transition density (BTD)/cyclical redundancy check (CRC) decoding subsystem. Data from the BTD/CRC decoding subsystem or data from the weather satellite data processing subsystem is then fed to an output subsystem where it is output from a data output port.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Massively parallel multicanonical simulations

NASA Astrophysics Data System (ADS)

Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

2018-03-01

Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Two-Electron Transfer Pathways.

PubMed

Lin, Jiaxing; Balamurugan, D; Zhang, Peng; Skourtis, Spiros S; Beratan, David N

2015-06-18

The frontiers of electron-transfer chemistry demand that we develop theoretical frameworks to describe the delivery of multiple electrons, atoms, and ions in molecular systems. When electrons move over long distances through high barriers, where the probability for thermal population of oxidized or reduced bridge-localized states is very small, the electrons will tunnel from the donor (D) to acceptor (A), facilitated by bridge-mediated superexchange interactions. If the stable donor and acceptor redox states on D and A differ by two electrons, it is possible that the electrons will propagate coherently from D to A. While structure-function relations for single-electron superexchange in molecules are well established, strategies to manipulate the coherent flow of multiple electrons are largely unknown. In contrast to one-electron superexchange, two-electron superexchange involves both one- and two-electron virtual intermediate states, the number of virtual intermediates increases very rapidly with system size, and multiple classes of pathways interfere with one another. In the study described here, we developed simple superexchange models for two-electron transfer. We explored how the bridge structure and energetics influence multielectron superexchange, and we compared two-electron superexchange interactions to single-electron superexchange. Multielectron superexchange introduces interference between singly and doubly oxidized (or reduced) bridge virtual states, so that even simple linear donor-bridge-acceptor systems have pathway topologies that resemble those seen for one-electron superexchange through bridges with multiple parallel pathways. The simple model systems studied here exhibit a richness that is amenable to experimental exploration by manipulating the multiple pathways, pathway crosstalk, and changes in the number of donor and acceptor species. The features that emerge from these studies may assist in developing new strategies to deliver multiple

Self-consistent quasi-static parallel electric field associated with substorm growth phase

NASA Astrophysics Data System (ADS)

Le Contel, O.; Pellat, R.; Roux, A.

2000-06-01

energy and/or small ky, while the second regime (ωd>ω) is adapted to large energies and/or large ky. In particular, in the limit ωd<ω and |vd|<|uy|, where uy is the diamagnetic velocity proportional to the pressure gradient, we find a parallel electric field proportional to the pressure gradient and directed toward the ionosphere in the dusk sector and toward the equator in the dawn sector. This parallel electric field corresponds to a potential drop of a few hundred volts that can accelerate electrons and produce a differential drift between electrons and ions.
Introducing parallelism to histogramming functions for GEM systems

NASA Astrophysics Data System (ADS)

Krawczyk, Rafał D.; Czarski, Tomasz; Kolasinski, Piotr; Pozniak, Krzysztof T.; Linczuk, Maciej; Byszuk, Adrian; Chernyshova, Maryna; Juszczyk, Bartlomiej; Kasprowicz, Grzegorz; Wojenski, Andrzej; Zabolotny, Wojciech

2015-09-01

This article is an assessment of potential parallelization of histogramming algorithms in GEM detector system. Histogramming and preprocessing algorithms in MATLAB were analyzed with regard to adding parallelism. Preliminary implementation of parallel strip histogramming resulted in speedup. Analysis of algorithms parallelizability is presented. Overview of potential hardware and software support to implement parallel algorithm is discussed.
Code Parallelization with CAPO: A User Manual

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

2001-01-01

A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
Plasma and energetic particle structure upstream of a quasi-parallel interplanetary shock

NASA Technical Reports Server (NTRS)

Kennel, C. F.; Scarf, F. L.; Coroniti, F. V.; Russell, C. T.; Wenzel, K.-P.; Sanderson, T. R.; Van Nes, P.; Smith, E. J.; Tsurutani, B. T.; Scudder, J. D.

1984-01-01

ISEE 1, 2 and 3 data from 1978 on interplanetary magnetic fields, shock waves and particle energetics are examined to characterize a quasi-parallel shock. The intense shock studied exhibited a 640 km/sec velocity. The data covered 1-147 keV protons and electrons and ions with energies exceeding 30 keV in regions both upstream and downstream of the shock, and also the magnitudes of ion-acoustic and MHD waves. The energetic particles and MHD waves began being detected 5 hr before the shock. Intense halo electron fluxes appeared ahead of the shock. A closed magnetic field structure was produced with a front end 700 earth radii from the shock. The energetic protons were cut off from the interior of the magnetic bubble, which contained a markedly increased density of 2-6 keV protons as well as the shock itself.
Parallelizing Timed Petri Net simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1993-01-01

The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.
Parallel fabrication of macroporous scaffolds.

PubMed

Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal

2018-07-01

Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
Adaptive parallel logic networks

NASA Technical Reports Server (NTRS)

Martinez, Tony R.; Vidal, Jacques J.

1988-01-01

Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Event parallelism: Distributed memory parallel computing for high energy physics experiments

NASA Astrophysics Data System (ADS)

Nash, Thomas

1989-12-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Brian B; Purba, Victor; Jafarpour, Saber

Given that next-generation infrastructures will contain large numbers of grid-connected inverters and these interfaces will be satisfying a growing fraction of system load, it is imperative to analyze the impacts of power electronics on such systems. However, since each inverter model has a relatively large number of dynamic states, it would be impractical to execute complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the pointmore » of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. That is, we show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as an individual inverter in the paralleled system. Numerical simulations validate the reduced-order models.« less
Increasing processor utilization during parallel computation rundown

NASA Technical Reports Server (NTRS)

Jones, W. H.

1986-01-01

Some parallel processing environments provide for asynchronous execution and completion of general purpose parallel computations from a single computational phase. When all the computations from such a phase are complete, a new parallel computational phase is begun. Depending upon the granularity of the parallel computations to be performed, there may be a shortage of available work as a particular computational phase draws to a close (computational rundown). This can result in the waste of computing resources and the delay of the overall problem. In many practical instances, strict sequential ordering of phases of parallel computation is not totally required. In such cases, the beginning of one phase can be correctly computed before the end of a previous phase is completed. This allows additional work to be generated somewhat earlier to keep computing resources busy during each computational rundown. The conditions under which this can occur are identified and the frequency of occurrence of such overlapping in an actual parallel Navier-Stokes code is reported. A language construct is suggested and possible control strategies for the management of such computational phase overlapping are discussed.
Kinetic Alfven turbulence: Electron and ion heating by particle-in-cell simulations

NASA Astrophysics Data System (ADS)

Gary, S. P.; Hughes, R. S.; Wang, J.; Parashar, T. N.

2017-12-01

Three-dimensional particle-in-cell simulations of the forward cascade of decaying kinetic Alfvén turbulence have been carried out as an initial-value problem on a collisionless, homogeneous, magnetized, electron-ion plasma model with betae = betai =0.50 and mi/me=100 where subscripts e and i represent electrons and ions respectively. Initial anisotropic narrowband spectra of relatively long wavelength modes with approximately gyrotropic distributions in kperp undergo a forward cascade to broadband spectra of magnetic fluctuations at shorter wavelengths. Maximum electron and ion heating rates are computed as functions of the initial fluctuating magnetic field energy density eo on the range 0.05 < eo < 0.50. In contrast to dissipation by whistler turbulence, the maximum ion heating rate due to kinetic Alfvén turbulence is substantially greater than the maximum electron heating rate. Furthermore, ion heating as well as electron heating due to kinetic Alfvén turbulence scale approximately with eo. Finally, electron heating leads to anisotropies of the type T||e> Tperpe where the parallel and perpendicular symbols refer to directions parallel and perpendicular, respectively, to the background magnetic field, whereas the heated ions remain relatively isotropic. This implies that, for the range of eo values considered, the Landau wave-particle resonance is a likely heating mechanism for the electrons and may also contribute to ion heating.
On the interplay of morphology and electronic conductivity of rotationally spun carbon fiber mats

NASA Astrophysics Data System (ADS)

Opitz, Martin; Go, Dennis; Lott, Philipp; Müller, Sandra; Stollenwerk, Jochen; Kuehne, Alexander J. C.; Roling, Bernhard

2017-09-01

Carbon-based materials are used as electrode materials in a wide range of electrochemical applications, e.g., in batteries, supercapacitors, and fuel cells. For these applications, the electronic conductivity of the materials plays an important role. Currently, porous carbon materials with complex morphologies and hierarchical pore structures are in the focus of research. The complex morphologies influence the electronic transport and may lead to an anisotropic electronic conductivity. In this paper, we unravel the influence of the morphology of rotationally spun carbon fiber mats on their electronic conductivity. By combining experiments with finite-element simulations, we compare and evaluate different electrode setups for conductivity measurements. While the "bar-type method" with two parallel electrodes on the same face of the sample yields information about the intrinsic conductivity of the carbon fibers, the "parallel-plate method" with two electrodes on opposite faces gives information about the electronic transport orthogonal to the faces. Results obtained for the van-der-Pauw method suggest that this method is not well suited for understanding morphology-transport relations in these materials.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Organization of the channel-switching process in parallel computer systems based on a matrix optical switch

NASA Technical Reports Server (NTRS)

Golomidov, Y. V.; Li, S. K.; Popov, S. A.; Smolov, V. B.

1986-01-01

After a classification and analysis of electronic and optoelectronic switching devices, the design principles and structure of a matrix optical switch is described. The switching and pair-exclusion operations in this type of switch are examined, and a method for the optical switching of communication channels is elaborated. Finally, attention is given to the structural organization of a parallel computer system with a matrix optical switch.
Electron Heating at Kinetic Scales in Magnetosheath Turbulence

NASA Technical Reports Server (NTRS)

Chasapis, Alexandros; Matthaeus, W. H.; Parashar, T. N.; Lecontel, O.; Retino, A.; Breuillard, H.; Khotyaintsev, Y.; Vaivads, A.; Lavraud, B.; Eriksson, E.;

2017-01-01

We present a statistical study of coherent structures at kinetic scales, using data from the Magnetospheric Multiscale mission in the Earths magnetosheath. We implemented the multi-spacecraft partial variance of increments (PVI) technique to detect these structures, which are associated with intermittency at kinetic scales. We examine the properties of the electron heating occurring within such structures. We find that, statistically, structures with a high PVI index are regions of significant electron heating. We also focus on one such structure, a current sheet, which shows some signatures consistent with magnetic reconnection. Strong parallel electron heating coincides with whistler emissions at the edges of the current sheet.

Parallel computations and control of adaptive structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

1991-01-01

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Storm Time Evolution of Outer Radiation Belt Relativistic Electrons by a Nearly Continuous Distribution of Chorus

NASA Astrophysics Data System (ADS)

Yang, Chang; Xiao, Fuliang; He, Yihua; Liu, Si; Zhou, Qinghua; Guo, Mingyue; Zhao, Wanli

2018-03-01

During the 13-14 November 2012 storm, Van Allen Probe A simultaneously observed a 10 h period of enhanced chorus (including quasi-parallel and oblique propagation components) and relativistic electron fluxes over a broad range of L = 3-6 and magnetic local time = 2-10 within a complete orbit cycle. By adopting a Gaussian fit to the observed wave spectra, we obtain the wave parameters and calculate the bounce-averaged diffusion coefficients. We solve the Fokker-Planck diffusion equation to simulate flux evolutions of relativistic (1.8-4.2 MeV) electrons during two intervals when Probe A passed the location L = 4.3 along its orbit. The simulating results show that chorus with combined quasi-parallel and oblique components can produce a more pronounced flux enhancement in the pitch angle range ˜45°-80°, consistent well with the observation. The current results provide the first evidence on how relativistic electron fluxes vary under the drive of almost continuously distributed chorus with both quasi-parallel and oblique components within a complete orbit of Van Allen Probe.
A parallel adaptive mesh refinement algorithm

NASA Technical Reports Server (NTRS)

Quirk, James J.; Hanebutte, Ulf R.

1993-01-01

Over recent years, Adaptive Mesh Refinement (AMR) algorithms which dynamically match the local resolution of the computational grid to the numerical solution being sought have emerged as powerful tools for solving problems that contain disparate length and time scales. In particular, several workers have demonstrated the effectiveness of employing an adaptive, block-structured hierarchical grid system for simulations of complex shock wave phenomena. Unfortunately, from the parallel algorithm developer's viewpoint, this class of scheme is quite involved; these schemes cannot be distilled down to a small kernel upon which various parallelizing strategies may be tested. However, because of their block-structured nature such schemes are inherently parallel, so all is not lost. In this paper we describe the method by which Quirk's AMR algorithm has been parallelized. This method is built upon just a few simple message passing routines and so it may be implemented across a broad class of MIMD machines. Moreover, the method of parallelization is such that the original serial code is left virtually intact, and so we are left with just a single product to support. The importance of this fact should not be underestimated given the size and complexity of the original algorithm.
Equalizer: a scalable parallel rendering framework.

PubMed

Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato

2009-01-01

Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.

Broadcasting a message in a parallel computer

DOEpatents

Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

2011-08-02

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Electron acceleration by surface plasma waves in double metal surface structure

NASA Astrophysics Data System (ADS)

Liu, C. S.; Kumar, Gagan; Singh, D. B.; Tripathi, V. K.

2007-12-01

Two parallel metal sheets, separated by a vacuum region, support a surface plasma wave whose amplitude is maximum on the two parallel interfaces and minimum in the middle. This mode can be excited by a laser using a glass prism. An electron beam launched into the middle region experiences a longitudinal ponderomotive force due to the surface plasma wave and gets accelerated to velocities of the order of phase velocity of the surface wave. The scheme is viable to achieve beams of tens of keV energy. In the case of a surface plasma wave excited on a single metal-vacuum interface, the field gradient normal to the interface pushes the electrons away from the high field region, limiting the acceleration process. The acceleration energy thus achieved is in agreement with the experimental observations.
Propagation of electromagnetic waves parallel to the magnetic field in the nightside Venus ionosphere

NASA Technical Reports Server (NTRS)

Huba, J. D.; Rowland, H. L.

1993-01-01

The propagation of electromagnetic waves parallel to the magnetic field in the nightside Venus ionosphere is presented in a theoretical and numerical analysis. The model assumes a source of electromagnetic radiation in the Venus atmosphere, such as that produced by lightning. Specifically addressed is wave propagation in the altitude range z = 130-160 km at the four frequencies detectable by the Pioneer Venus Orbiter Electric Field Detector: 100 Hz, 730 Hz, 5.4 kHz, and 30 kHz. Parameterizations of the wave intensities, peak electron density, and Poynting flux as a function of magnetic field are presented. The waves are found to propagate most easily in conditions of low electron density and high magnetic field. The results of the model are consistent with observational data.
Electron/ion whistler instabilities and magnetic noise bursts

NASA Technical Reports Server (NTRS)

Akimoto, K.; Gary, S. Peter; Omidi, N.

1987-01-01

Two whistler instabilities are investigated by means of the linear Vlasov dispersion equation. They are called the electron/ion parallel and oblique whistler instabilities, and are driven by electron/ion relative drifts along the magnetic field. It is demonstrated that the enhanced fluctuations from these instabilities can explain several properties of magnetic noise bursts in and near the plasma sheet in the presence of ion beams and/or field-aligned currents. At sufficiently high plasma beta, these instabilities may affect the current system in the magnetotail.
The AIS-5000 parallel processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schmitt, L.A.; Wilson, S.S.

1988-05-01

The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In thismore » paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.« less
Fast I/O for Massively Parallel Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew T.

1996-01-01

The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.
Parallel Algorithms for Groebner-Basis Reduction

DTIC Science & Technology

1987-09-25

22209 ELEMENT NO. NO. NO. ACCESSION NO. 11. TITLE (Include Security Classification) * PARALLEL ALGORITHMS FOR GROEBNER -BASIS REDUCTION 12. PERSONAL...All other editions are obsolete. Productivity Engineering in the UNIXt Environment p Parallel Algorithms for Groebner -Basis Reduction Technical Report
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
A conservative scheme of drift kinetic electrons for gyrokinetic simulation of kinetic-MHD processes in toroidal plasmas

NASA Astrophysics Data System (ADS)

Bao, J.; Liu, D.; Lin, Z.

2017-10-01

A conservative scheme of drift kinetic electrons for gyrokinetic simulations of kinetic-magnetohydrodynamic processes in toroidal plasmas has been formulated and verified. Both vector potential and electron perturbed distribution function are decomposed into adiabatic part with analytic solution and non-adiabatic part solved numerically. The adiabatic parallel electric field is solved directly from the electron adiabatic response, resulting in a high degree of accuracy. The consistency between electrostatic potential and parallel vector potential is enforced by using the electron continuity equation. Since particles are only used to calculate the non-adiabatic response, which is used to calculate the non-adiabatic vector potential through Ohm's law, the conservative scheme minimizes the electron particle noise and mitigates the cancellation problem. Linear dispersion relations of the kinetic Alfvén wave and the collisionless tearing mode in cylindrical geometry have been verified in gyrokinetic toroidal code simulations, which show that the perpendicular grid size can be larger than the electron collisionless skin depth when the mode wavelength is longer than the electron skin depth.
Critical role of electron heat flux on Bohm criterion

DOE PAGES

Tang, Xianzhu; Guo, Zehua

2016-12-05

Bohm criterion, originally derived for an isothermal-electron and cold-ion plasma, is often used as a rule of thumb for more general plasmas. Here, we establish a more precise determination of the Bohm criterion that are quantitatively useful for understanding and modeling collisional plasmas that still have collisional mean-free-path much greater than plasma Debye length. Specifically, it is shown that electron heat flux, rather than the isothermal electron assumption, is what sets the Bohm speed to bemore » $$\\sqrt{k_B(T_e||+3T_i||)/m_i}$$ with T e,i∥ the electron and ion parallel temperature at the sheath entrance and m i the ion mass.« less
Critical role of electron heat flux on Bohm criterion

NASA Astrophysics Data System (ADS)

Tang, Xian-Zhu; Guo, Zehua

2016-12-01

Bohm criterion, originally derived for an isothermal-electron and cold-ion plasma, is often used as a rule of thumb for more general plasmas. Here, we establish a more precise determination of the Bohm criterion that are quantitatively useful for understanding and modeling collisional plasmas that still have collisional mean-free-path much greater than plasma Debye length. Specifically, it is shown that electron heat flux, rather than the isothermal electron assumption, is what sets the Bohm speed to be √{ k B ( T e ∥ + 3 T i ∥ ) / m i } with T e , i ∥ the electron and ion parallel temperature at the sheath entrance and mi the ion mass.
Phase-space dynamics of runaway electrons in magnetic fields

DOE PAGES

Guo, Zehua; McDevitt, Christopher Joseph; Tang, Xian-Zhu

2017-02-16

Dynamics of runaway electrons in magnetic fields are governed by the competition of three dominant physics: parallel electric field acceleration, Coulomb collision, and synchrotron radiation. Examination of the energy and pitch-angle flows reveals that the presence of local vortex structure and global circulation is crucial to the saturation of primary runaway electrons. Models for the vortex structure, which has an O-point to X-point connection, and the bump of runaway electron distribution in energy space have been developed and compared against the simulation data. Lastly, identification of these velocity-space structures opens a new venue to re-examine the conventional understanding of runawaymore » electron dynamics in magnetic fields.« less
Address tracing for parallel machines

NASA Technical Reports Server (NTRS)

Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

1991-01-01

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Low inductance power electronics assembly

DOEpatents

Herron, Nicholas Hayden; Mann, Brooks S.; Korich, Mark D.; Chou, Cindy; Tang, David; Carlson, Douglas S.; Barry, Alan L.

2012-10-02

A power electronics assembly is provided. A first support member includes a first plurality of conductors. A first plurality of power switching devices are coupled to the first support member. A first capacitor is coupled to the first support member. A second support member includes a second plurality of conductors. A second plurality of power switching devices are coupled to the second support member. A second capacitor is coupled to the second support member. The first and second pluralities of conductors, the first and second pluralities of power switching devices, and the first and second capacitors are electrically connected such that the first plurality of power switching devices is connected in parallel with the first capacitor and the second capacitor and the second plurality of power switching devices is connected in parallel with the second capacitor and the first capacitor.
INVITED TOPICAL REVIEW: Parallel magnetic resonance imaging

NASA Astrophysics Data System (ADS)

Larkman, David J.; Nunes, Rita G.

2007-04-01

Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed.
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Parallel adaptive wavelet collocation method for PDEs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu

2015-10-01

A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

PubMed

Bellucci, Michael A; Coker, David F

2011-07-28

We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
Temperature Control with Two Parallel Small Loop Heat Pipes for GLM Program

NASA Technical Reports Server (NTRS)

Khrustalev, Dmitry; Stouffer, Chuck; Ku, Jentung; Hamilton, Jon; Anderson, Mark

2014-01-01

The concept of temperature control of an electronic component using a single Loop Heat Pipe (LHP) is well established for Aerospace applications. Using two LHPs is often desirable for redundancy/reliability reasons or for increasing the overall heat source-sink thermal conductance. This effort elaborates on temperature controlling operation of a thermal system that includes two small ammonia LHPs thermally coupled together at the evaporator end as well as at the condenser end and operating "in parallel". A transient model of the LHP system was developed on the Thermal Desktop (TradeMark) platform to understand some fundamental details of such parallel operation of the two LHPs. Extensive thermal-vacuum testing was conducted with two thermally coupled LHPs operating simultaneously as well as with only one LHP operating at a time. This paper outlines the temperature control procedures for two LHPs operating simultaneously with widely varying sink temperatures. The test data obtained during the thermal-vacuum testing, with both LHPs running simultaneously in comparison with only one LHP operating at a time, are presented with detailed explanations.
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

NASA Astrophysics Data System (ADS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.

Parallel Activation in Bilingual Phonological Processing

ERIC Educational Resources Information Center

Lee, Su-Yeon

2011-01-01

In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…
Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

NASA Astrophysics Data System (ADS)

Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

2010-12-01

Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer
Mn-silicide nanostructures aligned on massively parallel silicon nano-ribbons

NASA Astrophysics Data System (ADS)

De Padova, Paola; Ottaviani, Carlo; Ronci, Fabio; Colonna, Stefano; Olivieri, Bruno; Quaresima, Claudio; Cricenti, Antonio; Dávila, Maria E.; Hennies, Franz; Pietzsch, Annette; Shariati, Nina; Le Lay, Guy

2013-01-01

The growth of Mn nanostructures on a 1D grating of silicon nano-ribbons is investigated at atomic scale by means of scanning tunneling microscopy, low energy electron diffraction and core level photoelectron spectroscopy. The grating of silicon nano-ribbons represents an atomic scale template that can be used in a surface-driven route to control the combination of Si with Mn in the development of novel materials for spintronics devices. The Mn atoms show a preferential adsorption site on silicon atoms, forming one-dimensional nanostructures. They are parallel oriented with respect to the surface Si array, which probably predetermines the diffusion pathways of the Mn atoms during the process of nanostructure formation.
Mn-silicide nanostructures aligned on massively parallel silicon nano-ribbons.

PubMed

De Padova, Paola; Ottaviani, Carlo; Ronci, Fabio; Colonna, Stefano; Olivieri, Bruno; Quaresima, Claudio; Cricenti, Antonio; Dávila, Maria E; Hennies, Franz; Pietzsch, Annette; Shariati, Nina; Le Lay, Guy

2013-01-09

The growth of Mn nanostructures on a 1D grating of silicon nano-ribbons is investigated at atomic scale by means of scanning tunneling microscopy, low energy electron diffraction and core level photoelectron spectroscopy. The grating of silicon nano-ribbons represents an atomic scale template that can be used in a surface-driven route to control the combination of Si with Mn in the development of novel materials for spintronics devices. The Mn atoms show a preferential adsorption site on silicon atoms, forming one-dimensional nanostructures. They are parallel oriented with respect to the surface Si array, which probably predetermines the diffusion pathways of the Mn atoms during the process of nanostructure formation.
PARAVT: Parallel Voronoi tessellation code

NASA Astrophysics Data System (ADS)

González, R. E.

2016-10-01

In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
Theory and simulations of current drive via injection of an electron beam in the ACT-1 device

DOE Office of Scientific and Technical Information (OSTI.GOV)

Okuda, H.; Horton, R.; Ono, M.

1985-02-01

One- and two-dimensional particle simulations of beam-plasma interaction have been carried out in order to understand current drive experiments that use an electron beam injected into the ACT-1 device. Typically, the beam velocity along the magnetic field is V = 10/sup 9/ cm/sec while the thermal velocity of the background electrons is v/sub t/ = 10/sup 8//cm. The ratio of the beam density to the background density is about 10% so that a strong beam-plasma instability develops causing rapid diffusion of beam particles. For both one- and two- dimensional simulations, it is found that a significant amount of beam andmore » background electrons is accelerated considerably beyond the initial beam velocity when the beam density is more than a few percent of the background plasma density. In addition, electron distribution along the magnetic field has a smooth negative slope, f' (v/sub parallel/) < 0, for v/ sub parallel/ > 0 extending v/sub parallel/ = 1.5 V approx. 2 V, which is in sharp contrast to the predictions from quasilinear theory. An estimate of the mean-free path for beam electrons due to Coulomb collisions reveals that the beam electrons can propagate a much longer distance than is predicted from a quasilinear theory, due to the presence of a high energy tail. These simulation results agree well with the experimental observations from the ACT-1 device.« less
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Synchronization Of Parallel Discrete Event Simulations

NASA Technical Reports Server (NTRS)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Distributed parallel messaging for multiprocessor systems

DOEpatents

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Knowledge representation into Ada parallel processing

NASA Technical Reports Server (NTRS)

Masotto, Tom; Babikyan, Carol; Harper, Richard

1990-01-01

The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Parallel Algorithms for Image Analysis.

DTIC Science & Technology

1982-06-01

8217 _ _ _ _ _ _ _ 4. TITLE (aid Subtitle) S. TYPE OF REPORT & PERIOD COVERED PARALLEL ALGORITHMS FOR IMAGE ANALYSIS TECHNICAL 6. PERFORMING O4G. REPORT NUMBER TR-1180...Continue on reverse side it neceesary aid Identlfy by block number) Image processing; image analysis ; parallel processing; cellular computers. 20... IMAGE ANALYSIS TECHNICAL 6. PERFORMING ONG. REPORT NUMBER TR-1180 - 7. AUTHOR(&) S. CONTRACT OR GRANT NUMBER(s) Azriel Rosenfeld AFOSR-77-3271 9
Speeding up parallel processing

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1988-01-01

In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
National Combustion Code: Parallel Implementation and Performance

NASA Technical Reports Server (NTRS)

Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

2000-01-01

The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Weibel instability for a streaming electron, counterstreaming e-e, and e-p plasmas with intrinsic temperature anisotropy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ghorbanalilu, M.; Physics Department, Azarbaijan Shahid Madani University, Tabriz; Sadegzadeh, S.

2014-05-15

The existence of Weibel instability for a streaming electron, counterstreaming electron-electron (e-e), and electron-positron (e-p) plasmas with intrinsic temperature anisotropy is investigated. The temperature anisotropy is included in the directions perpendicular and parallel to the streaming direction. It is shown that the beam mean speed changes the instability mode, for a streaming electron beam, from the classic Weibel to the Weibel-like mode. The analytical and numerical solutions approved that Weibel-like modes are excited for both counterstreaming e-e and e-p plasmas. The growth rates of the instabilities in e-e and e-p plasmas are compared. The growth rate is larger for e-pmore » plasmas if the thermal anisotropy is small and the opposite is true for large thermal anisotropies. The analytical and numerical solutions are in good agreement only in the small parallel temperature and wave number limits, when the instability growth rate increases linearly with normalized wave number kc∕ω{sub p}.« less
A Domain Decomposition Parallelization of the Fast Marching Method

NASA Technical Reports Server (NTRS)

Herrmann, M.

2003-01-01

In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
Porous electronic current collector bodies for electrochemical cell configurations

DOEpatents

Pollack, William; Reichner, Philip

1989-01-01

A high-temperature, solid electrolyte electrochemical cell configuration is made comprising a plurality of elongated electrochemical cells 1, having inner electrodes 3, outer electrodes 6 and solid electrolyte 4 therebetween, the cells being electronically connected in series and parallel by flexible, porous, fibrous strips 7, where the strips contain flexible, electronically conductive fibers bonded together and coated with a refractory oxide, and where the oxide coating is effective to prevent additional bonding of fibers during electrochemical cell operation at high temperatures.
17 CFR 12.24 - Parallel proceedings.

Code of Federal Regulations, 2010 CFR

2010-04-01

...) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration proceeding... the receivership includes the resolution of claims made by customers; or (3) A petition filed under... any of the foregoing with knowledge of a parallel proceeding shall promptly notify the Commission, by...

Electron Bulk Acceleration and Thermalization at Earth's Quasiperpendicular Bow Shock.

PubMed

Chen, L-J; Wang, S; Wilson, L B; Schwartz, S; Bessho, N; Moore, T; Gershman, D; Giles, B; Malaspina, D; Wilder, F D; Ergun, R E; Hesse, M; Lai, H; Russell, C; Strangeway, R; Torbert, R B; F-Vinas, A; Burch, J; Lee, S; Pollock, C; Dorelli, J; Paterson, W; Ahmadi, N; Goodrich, K; Lavraud, B; Le Contel, O; Khotyaintsev, Yu V; Lindqvist, P-A; Boardsen, S; Wei, H; Le, A; Avanov, L

2018-06-01

Electron heating at Earth's quasiperpendicular bow shock has been surmised to be due to the combined effects of a quasistatic electric potential and scattering through wave-particle interaction. Here we report the observation of electron distribution functions indicating a new electron heating process occurring at the leading edge of the shock front. Incident solar wind electrons are accelerated parallel to the magnetic field toward downstream, reaching an electron-ion relative drift speed exceeding the electron thermal speed. The bulk acceleration is associated with an electric field pulse embedded in a whistler-mode wave. The high electron-ion relative drift is relaxed primarily through a nonlinear current-driven instability. The relaxed distributions contain a beam traveling toward the shock as a remnant of the accelerated electrons. Similar distribution functions prevail throughout the shock transition layer, suggesting that the observed acceleration and thermalization is essential to the cross-shock electron heating.
Electron Bulk Acceleration and Thermalization at Earth's Quasiperpendicular Bow Shock

NASA Astrophysics Data System (ADS)

Chen, L.-J.; Wang, S.; Wilson, L. B.; Schwartz, S.; Bessho, N.; Moore, T.; Gershman, D.; Giles, B.; Malaspina, D.; Wilder, F. D.; Ergun, R. E.; Hesse, M.; Lai, H.; Russell, C.; Strangeway, R.; Torbert, R. B.; F.-Vinas, A.; Burch, J.; Lee, S.; Pollock, C.; Dorelli, J.; Paterson, W.; Ahmadi, N.; Goodrich, K.; Lavraud, B.; Le Contel, O.; Khotyaintsev, Yu. V.; Lindqvist, P.-A.; Boardsen, S.; Wei, H.; Le, A.; Avanov, L.

2018-06-01

Electron heating at Earth's quasiperpendicular bow shock has been surmised to be due to the combined effects of a quasistatic electric potential and scattering through wave-particle interaction. Here we report the observation of electron distribution functions indicating a new electron heating process occurring at the leading edge of the shock front. Incident solar wind electrons are accelerated parallel to the magnetic field toward downstream, reaching an electron-ion relative drift speed exceeding the electron thermal speed. The bulk acceleration is associated with an electric field pulse embedded in a whistler-mode wave. The high electron-ion relative drift is relaxed primarily through a nonlinear current-driven instability. The relaxed distributions contain a beam traveling toward the shock as a remnant of the accelerated electrons. Similar distribution functions prevail throughout the shock transition layer, suggesting that the observed acceleration and thermalization is essential to the cross-shock electron heating.
Parallel Implementation of the Discontinuous Galerkin Method

NASA Technical Reports Server (NTRS)

Baggag, Abdalkader; Atkins, Harold; Keyes, David

1999-01-01

This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
Parallel Reconstruction Using Null Operations (PRUNO)

PubMed Central

Zhang, Jian; Liu, Chunlei; Moseley, Michael E.

2011-01-01

A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Data parallel sorting for particle simulation

NASA Technical Reports Server (NTRS)

Dagum, Leonardo

1992-01-01

Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Parallel/distributed direct method for solving linear systems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
Hot Electrons from Two-Plasmon Decay

NASA Astrophysics Data System (ADS)

Russell, D. A.; Dubois, D. F.

2000-10-01

We solve, self-consistently, the relativistic quasilinear diffusion equation and Zakharov's model equations of Langmuir wave (LW) and ion acoustic wave (IAW) turbulence, in two dimensions, for saturated states of the Two-Plasmon Decay instability. Parameters are those of the shorter gradient scale-length (50 microns) high temperature (4 keV) inhomogeneous plasmas anticipated at LLE’s Omega laser facility. We calculate the fraction of incident laser power absorbed in hot electron production as a function of laser intensity for a plane-wave laser field propagating parallel to the background density gradient. Two distinct regimes are identified: In the strong-turbulent regime, hot electron bursts occur intermittently in time, well correlated with collapse in the LW and IAW fields. A significant fraction of the incident laser power ( ~10%) is absorbed by hot electrons during a single burst. In the weak or convective regime, relatively constant rates of hot electron production are observed at much reduced intensities.
S-HARP: A parallel dynamic spectral partitioner

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sohn, A.; Simon, H.

1998-01-01

Computational science problems with adaptive meshes involve dynamic load balancing when implemented on parallel machines. This dynamic load balancing requires fast partitioning of computational meshes at run time. The authors present in this report a fast parallel dynamic partitioner, called S-HARP. The underlying principles of S-HARP are the fast feature of inertial partitioning and the quality feature of spectral partitioning. S-HARP partitions a graph from scratch, requiring no partition information from previous iterations. Two types of parallelism have been exploited in S-HARP, fine grain loop level parallelism and coarse grain recursive parallelism. The parallel partitioner has been implemented in Messagemore » Passing Interface on Cray T3E and IBM SP2 for portability. Experimental results indicate that S-HARP can partition a mesh of over 100,000 vertices into 256 partitions in 0.2 seconds on a 64 processor Cray T3E. S-HARP is much more scalable than other dynamic partitioners, giving over 15 fold speedup on 64 processors while ParaMeTiS1.0 gives a few fold speedup. Experimental results demonstrate that S-HARP is three to 10 times faster than the dynamic partitioners ParaMeTiS and Jostle on six computational meshes of size over 100,000 vertices.« less
Electronic compensation technique to deliver a total body dose

NASA Astrophysics Data System (ADS)

Lakeman, Tara E.

Purpose: Total body irradiation (TBI) uses large parallel-opposed radiation fields to suppress the patient's immune system and eradicate the residual cancer cells in preparation of recipient for bone marrow transplant. The manual placement of lead compensators has been conventionally used to compensate for the varying thickness throughout the body in large-field TBI. The goal of this study is to pursue utilizing the modern electronic compensation technique to more accurately and efficiently deliver dose to patients in need of TBI. Method: Treatment plans utilizing the electronic compensation to deliver a total body dose were created retrospectively for patients for whom CT data had been previously acquired. Each treatment plan includes two pair of parallel opposed fields. One pair of large fields is used to encompass the majority of the patient's anatomy. The other pair are very small open fields focused only on the thin bottom portion of the patient's anatomy, which requires much less radiation than the rest of the body to reach 100% of the prescribed dose. A desirable fluence pattern was manually painted within each of the larger fields for each patient to provide a more uniform distribution. Results: Dose-volume histograms (DVH) were calculated for evaluating the electronic compensation technique. In the electronically compensated plans, the maximum body doses calculated from the DVH were reduced from the conventionally-compensated plans by an average of 15%, indicating a more uniform dose. The mean body doses calculated from the electronically compensated DVH remained comparable to that of the conventionally-compensated plans, indicating an accurate delivery of the prescription dose using electronic compensation. All calculated monitor units were within clinically acceptable limits. Conclusion: Electronic compensation technique for TBI will not increase the beam on time beyond clinically acceptable limits while it can substantially reduce the compensator setup
Measuring the orbital angular momentum spectrum of an electron beam

PubMed Central

Grillo, Vincenzo; Tavabi, Amir H.; Venturi, Federico; Larocque, Hugo; Balboni, Roberto; Gazzadi, Gian Carlo; Frabboni, Stefano; Lu, Peng-Han; Mafakheri, Erfan; Bouchard, Frédéric; Dunin-Borkowski, Rafal E.; Boyd, Robert W.; Lavery, Martin P. J.; Padgett, Miles J.; Karimi, Ebrahim

2017-01-01

Electron waves that carry orbital angular momentum (OAM) are characterized by a quantized and unbounded magnetic dipole moment parallel to their propagation direction. When interacting with magnetic materials, the wavefunctions of such electrons are inherently modified. Such variations therefore motivate the need to analyse electron wavefunctions, especially their wavefronts, to obtain information regarding the material's structure. Here, we propose, design and demonstrate the performance of a device based on nanoscale holograms for measuring an electron's OAM components by spatially separating them. We sort pure and superposed OAM states of electrons with OAM values of between −10 and 10. We employ the device to analyse the OAM spectrum of electrons that have been affected by a micron-scale magnetic dipole, thus establishing that our sorter can be an instrument for nanoscale magnetic spectroscopy. PMID:28537248
Cross-sectional aspect ratio modulated electronic properties in Si/Ge core/shell nanowires

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Nuo; Lu, Ning; Yao, Yong-Xin

2013-02-28

Electronic structures of (4, n) and (m, 4) (the NW has m layers parallel to the {1 1 1} facet and n layers parallel to {1 1 0}) Si/Ge core/shell nanowires (NWs) along the [1 1 2] direction with cross-sectional aspect ratio (m/n) from 0.36 to 2.25 are studied by first-principles calculations. An indirect to direct band gap transition is observed as m/n decreases, and the critical values of m/n and diameter for the transition are also estimated. The size of the band gap also depends on the aspect ratio. These results suggest that m/n plays an important role inmore » modulating the electronic properties of the NWs.« less
Electron Heating at Kinetic Scales in Magnetosheath Turbulence

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chasapis, Alexandros; Matthaeus, W. H.; Parashar, T. N.

2017-02-20

We present a statistical study of coherent structures at kinetic scales, using data from the Magnetospheric Multiscale mission in the Earth’s magnetosheath. We implemented the multi-spacecraft partial variance of increments (PVI) technique to detect these structures, which are associated with intermittency at kinetic scales. We examine the properties of the electron heating occurring within such structures. We find that, statistically, structures with a high PVI index are regions of significant electron heating. We also focus on one such structure, a current sheet, which shows some signatures consistent with magnetic reconnection. Strong parallel electron heating coincides with whistler emissions at themore » edges of the current sheet.« less
Computational efficiency of parallel combinatorial OR-tree searches

NASA Technical Reports Server (NTRS)

Li, Guo-Jie; Wah, Benjamin W.

1990-01-01

The performance of parallel combinatorial OR-tree searches is analytically evaluated. This performance depends on the complexity of the problem to be solved, the error allowance function, the dominance relation, and the search strategies. The exact performance may be difficult to predict due to the nondeterminism and anomalies of parallelism. The authors derive the performance bounds of parallel OR-tree searches with respect to the best-first, depth-first, and breadth-first strategies, and verify these bounds by simulation. They show that a near-linear speedup can be achieved with respect to a large number of processors for parallel OR-tree searches. Using the bounds developed, the authors derive sufficient conditions for assuring that parallelism will not degrade performance and necessary conditions for allowing parallelism to have a speedup greater than the ratio of the numbers of processors. These bounds and conditions provide the theoretical foundation for determining the number of processors required to assure a near-linear speedup.
High-resolution, high-throughput imaging with a multibeam scanning electron microscope

PubMed Central

EBERLE, AL; MIKULA, S; SCHALEK, R; LICHTMAN, J; TATE, ML KNOTHE; ZEIDLER, D

2015-01-01

Electron–electron interactions and detector bandwidth limit the maximal imaging speed of single-beam scanning electron microscopes. We use multiple electron beams in a single column and detect secondary electrons in parallel to increase the imaging speed by close to two orders of magnitude and demonstrate imaging for a variety of samples ranging from biological brain tissue to semiconductor wafers. Lay Description The composition of our world and our bodies on the very small scale has always fascinated people, making them search for ways to make this visible to the human eye. Where light microscopes reach their resolution limit at a certain magnification, electron microscopes can go beyond. But their capability of visualizing extremely small features comes at the cost of a very small field of view. Some of the questions researchers seek to answer today deal with the ultrafine structure of brains, bones or computer chips. Capturing these objects with electron microscopes takes a lot of time – maybe even exceeding the time span of a human being – or new tools that do the job much faster. A new type of scanning electron microscope scans with 61 electron beams in parallel, acquiring 61 adjacent images of the sample at the same time a conventional scanning electron microscope captures one of these images. In principle, the multibeam scanning electron microscope’s field of view is 61 times larger and therefore coverage of the sample surface can be accomplished in less time. This enables researchers to think about large-scale projects, for example in the rather new field of connectomics. A very good introduction to imaging a brain at nanometre resolution can be found within course material from Harvard University on http://www.mcb80x.org/# as featured media entitled ‘connectomics’. PMID:25627873
The Parallel Axiom

ERIC Educational Resources Information Center

Rogers, Pat

1972-01-01

Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)
Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

PubMed

Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

2014-11-11

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
Parallel Transport with Sheath and Collisional Effects in Global Electrostatic Turbulent Transport in FRCs

NASA Astrophysics Data System (ADS)

Bao, Jian; Lau, Calvin; Kuley, Animesh; Lin, Zhihong; Fulton, Daniel; Tajima, Toshiki; Tri Alpha Energy, Inc. Team

2017-10-01

Collisional and turbulent transport in a field reversed configuration (FRC) is studied in global particle simulation by using GTC (gyrokinetic toroidal code). The global FRC geometry is incorporated in GTC by using a field-aligned mesh in cylindrical coordinates, which enables global simulation coupling core and scrape-off layer (SOL) across the separatrix. Furthermore, fully kinetic ions are implemented in GTC to treat magnetic-null point in FRC core. Both global simulation coupling core and SOL regions and independent SOL region simulation have been carried out to study turbulence. In this work, the ``logical sheath boundary condition'' is implemented to study parallel transport in the SOL. This method helps to relax time and spatial steps without resolving electron plasma frequency and Debye length, which enables turbulent transports simulation with sheath effects. We will study collisional and turbulent SOL parallel transport with mirror geometry and sheath boundary condition in C2-W divertor.
Recurrence spectra of a helium atom in parallel electric and magnetic fields

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Dehua; Department of Mathematics and Physics, Shandong Architecture and Engineering Institute, Jinan 250014, People's Republic of China; Ding, Shiliang

2003-08-01

A model potential for the general Rydberg atom is put forward, which includes not only the Coulomb interaction potential and the core-attractive potential, but also the exchange potential between the excited electron and other electrons. Using the region-splitting consistent and iterative method, we calculated the scaled recurrence spectra of the helium atom in parallel electric and magnetic fields and the closed orbits in the corresponding classical system have also been obtained. In order to remove the Coulomb singularity of the classical motion of Hamiltonian, we implement the Kustaanheimo-Stiefel transformation, which transforms the system from a three-dimensional to a four-dimensional one.more » The Fourier-transformed spectra of the helium atom has allowed direct comparison between peaks in such a plot and the scaled action values of closed orbits. Considering the exchange potential, the number of the closed orbits increased, which led to more peaks in the recurrence spectra. The results are compared with those of the hydrogen case, which shows that the core-scattered effects and the electron exchange potential play an important role in the multielectron Rydberg atom.« less
Set-up and demonstration of a Low Energy Electron Magnetometer (LEEM)

NASA Technical Reports Server (NTRS)

Rayborn, G. H.

1986-01-01

Described are the design, construction and test results of a Low Energy Electron Magnetometer (LEEM). The electron source is a commercial electron gun capable of providing several microamperes of electron beam. These electrons, after acceleration through a selected potential difference of 100-300 volts, are sent through two 30 degree second-order focussing parallel plate electrostatic analyzers. The first analyzer acts as a monochromator located in the field-free space. It is capable of providing energy resolution of better than 10 to the -3 power. The second analyzer, located in the test field region, acts as the detector for electrons deflected by the test field. The entire magnetometer system is expected to have a resolution of 1 part in 1000 or better.
Parallel processing in finite element structural analysis

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1987-01-01

A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).

Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm
Parallel-Processing Test Bed For Simulation Software

NASA Technical Reports Server (NTRS)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
[Three-dimensional parallel collagen scaffold promotes tendon extracellular matrix formation].

PubMed

Zheng, Zefeng; Shen, Weiliang; Le, Huihui; Dai, Xuesong; Ouyang, Hongwei; Chen, Weishan

2016-03-01

To investigate the effects of three-dimensional parallel collagen scaffold on the cell shape, arrangement and extracellular matrix formation of tendon stem cells. Parallel collagen scaffold was fabricated by unidirectional freezing technique, while random collagen scaffold was fabricated by freeze-drying technique. The effects of two scaffolds on cell shape and extracellular matrix formation were investigated in vitro by seeding tendon stem/progenitor cells and in vivo by ectopic implantation. Parallel and random collagen scaffolds were produced successfully. Parallel collagen scaffold was more akin to tendon than random collagen scaffold. Tendon stem/progenitor cells were spindle-shaped and unified orientated in parallel collagen scaffold, while cells on random collagen scaffold had disorder orientation. Two weeks after ectopic implantation, cells had nearly the same orientation with the collagen substance. In parallel collagen scaffold, cells had parallel arrangement, and more spindly cells were observed. By contrast, cells in random collagen scaffold were disorder. Parallel collagen scaffold can induce cells to be in spindly and parallel arrangement, and promote parallel extracellular matrix formation; while random collagen scaffold can induce cells in random arrangement. The results indicate that parallel collagen scaffold is an ideal structure to promote tendon repairing.
Graphics applications utilizing parallel processing

NASA Technical Reports Server (NTRS)

Rice, John R.

1990-01-01

The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.
Visualizing Parallel Computer System Performance

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.

1988-01-01

Parallel computer systems are among the most complex of man's creations, making satisfactory performance characterization difficult. Despite this complexity, there are strong, indeed, almost irresistible, incentives to quantify parallel system performance using a single metric. The fallacy lies in succumbing to such temptations. A complete performance characterization requires not only an analysis of the system's constituent levels, it also requires both static and dynamic characterizations. Static or average behavior analysis may mask transients that dramatically alter system performance. Although the human visual system is remarkedly adept at interpreting and identifying anomalies in false color data, the importance of dynamic, visual scientific data presentation has only recently been recognized Large, complex parallel system pose equally vexing performance interpretation problems. Data from hardware and software performance monitors must be presented in ways that emphasize important events while eluding irrelevant details. Design approaches and tools for performance visualization are the subject of this paper.
Fast parallel algorithm for slicing STL based on pipeline

NASA Astrophysics Data System (ADS)

Ma, Xulong; Lin, Feng; Yao, Bo

2016-05-01

In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
A possibility of parallel and anti-parallel diffraction measurements on neu- tron diffractometer employing bent perfect crystal monochromator at the monochromatic focusing condition

NASA Astrophysics Data System (ADS)

Choi, Yong Nam; Kim, Shin Ae; Kim, Sung Kyu; Kim, Sung Baek; Lee, Chang-Hee; Mikula, Pavel

2004-07-01

In a conventional diffractometer having single monochromator, only one position, parallel position, is used for the diffraction experiment (i.e. detection) because the resolution property of the other one, anti-parallel position, is very poor. However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the Delta d/d measured on three diffraction geometries (symmetric, asymmetric compression and asymmetric expansion), we can conclude that the simultaneous diffraction measurement in both parallel and anti-parallel positions can be achieved.
Coherence of a spin-polarized electron beam emitted from a semiconductor photocathode in a transmission electron microscope

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuwahara, Makoto, E-mail: kuwahara@esi.nagoya-u.ac.jp; Saitoh, Koh; Tanaka, Nobuo

2014-11-10

The brightness and interference fringes of a spin-polarized electron beam extracted from a semiconductor photocathode excited by laser irradiation are directly measured via its use in a transmission electron microscope. The brightness was 3.8 × 10{sup 7 }A cm{sup −2 }sr{sup −1} for a 30-keV beam energy with the polarization of 82%, which corresponds to 3.1 × 10{sup 8 }A cm{sup −2 }sr{sup −1} for a 200-keV beam energy. The resulting electron beam exhibited a long coherence length at the specimen position due to the high parallelism of (1.7 ± 0.3) × 10{sup −5 }rad, which generated interference fringes representative of a first-order correlation using an electron biprism. The beam also had amore » high degeneracy of electron wavepacket of 4 × 10{sup −6}. Due to the high polarization, the high degeneracy and the long coherence length, the spin-polarized electron beam can enhance the antibunching effect.« less
Broadcasting collective operation contributions throughout a parallel computer

DOEpatents

Faraj, Ahmad [Rochester, MN

2012-02-21

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
Parallel, Distributed Scripting with Python

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, P J

2002-05-24

Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less
A new hybrid code (CHIEF) implementing the inertial electron fluid equation without approximation

NASA Astrophysics Data System (ADS)

Muñoz, P. A.; Jain, N.; Kilian, P.; Büchner, J.

2018-03-01

We present a new hybrid algorithm implemented in the code CHIEF (Code Hybrid with Inertial Electron Fluid) for simulations of electron-ion plasmas. The algorithm treats the ions kinetically, modeled by the Particle-in-Cell (PiC) method, and electrons as an inertial fluid, modeled by electron fluid equations without any of the approximations used in most of the other hybrid codes with an inertial electron fluid. This kind of code is appropriate to model a large variety of quasineutral plasma phenomena where the electron inertia and/or ion kinetic effects are relevant. We present here the governing equations of the model, how these are discretized and implemented numerically, as well as six test problems to validate our numerical approach. Our chosen test problems, where the electron inertia and ion kinetic effects play the essential role, are: 0) Excitation of parallel eigenmodes to check numerical convergence and stability, 1) parallel (to a background magnetic field) propagating electromagnetic waves, 2) perpendicular propagating electrostatic waves (ion Bernstein modes), 3) ion beam right-hand instability (resonant and non-resonant), 4) ion Landau damping, 5) ion firehose instability, and 6) 2D oblique ion firehose instability. Our results reproduce successfully the predictions of linear and non-linear theory for all these problems, validating our code. All properties of this hybrid code make it ideal to study multi-scale phenomena between electron and ion scales such as collisionless shocks, magnetic reconnection and kinetic plasma turbulence in the dissipation range above the electron scales.
Finite Gyroradius Effects in the Electron Outflow of Asymmetric Magnetic Reconnection

NASA Technical Reports Server (NTRS)

Norgren, C.; Graham, D. B.; Khotyaintsev, Yu. V.; Andre, M.; Vaivads, A.; Chen, Li-Jen; Lindqvist, P.-A.; Marklund, G. T.; Ergun, R. E.; Magnes, W.;

2016-01-01

We present observations of asymmetric magnetic reconnection showing evidence of electron demagnetization in the electron outflow. The observations were made at the magnetopause by the four Magnetospheric Multiscale (MMS) spacecraft, separated by approximately 15 km. The reconnecting current sheet has negligible guide field, and all four spacecraft likely pass close to the electron diffusion region just south of the X line. In the electron outflow near the X line, all four spacecraft observe highly structured electron distributions in a region comparable to a few electron gyroradii. The distributions consist of a core with T(sub parallel) greater than T(sub perpendicular) and a nongyrotropic crescent perpendicular to the magnetic field. The crescents are associated with finite gyroradius effects of partly demagnetized electrons. These observations clearly demonstrate the manifestation of finite gyroradius effects in an electron-scale reconnection current sheet.

Parallel optoelectronic trinary signed-digit division

NASA Astrophysics Data System (ADS)

Alam, Mohammad S.

1999-03-01

The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
Parallel grid population

DOEpatents

Wald, Ingo; Ize, Santiago

2015-07-28

Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Parallel Digital Phase-Locked Loops

NASA Technical Reports Server (NTRS)

Sadr, Ramin; Shah, Biren N.; Hinedi, Sami M.

1995-01-01

Wide-band microwave receivers of proposed type include digital phase-locked loops in which band-pass filtering and down-conversion of input signals implemented by banks of multirate digital filters operating in parallel. Called "parallel digital phase-locked loops" to distinguish them from other digital phase-locked loops. Systems conceived as cost-effective solution to problem of filtering signals at high sampling rates needed to accommodate wide input frequency bands. Each of M filters process 1/M of spectrum of signal.
Fringe Capacitance of a Parallel-Plate Capacitor.

ERIC Educational Resources Information Center

Hale, D. P.

1978-01-01

Describes an experiment designed to measure the forces between charged parallel plates, and determines the relationship among the effective electrode area, the measured capacitance values, and the electrode spacing of a parallel plate capacitor. (GA)
A radiation-tolerant electronic readout system for portal imaging

NASA Astrophysics Data System (ADS)

Östling, J.; Brahme, A.; Danielsson, M.; Iacobaeus, C.; Peskov, V.

2004-06-01

A new electronic portal imaging device, EPID, is under development at the Karolinska Institutet and the Royal Institute of Technology. Due to considerable demands on radiation tolerance in the radiotherapy environment, a dedicated electronic readout system has been designed. The most interesting aspect of the readout system is that it allows to read out ˜1000 pixels in parallel, with all electronics placed outside the radiation beam—making the detector more radiation resistant. In this work we are presenting the function of a small prototype (6×100 pixels) of the electronic readout board that has been tested. Tests were made with continuous X-rays (10-60 keV) and with α particles. The results show that, without using an optimised gas mixture and with an early prototype only, the electronic readout system still works very well.
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele

2001-01-01

This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.