Multiple Embedded Processors for Fault-Tolerant Computing
NASA Technical Reports Server (NTRS)
Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy
2005-01-01
A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Scalable Domain Decomposed Monte Carlo Particle Transport
NASA Astrophysics Data System (ADS)
O'Brien, Matthew Joseph
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation. The main algorithms we consider are: • Domain decomposition of constructive solid geometry: enables extremely large calculations in which the background geometry is too large to fit in the memory of a single computational node. • Load Balancing: keeps the workload per processor as even as possible so the calculation runs efficiently. • Global Particle Find: if particles are on the wrong processor, globally resolve their locations to the correct processor based on particle coordinate and background domain. • Visualizing constructive solid geometry, sourcing particles, deciding that particle streaming communication is completed and spatial redecomposition. These algorithms are some of the most important parallel algorithms required for domain decomposed Monte Carlo particle transport. We demonstrate that our previous algorithms were not scalable, prove that our new algorithms are scalable, and run some of the algorithms up to 2 million MPI processes on the Sequoia supercomputer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.
2015-01-20
In order to run Monte Carlo particle transport calculations on new supercomputers with hundreds of thousands or millions of processors, care must be taken to implement scalable algorithms. This means that the algorithms must continue to perform well as the processor count increases. In this paper, we examine the scalability of:(1) globally resolving the particle locations on the correct processor, (2) deciding that particle streaming communication has finished, and (3) efficiently coupling neighbor domains together with different replication levels. We have run domain decomposed Monte Carlo particle transport on up to 2 21 = 2,097,152 MPI processes on the IBMmore » BG/Q Sequoia supercomputer and observed scalable results that agree with our theoretical predictions. These calculations were carefully constructed to have the same amount of work on every processor, i.e. the calculation is already load balanced. We also examine load imbalanced calculations where each domain’s replication level is proportional to its particle workload. In this case we show how to efficiently couple together adjacent domains to maintain within workgroup load balance and minimize memory usage.« less
Dynamic load balance scheme for the DSMC algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jin; Geng, Xiangren; Jiang, Dingwu
The direct simulation Monte Carlo (DSMC) algorithm, devised by Bird, has been used over a wide range of various rarified flow problems in the past 40 years. While the DSMC is suitable for the parallel implementation on powerful multi-processor architecture, it also introduces a large load imbalance across the processor array, even for small examples. The load imposed on a processor by a DSMC calculation is determined to a large extent by the total of simulator particles upon it. Since most flows are impulsively started with initial distribution of particles which is surely quite different from the steady state, themore » total of simulator particles will change dramatically. The load balance based upon an initial distribution of particles will break down as the steady state of flow is reached. The load imbalance and huge computational cost of DSMC has limited its application to rarefied or simple transitional flows. In this paper, by taking advantage of METIS, a software for partitioning unstructured graphs, and taking the total of simulator particles in each cell as a weight information, the repartitioning based upon the principle that each processor handles approximately the equal total of simulator particles has been achieved. The computation must pause several times to renew the total of simulator particles in each processor and repartition the whole domain again. Thus the load balance across the processors array holds in the duration of computation. The parallel efficiency can be improved effectively. The benchmark solution of a cylinder submerged in hypersonic flow has been simulated numerically. Besides, hypersonic flow past around a complex wing-body configuration has also been simulated. The results have displayed that, for both of cases, the computational time can be reduced by about 50%.« less
Fuzzy logic particle tracking velocimetry
NASA Technical Reports Server (NTRS)
Wernet, Mark P.
1993-01-01
Fuzzy logic has proven to be a simple and robust method for process control. Instead of requiring a complex model of the system, a user defined rule base is used to control the process. In this paper the principles of fuzzy logic control are applied to Particle Tracking Velocimetry (PTV). Two frames of digitally recorded, single exposure particle imagery are used as input. The fuzzy processor uses the local particle displacement information to determine the correct particle tracks. Fuzzy PTV is an improvement over traditional PTV techniques which typically require a sequence (greater than 2) of image frames for accurately tracking particles. The fuzzy processor executes in software on a PC without the use of specialized array or fuzzy logic processors. A pair of sample input images with roughly 300 particle images each, results in more than 200 velocity vectors in under 8 seconds of processing time.
Scalable Domain Decomposed Monte Carlo Particle Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, Matthew Joseph
2013-12-05
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation.
PIV/HPIV Film Analysis Software Package
NASA Technical Reports Server (NTRS)
Blackshire, James L.
1997-01-01
A PIV/HPIV film analysis software system was developed that calculates the 2-dimensional spatial autocorrelations of subregions of Particle Image Velocimetry (PIV) or Holographic Particle Image Velocimetry (HPIV) film recordings. The software controls three hardware subsystems including (1) a Kodak Megaplus 1.4 camera and EPIX 4MEG framegrabber subsystem, (2) an IEEE/Unidex 11 precision motion control subsystem, and (3) an Alacron I860 array processor subsystem. The software runs on an IBM PC/AT host computer running either the Microsoft Windows 3.1 or Windows 95 operating system. It is capable of processing five PIV or HPIV displacement vectors per second, and is completely automated with the exception of user input to a configuration file prior to analysis execution for update of various system parameters.
Optimization of Particle-in-Cell Codes on RISC Processors
NASA Technical Reports Server (NTRS)
Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.
1996-01-01
General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
NASA Technical Reports Server (NTRS)
Lund, D.
1998-01-01
This report presents a description of the tests performed, and the test data, for the AI METSAT Signal Processor Assembly P/N 1331670-2, S/N F05. The assembly was tested in accordance with AE-26754, "METSAT Signal Processor Scan Drive and Integration Procedure." The objective is to demonstrate functionality of the signal processor prior to instrument integration.
NASA Technical Reports Server (NTRS)
Lund, D.
1998-01-01
This report presents a description of tests performed, and the test data, for the A1 METSAT Signal Processor Assembly PN: 1331679-2, S/N F03. This assembly was tested in accordance with AE-26754, "METSAT Signal Processor Scan Drive Test and Integration Procedure." The objective is to demonstrate functionality of the signal processor prior to instrument integration.
NASA Technical Reports Server (NTRS)
Lund, D.
1998-01-01
This report presents a description of the tests performed, and the test data, for the A1 METSAT Signal Processor Assembly PN: 1331679-2, S/N F04. The assembly was tested in accordance with AE-26754, "METSAT Signal Processor Scan Drive Test and Integration Procedure." The objective is to demonstrate functionality of the signal processor prior to instrument integration.
Treecode with a Special-Purpose Processor
NASA Astrophysics Data System (ADS)
Makino, Junichiro
1991-08-01
We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.; Joy, K. I.
2013-07-01
In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Coiled Brine Recovery Assembly (CoBRA): A New Approach to Recovering Water from Wastewater Brines
NASA Technical Reports Server (NTRS)
Pensinger, Stuart J.
2015-01-01
Brine water recovery represents a current technology gap in water recycling for human spaceflight. The role of a brine processor is to take the concentrated discharge from a primary wastewater processor, called brine, and recover most of the remaining water from it. The current state-of-the-art primary processor is the ISS Urine Processor Assembly (UPA) that currently achieves 70% water recovery. Recent advancements in chemical pretreatments are expected to increase this to 85% in the near future. This is a welcome improvement, yet is still not high enough for deep space transit. Mission architecture studies indicate that at least 95% is necessary for a Mars mission, as an example. Brine water recovery is the technology that bridges the gap between 85% and 95%, and moves life support systems one step closer to full closure of the water loop. Several brine water recovery systems have been proposed for human spaceflight, most of them focused on solving two major problems: operation in a weightless environment, and management and containment of brine residual. Brine residual is the leftover byproduct of the brine recovery process, and is often a viscous, sticky paste, laden with crystallized solid particles. Due to the chemical pretreatments added to wastewater prior to distillation in a primary processor, these residuals are typically toxic, which further complicates matters. Isolation of crewmembers from these hazardous materials is paramount. The Coiled Brine Recovery Assembly (CoBRA) is a recently developed concept from the Johnson Space Center that offers solutions to these challenges. CoBRA is centered on a softgoods evaporator that enables a passive fill with brine, and regeneration by discharging liquid brine residual to a collection bag. This evaporator is meant to be lightweight, which allows it to be discarded along with the accumulated brine solids contained within it. This paper discusses design and development of a first CoBRA prototype, and reports initial test results.
Exact diagonalization of quantum lattice models on coprocessors
NASA Astrophysics Data System (ADS)
Siro, T.; Harju, A.
2016-10-01
We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Kondo, Keita; Kato, Shinsuke; Niwa, Toshiyuki
2017-10-30
We aimed to understand the factors controlling mechanical particle coating using polymethacrylate. The relationship between coating performance and the characteristics of polymethacrylate powders was investigated. First, theophylline crystals were treated using a mechanical powder processor to obtain theophylline spheres (<100μm). Second, five polymethacrylate latexes were powdered by spray freeze drying to produce colloidal agglomerates. Finally, mechanical particle coating was performed by mixing theophylline spheres and polymethacrylate agglomerates using the processor. The agglomerates were broken under mechanical stress to coat the spheres effectively. The coating performance of polymethacrylate agglomerates tended to increase as their pulverization progressed. Differences in the grindability of the agglomerates were attributed to differences in particle structure, resulting from consolidation between colloidal particles. High-grindability agglomerates exhibited higher pulverization as their glass transition temperature (T g ) increased and the further pulverization promoted coating. We therefore conclude that the minimization of polymethacrylate powder by pulverization is an important factor in mechanical particle coating using polymethacrylate with low deformability. Meanwhile, when product temperature during coating approaches T g of polymer, polymethacrylate was soften to show high coating performance by plastic deformation. The effective coating by this mechanism may be accomplished by adjusting the temperature in the processor to the T g . Copyright © 2017 Elsevier B.V. All rights reserved.
Amisaki, Takashi; Toyoda, Shinjiro; Miyagawa, Hiroh; Kitamura, Kunihiro
2003-04-15
Evaluation of long-range Coulombic interactions still represents a bottleneck in the molecular dynamics (MD) simulations of biological macromolecules. Despite the advent of sophisticated fast algorithms, such as the fast multipole method (FMM), accurate simulations still demand a great amount of computation time due to the accuracy/speed trade-off inherently involved in these algorithms. Unless higher order multipole expansions, which are extremely expensive to evaluate, are employed, a large amount of the execution time is still spent in directly calculating particle-particle interactions within the nearby region of each particle. To reduce this execution time for pair interactions, we developed a computation unit (board), called MD-Engine II, that calculates nonbonded pairwise interactions using a specially designed hardware. Four custom arithmetic-processors and a processor for memory manipulation ("particle processor") are mounted on the computation board. The arithmetic processors are responsible for calculation of the pair interactions. The particle processor plays a central role in realizing efficient cooperation with the FMM. The results of a series of 50-ps MD simulations of a protein-water system (50,764 atoms) indicated that a more stringent setting of accuracy in FMM computation, compared with those previously reported, was required for accurate simulations over long time periods. Such a level of accuracy was efficiently achieved using the cooperative calculations of the FMM and MD-Engine II. On an Alpha 21264 PC, the FMM computation at a moderate but tolerable level of accuracy was accelerated by a factor of 16.0 using three boards. At a high level of accuracy, the cooperative calculation achieved a 22.7-fold acceleration over the corresponding conventional FMM calculation. In the cooperative calculations of the FMM and MD-Engine II, it was possible to achieve more accurate computation at a comparable execution time by incorporating larger nearby regions. Copyright 2003 Wiley Periodicals, Inc. J Comput Chem 24: 582-592, 2003
Clouser, C S; Doores, S; Mast, M G; Knabel, S J
1995-04-01
This study was undertaken to determine whether the incidence of either Salmonella spp. or Listeria monocytogenes on turkeys at three commercial processors could be related to the type of defeathering system: 1) conventional, 58 C common bath scald; 2) kosher, 7 C common bath scald; or 3) steam-spray, 62 C nonimmersion scald. Flocks were sampled before defeathering, after defeathering, and after chill at each facility. The incidence of Salmonella-positive turkeys significantly increased subsequent to conventional defeathering (10 positive out of 14) as compared with before defeathering (3/14). The number of Salmonella-positive carcasses following kosher (0/14) and steam-spray (2/14) defeathering were similar to the number of Salmonella-positive carcasses found prior to defeathering (1/14 and 3/14, respectively). The incidence of Salmonella-positive carcasses following chill was slightly lower, but not significantly different than the number of Salmonella-positive carcasses found immediately following defeathering at all processors (8/14, 0/14, 1/14 for conventional, kosher, and steam-spray processors, respectively). Although L. monocytogenes was detected on turkeys sampled before chilling (2/10, kosher) and after chilling (8/14, kosher; 1/14, conventional), no L. monocytogenes was detected on turkeys at any of the processors prior to the evisceration process. Flocks with high aerobic plate counts prior to processing were more likely to contain Salmonella-positive birds throughout processing. Aerobic plate counts of all flocks were similar after chill whether or not Salmonella spp. and L. monocytogenes were detected.
The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation
NASA Astrophysics Data System (ADS)
Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru
1999-09-01
We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.
An implementation of a tree code on a SIMD, parallel computer
NASA Technical Reports Server (NTRS)
Olson, Kevin M.; Dorband, John E.
1994-01-01
We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Zonal methods for the parallel execution of range-limited N-body simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Kevin J.; Dror, Ron O.; Shaw, David E.
2007-01-20
Particle simulations in fields ranging from biochemistry to astrophysics require the evaluation of interactions between all pairs of particles separated by less than some fixed interaction radius. The applicability of such simulations is often limited by the time required for calculation, but the use of massive parallelism to accelerate these computations is typically limited by inter-processor communication requirements. Recently, Snir [M. Snir, A note on N-body computations with cutoffs, Theor. Comput. Syst. 37 (2004) 295-318] and Shaw [D.E. Shaw, A fast, scalable method for the parallel evaluation of distance-limited pairwise particle interactions, J. Comput. Chem. 26 (2005) 1318-1328] independently introducedmore » two distinct methods that offer asymptotic reductions in the amount of data transferred between processors. In the present paper, we show that these schemes represent special cases of a more general class of methods, and introduce several new algorithms in this class that offer practical advantages over all previously described methods for a wide range of problem parameters. We also show that several of these algorithms approach an approximate lower bound on inter-processor data transfer.« less
Output statistics of laser anemometers in sparsely seeded flows
NASA Technical Reports Server (NTRS)
Edwards, R. V.; Jensen, A. S.
1982-01-01
It is noted that until very recently, research on this topic concentrated on the particle arrival statistics and the influence of the optical parameters on them. Little attention has been paid to the influence of subsequent processing on the measurement statistics. There is also controversy over whether the effects of the particle statistics can be measured. It is shown here that some of the confusion derives from a lack of understanding of the experimental parameters that are to be controlled or known. A rigorous framework is presented for examining the measurement statistics of such systems. To provide examples, two problems are then addressed. The first has to do with a sample and hold processor, the second with what is called a saturable processor. The sample and hold processor converts the output to a continuous signal by holding the last reading until a new one is obtained. The saturable system is one where the maximum processable rate is arrived at by the dead time of some unit in the system. At high particle rates, the processed rate is determined through the dead time.
First Results of an “Artificial Retina” Processor Prototype
Cenci, Riccardo; Bedeschi, Franco; Marino, Pietro; ...
2016-11-15
We report on the performance of a specialized processor capable of reconstructing charged particle tracks in a realistic LHC silicon tracker detector, at the same speed of the readout and with sub-microsecond latency. The processor is based on an innovative pattern-recognition algorithm, called “artificial retina algorithm”, inspired from the vision system of mammals. A prototype of the processor has been designed, simulated, and implemented on Tel62 boards equipped with high-bandwidth Altera Stratix III FPGA devices. Also, the prototype is the first step towards a real-time track reconstruction device aimed at processing complex events of high-luminosity LHC experiments at 40 MHzmore » crossing rate.« less
First Results of an “Artificial Retina” Processor Prototype
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cenci, Riccardo; Bedeschi, Franco; Marino, Pietro
We report on the performance of a specialized processor capable of reconstructing charged particle tracks in a realistic LHC silicon tracker detector, at the same speed of the readout and with sub-microsecond latency. The processor is based on an innovative pattern-recognition algorithm, called “artificial retina algorithm”, inspired from the vision system of mammals. A prototype of the processor has been designed, simulated, and implemented on Tel62 boards equipped with high-bandwidth Altera Stratix III FPGA devices. Also, the prototype is the first step towards a real-time track reconstruction device aimed at processing complex events of high-luminosity LHC experiments at 40 MHzmore » crossing rate.« less
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
NASA Astrophysics Data System (ADS)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2017-08-01
For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
2015-06-01
5110P and 16 dx360M4 nodes each with one NVIDIA Kepler K20M/K40M GPU. Each node contained dual Intel Xeon E5-2670 (Sandy Bridge) central processing...kernel and as such does not employ multiple processors. This work makes use of a single processing core and a single NVIDIA Kepler K40 GK110...bandwidth (2 × 16 slot), 7.877 GFloat/s; Kepler K40 peak, 4,290 × 1 billion floating-point operations (GFLOPs), and 288 GB/s Kepler K40 memory
Parallelization of a Monte Carlo particle transport simulation code
NASA Astrophysics Data System (ADS)
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Next Generation Security for the 10,240 Processor Columbia System
NASA Technical Reports Server (NTRS)
Hinke, Thomas; Kolano, Paul; Shaw, Derek; Keller, Chris; Tweton, Dave; Welch, Todd; Liu, Wen (Betty)
2005-01-01
This presentation includes a discussion of the Columbia 10,240-processor system located at the NASA Advanced Supercomputing (NAS) division at the NASA Ames Research Center which supports each of NASA's four missions: science, exploration systems, aeronautics, and space operations. It is comprised of 20 Silicon Graphics nodes, each consisting of 512 Itanium II processors. A 64 processor Columbia front-end system supports users as they prepare their jobs and then submits them to the PBS system. Columbia nodes and front-end systems use the Linux OS. Prior to SC04, the Columbia system was used to attain a processing speed of 51.87 TeraFlops, which made it number two on the Top 500 list of the world's supercomputers and the world's fastest "operational" supercomputer since it was fully engaged in supporting NASA users.
Pentium Pro inside. 1; A treecode at 430 Gigaflops on ASCI Red
NASA Technical Reports Server (NTRS)
Warren, M. S.; Becker, D. J.; Sterling, T.; Salmon, J. K.; Goda, M. P.
1997-01-01
As an entry for the 1997 Gordon Bell performance prize, we present results from two methods of solving the gravitational N-body problem on the Intel Teraflops system at Sandia National Laboratory (ASCI Red). The first method, an O(N2) algorithm, obtained 635 Gigaflops for a 1 million particle problem on 6800 Pentium Pro processors. The second solution method, a tree-code which scales as O(N log N), sustained 170 Gigaflops over a continuous 9.4 hour period on 4096 processors, integrating the motion of 322 million mutually interacting particles in a cosmology simulation, while saving over 100 Gigabytes of raw data. Additionally, the tree-code sustained 430 Gigaflops on 6800 processors for the first 5 time-steps of that simulation. This tree-code solution is approximately 105 times more efficient than the O(N2) algorithm for this problem. As an entry for the 1997 Gordon Bell price/performance prize, we present two calculations from the disciplines of astrophysics and fluid dynamics. The simulations were performed on two 16 Pentium Pro processor Beowulf-class computers (Loki and Hyglac) constructed entirely from commodity personal computer technology, at a cost of roughly $50k each in September, 1996. The price of an equivalent system in August 1997 is less than $30. At Los Alamos, Loki performed a gravitational tree-code N-body simulation of galaxy formation using 9.75 million particles, which sustained an average of 879 Mflops over a ten day period, and produced roughly 10 Gbytes of raw data.
High-performance multiprocessor architecture for a 3-D lattice gas model
NASA Technical Reports Server (NTRS)
Lee, F.; Flynn, M.; Morf, M.
1991-01-01
The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
Code of Federal Regulations, 2012 CFR
2012-07-01
... complete corporate fiscal year prior to June 25, 1986. For purposes of this provision, processors of P-TBBA... the respondent's latest complete corporate fiscal year prior to June 25, 1986. Respondents to this... narrative need not include customer identity. (xi) A narrative description of the methods used at each site...
Code of Federal Regulations, 2011 CFR
2011-07-01
... complete corporate fiscal year prior to June 25, 1986. For purposes of this provision, processors of P-TBBA... the respondent's latest complete corporate fiscal year prior to June 25, 1986. Respondents to this... narrative need not include customer identity. (xi) A narrative description of the methods used at each site...
Code of Federal Regulations, 2014 CFR
2014-07-01
... complete corporate fiscal year prior to June 25, 1986. For purposes of this provision, processors of P-TBBA... the respondent's latest complete corporate fiscal year prior to June 25, 1986. Respondents to this... narrative need not include customer identity. (xi) A narrative description of the methods used at each site...
Code of Federal Regulations, 2013 CFR
2013-07-01
... complete corporate fiscal year prior to June 25, 1986. For purposes of this provision, processors of P-TBBA... the respondent's latest complete corporate fiscal year prior to June 25, 1986. Respondents to this... narrative need not include customer identity. (xi) A narrative description of the methods used at each site...
Extraction of Volatiles from Regolith or Soil on Mars, the Moon, and Asteroids
NASA Technical Reports Server (NTRS)
Linne, Diane; Kleinhenz, Julie; Trunek, Andrew; Hoffman, Stephen; Collins, Jacob
2017-01-01
NASA's Advanced Exploration Systems ISRU Technology Project is evaluating concepts to extract water from all resource types Near-term objectives: Produce high-fidelity mass, power, and volume estimates for mining and processing systems Identify critical challenges for development focus Begin demonstration of component and subsystem technologies in relevant environment Several processor types: Closed processors either partially or completely sealed during processing Open air processors operates at Mars ambient conditions In-situ processors Extract product directly without excavation of raw resource Design features Elimination of sweep gas reduces dust particles in water condensate Pressure maintained by height of soil in hopper Model developed to evaluate key design parameters Geometry: conveyor diameter, screw diameter, shaft diameter, flight spacing and pitch Operational: screw speed vs. screw length (residence time) Thermal: Heat flux, heat transfer to soil Testing to demonstrate feasibility and performance Agglomeration, clogging Pressure rise forced flow to condenser.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Crespo, Alejandro C.; Dominguez, Jose M.; Barreiro, Anxo; Gómez-Gesteira, Moncho; Rogers, Benedict D.
2011-01-01
Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA) of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability. PMID:21695185
NASA Astrophysics Data System (ADS)
Lippert, Ross A.; Predescu, Cristian; Ierardi, Douglas J.; Mackenzie, Kenneth M.; Eastwood, Michael P.; Dror, Ron O.; Shaw, David E.
2013-10-01
In molecular dynamics simulations, control over temperature and pressure is typically achieved by augmenting the original system with additional dynamical variables to create a thermostat and a barostat, respectively. These variables generally evolve on timescales much longer than those of particle motion, but typical integrator implementations update the additional variables along with the particle positions and momenta at each time step. We present a framework that replaces the traditional integration procedure with separate barostat, thermostat, and Newtonian particle motion updates, allowing thermostat and barostat updates to be applied infrequently. Such infrequent updates provide a particularly substantial performance advantage for simulations parallelized across many computer processors, because thermostat and barostat updates typically require communication among all processors. Infrequent updates can also improve accuracy by alleviating certain sources of error associated with limited-precision arithmetic. In addition, separating the barostat, thermostat, and particle motion update steps reduces certain truncation errors, bringing the time-average pressure closer to its target value. Finally, this framework, which we have implemented on both general-purpose and special-purpose hardware, reduces software complexity and improves software modularity.
Optimistic barrier synchronization
NASA Technical Reports Server (NTRS)
Nicol, David M.
1992-01-01
Barrier synchronization is fundamental operation in parallel computation. In many contexts, at the point a processor enters a barrier it knows that it has already processed all the work required of it prior to synchronization. The alternative case, when a processor cannot enter a barrier with the assurance that it has already performed all the necessary pre-synchronization computation, is treated. The problem arises when the number of pre-sychronization messages to be received by a processor is unkown, for example, in a parallel discrete simulation or any other computation that is largely driven by an unpredictable exchange of messages. We describe an optimistic O(log sup 2 P) barrier algorithm for such problems, study its performance on a large-scale parallel system, and consider extensions to general associative reductions as well as associative parallel prefix computations.
NASA Astrophysics Data System (ADS)
Feng, Bing
Electron cloud instabilities have been observed in many circular accelerators around the world and raised concerns of future accelerators and possible upgrades. In this thesis, the electron cloud instabilities are studied with the quasi-static particle-in-cell (PIC) code QuickPIC. Modeling in three-dimensions the long timescale propagation of beam in electron clouds in circular accelerators requires faster and more efficient simulation codes. Thousands of processors are easily available for parallel computations. However, it is not straightforward to increase the effective speed of the simulation by running the same problem size on an increasingly number of processors because there is a limit to domain size in the decomposition of the two-dimensional part of the code. A pipelining algorithm applied on the fully parallelized particle-in-cell code QuickPIC is implemented to overcome this limit. The pipelining algorithm uses multiple groups of processors and optimizes the job allocation on the processors in parallel computing. With this novel algorithm, it is possible to use on the order of 102 processors, and to expand the scale and the speed of the simulation with QuickPIC by a similar factor. In addition to the efficiency improvement with the pipelining algorithm, the fidelity of QuickPIC is enhanced by adding two physics models, the beam space charge effect and the dispersion effect. Simulation of two specific circular machines is performed with the enhanced QuickPIC. First, the proposed upgrade to the Fermilab Main Injector is studied with an eye upon guiding the design of the upgrade and code validation. Moderate emittance growth is observed for the upgrade of increasing the bunch population by 5 times. But the simulation also shows that increasing the beam energy from 8GeV to 20GeV or above can effectively limit the emittance growth. Then the enhanced QuickPIC is used to simulate the electron cloud effect on electron beam in the Cornell Energy Recovery Linac (ERL) due to extremely small emittance and high peak currents anticipated in the machine. A tune shift is discovered from the simulation; however, emittance growth of the electron beam in electron cloud is not observed for ERL parameters.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava
2017-01-01
For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particlemore » tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.« less
NASA Astrophysics Data System (ADS)
Tanikawa, Ataru; Yoshikawa, Kohji; Nitadori, Keigo; Okamoto, Takashi
2013-02-01
We have developed a numerical software library for collisionless N-body simulations named "Phantom-GRAPE" which highly accelerates force calculations among particles by use of a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). In our library, not only the Newton's forces, but also central forces with an arbitrary shape f(r), which has a finite cutoff radius rcut (i.e. f(r)=0 at r>rcut), can be quickly computed. In computing such central forces with an arbitrary force shape f(r), we refer to a pre-calculated look-up table. We also present a new scheme to create the look-up table whose binning is optimal to keep good accuracy in computing forces and whose size is small enough to avoid cache misses. Using an Intel Core i7-2600 processor, we measure the performance of our library for both of the Newton's forces and the arbitrarily shaped central forces. In the case of Newton's forces, we achieve 2×109 interactions per second with one processor core (or 75 GFLOPS if we count 38 operations per interaction), which is 20 times higher than the performance of an implementation without any explicit use of SIMD instructions, and 2 times than that with the SSE instructions. With four processor cores, we obtain the performance of 8×109 interactions per second (or 300 GFLOPS). In the case of the arbitrarily shaped central forces, we can calculate 1×109 and 4×109 interactions per second with one and four processor cores, respectively. The performance with one processor core is 6 times and 2 times higher than those of the implementations without any use of SIMD instructions and with the SSE instructions. These performances depend only weakly on the number of particles, irrespective of the force shape. It is good contrast with the fact that the performance of force calculations accelerated by graphics processing units (GPUs) depends strongly on the number of particles. Substantially weak dependence of the performance on the number of particles is suitable to collisionless N-body simulations, since these simulations are usually performed with sophisticated N-body solvers such as Tree- and TreePM-methods combined with an individual timestep scheme. We conclude that collisionless N-body simulations accelerated with our library have significant advantage over those accelerated by GPUs, especially on massively parallel environments.
Environmentally adaptive processing for shallow ocean applications: A sequential Bayesian approach.
Candy, J V
2015-09-01
The shallow ocean is a changing environment primarily due to temperature variations in its upper layers directly affecting sound propagation throughout. The need to develop processors capable of tracking these changes implies a stochastic as well as an environmentally adaptive design. Bayesian techniques have evolved to enable a class of processors capable of performing in such an uncertain, nonstationary (varying statistics), non-Gaussian, variable shallow ocean environment. A solution to this problem is addressed by developing a sequential Bayesian processor capable of providing a joint solution to the modal function tracking and environmental adaptivity problem. Here, the focus is on the development of both a particle filter and an unscented Kalman filter capable of providing reasonable performance for this problem. These processors are applied to hydrophone measurements obtained from a vertical array. The adaptivity problem is attacked by allowing the modal coefficients and/or wavenumbers to be jointly estimated from the noisy measurement data along with tracking of the modal functions while simultaneously enhancing the noisy pressure-field measurements.
Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J
2004-09-01
We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Laser velocimetry: A state-of-the-art overview
NASA Technical Reports Server (NTRS)
Stevenson, W. H.
1982-01-01
General systems design and optical and signal processing requirements for laser velocimetric measurement of flows are reviewed. Bias errors which occur in measurements using burst (counter) processors are discussed and particle seeding requirements are suggested.
NextGen Weather Processor Architecture Study
2010-03-09
achieved in a staged fashion , ideally with new components coming on-line in time to replace existing capabilities prior to their end-of-life dates... fashion , ideally with new components coming on-line in time to replace existing capabilities prior to their end-of-life dates. As part of NWP Segment...weather capabilities. These objectives are to be achieved in a staged fashion , ideally with new components coming on-line in time to replace existing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Procassini, R.J.
1997-12-31
The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution ofmore » particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.« less
NASA Technical Reports Server (NTRS)
Pordes, Ruth (Editor)
1989-01-01
Papers on real-time computer applications in nuclear, particle, and plasma physics are presented, covering topics such as expert systems tactics in testing FASTBUS segment interconnect modules, trigger control in a high energy physcis experiment, the FASTBUS read-out system for the Aleph time projection chamber, a multiprocessor data acquisition systems, DAQ software architecture for Aleph, a VME multiprocessor system for plasma control at the JT-60 upgrade, and a multiasking, multisinked, multiprocessor data acquisition front end. Other topics include real-time data reduction using a microVAX processor, a transputer based coprocessor for VEDAS, simulation of a macropipelined multi-CPU event processor for use in FASTBUS, a distributed VME control system for the LISA superconducting Linac, a distributed system for laboratory process automation, and a distributed system for laboratory process automation. Additional topics include a structure macro assembler for the event handler, a data acquisition and control system for Thomson scattering on ATF, remote procedure execution software for distributed systems, and a PC-based graphic display real-time particle beam uniformity.
NASA Astrophysics Data System (ADS)
Qiang, Ji
2017-10-01
A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
NASA Astrophysics Data System (ADS)
Li, Jiaqiang; Choutko, Vitaly; Xiao, Liyi
2018-03-01
Based on the collection of error data from the Alpha Magnetic Spectrometer (AMS) Digital Signal Processors (DSP), on-orbit Single Event Upsets (SEUs) of the DSP program memory are analyzed. The daily error distribution and time intervals between errors are calculated to evaluate the reliability of the system. The particle density distribution of International Space Station (ISS) orbit is presented and the effects from the South Atlantic Anomaly (SAA) and the geomagnetic poles are analyzed. The impact of solar events on the DSP program memory is carried out combining data analysis and Monte Carlo simulation (MC). From the analysis and simulation results, it is concluded that the area corresponding to the SAA is the main source of errors on the ISS orbit. Solar events can also cause errors on DSP program memory, but the effect depends on the on-orbit particle density.
Design and performance of a high resolution, low latency stripline beam position monitor system
NASA Astrophysics Data System (ADS)
Apsimon, R. J.; Bett, D. R.; Blaskovic Kraljevic, N.; Burrows, P. N.; Christian, G. B.; Clarke, C. I.; Constance, B. D.; Dabiri Khah, H.; Davis, M. R.; Perry, C.; Resta López, J.; Swinson, C. J.
2015-03-01
A high-resolution, low-latency beam position monitor (BPM) system has been developed for use in particle accelerators and beam lines that operate with trains of particle bunches with bunch separations as low as several tens of nanoseconds, such as future linear electron-positron colliders and free-electron lasers. The system was tested with electron beams in the extraction line of the Accelerator Test Facility at the High Energy Accelerator Research Organization (KEK) in Japan. It consists of three stripline BPMs instrumented with analogue signal-processing electronics and a custom digitizer for logging the data. The design of the analogue processor units is presented in detail, along with measurements of the system performance. The processor latency is 15.6 ±0.1 ns . A single-pass beam position resolution of 291 ±10 nm has been achieved, using a beam with a bunch charge of approximately 1 nC.
NASA Technical Reports Server (NTRS)
Sastry, V. S.
1980-01-01
The nature of Brownian motion and historical theoretical investigations of the phenomemon are reviewed. The feasibility of using a laser anemometer to perform small particle experiments in an orbiting space laboratory was investigated using latex particles suspended in water in a plastic container. The optical equipment and the particle Doppler analysis processor are described. The values of the standard deviation obtained for the latex particle motion experiment were significantly large compared to corresponding velocity, therefore, their accuracy was suspect and no attempt was made to draw meaningful conclusions from the results.
SoAx: A generic C++ Structure of Arrays for handling particles in HPC codes
NASA Astrophysics Data System (ADS)
Homann, Holger; Laenen, Francois
2018-03-01
The numerical study of physical problems often require integrating the dynamics of a large number of particles evolving according to a given set of equations. Particles are characterized by the information they are carrying such as an identity, a position other. There are generally speaking two different possibilities for handling particles in high performance computing (HPC) codes. The concept of an Array of Structures (AoS) is in the spirit of the object-oriented programming (OOP) paradigm in that the particle information is implemented as a structure. Here, an object (realization of the structure) represents one particle and a set of many particles is stored in an array. In contrast, using the concept of a Structure of Arrays (SoA), a single structure holds several arrays each representing one property (such as the identity) of the whole set of particles. The AoS approach is often implemented in HPC codes due to its handiness and flexibility. For a class of problems, however, it is known that the performance of SoA is much better than that of AoS. We confirm this observation for our particle problem. Using a benchmark we show that on modern Intel Xeon processors the SoA implementation is typically several times faster than the AoS one. On Intel's MIC co-processors the performance gap even attains a factor of ten. The same is true for GPU computing, using both computational and multi-purpose GPUs. Combining performance and handiness, we present the library SoAx that has optimal performance (on CPUs, MICs, and GPUs) while providing the same handiness as AoS. For this, SoAx uses modern C++ design techniques such template meta programming that allows to automatically generate code for user defined heterogeneous data structures.
7 CFR 274.3 - Retailer management.
Code of Federal Regulations, 2012 CFR
2012-01-01
... retailer, and it must include acceptable privacy and security features. Such systems shall only be... terminals that are capable of relaying electronic transactions to a central database computer for... specifications prior to implementation of the EBT system to enable third party processors to access the database...
The 2nd Symposium on the Frontiers of Massively Parallel Computations
NASA Technical Reports Server (NTRS)
Mills, Ronnie (Editor)
1988-01-01
Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
An FPGA computing demo core for space charge simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Jinyuan; Huang, Yifei; /Fermilab
2009-01-01
In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computedmore » using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.« less
Electro-optic voltage sensor with beam splitting
Woods, Gregory K.; Renak, Todd W.; Davidson, James R.; Crawford, Thomas M.
2002-01-01
The invention is a miniature electro-optic voltage sensor system capable of accurate operation at high voltages without use of the dedicated voltage dividing hardware typically found in the prior art. The invention achieves voltage measurement without significant error contributions from neighboring conductors or environmental perturbations. The invention employs a transmitter, a sensor, a detector, and a signal processor. The transmitter produces a beam of electromagnetic radiation which is routed into the sensor. Within the sensor the beam undergoes the Pockels electro-optic effect. The electro-optic effect produces a modulation of the beam's polarization, which is in turn converted to a pair of independent conversely-amplitude-modulated signals, from which the voltage of the E-field is determined by the signal processor. The use of converse AM signals enables the signal processor to better distinguish signal from noise. The sensor converts the beam by splitting the beam in accordance with the axes of the beam's polarization state (an ellipse) into at least two AM signals. These AM signals are fed into a signal processor and processed to determine the voltage between a ground conductor and the conductor on which voltage is being measured.
7 CFR 457.168 - Mustard crop insurance provisions.
Code of Federal Regulations, 2012 CFR
2012-01-01
.... Harvest. Combining or threshing for seed. A crop that is swathed prior to combining is not considered... contained in the Basic Provisions, mustard seed must be planted in rows. Acreage planted in any other manner... written agreement. Processor. Any business enterprise regularly engaged in buying and processing mustard...
7 CFR 457.168 - Mustard crop insurance provisions.
Code of Federal Regulations, 2014 CFR
2014-01-01
.... Harvest. Combining or threshing for seed. A crop that is swathed prior to combining is not considered... contained in the Basic Provisions, mustard seed must be planted in rows. Acreage planted in any other manner... written agreement. Processor. Any business enterprise regularly engaged in buying and processing mustard...
Code of Federal Regulations, 2014 CFR
2014-10-01
... fishery allocations. Prior to the setting of fishery allocations, the TAC, ACL, or ACT when specified, is... in the non-groundfish fishery that is deducted from the ACL or ACT when specified. (2) The commercial... from the TAC, OY, ACL, or ACT when specified. For the catcher/processor and mothership sectors of the...
Code of Federal Regulations, 2013 CFR
2013-10-01
... fishery allocations. Prior to the setting of fishery allocations, the TAC, ACL, or ACT when specified, is... in the non-groundfish fishery that is deducted from the ACL or ACT when specified. (2) The commercial... from the TAC, OY, ACL, or ACT when specified. For the catcher/processor and mothership sectors of the...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wetzstein, M.; Nelson, Andrew F.; Naab, T.
2009-10-01
We present a numerical code for simulating the evolution of astrophysical systems using particles to represent the underlying fluid flow. The code is written in Fortran 95 and is designed to be versatile, flexible, and extensible, with modular options that can be selected either at the time the code is compiled or at run time through a text input file. We include a number of general purpose modules describing a variety of physical processes commonly required in the astrophysical community and we expect that the effort required to integrate additional or alternate modules into the code will be small. Inmore » its simplest form the code can evolve the dynamical trajectories of a set of particles in two or three dimensions using a module which implements either a Leapfrog or Runge-Kutta-Fehlberg integrator, selected by the user at compile time. The user may choose to allow the integrator to evolve the system using individual time steps for each particle or with a single, global time step for all. Particles may interact gravitationally as N-body particles, and all or any subset may also interact hydrodynamically, using the smoothed particle hydrodynamic (SPH) method by selecting the SPH module. A third particle species can be included with a module to model massive point particles which may accrete nearby SPH or N-body particles. Such particles may be used to model, e.g., stars in a molecular cloud. Free boundary conditions are implemented by default, and a module may be selected to include periodic boundary conditions. We use a binary 'Press' tree to organize particles for rapid access in gravity and SPH calculations. Modules implementing an interface with special purpose 'GRAPE' hardware may also be selected to accelerate the gravity calculations. If available, forces obtained from the GRAPE coprocessors may be transparently substituted for those obtained from the tree, or both tree and GRAPE may be used as a combination GRAPE/tree code. The code may be run without modification on single processors or in parallel using OpenMP compiler directives on large-scale, shared memory parallel machines. We present simulations of several test problems, including a merger simulation of two elliptical galaxies with 800,000 particles. In comparison to the Gadget-2 code of Springel, the gravitational force calculation, which is the most costly part of any simulation including self-gravity, is {approx}4.6-4.9 times faster with VINE when tested on different snapshots of the elliptical galaxy merger simulation when run on an Itanium 2 processor in an SGI Altix. A full simulation of the same setup with eight processors is a factor of 2.91 faster with VINE. The code is available to the public under the terms of the Gnu General Public License.« less
NASA Astrophysics Data System (ADS)
Wetzstein, M.; Nelson, Andrew F.; Naab, T.; Burkert, A.
2009-10-01
We present a numerical code for simulating the evolution of astrophysical systems using particles to represent the underlying fluid flow. The code is written in Fortran 95 and is designed to be versatile, flexible, and extensible, with modular options that can be selected either at the time the code is compiled or at run time through a text input file. We include a number of general purpose modules describing a variety of physical processes commonly required in the astrophysical community and we expect that the effort required to integrate additional or alternate modules into the code will be small. In its simplest form the code can evolve the dynamical trajectories of a set of particles in two or three dimensions using a module which implements either a Leapfrog or Runge-Kutta-Fehlberg integrator, selected by the user at compile time. The user may choose to allow the integrator to evolve the system using individual time steps for each particle or with a single, global time step for all. Particles may interact gravitationally as N-body particles, and all or any subset may also interact hydrodynamically, using the smoothed particle hydrodynamic (SPH) method by selecting the SPH module. A third particle species can be included with a module to model massive point particles which may accrete nearby SPH or N-body particles. Such particles may be used to model, e.g., stars in a molecular cloud. Free boundary conditions are implemented by default, and a module may be selected to include periodic boundary conditions. We use a binary "Press" tree to organize particles for rapid access in gravity and SPH calculations. Modules implementing an interface with special purpose "GRAPE" hardware may also be selected to accelerate the gravity calculations. If available, forces obtained from the GRAPE coprocessors may be transparently substituted for those obtained from the tree, or both tree and GRAPE may be used as a combination GRAPE/tree code. The code may be run without modification on single processors or in parallel using OpenMP compiler directives on large-scale, shared memory parallel machines. We present simulations of several test problems, including a merger simulation of two elliptical galaxies with 800,000 particles. In comparison to the Gadget-2 code of Springel, the gravitational force calculation, which is the most costly part of any simulation including self-gravity, is ~4.6-4.9 times faster with VINE when tested on different snapshots of the elliptical galaxy merger simulation when run on an Itanium 2 processor in an SGI Altix. A full simulation of the same setup with eight processors is a factor of 2.91 faster with VINE. The code is available to the public under the terms of the Gnu General Public License.
Superposing pure quantum states with partial prior information
NASA Astrophysics Data System (ADS)
Dogra, Shruti; Thomas, George; Ghosh, Sibasish; Suter, Dieter
2018-05-01
The principle of superposition is an intriguing feature of quantum mechanics, which is regularly exploited in many different circumstances. A recent work [M. Oszmaniec et al., Phys. Rev. Lett. 116, 110403 (2016), 10.1103/PhysRevLett.116.110403] shows that the fundamentals of quantum mechanics restrict the process of superimposing two unknown pure states, even though it is possible to superimpose two quantum states with partial prior knowledge. The prior knowledge imposes geometrical constraints on the choice of input states. We discuss an experimentally feasible protocol to superimpose multiple pure states of a d -dimensional quantum system and carry out an explicit experimental realization for two single-qubit pure states with partial prior information on a two-qubit NMR quantum information processor.
Reconfigurable Drive Current System
NASA Technical Reports Server (NTRS)
Alhorn, Dean C. (Inventor); Dutton, Kenneth R. (Inventor); Howard, David E. (Inventor); Smith, Dennis A. (Inventor)
2017-01-01
A reconfigurable drive current system includes drive stages, each of which includes a high-side transistor and a low-side transistor in a totem pole configuration. A current monitor is coupled to an output of each drive stage. Input channels are provided to receive input signals. A processor is coupled to the input channels and to each current monitor for generating at least one drive signal using at least one of the input signals and current measured by at least one of the current monitors. A pulse width modulation generator is coupled to the processor and each drive stage for varying the drive signals as a function of time prior to being supplied to at least one of the drive stages.
Experience with custom processors in space flight applications
NASA Technical Reports Server (NTRS)
Fraeman, M. E.; Hayes, J. R.; Lohr, D. A.; Ballard, B. W.; Williams, R. L.; Henshaw, R. M.
1991-01-01
The Applied Physics Laboratory (APL) has developed a magnetometer instrument for a swedish satellite named Freja with launch scheduled for August 1992 on a Chinese Long March rocket. The magnetometer controller utilized a custom microprocessor designed at APL with the Genesil silicon compiler. The processor evolved from our experience with an older bit-slice design and two prior single chip efforts. The architecture of our microprocessor greatly lowered software development costs because it was optimized to provide an interactive and extensible programming environment hosted by the target hardware. Radiation tolerance of the microprocessor was also tested and was adequate for Freja's mission -- 20 kRad(Si) total dose and very infrequent latch-up and single event upset events.
Using an ARM Processor to boost data acquisition rates
NASA Astrophysics Data System (ADS)
Brown, Anthony; Seaquest Collaboration
2015-10-01
It has been proposed, Fermilab E-1067, to use the SeaQuest (E906/E1039/1037) dimuon spectrometer to do a search for the dark photon and dark Higgs. The concept is that it would run in a parasitic mode with only minor upgrades to the spectrometer. There are various requirements for the upgrades but one of them is to increase the DAQ rates and one minimal cost approach to do this will be discussed. The currently running SeaQuest (E906) experiment has modest rate requirements of around 1 kHz. Since the dark particle search would involve recording particles originating in the first magnet used as a beam dump, the data rate will be higher than recording events just from the target. Thus the DAQ rate capability will need to be increased to around 10 kHz. There exists a possible very low cost solution as the Academica Sinica designed TDCs contains an ARM processor that was not needed to meet the original SeaQuest (E906 needs). Since the 120 GeV beam from the Main Injector is delivered in a 4 second spill, once per minute and the ARM processor on the TDC has two dual-ported memory chips, these could be used to store data during each spill and then read the data out in the time between spills.
The artificial retina processor for track reconstruction at the LHC crossing rate
Abba, A.; Bedeschi, F.; Citterio, M.; ...
2015-03-16
We present results of an R&D study for a specialized processor capable of precisely reconstructing, in pixel detectors, hundreds of charged-particle tracks from high-energy collisions at 40 MHz rate. We apply a highly parallel pattern-recognition algorithm, inspired by studies of the processing of visual images by the brain as it happens in nature, and describe in detail an efficient hardware implementation in high-speed, high-bandwidth FPGA devices. This is the first detailed demonstration of reconstruction of offline-quality tracks at 40 MHz and makes the device suitable for processing Large Hadron Collider events at the full crossing frequency.
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmalz, Mark S
2011-07-24
Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G}more » for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient parallel computation of particle and fluid dynamics simulations. These problems occur throughout DOE, military and commercial sectors: the potential payoff is high. We plan to license or sell the solution to contractors for military and domestic applications such as disaster simulation (aerodynamic and hydrodynamic), Government agencies (hydrological and environmental simulations), and medical applications (e.g., in tomographic image reconstruction). Keywords - High-performance Computing, Graphic Processing Unit, Fluid/Particle Simulation. Summary for Members of Congress - Department of Energy has many simulation codes that must compute faster, to be effective. The Phase I research parallelized particle/fluid simulations for rocket combustion, for high-performance computing systems.« less
An artificial retina processor for track reconstruction at the LHC crossing rate
Bedeschi, F.; Cenci, R.; Marino, P.; ...
2017-11-23
The goal of the INFN-RETINA R&D project is to develop and implement a computational methodology that allows to reconstruct events with a large number (> 100) of charged-particle tracks in pixel and silicon strip detectors at 40 MHz, thus matching the requirements for processing LHC events at the full bunch-crossing frequency. Our approach relies on a parallel pattern-recognition algorithm, dubbed artificial retina, inspired by the early stages of image processing by the brain. In order to demonstrate that a track-processing system based on this algorithm is feasible, we built a sizable prototype of a tracking processor tuned to 3 000more » patterns, based on already existing readout boards equipped with Altera Stratix III FPGAs. The detailed geometry and charged-particle activity of a large tracking detector currently in operation are used to assess its performances. Here, we report on the test results with such a prototype.« less
An artificial retina processor for track reconstruction at the LHC crossing rate
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bedeschi, F.; Cenci, R.; Marino, P.
The goal of the INFN-RETINA R&D project is to develop and implement a computational methodology that allows to reconstruct events with a large number (> 100) of charged-particle tracks in pixel and silicon strip detectors at 40 MHz, thus matching the requirements for processing LHC events at the full bunch-crossing frequency. Our approach relies on a parallel pattern-recognition algorithm, dubbed artificial retina, inspired by the early stages of image processing by the brain. In order to demonstrate that a track-processing system based on this algorithm is feasible, we built a sizable prototype of a tracking processor tuned to 3 000more » patterns, based on already existing readout boards equipped with Altera Stratix III FPGAs. The detailed geometry and charged-particle activity of a large tracking detector currently in operation are used to assess its performances. Here, we report on the test results with such a prototype.« less
CoNNeCT Baseband Processor Module
NASA Technical Reports Server (NTRS)
Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.
2011-01-01
A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.
CT Imaging of Hardwood Logs for Lumber Production
Daniel L. Schmoldt; Pei Li; A. Lynn Abbott
1996-01-01
Hardwood sawmill operators need to improve the conversion of raw material (logs) into lumber. Internal log scanning provides detailed information that can aid log processors in improving lumber recovery. However, scanner data (i.e. tomographic images) need to be analyzed prior to presentation to saw operators. Automatic labeling of computer tomography (CT) images is...
Perceptions of Ability to Program or to Use a Word Processor.
ERIC Educational Resources Information Center
Colley, Ann; And Others
1996-01-01
This study examined 117 undergraduates' perceptions of ability at computer programming and word processing. In particular, it rated the importance of prior experience factors, keyboarding skills, and personal attributes such as enjoyment of problem solving. Those were discovered, in general, to be more important than formal training or aptitude in…
Speech production in experienced cochlear implant users undergoing short-term auditory deprivation
NASA Astrophysics Data System (ADS)
Greenman, Geoffrey; Tjaden, Kris; Kozak, Alexa T.
2005-09-01
This study examined the effect of short-term auditory deprivation on the speech production of five postlingually deafened women, all of whom were experienced cochlear implant users. Each cochlear implant user, as well as age and gender matched control speakers, produced CVC target words embedded in a reading passage. Speech samples for the deafened adults were collected on two separate occasions. First, the speakers were recorded after wearing their speech processor consistently for at least two to three hours prior to recording (implant ``ON''). The second recording occurred when the speakers had their speech processors turned off for approximately ten to twelve hours prior to recording (implant ``OFF''). Acoustic measures, including fundamental frequency (F0), the first (F1) and second (F2) formants of the vowels, vowel space area, vowel duration, spectral moments of the consonants, as well as utterance duration and sound pressure level (SPL) across the entire utterance were analyzed in both speaking conditions. For each implant speaker, acoustic measures will be compared across implant ``ON'' and implant ``OFF'' speaking conditions, and will also be compared to data obtained from normal hearing speakers.
Energetic Particle Loss Estimates in W7-X
NASA Astrophysics Data System (ADS)
Lazerson, Samuel; Akaslompolo, Simppa; Drevlak, Micheal; Wolf, Robert; Darrow, Douglass; Gates, David; W7-X Team
2017-10-01
The collisionless loss of high energy H+ and D+ ions in the W7-X device are examined using the BEAMS3D code. Simulations of collisionless losses are performed for a large ensemble of particles distributed over various flux surfaces. A clear loss cone of particles is present in the distribution for all particles. These simulations are compared against slowing down simulations in which electron impact, ion impact, and pitch angle scattering are considered. Full device simulations allow tracing of particle trajectories to the first wall components. These simulations provide estimates for placement of a novel set of energetic particle detectors. Recent performance upgrades to the code are allowing simulations with > 1000 processors providing high fidelity simulations. Speedup and future works are discussed. DE-AC02-09CH11466.
TiCl4 as a source of TiO2 particles for laser anemometry measurements in hot gas
NASA Technical Reports Server (NTRS)
Weikle, Donald H.; Seasholtz, Richard G.; Oberle, Lawrence G.
1990-01-01
A method of reacting TiCl4 with water saturated gaseous nitrogen (GN2) at the entrance into a high temperature gas flow is described. The TiO2 particles formed are then entrained in the gas flow and used as seed particles for making laser anemometry (LA) measurements of the flow velocity distribution in the hot gas. Scanning electron microscope photographs of the TiO2 particles are shown. Data rate of the LA processor was measured to determine the amount of TiO2 formed. The TiCl4 and mixing gas flow diagram is shown. This work was performed in an open jet burner.
IMPETUS - Interactive MultiPhysics Environment for Unified Simulations.
Ha, Vi Q; Lykotrafitis, George
2016-12-08
We introduce IMPETUS - Interactive MultiPhysics Environment for Unified Simulations, an object oriented, easy-to-use, high performance, C++ program for three-dimensional simulations of complex physical systems that can benefit a large variety of research areas, especially in cell mechanics. The program implements cross-communication between locally interacting particles and continuum models residing in the same physical space while a network facilitates long-range particle interactions. Message Passing Interface is used for inter-processor communication for all simulations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Dynamic load balancing algorithm for molecular dynamics based on Voronoi cells domain decompositions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fattebert, J.-L.; Richards, D.F.; Glosli, J.N.
2012-12-01
We present a new algorithm for automatic parallel load balancing in classical molecular dynamics. It assumes a spatial domain decomposition of particles into Voronoi cells. It is a gradient method which attempts to minimize a cost function by displacing Voronoi sites associated with each processor/sub-domain along steepest descent directions. Excellent load balance has been obtained for quasi-2D and 3D practical applications, with up to 440·10 6 particles on 65,536 MPI tasks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fragoso, Margarida; Kawrakow, Iwan; Faddegon, Bruce A.
In this work, an investigation of efficiency enhancing methods and cross-section data in the BEAMnrc Monte Carlo (MC) code system is presented. Additionally, BEAMnrc was compared with VMC++, another special-purpose MC code system that has recently been enhanced for the simulation of the entire treatment head. BEAMnrc and VMC++ were used to simulate a 6 MV photon beam from a Siemens Primus linear accelerator (linac) and phase space (PHSP) files were generated at 100 cm source-to-surface distance for the 10x10 and 40x40 cm{sup 2} field sizes. The BEAMnrc parameters/techniques under investigation were grouped by (i) photon and bremsstrahlung cross sections,more » (ii) approximate efficiency improving techniques (AEITs), (iii) variance reduction techniques (VRTs), and (iv) a VRT (bremsstrahlung photon splitting) in combination with an AEIT (charged particle range rejection). The BEAMnrc PHSP file obtained without the efficiency enhancing techniques under study or, when not possible, with their default values (e.g., EXACT algorithm for the boundary crossing algorithm) and with the default cross-section data (PEGS4 and Bethe-Heitler) was used as the ''base line'' for accuracy verification of the PHSP files generated from the different groups described previously. Subsequently, a selection of the PHSP files was used as input for DOSXYZnrc-based water phantom dose calculations, which were verified against measurements. The performance of the different VRTs and AEITs available in BEAMnrc and of VMC++ was specified by the relative efficiency, i.e., by the efficiency of the MC simulation relative to that of the BEAMnrc base-line calculation. The highest relative efficiencies were {approx}935 ({approx}111 min on a single 2.6 GHz processor) and {approx}200 ({approx}45 min on a single processor) for the 10x10 field size with 50 million histories and 40x40 cm{sup 2} field size with 100 million histories, respectively, using the VRT directional bremsstrahlung splitting (DBS) with no electron splitting. When DBS was used with electron splitting and combined with augmented charged particle range rejection, a technique recently introduced in BEAMnrc, relative efficiencies were {approx}420 ({approx}253 min on a single processor) and {approx}175 ({approx}58 min on a single processor) for the 10x10 and 40x40 cm{sup 2} field sizes, respectively. Calculations of the Siemens Primus treatment head with VMC++ produced relative efficiencies of {approx}1400 ({approx}6 min on a single processor) and {approx}60 ({approx}4 min on a single processor) for the 10x10 and 40x40 cm{sup 2} field sizes, respectively. BEAMnrc PHSP calculations with DBS alone or DBS in combination with charged particle range rejection were more efficient than the other efficiency enhancing techniques used. Using VMC++, accurate simulations of the entire linac treatment head were performed within minutes on a single processor. Noteworthy differences ({+-}1%-3%) in the mean energy, planar fluence, and angular and spectral distributions were observed with the NIST bremsstrahlung cross sections compared with those of Bethe-Heitler (BEAMnrc default bremsstrahlung cross section). However, MC calculated dose distributions in water phantoms (using combinations of VRTs/AEITs and cross-section data) agreed within 2% of measurements. Furthermore, MC calculated dose distributions in a simulated water/air/water phantom, using NIST cross sections, were within 2% agreement with the BEAMnrc Bethe-Heitler default case.« less
NASA Astrophysics Data System (ADS)
Couvidat, F.; Sartelet, K.
2014-01-01
The Secondary Organic Aerosol Processor (SOAP v1.0) model is presented. This model is designed to be modular with different user options depending on the computing time and the complexity required by the user. This model is based on the molecular surrogate approach, in which each surrogate compound is associated with a molecular structure to estimate some properties and parameters (hygroscopicity, absorption on the aqueous phase of particles, activity coefficients, phase separation). Each surrogate can be hydrophilic (condenses only on the aqueous phase of particles), hydrophobic (condenses only on the organic phase of particles) or both (condenses on both the aqueous and the organic phases of particles). Activity coefficients are computed with the UNIFAC thermodynamic model for short-range interactions and with the AIOMFAC parameterization for medium and long-range interactions between electrolytes and organic compounds. Phase separation is determined by Gibbs energy minimization. The user can choose between an equilibrium and a dynamic representation of the organic aerosol. In the equilibrium representation, compounds in the particle phase are assumed to be at equilibrium with the gas phase. However, recent studies show that the organic aerosol (OA) is not at equilibrium with the gas phase because the organic phase could be semi-solid (very viscous liquid phase). The condensation or evaporation of organic compounds could then be limited by the diffusion in the organic phase due to the high viscosity. A dynamic representation of secondary organic aerosols (SOA) is used with OA divided into layers, the first layer at the center of the particle (slowly reaches equilibrium) and the final layer near the interface with the gas phase (quickly reaches equilibrium).
SU-F-SPS-09: Parallel MC Kernel Calculations for VMAT Plan Improvement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chamberlain, S; Roswell Park Cancer Institute, Buffalo, NY; French, S
Purpose: Adding kernels (small perturbations in leaf positions) to the existing apertures of VMAT control points may improve plan quality. We investigate the calculation of kernel doses using a parallelized Monte Carlo (MC) method. Methods: A clinical prostate VMAT DICOM plan was exported from Eclipse. An arbitrary control point and leaf were chosen, and a modified MLC file was created, corresponding to the leaf position offset by 0.5cm. The additional dose produced by this 0.5 cm × 0.5 cm kernel was calculated using the DOSXYZnrc component module of BEAMnrc. A range of particle history counts were run (varying from 3more » × 10{sup 6} to 3 × 10{sup 7}); each job was split among 1, 10, or 100 parallel processes. A particle count of 3 × 10{sup 6} was established as the lower range because it provided the minimal accuracy level. Results: As expected, an increase in particle counts linearly increases run time. For the lowest particle count, the time varied from 30 hours for the single-processor run, to 0.30 hours for the 100-processor run. Conclusion: Parallel processing of MC calculations in the EGS framework significantly decreases time necessary for each kernel dose calculation. Particle counts lower than 1 × 10{sup 6} have too large of an error to output accurate dose for a Monte Carlo kernel calculation. Future work will investigate increasing the number of parallel processes and optimizing run times for multiple kernel calculations.« less
USDA-ARS?s Scientific Manuscript database
Introduction: Current regulations require that juice processors effect a 5 log CFU/ml reduction of a target pathogen prior to distributing products. Whereas thermal pasteurization reduces the sensory characteristics of juice by altering flavor components, pulsed electric field (PEF) treatment may ...
Code of Federal Regulations, 2011 CFR
2011-07-01
... 40 Protection of Environment 16 2011-07-01 2011-07-01 false What alternative sulfur standards and... ADDITIVES Gasoline Sulfur Gasoline Sulfur Standards § 80.213 What alternative sulfur standards and... TGP meets the applicable sulfur standards under § 80.210 or § 80.220, prior to the TGP leaving the...
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 16 2010-07-01 2010-07-01 false What alternative sulfur standards and... ADDITIVES Gasoline Sulfur Gasoline Sulfur Standards § 80.213 What alternative sulfur standards and... TGP meets the applicable sulfur standards under § 80.210 or § 80.220, prior to the TGP leaving the...
Code of Federal Regulations, 2014 CFR
2014-07-01
... 40 Protection of Environment 17 2014-07-01 2014-07-01 false What alternative sulfur standards and... ADDITIVES Gasoline Sulfur Gasoline Sulfur Standards § 80.213 What alternative sulfur standards and... TGP meets the applicable sulfur standards under § 80.210 or § 80.220, prior to the TGP leaving the...
Code of Federal Regulations, 2013 CFR
2013-07-01
... 40 Protection of Environment 17 2013-07-01 2013-07-01 false What alternative sulfur standards and... ADDITIVES Gasoline Sulfur Gasoline Sulfur Standards § 80.213 What alternative sulfur standards and... TGP meets the applicable sulfur standards under § 80.210 or § 80.220, prior to the TGP leaving the...
Fast particles identification in programmable form at level-0 trigger by means of the 3D-Flow system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crosetto, Dario B.
1998-10-30
The 3D-Flow Processor system is a new, technology-independent concept in very fast, real-time system architectures. Based on either an FPGA or an ASIC implementation, it can address, in a fully programmable manner, applications where commercially available processors would fail because of throughput requirements. Possible applications include filtering-algorithms (pattern recognition) from the input of multiple sensors, as well as moving any input validated by these filtering-algorithms to a single output channel. Both operations can easily be implemented on a 3D-Flow system to achieve a real-time processing system with a very short lag time. This system can be built either with off-the-shelfmore » FPGAs or, for higher data rates, with CMOS chips containing 4 to 16 processors each. The basic building block of the system, a 3D-Flow processor, has been successfully designed in VHDL code written in ''Generic HDL'' (mostly made of reusable blocks that are synthesizable in different technologies, or FPGAs), to produce a netlist for a four-processor ASIC featuring 0.35 micron CBA (Ceil Base Array) technology at 3.3 Volts, 884 mW power dissipation at 60 MHz and 63.75 mm sq. die size. The same VHDL code has been targeted to three FPGA manufacturers (Altera EPF10K250A, ORCA-Lucent Technologies 0R3T165 and Xilinx XCV1000). A complete set of software tools, the 3D-Flow System Manager, equally applicable to ASIC or FPGA implementations, has been produced to provide full system simulation, application development, real-time monitoring, and run-time fault recovery. Today's technology can accommodate 16 processors per chip in a medium size die, at a cost per processor of less than $5 based on the current silicon die/size technology cost.« less
NASA Technical Reports Server (NTRS)
Venter, Gerhard; Sobieszczanski-Sobieski Jaroslaw
2002-01-01
The purpose of this paper is to show how the search algorithm known as particle swarm optimization performs. Here, particle swarm optimization is applied to structural design problems, but the method has a much wider range of possible applications. The paper's new contributions are improvements to the particle swarm optimization algorithm and conclusions and recommendations as to the utility of the algorithm, Results of numerical experiments for both continuous and discrete applications are presented in the paper. The results indicate that the particle swarm optimization algorithm does locate the constrained minimum design in continuous applications with very good precision, albeit at a much higher computational cost than that of a typical gradient based optimizer. However, the true potential of particle swarm optimization is primarily in applications with discrete and/or discontinuous functions and variables. Additionally, particle swarm optimization has the potential of efficient computation with very large numbers of concurrently operating processors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nelson, Andrew F.; Wetzstein, M.; Naab, T.
2009-10-01
We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the codemore » necessary to obtain forces efficiently from special purpose 'GRAPE' hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE. Finally, we find that although parallel performance on small problems may reach a plateau beyond which more processors bring no additional speedup, performance never decreases, a factor important for running large simulations on many processors with individual time steps, where only a small fraction of the total particles require updates at any given moment.« less
Multi-Parameter Scattering Sensor and Methods
NASA Technical Reports Server (NTRS)
Greenberg, Paul S. (Inventor); Fischer, David G. (Inventor)
2016-01-01
Methods, detectors and systems detect particles and/or measure particle properties. According to one embodiment, a detector for detecting particles comprises: a sensor for receiving radiation scattered by an ensemble of particles; and a processor for determining a physical parameter for the detector, or an optimal detection angle or a bound for an optimal detection angle, for measuring at least one moment or integrated moment of the ensemble of particles, the physical parameter, or detection angle, or detection angle bound being determined based on one or more of properties (a) and/or (b) and/or (c) and/or (d) or ranges for one or more of properties (a) and/or (b) and/or (c) and/or (d), wherein (a)-(d) are the following: (a) is a wavelength of light incident on the particles, (b) is a count median diameter or other characteristic size parameter of the particle size distribution, (c) is a standard deviation or other characteristic width parameter of the particle size distribution, and (d) is a refractive index of particles.
USDA-ARS?s Scientific Manuscript database
BACKGROUND: Texture is a major quality parameter for the acceptability of canned whole beans. Prior knowledge of this quality trait before processing would be useful to guide variety development by bean breeders and optimize handling protocols by processors. The objective of this study was to evalua...
Particle Number Dependence of the N-body Simulations of Moon Formation
NASA Astrophysics Data System (ADS)
Sasaki, Takanori; Hosono, Natsuki
2018-04-01
The formation of the Moon from the circumterrestrial disk has been investigated by using N-body simulations with the number N of particles limited from 104 to 105. We develop an N-body simulation code on multiple Pezy-SC processors and deploy Framework for Developing Particle Simulators to deal with large number of particles. We execute several high- and extra-high-resolution N-body simulations of lunar accretion from a circumterrestrial disk of debris generated by a giant impact on Earth. The number of particles is up to 107, in which 1 particle corresponds to a 10 km sized satellitesimal. We find that the spiral structures inside the Roche limit radius differ between low-resolution simulations (N ≤ 105) and high-resolution simulations (N ≥ 106). According to this difference, angular momentum fluxes, which determine the accretion timescale of the Moon also depend on the numerical resolution.
Implementation of unsteady sampling procedures for the parallel direct simulation Monte Carlo method
NASA Astrophysics Data System (ADS)
Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.
2008-06-01
An unsteady sampling routine for a general parallel direct simulation Monte Carlo method called PDSC is introduced, allowing the simulation of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and simulation time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by simulating shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. Simulations are then conducted of two applications involving the interaction of shocks over wedges. The results of these simulations are compared to experimental data and simulations from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Satake, Shin-ichi; Kanamori, Hiroyuki; Kunugi, Tomoaki
2007-02-01
We have developed a parallel algorithm for microdigital-holographic particle-tracking velocimetry. The algorithm is used in (1) numerical reconstruction of a particle image computer using a digital hologram, and (2) searching for particles. The numerical reconstruction from the digital hologram makes use of the Fresnel diffraction equation and the FFT (fast Fourier transform),whereas the particle search algorithm looks for local maximum graduation in a reconstruction field represented by a 3D matrix. To achieve high performance computing for both calculations (reconstruction and particle search), two memory partitions are allocated to the 3D matrix. In this matrix, the reconstruction part consists of horizontallymore » placed 2D memory partitions on the x-y plane for the FFT, whereas, the particle search part consists of vertically placed 2D memory partitions set along the z axes.Consequently, the scalability can be obtained for the proportion of processor elements,where the benchmarks are carried out for parallel computation by a SGI Altix machine.« less
Particle-sampling statistics in laser anemometers Sample-and-hold systems and saturable systems
NASA Technical Reports Server (NTRS)
Edwards, R. V.; Jensen, A. S.
1983-01-01
The effect of the data-processing system on the particle statistics obtained with laser anemometry of flows containing suspended particles is examined. Attention is given to the sample and hold processor, a pseudo-analog device which retains the last measurement until a new measurement is made, followed by time-averaging of the data. The second system considered features a dead time, i.e., a saturable system with a significant reset time with storage in a data buffer. It is noted that the saturable system operates independent of the particle arrival rate. The probabilities of a particle arrival in a given time period are calculated for both processing systems. It is shown that the system outputs are dependent on the mean particle flow rate, the flow correlation time, and the flow statistics, indicating that the particle density affects both systems. The results are significant for instances of good correlation between the particle density and velocity, such as occurs near the edge of a jet.
Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F
2002-02-01
This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Particle Based Simulations of Complex Systems with MP2C : Hydrodynamics and Electrostatics
NASA Astrophysics Data System (ADS)
Sutmann, Godehard; Westphal, Lidia; Bolten, Matthias
2010-09-01
Particle based simulation methods are well established paths to explore system behavior on microscopic to mesoscopic time and length scales. With the development of new computer architectures it becomes more and more important to concentrate on local algorithms which do not need global data transfer or reorganisation of large arrays of data across processors. This requirement strongly addresses long-range interactions in particle systems, i.e. mainly hydrodynamic and electrostatic contributions. In this article, emphasis is given to the implementation and parallelization of the Multi-Particle Collision Dynamics method for hydrodynamic contributions and a splitting scheme based on Multigrid for electrostatic contributions. Implementations are done for massively parallel architectures and are demonstrated for the IBM Blue Gene/P architecture Jugene in Jülich.
Proceedings of the 14th International Conference on the Numerical Simulation of Plasmas
NASA Astrophysics Data System (ADS)
Partial Contents are as follows: Numerical Simulations of the Vlasov-Maxwell Equations by Coupled Particle-Finite Element Methods on Unstructured Meshes; Electromagnetic PIC Simulations Using Finite Elements on Unstructured Grids; Modelling Travelling Wave Output Structures with the Particle-in-Cell Code CONDOR; SST--A Single-Slice Particle Simulation Code; Graphical Display and Animation of Data Produced by Electromagnetic, Particle-in-Cell Codes; A Post-Processor for the PEST Code; Gray Scale Rendering of Beam Profile Data; A 2D Electromagnetic PIC Code for Distributed Memory Parallel Computers; 3-D Electromagnetic PIC Simulation on the NRL Connection Machine; Plasma PIC Simulations on MIMD Computers; Vlasov-Maxwell Algorithm for Electromagnetic Plasma Simulation on Distributed Architectures; MHD Boundary Layer Calculation Using the Vortex Method; and Eulerian Codes for Plasma Simulations.
USDA-ARS?s Scientific Manuscript database
Current FDA regulations require that juice processors effect a 5 log CFU/ml reduction of a target pathogen prior to distributing products. Whereas thermal pasteurization reduces the sensory characteristics of juice by altering flavor components, pulsed electric field (PEF) treatment can be conducte...
An Intrinsically Digital Amplification Scheme for Hearing Aids
NASA Astrophysics Data System (ADS)
Blamey, Peter J.; Macfarlane, David S.; Steele, Brenton R.
2005-12-01
Results for linear and wide-dynamic range compression were compared with a new 64-channel digital amplification strategy in three separate studies. The new strategy addresses the requirements of the hearing aid user with efficient computations on an open-platform digital signal processor (DSP). The new amplification strategy is not modeled on prior analog strategies like compression and linear amplification, but uses statistical analysis of the signal to optimize the output dynamic range in each frequency band independently. Using the open-platform DSP processor also provided the opportunity for blind trial comparisons of the different processing schemes in BTE and ITE devices of a high commercial standard. The speech perception scores and questionnaire results show that it is possible to provide improved audibility for sound in many narrow frequency bands while simultaneously improving comfort, speech intelligibility in noise, and sound quality.
NASA Technical Reports Server (NTRS)
Jacob, Joseph; Katz, Daniel; Prince, Thomas; Berriman, Graham; Good, John; Laity, Anastasia
2006-01-01
The final version (3.0) of the Montage software has been released. To recapitulate from previous NASA Tech Briefs articles about Montage: This software generates custom, science-grade mosaics of astronomical images on demand from input files that comply with the Flexible Image Transport System (FITS) standard and contain image data registered on projections that comply with the World Coordinate System (WCS) standards. This software can be executed on single-processor computers, multi-processor computers, and such networks of geographically dispersed computers as the National Science Foundation s TeraGrid or NASA s Information Power Grid. The primary advantage of running Montage in a grid environment is that computations can be done on a remote supercomputer for efficiency. Multiple computers at different sites can be used for different parts of a computation a significant advantage in cases of computations for large mosaics that demand more processor time than is available at any one site. Version 3.0 incorporates several improvements over prior versions. The most significant improvement is that this version is accessible to scientists located anywhere, through operational Web services that provide access to data from several large astronomical surveys and construct mosaics on either local workstations or remote computational grids as needed.
Yang, Haw; Welsher, Kevin
2016-11-15
A system and method for non-invasively tracking a particle in a sample is disclosed. The system includes a 2-photon or confocal laser scanning microscope (LSM) and a particle-holding device coupled to a stage with X-Y and Z position control. The system also includes a tracking module having a tracking excitation laser, X-Y and Z radiation-gathering components configured to detect deviations of the particle in an X-Y and Z directions. The system also includes a processor coupled to the X-Y and Z radiation gathering components, generate control signals configured to drive the stage X-Y and Z position controls to track the movement of the particle. The system may also include a synchronization module configured to generate LSM pixels stamped with stage position and a processing module configured to generate a 3D image showing the 3D trajectory of a particle using the LSM pixels stamped with stage position.
Compact ion chamber based neutron detector
Derzon, Mark S.; Galambos, Paul C.; Renzi, Ronald F.
2015-10-27
A directional neutron detector has an ion chamber formed in a dielectric material; a signal electrode and a ground electrode formed in the ion chamber; a neutron absorbing material filling the ion chamber; readout circuitry which is electrically coupled to the signal and ground electrodes; and a signal processor electrically coupled to the readout circuitry. The ion chamber has a pair of substantially planar electrode surfaces. The chamber pressure of the neutron absorbing material is selected such that the reaction particle ion trail length for neutrons absorbed by the neutron absorbing material is equal to or less than the distance between the electrode surfaces. The signal processor is adapted to determine a path angle for each absorbed neutron based on the rise time of the corresponding pulse in a time-varying detector signal.
Detection of neutrinos, antineutrinos, and neutrino-like particles
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fischbach, Ephraim
An apparatus for detecting the presence of a nuclear reactor by the detection of antineutrinos from the reactor can include a radioactive sample having a measurable nuclear activity level and a decay rate capable of changing in response to the presence of antineutrinos, and a detector associated with the radioactive sample. The detector is responsive to at least one of a particle or radiation formed by decay of the radioactive sample. A processor associated with the detector can correlate rate of decay of the radioactive sample to a flux of the antineutrinos to detect the reactor.
Accelerating Pseudo-Random Number Generator for MCNP on GPU
NASA Astrophysics Data System (ADS)
Gong, Chunye; Liu, Jie; Chi, Lihua; Hu, Qingfeng; Deng, Li; Gong, Zhenghu
2010-09-01
Pseudo-random number generators (PRNG) are intensively used in many stochastic algorithms in particle simulations, artificial neural networks and other scientific computation. The PRNG in Monte Carlo N-Particle Transport Code (MCNP) requires long period, high quality, flexible jump and fast enough. In this paper, we implement such a PRNG for MCNP on NVIDIA's GTX200 Graphics Processor Units (GPU) using CUDA programming model. Results shows that 3.80 to 8.10 times speedup are achieved compared with 4 to 6 cores CPUs and more than 679.18 million double precision random numbers can be generated per second on GPU.
Error-rate prediction for programmable circuits: methodology, tools and studied cases
NASA Astrophysics Data System (ADS)
Velazco, Raoul
2013-05-01
This work presents an approach to predict the error rates due to Single Event Upsets (SEU) occurring in programmable circuits as a consequence of the impact or energetic particles present in the environment the circuits operate. For a chosen application, the error-rate is predicted by combining the results obtained from radiation ground testing and the results of fault injection campaigns performed off-beam during which huge numbers of SEUs are injected during the execution of the studied application. The goal of this strategy is to obtain accurate results about different applications' error rates, without using particle accelerator facilities, thus significantly reducing the cost of the sensitivity evaluation. As a case study, this methodology was applied a complex processor, the Power PC 7448 executing a program issued from a real space application and a crypto-processor application implemented in an SRAM-based FPGA and accepted to be embedded in the payload of a scientific satellite of NASA. The accuracy of predicted error rates was confirmed by comparing, for the same circuit and application, predictions with measures issued from radiation ground testing performed at the cyclotron Cyclone cyclotron of HIF (Heavy Ion Facility) of Louvain-la-Neuve (Belgium).
Monte Carlo charged-particle tracking and energy deposition on a Lagrangian mesh.
Yuan, J; Moses, G A; McKenty, P W
2005-10-01
A Monte Carlo algorithm for alpha particle tracking and energy deposition on a cylindrical computational mesh in a Lagrangian hydrodynamics code used for inertial confinement fusion (ICF) simulations is presented. The straight line approximation is used to follow propagation of "Monte Carlo particles" which represent collections of alpha particles generated from thermonuclear deuterium-tritium (DT) reactions. Energy deposition in the plasma is modeled by the continuous slowing down approximation. The scheme addresses various aspects arising in the coupling of Monte Carlo tracking with Lagrangian hydrodynamics; such as non-orthogonal severely distorted mesh cells, particle relocation on the moving mesh and particle relocation after rezoning. A comparison with the flux-limited multi-group diffusion transport method is presented for a polar direct drive target design for the National Ignition Facility. Simulations show the Monte Carlo transport method predicts about earlier ignition than predicted by the diffusion method, and generates higher hot spot temperature. Nearly linear speed-up is achieved for multi-processor parallel simulations.
A computer controlled signal preprocessor for laser fringe anemometer applications
NASA Technical Reports Server (NTRS)
Oberle, Lawrence G.
1987-01-01
The operation of most commercially available laser fringe anemometer (LFA) counter-processors assumes that adjustments are made to the signal processing independent of the computer used for reducing the data acquired. Not only does the researcher desire a record of these parameters attached to the data acquired, but changes in flow conditions generally require that these settings be changed to improve data quality. Because of this limitation, on-line modification of the data acquisition parameters can be difficult and time consuming. A computer-controlled signal preprocessor has been developed which makes possible this optimization of the photomultiplier signal as a normal part of the data acquisition process. It allows computer control of the filter selection, signal gain, and photo-multiplier voltage. The raw signal from the photomultiplier tube is input to the preprocessor which, under the control of a digital computer, filters the signal and amplifies it to an acceptable level. The counter-processor used at Lewis Research Center generates the particle interarrival times, as well as the time-of-flight of the particle through the probe volume. The signal preprocessor allows computer control of the acquisition of these data.Through the preprocessor, the computer also can control the hand shaking signals for the interface between itself and the counter-processor. Finally, the signal preprocessor splits the pedestal from the signal before filtering, and monitors the photo-multiplier dc current, sends a signal proportional to this current to the computer through an analog to digital converter, and provides an alarm if the current exceeds a predefined maximum. Complete drawings and explanations are provided in the text as well as a sample interface program for use with the data acquisition software.
Smoothed particle hydrodynamics with GRAPE-1A
NASA Technical Reports Server (NTRS)
Umemura, Masayuki; Fukushige, Toshiyuki; Makino, Junichiro; Ebisuzaki, Toshikazu; Sugimoto, Daiichiro; Turner, Edwin L.; Loeb, Abraham
1993-01-01
We describe the implementation of a smoothed particle hydrodynamics (SPH) scheme using GRAPE-1A, a special-purpose processor used for gravitational N-body simulations. The GRAPE-1A calculates the gravitational force exerted on a particle from all other particles in a system, while simultaneously making a list of the nearest neighbors of the particle. It is found that GRAPE-1A accelerates SPH calculations by direct summation by about two orders of magnitudes for a ten thousand-particle simulation. The effective speed is 80 Mflops, which is about 30 percent of the peak speed of GRAPE-1A. Also, in order to investigate the accuracy of GRAPE-SPH, some test simulations were executed. We found that the force and position errors are smaller than those due to representing a fluid by a finite number of particles. The total energy and momentum were conserved within 0.2-0.4 percent and 2-5 x 10 exp -5, respectively, in simulations with several thousand particles. We conclude that GRAPE-SPH is quite effective and sufficiently accurate for self-gravitating hydrodynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cleveland, Mathew A., E-mail: cleveland7@llnl.gov; Brunner, Thomas A.; Gentile, Nicholas A.
2013-10-15
We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo simulations. Reproducibility is desirable for code verification, testing, and debugging. Parallelism creates a unique problem for achieving reproducibility in Monte Carlo simulations because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. Parallel Monte Carlo, both domain replicated and decomposed simulations, will run their particles in a different order during different runs of the same simulation because the non-reproducibility of communication between processors. In addition, runs of the same simulation using different domain decompositionsmore » will also result in particles being simulated in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo simulations. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1996-01-16
The Petro-Processors of Louisiana, Inc. (PPI) site, located in East Baton Rouge Parish, Louisiana, operated two waste disposal facilities: the Brooklawn area and the Scenic Highway area. Both areas contain chlorinated aromatic hydrocarbons and chlorinated hydrocarbons. Contaminants have been detected in samples from soil, groundwater, surface water, and air at the Brooklawn area and in soil, groundwater, and air at the Scenic Highway area. The site is considered a public health hazard because of risks to human health from past, present, and future exposure to hazardous substances. Exposure pathways of public health concern are: ingestion of contaminated fish, potential ingestionmore » of contaminated groundwater and wildlife, dermal contact with contaminated sediments, inhalation of airborne volatile contaminants prior to and during remedial activities, and dermal contact and incidental ingestion of contaminated soils.« less
THOR Field and Wave Processor - FWP
NASA Astrophysics Data System (ADS)
Soucek, Jan; Rothkaehl, Hanna; Balikhin, Michael; Zaslavsky, Arnaud; Nakamura, Rumi; Khotyaintsev, Yuri; Uhlir, Ludek; Lan, Radek; Yearby, Keith; Morawski, Marek; Winkler, Marek
2016-04-01
If selected, Turbulence Heating ObserveR (THOR) will become the first mission ever flown in space dedicated to plasma turbulence. The Fields and Waves Processor (FWP) is an integrated electronics unit for all electromagnetic field measurements performed by THOR. FWP will interface with all fields sensors: electric field antennas of the EFI instrument, the MAG fluxgate magnetometer and search-coil magnetometer (SCM) and perform data digitization and on-board processing. FWP box will house multiple data acquisition sub-units and signal analyzers all sharing a common power supply and data processing unit and thus a single data and power interface to the spacecraft. Integrating all the electromagnetic field measurements in a single unit will improve the consistency of field measurement and accuracy of time synchronization. The feasibility of making highly sensitive electric and magnetic field measurements in space has been demonstrated by Cluster (among other spacecraft) and THOR instrumentation complemented by a thorough electromagnetic cleanliness program will further improve on this heritage. Taking advantage of the capabilities of modern electronics, FWP will provide simultaneous synchronized waveform and spectral data products at high time resolution from the numerous THOR sensors, taking advantage of the large telemetry bandwidth of THOR. FWP will also implement a plasma a resonance sounder and a digital plasma quasi-thermal noise analyzer designed to provide high cadence measurements of plasma density and temperature complementary to data from particle instruments. FWP will be interfaced with the particle instrument data processing unit (PPU) via a dedicated digital link which will enable performing on board correlation between waves and particles, quantifying the transfer of energy between waves and particles. The FWP instrument shall be designed and built by an international consortium of scientific institutes from Czech Republic, Poland, France, UK, Sweden and Austria.
Software for embedded processors: Problems and solutions
NASA Astrophysics Data System (ADS)
Bogaerts, J. A. C.
1990-08-01
Data Acquistion systems in HEP experiments use a wide spectrum of computers to cope with two major problems: high event rates and a large data volume. They do this by using special fast trigger processors at the source to reduce the event rate by several orders of magnitude. The next stage of a data acquisition system consists of a network of fast but conventional microprocessors which are embedded in high speed bus systems where data is still further reduced, filtered and merged. In the final stage complete events are farmed out to a another collection of processors, which reconstruct the events and perhaps achieve a further event rejection by a small factor, prior to recording onto magnetic tape. Detectors are monitored by analyzing a fraction of the data. This may be done for individual detectors at an early state of the data acquisition or it may be delayed till the complete events are available. A network of workstations is used for monitoring, displays and run control. Software for trigger processors must have a simple structure. Rejection algorithms are carefully optimized, and overheads introduced by system software cannot be tolerated. The embedded microprocessors have to co-operate, and need to be synchronized with the preceding and following stages. Real time kernels are typically used to solve synchronization and communication problems. Applications are usually coded in C, which is reasonably efficient and allows direct control over low level hardware functions. Event reconstruction software is very similar or even identical to offline software, predominantly written in FORTRAN. With the advent of powerful RISC processors, and with manufacturers tending to adopt open bus architectures, there is a move towards commercial processors and hence the introduction of the UNIX operating system. Building and controlling such a heterogeneous data acquisition system puts a heavy strain on the software. Communications is now as important as CPU capacity and I/O bandwidth, the traditional key parameters of a HEP data acquisition system. Software engineering and real time system simulation tools are becoming indispensible for the design of future data acquisition systems.
Emulating Many-Body Localization with a Superconducting Quantum Processor
NASA Astrophysics Data System (ADS)
Xu, Kai; Chen, Jin-Jun; Zeng, Yu; Zhang, Yu-Ran; Song, Chao; Liu, Wuxin; Guo, Qiujiang; Zhang, Pengfei; Xu, Da; Deng, Hui; Huang, Keqiang; Wang, H.; Zhu, Xiaobo; Zheng, Dongning; Fan, Heng
2018-02-01
The law of statistical physics dictates that generic closed quantum many-body systems initialized in nonequilibrium will thermalize under their own dynamics. However, the emergence of many-body localization (MBL) owing to the interplay between interaction and disorder, which is in stark contrast to Anderson localization, which only addresses noninteracting particles in the presence of disorder, greatly challenges this concept, because it prevents the systems from evolving to the ergodic thermalized state. One critical evidence of MBL is the long-time logarithmic growth of entanglement entropy, and a direct observation of it is still elusive due to the experimental challenges in multiqubit single-shot measurement and quantum state tomography. Here we present an experiment fully emulating the MBL dynamics with a 10-qubit superconducting quantum processor, which represents a spin-1 /2 X Y model featuring programmable disorder and long-range spin-spin interactions. We provide essential signatures of MBL, such as the imbalance due to the initial nonequilibrium, the violation of eigenstate thermalization hypothesis, and, more importantly, the direct evidence of the long-time logarithmic growth of entanglement entropy. Our results lay solid foundations for precisely simulating the intriguing physics of quantum many-body systems on the platform of large-scale multiqubit superconducting quantum processors.
Cross-Layer Resilience Exploration
2015-03-31
complex 563 server-class systems) and any arbitrary fault model (permanent, transient, multi-bit, etc.) System Design Analysis Using flip- flop ...level fault injection, we rank the vulnerability of each flip- flop in the processor in terms of its likelihood to propagate faults [3]. This allows the...hardened flip- flops , which are flip- flops designed to uphold the bit representation of their output circuit even under particle strikes [1, 6, 10
Research and Development Services: Methods Development. Appendices
1982-07-23
0.010-inch stainless-steel tubing. The order of the detectors is as follows: 1. Perkin-Elmer LC-75 variable-wavelength spectrometer with autocontrol . 2...reset it prior to the next injection. At 60 minutes, the micro- processor signals the Perkin-Elmer LC-75 with autocontrol to switch detection...variable-wavelength spectrometer with autocontrol set at 230 am for first 60 minutes of chromatogram then switched to 280 m. b. Altex Model 153 fixed
Knowledge-based processing for aircraft flight control
NASA Technical Reports Server (NTRS)
Painter, John H.
1991-01-01
The purpose is to develop algorithms and architectures for embedding artificial intelligence in aircraft guidance and control systems. With the approach adopted, AI-computing is used to create an outer guidance loop for driving the usual aircraft autopilot. That is, a symbolic processor monitors the operation and performance of the aircraft. Then, based on rules and other stored knowledge, commands are automatically formulated for driving the autopilot so as to accomplish desired flight operations. The focus is on developing a software system which can respond to linguistic instructions, input in a standard format, so as to formulate a sequence of simple commands to the autopilot. The instructions might be a fairly complex flight clearance, input either manually or by data-link. Emphasis is on a software system which responds much like a pilot would, employing not only precise computations, but, also, knowledge which is less precise, but more like common-sense. The approach is based on prior work to develop a generic 'shell' architecture for an AI-processor, which may be tailored to many applications by describing the application in appropriate processor data bases (libraries). Such descriptions include numerical models of the aircraft and flight control system, as well as symbolic (linguistic) descriptions of flight operations, rules, and tactics.
NASA Astrophysics Data System (ADS)
Abramov, G. V.; Gavrilov, A. N.
2018-03-01
The article deals with the numerical solution of the mathematical model of the particles motion and interaction in multicomponent plasma by the example of electric arc synthesis of carbon nanostructures. The high order of the particles and the number of their interactions requires a significant input of machine resources and time for calculations. Application of the large particles method makes it possible to reduce the amount of computation and the requirements for hardware resources without affecting the accuracy of numerical calculations. The use of technology of GPGPU parallel computing using the Nvidia CUDA technology allows organizing all General purpose computation on the basis of the graphical processor graphics card. The comparative analysis of different approaches to parallelization of computations to speed up calculations with the choice of the algorithm in which to calculate the accuracy of the solution shared memory is used. Numerical study of the influence of particles density in the macro particle on the motion parameters and the total number of particle collisions in the plasma for different modes of synthesis has been carried out. The rational range of the coherence coefficient of particle in the macro particle is computed.
European Science Notes Information Bulletin Reports on Current European/ Middle Eastern Science
1989-03-01
Palo-Oceanography, Marine Geophysics, Marine Environmental Geology, and Petrology of the Oceanic Crust. The spe- cific concerns of each of these...integration To compute numerically the expected value of an over the fermion fields, leaving an integral over the gauge operator, the configuration space...ethrough the machine (one space point per processor).In the gauge field theories of elementary particles, This is appropriate for generating gauge field
Multiprocessing MCNP on an IBM RS/6000 cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKinney, G.W.; West, J.T.
1993-01-01
The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl's Law S ((f,P) = 1 f + f/P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl's Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precision of the result increases as the square root of the number of particles tracked.« less
Texas Instruments-Digital Signal Processor(TI-DSP)SMJ320F20 SEL Testing
NASA Technical Reports Server (NTRS)
Sanders, Anthony B.; Poivey, C.; Kim, H. S.; Gee, George B.
2006-01-01
This viewgraph presentation reviews the testing of the Texas Instrument Digital Signal Processor(TI-DSP)SMJ320F20. Tests were performed to screen for susceptibility to Single Event Latchup (SEL) and measure sensitivity as a function of Linear Energy Transfer (LET) for an application specific test setup. The Heavy Ion Testing of two TI-DSP SMJ320F240 devices experienced Single Event Latchup (SEL) conditions at an LET of 1.8 MeV/(mg/square cm) The devices were exposed from a fluence of 1.76 x l0(exp 3) to 5.00 x 10(exp 6) particles/square cm of the Neon, Argon and Krypton ion beams. For DI(sub DD) an average latchup current occurred at about 700mA, which is a magnitude of 10 over the nominal current of 700mA.
NASA Astrophysics Data System (ADS)
Romano, Paul Kollath
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
FPGA-Based, Self-Checking, Fault-Tolerant Computers
NASA Technical Reports Server (NTRS)
Some, Raphael; Rennels, David
2004-01-01
A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
ORNL Cray X1 evaluation status report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, P.K.; Alexander, R.A.; Apra, E.
2004-05-01
On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (CCS) at Oak Ridge National Laboratory (ORNL) to deploy a new scalable vector supercomputer architecture for solving important scientific problems in climate, fusion, biology, nanoscale materials and astrophysics. ''This program is one of the first steps in an initiative designed to provide U.S. scientists with the computational power that is essential to 21st century scientific leadership,'' said Dr. Raymond L. Orbach, director of the department's Office of Science. In FY03, CCS procured a 256-processor Cray X1 to evaluate the processors, memory subsystem, scalability of themore » architecture, software environment and to predict the expected sustained performance on key DOE applications codes. The results of the micro-benchmarks and kernel bench marks show the architecture of the Cray X1 to be exceptionally fast for most operations. The best results are shown on large problems, where it is not possible to fit the entire problem into the cache of the processors. These large problems are exactly the types of problems that are important for the DOE and ultra-scale simulation. Application performance is found to be markedly improved by this architecture: - Large-scale simulations of high-temperature superconductors run 25 times faster than on an IBM Power4 cluster using the same number of processors. - Best performance of the parallel ocean program (POP v1.4.3) is 50 percent higher than on Japan s Earth Simulator and 5 times higher than on an IBM Power4 cluster. - A fusion application, global GYRO transport, was found to be 16 times faster on the X1 than on an IBM Power3. The increased performance allowed simulations to fully resolve questions raised by a prior study. - The transport kernel in the AGILE-BOLTZTRAN astrophysics code runs 15 times faster than on an IBM Power4 cluster using the same number of processors. - Molecular dynamics simulations related to the phenomenon of photon echo run 8 times faster than previously achieved. Even at 256 processors, the Cray X1 system is already outperforming other supercomputers with thousands of processors for a certain class of applications such as climate modeling and some fusion applications. This evaluation is the outcome of a number of meetings with both high-performance computing (HPC) system vendors and application experts over the past 9 months and has received broad-based support from the scientific community and other agencies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sargent, S.A.
Apple pomace or presscake, was evaluated for suitability as a boiler feedstock for Michigan firms processing apple juice. Based upon the physical and chemical characteristics of pomace, handling/direct combustion systems were selected to conform with operating parameters typical of the industry. Fresh pomace flow rates of 29,030 and 88,998 kg/day (64,000 and 194,000 lb/day) were considered as representative of small and large processors, respectively, and the material was assumed to be dried to 15% moisture content (wet basis) prior to storage and combustion. Boilers utilizing pile-burning, fluidized-bed-combustion, and suspension-firing technologies were sized for each flow rate, resulting in energy productionmore » of 2930 and 8790 kW (10 and 30 million Btu/h), respectively. A life-cycle cost analysis was performed giving Average Annual Costs for the three handling/combustion system combinations (based on the Uniform Capital Recovery factor). An investment loan at 16% interest with a 5-year payback period was assumed. The break-even period for annual costs was calculated by anticipated savings incurred through reduction of fossil-fuel costs during a 5-month processing season. Large processors, producing more than 88,998 kg pomace/day, could economically convert to a suspension-fired system substituting for fuel oil, with break-even occurring after 4 months of operation of pomace per year. Small processors, producing less than 29,030 kg/day, could not currently convert to pomace combustion systems given these economic circumstances. A doubling of electrical-utility costs and changes in interest rates from 10 to 20% per year had only slight effects on the recovery of Average Annual Costs. Increases in fossil-fuel prices and the necessity to pay for pomace disposal reduced the cost-recovery period for all systems, making some systems feasible for small processors. 39 references, 13 figures, 10 tables.« less
Managing seafood processing wastewater on the Oregon coast: A time of transition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, M.D.; Miner, J.R.
1997-12-01
Seafood processors along the Oregon coast practice a wastewater management plan that is unique within the state. Most of these operations discharge wastewater under a General Permit issued by the Oregon Department of Environmental Quality (DEQ) that requires only that they screen the wastewater to remove particles that will not pass through a 40 mesh screen. The General Permit was issued in February of 1992 and was scheduled to expire at the end of December, 1996. It has been extended until a replacement is adopted. Alternatives are currently under consideration by the DEQ. A second issue is the increasing competitionmore » for water within the coastal communities that are experiencing a growing tourist industry and a static water supply. Tourism and seafood processing both have their peak water demands during the summer months when fresh water supplies are most limited. Disposal of solid wastes has been simplified for many of the processors along the Lower Columbia River by a Fisheries Enhancement Program which allows processors to grind the solid waste then to discharge it into the stream under appropriate tidal conditions. There is no data which indicates water quality damage from this practice nor is there clear evidence of enhanced fishery productivity.« less
Free-electron laser simulations on the MPP
NASA Technical Reports Server (NTRS)
Vonlaven, Scott A.; Liebrock, Lorie M.
1987-01-01
Free electron lasers (FELs) are of interest because they provide high power, high efficiency, and broad tunability. FEL simulations can make efficient use of computers of the Massively Parallel Processor (MPP) class because most of the processing consists of applying a simple equation to a set of identical particles. A test version of the KMS Fusion FEL simulation, which resides mainly in the MPPs host computer and only partially in the MPP, has run successfully.
Numerical studies of electron dynamics in oblique quasi-perpendicular collisionless shock waves
NASA Technical Reports Server (NTRS)
Liewer, P. C.; Decyk, V. K.; Dawson, J. M.; Lembege, B.
1991-01-01
Linear and nonlinear electron damping of the whistler precursor wave train to low Mach number quasi-perpendicular oblique shocks is studied using a one-dimensional electromagnetic plasma simulation code with particle electrons and ions. In some parameter regimes, electrons are observed to trap along the magnetic field lines in the potential of the whistler precursor wave train. This trapping can lead to significant electron heating in front of the shock for low beta(e). Use of a 64-processor hypercube concurrent computer has enabled long runs using realistic mass ratios in the full particle in-cell code and thus simulate shock parameter regimes and phenomena not previously studied numerically.
NASA Astrophysics Data System (ADS)
Hur, Min Young; Verboncoeur, John; Lee, Hae June
2014-10-01
Particle-in-cell (PIC) simulations have high fidelity in the plasma device requiring transient kinetic modeling compared with fluid simulations. It uses less approximation on the plasma kinetics but requires many particles and grids to observe the semantic results. It means that the simulation spends lots of simulation time in proportion to the number of particles. Therefore, PIC simulation needs high performance computing. In this research, a graphic processing unit (GPU) is adopted for high performance computing of PIC simulation for low temperature discharge plasmas. GPUs have many-core processors and high memory bandwidth compared with a central processing unit (CPU). NVIDIA GeForce GPUs were used for the test with hundreds of cores which show cost-effective performance. PIC code algorithm is divided into two modules which are a field solver and a particle mover. The particle mover module is divided into four routines which are named move, boundary, Monte Carlo collision (MCC), and deposit. Overall, the GPU code solves particle motions as well as electrostatic potential in two-dimensional geometry almost 30 times faster than a single CPU code. This work was supported by the Korea Institute of Science Technology Information.
NASA Astrophysics Data System (ADS)
Couvidat, F.; Sartelet, K.
2015-04-01
In this paper the Secondary Organic Aerosol Processor (SOAP v1.0) model is presented. This model determines the partitioning of organic compounds between the gas and particle phases. It is designed to be modular with different user options depending on the computation time and the complexity required by the user. This model is based on the molecular surrogate approach, in which each surrogate compound is associated with a molecular structure to estimate some properties and parameters (hygroscopicity, absorption into the aqueous phase of particles, activity coefficients and phase separation). Each surrogate can be hydrophilic (condenses only into the aqueous phase of particles), hydrophobic (condenses only into the organic phases of particles) or both (condenses into both the aqueous and the organic phases of particles). Activity coefficients are computed with the UNIFAC (UNIversal Functional group Activity Coefficient; Fredenslund et al., 1975) thermodynamic model for short-range interactions and with the Aerosol Inorganic-Organic Mixtures Functional groups Activity Coefficients (AIOMFAC) parameterization for medium- and long-range interactions between electrolytes and organic compounds. Phase separation is determined by Gibbs energy minimization. The user can choose between an equilibrium representation and a dynamic representation of organic aerosols (OAs). In the equilibrium representation, compounds in the particle phase are assumed to be at equilibrium with the gas phase. However, recent studies show that the organic aerosol is not at equilibrium with the gas phase because the organic phases could be semi-solid (very viscous liquid phase). The condensation-evaporation of organic compounds could then be limited by the diffusion in the organic phases due to the high viscosity. An implicit dynamic representation of secondary organic aerosols (SOAs) is available in SOAP with OAs divided into layers, the first layer being at the center of the particle (slowly reaches equilibrium) and the final layer being near the interface with the gas phase (quickly reaches equilibrium). Although this dynamic implicit representation is a simplified approach to model condensation-evaporation with a low number of layers and short CPU (central processing unit) time, it shows good agreements with an explicit representation of condensation-evaporation (no significant differences after a few hours of condensation).
NASA Astrophysics Data System (ADS)
Ardanuy, Antoni; Comerón, Adolfo
2018-04-01
We analyze the practical limits of a lidar system based on the use of a laser diode, random binary continuous wave power modulation, and an avalanche photodiode (APD)-based photereceiver, combined with the control and computing power of the digital signal processors (DSP) currently available. The target is to design a compact portable lidar system made all in semiconductor technology, with a low-power demand and an easy configuration of the system, allowing change in some of its features through software. Unlike many prior works, we emphasize the use of APDs instead of photomultiplier tubes to detect the return signal and the application of the system to measure not only hard targets, but also medium-range aerosols and clouds. We have developed an experimental prototype to evaluate the behavior of the system under different environmental conditions. Experimental results provided by the prototype are presented and discussed.
Load Balancing Strategies for Multiphase Flows on Structured Grids
NASA Astrophysics Data System (ADS)
Olshefski, Kristopher; Owkes, Mark
2017-11-01
The computation time required to perform large simulations of complex systems is currently one of the leading bottlenecks of computational research. Parallelization allows multiple processing cores to perform calculations simultaneously and reduces computational times. However, load imbalances between processors waste computing resources as processors wait for others to complete imbalanced tasks. In multiphase flows, these imbalances arise due to the additional computational effort required at the gas-liquid interface. However, many current load balancing schemes are only designed for unstructured grid applications. The purpose of this research is to develop a load balancing strategy while maintaining the simplicity of a structured grid. Several approaches are investigated including brute force oversubscription, node oversubscription through Message Passing Interface (MPI) commands, and shared memory load balancing using OpenMP. Each of these strategies are tested with a simple one-dimensional model prior to implementation into the three-dimensional NGA code. Current results show load balancing will reduce computational time by at least 30%.
Network Signal Processor No. 2 after removal from Columbia
NASA Technical Reports Server (NTRS)
1998-01-01
Two USA employees, Tim Seymour (at left) and Danny Brown (at right), look at the network signal processor (NSP) that was responsible for postponement of the launch of STS-90 on Apr. 16. The Space Shuttle Columbia's liftoff from Launch Pad 39B was postponed 24 hours due to difficulty with NSP No. 2 on the orbiter. This device formats data and voice communications between the ground and the Space Shuttle. The unit, which is located in the orbiter's mid-deck, was removed and replaced on Apr. 16. Mission managers first noticed the problem at about 3 a.m. during normal communications systems activation prior to tanking operations. As a result, work to load the external tank with the cryogenic propellants did not begin and launch postponement was made official at about 8:15 a.m. STS-90 is slated to be the launch of Neurolab, a nearly 17-day mission to examine the effects of spaceflight on the brain, spinal cord, peripheral nerves and sensory organs in the human body.
Pedretti, Kevin
2008-11-18
A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.
GPU acceleration of particle-in-cell methods
NASA Astrophysics Data System (ADS)
Cowan, Benjamin; Cary, John; Meiser, Dominic
2015-11-01
Graphics processing units (GPUs) have become key components in many supercomputing systems, as they can provide more computations relative to their cost and power consumption than conventional processors. However, to take full advantage of this capability, they require a strict programming model which involves single-instruction multiple-data execution as well as significant constraints on memory accesses. To bring the full power of GPUs to bear on plasma physics problems, we must adapt the computational methods to this new programming model. We have developed a GPU implementation of the particle-in-cell (PIC) method, one of the mainstays of plasma physics simulation. This framework is highly general and enables advanced PIC features such as high order particles and absorbing boundary conditions. The main elements of the PIC loop, including field interpolation and particle deposition, are designed to optimize memory access. We describe the performance of these algorithms and discuss some of the methods used. Work supported by DARPA contract W31P4Q-15-C-0061 (SBIR).
A Lagrangian Approach for Calculating Microsphere Deposition in a One-Dimensional Lung-Airway Model.
Vaish, Mayank; Kleinstreuer, Clement
2015-09-01
Using the open-source software openfoam as the solver, a novel approach to calculate microsphere transport and deposition in a 1D human lung-equivalent trumpet model (TM) is presented. Specifically, for particle deposition in a nonlinear trumpetlike configuration a new radial force has been developed which, along with the regular drag force, generates particle trajectories toward the wall. The new semi-empirical force is a function of any given inlet volumetric flow rate, micron-particle diameter, and lung volume. Particle-deposition fractions (DFs) in the size range from 2 μm to 10 μm are in agreement with experimental datasets for different laminar and turbulent inhalation flow rates as well as total volumes. Typical run times on a single processor workstation to obtain actual total deposition results at comparable accuracy are 200 times less than that for an idealized whole-lung geometry (i.e., a 3D-1D model with airways up to 23rd generation in single-path only).
Vectorization of a particle code used in the simulation of rarefied hypersonic flow
NASA Technical Reports Server (NTRS)
Baganoff, D.
1990-01-01
A limitation of the direct simulation Monte Carlo (DSMC) method is that it does not allow efficient use of vector architectures that predominate in current supercomputers. Consequently, the problems that can be handled are limited to those of one- and two-dimensional flows. This work focuses on a reformulation of the DSMC method with the objective of designing a procedure that is optimized to the vector architectures found on machines such as the Cray-2. In addition, it focuses on finding a better balance between algorithmic complexity and the total number of particles employed in a simulation so that the overall performance of a particle simulation scheme can be greatly improved. Simulations of the flow about a 3D blunt body are performed with 10 to the 7th particles and 4 x 10 to the 5th mesh cells. Good statistics are obtained with time averaging over 800 time steps using 4.5 h of Cray-2 single-processor CPU time.
The accurate particle tracer code
NASA Astrophysics Data System (ADS)
Wang, Yulei; Liu, Jian; Qin, Hong; Yu, Zhi; Yao, Yicun
2017-11-01
The Accurate Particle Tracer (APT) code is designed for systematic large-scale applications of geometric algorithms for particle dynamical simulations. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and nonlinear problems. To provide a flexible and convenient I/O interface, the libraries of Lua and Hdf5 are used. Following a three-step procedure, users can efficiently extend the libraries of electromagnetic configurations, external non-electromagnetic forces, particle pushers, and initialization approaches by use of the extendible module. APT has been used in simulations of key physical problems, such as runaway electrons in tokamaks and energetic particles in Van Allen belt. As an important realization, the APT-SW version has been successfully distributed on the world's fastest computer, the Sunway TaihuLight supercomputer, by supporting master-slave architecture of Sunway many-core processors. Based on large-scale simulations of a runaway beam under parameters of the ITER tokamak, it is revealed that the magnetic ripple field can disperse the pitch-angle distribution significantly and improve the confinement of energetic runaway beam on the same time.
Optical sample-position sensing for electrostatic levitation
NASA Technical Reports Server (NTRS)
Sridharan, G.; Chung, S.; Elleman, D.; Whim, W. K.
1989-01-01
A comparative study is conducted for optical position-sensing techniques applicable to micro-G conditions sample-levitation systems. CCD sensors are compared with one- and two-dimensional position detectors used in electrostatic particle levitation. In principle, the CCD camera method can be improved from current resolution levels of 200 microns through the incorporation of a higher-pixel device and more complex digital signal processor interface. Nevertheless, the one-dimensional position detectors exhibited superior, better-than-one-micron resolution.
Multiprocessing MCNP on an IBN RS/6000 cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKinney, G.W.; West, J.T.
1993-01-01
The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors P and the fraction f of task time that multiprocesses, can be formulated using Amdahl's law: S(f, P) =1/(1-f+f/P). However, for most applications, this theoretical limit cannot be achieved because of additional terms (e.g., multitasking overhead, memory overlap, etc.) that are not included in Amdahl's law. Monte Carlo transport is a natural candidate for multiprocessing because the particle tracks are generally independent, and the precision of the result increases as the square Foot of the number of particles tracked.« less
Multiprocessing MCNP on an IBM RS/6000 cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKinney, G.W.; West, J.T.
1993-03-01
The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl`s Law S ((f,P) = 1 f + f/P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl`s Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precision of the result increases as the square root of the number of particles tracked.« less
Iwao, Yasunori; Kimura, Shin-Ichiro; Ishida, Masayuki; Mise, Ryohei; Yamada, Masaki; Namiki, Noriyuki; Noguchi, Shuji; Itai, Shigeru
2015-01-01
The manufacture of highly drug-loaded fine globular granules eventually applied for orally disintegrating tablets has been investigated using a unique multi-functional rotor processor with acetaminophen, which was used as a model drug substance. Experimental design and statistical analysis were used to evaluate potential relationships between three key operating parameters (i.e., the binder flow rate, atomization pressure and rotating speed) and a series of associated micromeritics (i.e., granule mean size, proportion of fine particles (106-212 µm), flowability, roundness and water content). The results of multiple linear regression analysis revealed several trends, including (1) the binder flow rate and atomization pressure had significant positive and negative effects on the granule mean size value, Carr's flowability index, granular roundness and water content, respectively; (2) the proportion of fine particles was positively affected by the product of interaction between the binder flow rate and atomization pressure; and (3) the granular roundness was negatively and positively affected by the product of interactions between the binder flow rate and the atomization pressure, and the binder flow rate and rotating speed, respectively. The results of this study led to the identification of optimal operating conditions for the preparation of granules, and could therefore be used to provide important information for the development of processes for the manufacture of highly drug-loaded fine globular granules.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yamamoto, K.; Mizuno, Y.; Hibino, S.
2006-01-15
Simulations of dusty plasmas were performed using GRAPE-6, a special-purpose computer designed for gravitational N-body problems. The collective behavior of dust particles, which are injected into the plasma, was studied by means of three-dimensional computer simulations. As an example of a dusty plasma simulation, experiments on Coulomb crystals in plasmas are simulated. Formation of a quasi-two-dimensional Coulomb crystal has been observed under typical laboratory conditions. Another example was to simulate movement of dust particles in plasmas under microgravity conditions. Fully three-dimensional spherical structures of dust clouds have been observed. For the simulation of a dusty plasma in microgravity with 3x10{supmore » 4} particles, GRAPE-6 can perform the whole operation 1000 times faster than by using a Pentium 4 1.6 GHz processor.« less
Remote creation of hybrid entanglement between particle-like and wave-like optical qubits
NASA Astrophysics Data System (ADS)
Morin, Olivier; Huang, Kun; Liu, Jianli; Le Jeannic, Hanna; Fabre, Claude; Laurat, Julien
2014-07-01
The wave-particle duality of light has led to two different encodings for optical quantum information processing. Several approaches have emerged based either on particle-like discrete-variable states (that is, finite-dimensional quantum systems) or on wave-like continuous-variable states (that is, infinite-dimensional systems). Here, we demonstrate the generation of entanglement between optical qubits of these different types, located at distant places and connected by a lossy channel. Such hybrid entanglement, which is a key resource for a variety of recently proposed schemes, including quantum cryptography and computing, enables information to be converted from one Hilbert space to the other via teleportation and therefore the connection of remote quantum processors based upon different encodings. Beyond its fundamental significance for the exploration of entanglement and its possible instantiations, our optical circuit holds promise for implementations of heterogeneous network, where discrete- and continuous-variable operations and techniques can be efficiently combined.
Online Event Reconstruction in the CBM Experiment at FAIR
NASA Astrophysics Data System (ADS)
Akishina, Valentina; Kisel, Ivan
2018-02-01
Targeting for rare observables, the CBM experiment will operate at high interaction rates of up to 10 MHz, which is unprecedented in heavy-ion experiments so far. It requires a novel free-streaming readout system and a new concept of data processing. The huge data rates of the CBM experiment will be reduced online to the recordable rate before saving the data to the mass storage. Full collision reconstruction and selection will be performed online in a dedicated processor farm. In order to make an efficient event selection online a clean sample of particles has to be provided by the reconstruction package called First Level Event Selection (FLES). The FLES reconstruction and selection package consists of several modules: track finding, track fitting, event building, short-lived particles finding, and event selection. Since detector measurements contain also time information, the event building is done at all stages of the reconstruction process. The input data are distributed within the FLES farm in a form of time-slices. A time-slice is reconstructed in parallel between processor cores. After all tracks of the whole time-slice are found and fitted, they are collected into clusters of tracks originated from common primary vertices, which then are fitted, thus identifying the interaction points. Secondary tracks are associated with primary vertices according to their estimated production time. After that short-lived particles are found and the full event building process is finished. The last stage of the FLES package is a selection of events according to the requested trigger signatures. The event reconstruction procedure and the results of its application to simulated collisions in the CBM detector setup are presented and discussed in detail.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
NASA Astrophysics Data System (ADS)
Vincenti, H.; Lobet, M.; Lehe, R.; Sasanka, R.; Vay, J.-L.
2017-01-01
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈ 20 pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scatter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of × 2 to × 2.5 speed-up in double precision for particle shape factor of orders 1- 3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles).
Jaramillo-Botero, Andres; Su, Julius; Qi, An; Goddard, William A
2011-02-01
We describe the first principle-based electron force field (eFF) methodology for modeling the simultaneous dynamics of electrons and nuclei (eMD) evolving nonadiabatically under transient extreme conditions. We introduce the parallel implementation of eFF (pEFF) that makes it practical to perform simulations of the nonadiabatic dynamics of materials in extreme environments involving millions of nuclei and electrons, over multi-picoseconds time scales, and demonstrate its application to: (i) accurately determine density and predict percent ionization of hydrogen at high pressure (∼61 GPa) and temperatures up to 15,300 K and (ii) determine, the single shock Hugoniot for lithium metal directly from the shock wave kinematics, i.e., mass velocities (U(p) ) and shock wave velocities (U(s) ), and shock density data. For (i), the density and ionization fractions of hydrogen atoms were calculated using the isobaric-isothermal ensemble at an isotropic pressure of 61.4 GPa and for temperatures between 300 K and 15,300 K. The results at 15,300 K describe a molecular fluid with density ρ = 0.36 g/cm(3) , in close agreement with existing experiments and theory, and ∼0.5% ionization. This result provides no indication of the existence of a critical plasma phase-transition point at this particular temperature and pressure, as previously predicted by others. For (ii), the relationship between U(p) and U(s) was characterized to be linear and plastic in the range 1-20 km/s, and the single shock Hugoniot was determined in close agreement with published results for experimentally reported U(p) s. In addition to this, we provide a description of the materials' behavior for large U(p) s in terms of the appearance of a weak metallic plasma phase by U(p) = 10 km/s, with ≃ 8% ionization, gradually transitioning to a denser plasma with an estimated ≃ 35% ionization by U(p) = 15 km/s. Last but not least, we confirm the computational efficiency and scalability of pEFF by comparing its single processor performance against the fastest existing serial code, which results in a linear speedup ∼10× faster for every 16,000 particles in favor of pEFF, and by evaluating its parallel performance in terms of its strong and weak scaling capabilities. Our results, on Los Alamos's Lobo supercomputer (a 38TFLOPSs Linux HPC with Quad-core AMD Opteron nodes interconnected with an Infiniband), show strong scaling with near ideal speedups for loads >62 particles per processor. Weak scaling is shown to be close to linear under the same per-processor load range. As an absolute reference, an NVT run with 2 million particle lithium bulk system (0.5 M nuclei and 1.5 M electrons) on Lobo takes ∼0.44 s/timestep on 1024 processors (∼1 day/ps using an integration timestep of 0.005 fs). Copyright © 2010 Wiley Periodicals, Inc.
Parallelization of MRCI based on hole-particle symmetry.
Suo, Bing; Zhai, Gaohong; Wang, Yubin; Wen, Zhenyi; Hu, Xiangqian; Li, Lemin
2005-01-15
The parallel implementation of multireference configuration interaction program based on the hole-particle symmetry is described. The platform to implement the parallelization is an Intel-Architectural cluster consisting of 12 nodes, each of which is equipped with two 2.4-G XEON processors, 3-GB memory, and 36-GB disk, and are connected by a Gigabit Ethernet Switch. The dependence of speedup on molecular symmetries and task granularities is discussed. Test calculations show that the scaling with the number of nodes is about 1.9 (for C1 and Cs), 1.65 (for C2v), and 1.55 (for D2h) when the number of nodes is doubled. The largest calculation performed on this cluster involves 5.6 x 10(8) CSFs.
NASA Astrophysics Data System (ADS)
Cary, John R.; Abell, D.; Amundson, J.; Bruhwiler, D. L.; Busby, R.; Carlsson, J. A.; Dimitrov, D. A.; Kashdan, E.; Messmer, P.; Nieter, C.; Smithe, D. N.; Spentzouris, P.; Stoltz, P.; Trines, R. M.; Wang, H.; Werner, G. R.
2006-09-01
As the size and cost of particle accelerators escalate, high-performance computing plays an increasingly important role; optimization through accurate, detailed computermodeling increases performance and reduces costs. But consequently, computer simulations face enormous challenges. Early approximation methods, such as expansions in distance from the design orbit, were unable to supply detailed accurate results, such as in the computation of wake fields in complex cavities. Since the advent of message-passing supercomputers with thousands of processors, earlier approximations are no longer necessary, and it is now possible to compute wake fields, the effects of dampers, and self-consistent dynamics in cavities accurately. In this environment, the focus has shifted towards the development and implementation of algorithms that scale to large numbers of processors. So-called charge-conserving algorithms evolve the electromagnetic fields without the need for any global solves (which are difficult to scale up to many processors). Using cut-cell (or embedded) boundaries, these algorithms can simulate the fields in complex accelerator cavities with curved walls. New implicit algorithms, which are stable for any time-step, conserve charge as well, allowing faster simulation of structures with details small compared to the characteristic wavelength. These algorithmic and computational advances have been implemented in the VORPAL7 Framework, a flexible, object-oriented, massively parallel computational application that allows run-time assembly of algorithms and objects, thus composing an application on the fly.
NASA Astrophysics Data System (ADS)
Nitadori, Keigo; Makino, Junichiro; Hut, Piet
2006-12-01
The main performance bottleneck of gravitational N-body codes is the force calculation between two particles. We have succeeded in speeding up this pair-wise force calculation by factors between 2 and 10, depending on the code and the processor on which the code is run. These speed-ups were obtained by writing highly fine-tuned code for x86_64 microprocessors. Any existing N-body code, running on these chips, can easily incorporate our assembly code programs. In the current paper, we present an outline of our overall approach, which we illustrate with one specific example: the use of a Hermite scheme for a direct N2 type integration on a single 2.0 GHz Athlon 64 processor, for which we obtain an effective performance of 4.05 Gflops, for double-precision accuracy. In subsequent papers, we will discuss other variations, including the combinations of N log N codes, single-precision implementations, and performance on other microprocessors.
EPA AirNow Satellite Data Processor (ASDP) for Improving Air Quality Information
NASA Astrophysics Data System (ADS)
White, J. E.; Dickerson, P.; Szykman, J.; Chu, D.; Kondragunta, S.; Zhang, H.; Martin, R. V.; van Donkelaar, A.; Pasch, A. N.; Dye, T. S.; Zahn, P. H.; Haderman, M. D.; DeWinter, J. L.
2012-12-01
The US Environmental Protection Agency (EPA) AirNow program provides Air Quality Index (AQI) information to the public, decision-makers, researchers and the media (data and forecasts) mainly for ozone and PM2.5 (particles smaller than 2.5 μm in median diameter). EPA wants to provide the best information available to the public and integrating NASA satellite-derived surface PM2.5 concentrations with ground-level PM2.5 observations has proved promising. The AirNow Satellite Data Processor (ASDP) uses daily PM2.5 estimates and uncertainties derived from average Aqua and Terra MODerate resolution Imaging Spectrometer (MODIS) AOD in near-real-time over the United States and fuses the results with observed PM2.5 measurements to create several air quality products for evaluation. In addition to the description of the AirNow program and the AirNow ASDP, several case studies will be presented to show the value that NASA satellite information adds to maps of air quality.
Thai University Students' Prior Knowledge about P-Waves Generated during Particle Motion
ERIC Educational Resources Information Center
Rakkapao, Suttida; Arayathanikul, Kwan; Pananont, Passakorn
2009-01-01
The goal of this study is to identify Thai students' prior knowledge about particle motion when P-waves arrive. This existing idea significantly influences what and how students learn in the classroom. The data were collected via conceptual open-ended questions designed by the researchers and through explanatory follow-up interviews. Participants…
Design and Development of a Configurable Fault-Tolerant Processor (CFTP) for Space Applications
2003-06-01
Slot Region BIRA/ IASB Figure 5. Van Allen Belts (After Ref. [20].) 17 The trapped particles in the Van Allen Belts include electrons trapped in...the most sig- nificant role in the failure of electronic equipment in orbit. There exists a wide range of these circuit-crippling events, including...such, the goals of the CFTP project are the most rigorous constraints applied on the design process, and are what will ensure its future role as an
Phase-I investigation of high-efficiency power amplifiers for 325 and 650 MHz
DOE Office of Scientific and Technical Information (OSTI.GOV)
Raab, Frederick
2018-01-27
This Phase-I SBIR grant investigated techniques for high-efficiency power amplification for DoE particle accelerators such as Project X that operate at 325 and 650 MHz. The recommended system achieves high efficiency, high reliability, and hot-swap capability by integrating class-F power amplifiers, class-S modulators, power combiners, and a digital signal processor. Experimental evaluations demonstrate the production of 120 W per transistor with overall efficiencies from 86 percent at 325 MHz and 80 percent at 650 MHz.
The high energy astronomy observatories
NASA Technical Reports Server (NTRS)
Neighbors, A. K.; Doolittle, R. F.; Halpers, R. E.
1977-01-01
The forthcoming NASA project of orbiting High Energy Astronomy Observatories (HEAO's) designed to probe the universe by tracing celestial radiations and particles is outlined. Solutions to engineering problems concerning HEAO's which are integrated, yet built to function independently are discussed, including the onboard digital processor, mirror assembly and the thermal shield. The principle of maximal efficiency with minimal cost and the potential capability of the project to provide explanations to black holes, pulsars and gamma-ray bursts are also stressed. The first satellite is scheduled for launch in April 1977.
Catalytic oxidation for treatment of ECLSS and PMMS waste streams
NASA Technical Reports Server (NTRS)
Akse, James R.; Jolly, Clifford D.
1991-01-01
It is shown that catalytic oxidation is an effective technique for the removal of trace organic contaminants in a multifiltration potable processor's effluent. Essential elements of this technology are devices that deliver oxygen to the influent, and remove gaseous reaction byproducts from the effluent, via hollow-tube, gas-permeable membranes. Iodine, which poisons existing catalysis, is removed by a small deiodination bed prior to catalytic reactor entrance. The catalyst used is a mixture of Pt and Ru deposited on carbon, operating at 125-160 C and 39-90 psi pressures.
Summary of Documentation for DYNA3D-ParaDyn's Software Quality Assurance Regression Test Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zywicz, Edward
The Software Quality Assurance (SQA) regression test suite for DYNA3D (Zywicz and Lin, 2015) and ParaDyn (DeGroot, et al., 2015) currently contains approximately 600 problems divided into 21 suites, and is a required component of ParaDyn’s SQA plan (Ferencz and Oliver, 2013). The regression suite allows developers to ensure that software modifications do not unintentionally alter the code response. The entire regression suite is run prior to permanently incorporating any software modification or addition. When code modifications alter test problem results, the specific cause must be determined and fully understood before the software changes and revised test answers can bemore » incorporated. The regression suite is executed on LLNL platforms using a Python script and an associated data file. The user specifies the DYNA3D or ParaDyn executable, number of processors to use, test problems to run, and other options to the script. The data file details how each problem and its answer extraction scripts are executed. For each problem in the regression suite there exists an input deck, an eight-processor partition file, an answer file, and various extraction scripts. These scripts assemble a temporary answer file in a specific format from the simulation results. The temporary and stored answer files are compared to a specific level of numerical precision, and when differences are detected the test problem is flagged as failed. Presently, numerical results are stored and compared to 16 digits. At this accuracy level different processor types, compilers, number of partitions, etc. impact the results to various degrees. Thus, for consistency purposes the regression suite is run with ParaDyn using 8 processors on machines with a specific processor type (currently the Intel Xeon E5530 processor). For non-parallel regression problems, i.e., the two XFEM problems, DYNA3D is used instead. When environments or platforms change, executables using the current source code and the new resource are created and the regression suite is run. If differences in answers arise, the new answers are retained provided that the differences are inconsequential. This bootstrap approach allows the test suite answers to evolve in a controlled manner with a high level of confidence. Developers also run the entire regression suite with (serial) DYNA3D. While these results normally differ from the stored (parallel) answers, abnormal termination or wildly different values are strong indicators of potential issues.« less
Challenging prior evidence for a shared syntactic processor for language and music.
Perruchet, Pierre; Poulin-Charronnat, Bénédicte
2013-04-01
A theoretical landmark in the growing literature comparing language and music is the shared syntactic integration resource hypothesis (SSIRH; e.g., Patel, 2008), which posits that the successful processing of linguistic and musical materials relies, at least partially, on the mastery of a common syntactic processor. Supporting the SSIRH, Slevc, Rosenberg, and Patel (Psychonomic Bulletin & Review 16(2):374-381, 2009) recently reported data showing enhanced syntactic garden path effects when the sentences were paired with syntactically unexpected chords, whereas the musical manipulation had no reliable effect on the processing of semantic violations. The present experiment replicated Slevc et al.'s (2009) procedure, except that syntactic garden paths were replaced with semantic garden paths. We observed the very same interactive pattern of results. These findings suggest that the element underpinning interactions is the garden path configuration, rather than the implication of an alleged syntactic module. We suggest that a different amount of attentional resources is recruited to process each type of linguistic manipulations, hence modulating the resources left available for the processing of music and, consequently, the effects of musical violations.
Quality changes in macadamia kernel between harvest and farm-gate.
Walton, David A; Wallace, Helen M
2011-02-01
Macadamia integrifolia, Macadamia tetraphylla and their hybrids are cultivated for their edible kernels. After harvest, nuts-in-shell are partially dried on-farm and sorted to eliminate poor-quality kernels before consignment to a processor. During these operations, kernel quality may be lost. In this study, macadamia nuts-in-shell were sampled at five points of an on-farm postharvest handling chain from dehusking to the final storage silo to assess quality loss prior to consignment. Shoulder damage, weight of pieces and unsound kernel were assessed for raw kernels, and colour, mottled colour and surface damage for roasted kernels. Shoulder damage, weight of pieces and unsound kernel for raw kernels increased significantly between the dehusker and the final silo. Roasted kernels displayed a significant increase in dark colour, mottled colour and surface damage during on-farm handling. Significant loss of macadamia kernel quality occurred on a commercial farm during sorting and storage of nuts-in-shell before nuts were consigned to a processor. Nuts-in-shell should be dried as quickly as possible and on-farm handling minimised to maintain optimum kernel quality. 2010 Society of Chemical Industry.
Efficiency of static core turn-off in a system-on-a-chip with variation
Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong
2013-10-29
A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T
2013-01-01
Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less
Calibrating thermal behavior of electronics
Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.
2017-07-11
A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.
Calibrating thermal behavior of electronics
Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.
2016-05-31
A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.
Calibrating thermal behavior of electronics
Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.
2017-01-03
A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.
Numerical study of the vortex tube reconnection using vortex particle method on many graphics cards
NASA Astrophysics Data System (ADS)
Kudela, Henryk; Kosior, Andrzej
2014-08-01
Vortex Particle Methods are one of the most convenient ways of tracking the vorticity evolution. In the article we presented numerical recreation of the real life experiment concerning head-on collision of two vortex rings. In the experiment the evolution and reconnection of the vortex structures is tracked with passive markers (paint particles) which in viscous fluid does not follow the evolution of vorticity field. In numerical computations we showed the difference between vorticity evolution and movement of passive markers. The agreement with the experiment was very good. Due to problems with very long time of computations on a single processor the Vortex-in-Cell method was implemented on the multicore architecture of the graphics cards (GPUs). Vortex Particle Methods are very well suited for parallel computations. As there are myriads of particles in the flow and for each of them the same equations of motion have to be solved the SIMD architecture used in GPUs seems to be perfect. The main disadvantage in this case is the small amount of the RAM memory. To overcome this problem we created a multiGPU implementation of the VIC method. Some remarks on parallel computing are given in the article.
Particle displacement tracking applied to air flows
NASA Technical Reports Server (NTRS)
Wernet, Mark P.
1991-01-01
Electronic Particle Image Velocimeter (PIV) techniques offer many advantages over conventional photographic PIV methods such as fast turn around times and simplified data reduction. A new all electronic PIV technique was developed which can measure high speed gas velocities. The Particle Displacement Tracking (PDT) technique employs a single cw laser, small seed particles (1 micron), and a single intensified, gated CCD array frame camera to provide a simple and fast method of obtaining two-dimensional velocity vector maps with unambiguous direction determination. Use of a single CCD camera eliminates registration difficulties encountered when multiple cameras are used to obtain velocity magnitude and direction information. An 80386 PC equipped with a large memory buffer frame-grabber board provides all of the data acquisition and data reduction operations. No array processors of other numerical processing hardware are required. Full video resolution (640x480 pixel) is maintained in the acquired images, providing high resolution video frames of the recorded particle images. The time between data acquisition to display of the velocity vector map is less than 40 sec. The new electronic PDT technique is demonstrated on an air nozzle flow with velocities less than 150 m/s.
The accurate particle tracer code
Wang, Yulei; Liu, Jian; Qin, Hong; ...
2017-07-20
The Accurate Particle Tracer (APT) code is designed for systematic large-scale applications of geometric algorithms for particle dynamical simulations. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and nonlinear problems. To provide a flexible and convenient I/O interface, the libraries of Lua and Hdf5 are used. Following a three-step procedure, users can efficiently extend the libraries of electromagnetic configurations, external non-electromagnetic forces, particle pushers, and initialization approaches by use of the extendible module. APT has been used in simulations of key physical problems, such as runawaymore » electrons in tokamaks and energetic particles in Van Allen belt. As an important realization, the APT-SW version has been successfully distributed on the world’s fastest computer, the Sunway TaihuLight supercomputer, by supporting master–slave architecture of Sunway many-core processors. Here, based on large-scale simulations of a runaway beam under parameters of the ITER tokamak, it is revealed that the magnetic ripple field can disperse the pitch-angle distribution significantly and improve the confinement of energetic runaway beam on the same time.« less
The accurate particle tracer code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Yulei; Liu, Jian; Qin, Hong
The Accurate Particle Tracer (APT) code is designed for systematic large-scale applications of geometric algorithms for particle dynamical simulations. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and nonlinear problems. To provide a flexible and convenient I/O interface, the libraries of Lua and Hdf5 are used. Following a three-step procedure, users can efficiently extend the libraries of electromagnetic configurations, external non-electromagnetic forces, particle pushers, and initialization approaches by use of the extendible module. APT has been used in simulations of key physical problems, such as runawaymore » electrons in tokamaks and energetic particles in Van Allen belt. As an important realization, the APT-SW version has been successfully distributed on the world’s fastest computer, the Sunway TaihuLight supercomputer, by supporting master–slave architecture of Sunway many-core processors. Here, based on large-scale simulations of a runaway beam under parameters of the ITER tokamak, it is revealed that the magnetic ripple field can disperse the pitch-angle distribution significantly and improve the confinement of energetic runaway beam on the same time.« less
Influence of hydrothermal synthesis parameters on the properties of hydroxyapatite nanoparticles.
Kuśnieruk, Sylwia; Wojnarowicz, Jacek; Chodara, Agnieszka; Chudoba, Tadeusz; Gierlotka, Stanislaw; Lojkowski, Witold
2016-01-01
Hydroxyapatite (HAp) nanoparticles of tunable diameter were obtained by the precipitation method at room temperature and by microwave hydrothermal synthesis (MHS). The following parameters of the obtained nanostructured HAp were determined: pycnometric density, specific surface area, phase purity, lattice parameters, particle size, particle size distribution, water content, and structure. HAp nanoparticle morphology and structure were determined using scanning electron microscopy (SEM) and transmission electron microscopy (TEM). X-ray diffraction measurements confirmed crystalline HAp was synthesized, which was pure in terms of phase. It was shown that by changing the synthesis parameters, the diameter of HAp nanoparticles could be controlled. The average diameter of the HAp nanoparticles was determined by Scherrer's equation via the Nanopowder XRD Processor Demo web application, which interprets the results of specific surface area and TEM measurements using the dark-field technique. The obtained nanoparticles with average particle diameter ranging from 8-39 nm were characterized by having homogeneous morphology with a needle shape and a narrow particle size distribution. Strong similarities were found when comparing the properties of some types of nanostructured hydroxyapatite with natural occurring apatite found in animal bones and teeth.
Development of a multifunctional particle spectrometer for space radiation imaging
NASA Astrophysics Data System (ADS)
Maddox, Erik; Palacios, Alex; Lampridis, Dimitris; Kraft, Stefan; Owens, Alan; Tomuta, Dana; Ostendorf, Reint
2008-06-01
For future exploration of the solar system, the European Space Agency (ESA) is planning missions to Mercury (BepiColombo), the Sun (SolarOrbiter) and to the moons of Jupiter and Saturn. The expected intensity of radiation during such missions is hazardous for the scientific instruments and the satellite. To extend the lifetime of the satellite and its payload a multifunctional particle spectrometer (MPS) is being developed. The basic function of the MPS is to send an alarm signal to the satellite control system during periods of high radiation. In addition the MPS is a scientific instrument that will unfold the composition of the different contributing particles on-line by the dE/dx versus E method. The energy spectrum and angular distribution of the particles will be recorded as well. This article describes the main requirements and the base line design for the MPS. A readout scheme consisting of a 32 channel ASIC from IDEAS is proposed and the signal filtering algorithm will run on a digital signal processor based on FPGA technology. Results are shown from prototype calibration studies with a proton beam.
A Distributed Prognostic Health Management Architecture
NASA Technical Reports Server (NTRS)
Bhaskar, Saha; Saha, Sankalita; Goebel, Kai
2009-01-01
This paper introduces a generic distributed prognostic health management (PHM) architecture with specific application to the electrical power systems domain. Current state-of-the-art PHM systems are mostly centralized in nature, where all the processing is reliant on a single processor. This can lead to loss of functionality in case of a crash of the central processor or monitor. Furthermore, with increases in the volume of sensor data as well as the complexity of algorithms, traditional centralized systems become unsuitable for successful deployment, and efficient distributed architectures are required. A distributed architecture though, is not effective unless there is an algorithmic framework to take advantage of its unique abilities. The health management paradigm envisaged here incorporates a heterogeneous set of system components monitored by a varied suite of sensors and a particle filtering (PF) framework that has the power and the flexibility to adapt to the different diagnostic and prognostic needs. Both the diagnostic and prognostic tasks are formulated as a particle filtering problem in order to explicitly represent and manage uncertainties; however, typically the complexity of the prognostic routine is higher than the computational power of one computational element ( CE). Individual CEs run diagnostic routines until the system variable being monitored crosses beyond a nominal threshold, upon which it coordinates with other networked CEs to run the prognostic routine in a distributed fashion. Implementation results from a network of distributed embedded devices monitoring a prototypical aircraft electrical power system are presented, where the CEs are Sun Microsystems Small Programmable Object Technology (SPOT) devices.
A Radiation Dosimeter Concept for the Lunar Surface Environment
NASA Technical Reports Server (NTRS)
Adams, James H.; Christl, Mark J.; Watts, John; Kuznetsov, Eugeny N.; Parnell, Thomas A.; Pendleton, Geoff N.
2007-01-01
A novel silicon detector configuration for radiation dose measurements in an environment where solar energetic particles are of most concern is described. The dosimeter would also measure the dose from galactic cosmic rays. In the lunar environment a large range in particle flux and ionization density must be measured and converted to dose equivalent. This could be accomplished with a thick (e.g. 2mm) silicon detector segmented into cubic volume elements "voxels" followed by a second, thin monolithic silicon detector. The electronics needed to implement this detector concept include analog signal processors (ASIC) and a field programmable gate array (FPGA) for data accumulation and conversion to linear energy transfer (LET) spectra and to dose-equivalent (Sievert). Currently available commercial ASIC's and FPGA's are suitable for implementing the analog and digital systems.
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations
NASA Astrophysics Data System (ADS)
Hause, Benjamin; Parker, Scott
2012-10-01
We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the GPU accelerator compiler directives. We have implemented the GPU acceleration on a Core I7 gaming PC with a NVIDIA GTX 580 GPU. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. Optimization strategies and comparisons between DIRAC and the gaming PC will be presented. We will also discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
Adaptation in pronoun resolution: Evidence from Brazilian and European Portuguese.
Fernandes, Eunice G; Luegi, Paula; Correa Soares, Eduardo; de la Fuente, Israel; Hemforth, Barbara
2018-04-26
Previous research accounting for pronoun resolution as a problem of probabilistic inference has not explored the phenomenon of adaptation, whereby the processor constantly tracks and adapts, rationally, to changes in a statistical environment. We investigate whether Brazilian (BP) and European Portuguese (EP) speakers adapt to variations in the probability of occurrence of ambiguous overt and null pronouns, in two experiments assessing resolution toward subject and object referents. For each variety (BP, EP), participants were faced with either the same number of null and overt pronouns (equal distribution), or with an environment with fewer overt (than null) pronouns (unequal distribution). We find that the preference for interpreting overt pronouns as referring back to an object referent (object-biased interpretation) is higher when there are fewer overt pronouns (i.e., in the unequal, relative to the equal distribution condition). This is especially the case for BP, a variety with higher prior frequency and smaller object-biased interpretation of overt pronouns, suggesting that participants adapted incrementally and integrated prior statistical knowledge with the knowledge obtained in the experiment. We hypothesize that comprehenders adapted rationally, with the goal of maintaining, across variations in pronoun probability, the likelihood of subject and object referents. Our findings unify insights from research in pronoun resolution and in adaptation, and add to previous studies in both topics: They provide evidence for the influence of pronoun probability in pronoun resolution, and for an adaptation process whereby the language processor not only tracks statistical information, but uses it to make interpretational inferences. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
Electrostatic removal of airborne particulates employing fiber beds
Postma, Arlin Keith; Winegardner, W. Kevin
1977-01-01
A method and apparatus for collecting aerosol particles. The particles are subjected to an electrostatic charge prior to collection in an electrically resistive fiber bed. The method is applicable to particles in a broad size range, including the difficult-to-remove particles having diameters between 0.01 and 2 microns.
Symplectic multi-particle tracking on GPUs
NASA Astrophysics Data System (ADS)
Liu, Zhicong; Qiang, Ji
2018-05-01
A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
Data decomposition of Monte Carlo particle transport simulations via tally servers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Romano, Paul K.; Siegel, Andrew R.; Forget, Benoit
An algorithm for decomposing large tally data in Monte Carlo particle transport simulations is developed, analyzed, and implemented in a continuous-energy Monte Carlo code, OpenMC. The algorithm is based on a non-overlapping decomposition of compute nodes into tracking processors and tally servers. The former are used to simulate the movement of particles through the domain while the latter continuously receive and update tally data. A performance model for this approach is developed, suggesting that, for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead on contemporary supercomputers. An implementation of the algorithmmore » in OpenMC is then tested on the Intrepid and Titan supercomputers, supporting the key predictions of the model over a wide range of parameters. We thus conclude that the tally server algorithm is a successful approach to circumventing classical on-node memory constraints en route to unprecedentedly detailed Monte Carlo reactor simulations.« less
Methods and systems for providing reconfigurable and recoverable computing resources
NASA Technical Reports Server (NTRS)
Stange, Kent (Inventor); Hess, Richard (Inventor); Kelley, Gerald B (Inventor); Rogers, Randy (Inventor)
2010-01-01
A method for optimizing the use of digital computing resources to achieve reliability and availability of the computing resources is disclosed. The method comprises providing one or more processors with a recovery mechanism, the one or more processors executing one or more applications. A determination is made whether the one or more processors needs to be reconfigured. A rapid recovery is employed to reconfigure the one or more processors when needed. A computing system that provides reconfigurable and recoverable computing resources is also disclosed. The system comprises one or more processors with a recovery mechanism, with the one or more processors configured to execute a first application, and an additional processor configured to execute a second application different than the first application. The additional processor is reconfigurable with rapid recovery such that the additional processor can execute the first application when one of the one more processors fails.
Chlorine disinfection of grey water for reuse: effect of organics and particles.
Winward, Gideon P; Avery, Lisa M; Stephenson, Tom; Jefferson, Bruce
2008-01-01
Adequate disinfection of grey water prior to reuse is important to prevent the potential transmission of disease-causing microorganisms. Chlorine is a widely utilised disinfectant and as such is a leading contender for disinfection of grey water intended for reuse. This study examined the impact of organics and particles on chlorine disinfection of grey water, measured by total coliform inactivation. The efficacy of disinfection was most closely linked with particle size. Larger particles shielded total coliforms from inactivation and disinfection efficacy decreased with increasing particle size. Blending to extract particle-associated coliforms (PACs) following chlorine disinfection revealed that up to 91% of total coliforms in chlorinated grey water were particle associated. The organic concentration of grey water affected chlorine demand but did not influence the disinfection resistance of total coliforms when a free chlorine residual was maintained. Implications for urban water reuse are discussed and it is recommended that grey water treatment systems target suspended solids removal to ensure removal of PACs prior to disinfection.
THOR Fields and Wave Processor - FWP
NASA Astrophysics Data System (ADS)
Soucek, Jan; Rothkaehl, Hanna; Ahlen, Lennart; Balikhin, Michael; Carr, Christopher; Dekkali, Moustapha; Khotyaintsev, Yuri; Lan, Radek; Magnes, Werner; Morawski, Marek; Nakamura, Rumi; Uhlir, Ludek; Yearby, Keith; Winkler, Marek; Zaslavsky, Arnaud
2017-04-01
If selected, Turbulence Heating ObserveR (THOR) will become the first spacecraft mission dedicated to the study of plasma turbulence. The Fields and Waves Processor (FWP) is an integrated electronics unit for all electromagnetic field measurements performed by THOR. FWP will interface with all THOR fields sensors: electric field antennas of the EFI instrument, the MAG fluxgate magnetometer, and search-coil magnetometer (SCM), and perform signal digitization and on-board data processing. FWP box will house multiple data acquisition sub-units and signal analyzers all sharing a common power supply and data processing unit and thus a single data and power interface to the spacecraft. Integrating all the electromagnetic field measurements in a single unit will improve the consistency of field measurement and accuracy of time synchronization. The scientific value of highly sensitive electric and magnetic field measurements in space has been demonstrated by Cluster (among other spacecraft) and THOR instrumentation will further improve on this heritage. Large dynamic range of the instruments will be complemented by a thorough electromagnetic cleanliness program, which will prevent perturbation of field measurements by interference from payload and platform subsystems. Taking advantage of the capabilities of modern electronics and the large telemetry bandwidth of THOR, FWP will provide multi-component electromagnetic field waveforms and spectral data products at a high time resolution. Fully synchronized sampling of many signals will allow to resolve wave phase information and estimate wavelength via interferometric correlations between EFI probes. FWP will also implement a plasma resonance sounder and a digital plasma quasi-thermal noise analyzer designed to provide high cadence measurements of plasma density and temperature complementary to data from particle instruments. FWP will rapidly transmit information about magnetic field vector and spacecraft potential to the particle instrument data processing unit (PPU) via a dedicated digital link. This information will help particle instruments to optimize energy and angular sweeps and calculate on-board moments. FWP will also coordinate the acquisition of high resolution waveform snapshots with very high time resolution electron data from the TEA instrument. This combined wave/particle measurement will provide the ultimate dataset for investigation of wave-particle interactions on electron scales. The FWP instrument shall be designed and built by an international consortium of scientific institutes from Czech Republic, Poland, France, UK, Sweden and Austria.
Rectangular Array Of Digital Processors For Planning Paths
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.
1993-01-01
Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, William Michael; Plimpton, Steven James; Wang, Peng
2010-03-01
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
Buffered coscheduling for parallel programming and enhanced fault tolerance
Petrini, Fabrizio [Los Alamos, NM; Feng, Wu-chun [Los Alamos, NM
2006-01-31
A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors
NASA Technical Reports Server (NTRS)
Seale, R. H.
1979-01-01
The prediction of the SRB and ET impact areas requires six separate processors. The SRB impact prediction processor computes the impact areas and related trajectory data for each SRB element. Output from this processor is stored on a secure file accessible by the SRB impact plot processor which generates the required plots. Similarly the ET RTLS impact prediction processor and the ET RTLS impact plot processor generates the ET impact footprints for return-to-launch-site (RTLS) profiles. The ET nominal/AOA/ATO impact prediction processor and the ET nominal/AOA/ATO impact plot processor generate the ET impact footprints for non-RTLS profiles. The SRB and ET impact processors compute the size and shape of the impact footprints by tabular lookup in a stored footprint dispersion data base. The location of each footprint is determined by simulating a reference trajectory and computing the reference impact point location. To insure consistency among all flight design system (FDS) users, much input required by these processors will be obtained from the FDS master data base.
Broadband Processing in a Noisy Shallow Ocean Environment: A Particle Filtering Approach
Candy, J. V.
2016-04-14
Here we report that when a broadband source propagates sound in a shallow ocean the received data can become quite complicated due to temperature-related sound-speed variations and therefore a highly dispersive environment. Noise and uncertainties disrupt this already chaotic environment even further because disturbances propagate through the same inherent acoustic channel. The broadband (signal) estimation/detection problem can be decomposed into a set of narrowband solutions that are processed separately and then combined to achieve more enhancement of signal levels than that available from a single frequency, thereby allowing more information to be extracted leading to a more reliable source detection.more » A Bayesian solution to the broadband modal function tracking, pressure-field enhancement, and source detection problem is developed that leads to nonparametric estimates of desired posterior distributions enabling the estimation of useful statistics and an improved processor/detector. In conclusion, to investigate the processor capabilities, we synthesize an ensemble of noisy, broadband, shallow-ocean measurements to evaluate its overall performance using an information theoretical metric for the preprocessor and the receiver operating characteristic curve for the detector.« less
2014-04-17
measured with an infrared pyrometer (550-3200°C). The substrates were coated with diamond nanoparticles (ITC Inc.) which serve as nucleation sites...wafers were seeded with nano-diamond particles prior to film growth to provide nucleation sites for diamond growth. To study the effect of surface...wafers are appropriate to generate uniform seeding. AFM tips were seeded with nano-diamond particles prior to coating with NCD to provide nucleation
TOPICAL REVIEW: Advances and challenges in computational plasma science
NASA Astrophysics Data System (ADS)
Tang, W. M.; Chan, V. S.
2005-02-01
Scientific simulation, which provides a natural bridge between theory and experiment, is an essential tool for understanding complex plasma behaviour. Recent advances in simulations of magnetically confined plasmas are reviewed in this paper, with illustrative examples, chosen from associated research areas such as microturbulence, magnetohydrodynamics and other topics. Progress has been stimulated, in particular, by the exponential growth of computer speed along with significant improvements in computer technology. The advances in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics have produced increasingly good agreement between experimental observations and computational modelling. This was enabled by two key factors: (a) innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales and (b) access to powerful new computational resources. Excellent progress has been made in developing codes for which computer run-time and problem-size scale well with the number of processors on massively parallel processors (MPPs). Examples include the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPPs to produce three-dimensional, general geometry, nonlinear particle simulations that have accelerated advances in understanding the nature of turbulence self-regulation by zonal flows. These calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In looking towards the future, the current results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. This should produce the scientific excitement which will help to (a) stimulate enhanced cross-cutting collaborations with other fields and (b) attract the bright young talent needed for the future health of the field of plasma science.
Advances and challenges in computational plasma science
NASA Astrophysics Data System (ADS)
Tang, W. M.
2005-02-01
Scientific simulation, which provides a natural bridge between theory and experiment, is an essential tool for understanding complex plasma behaviour. Recent advances in simulations of magnetically confined plasmas are reviewed in this paper, with illustrative examples, chosen from associated research areas such as microturbulence, magnetohydrodynamics and other topics. Progress has been stimulated, in particular, by the exponential growth of computer speed along with significant improvements in computer technology. The advances in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics have produced increasingly good agreement between experimental observations and computational modelling. This was enabled by two key factors: (a) innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales and (b) access to powerful new computational resources. Excellent progress has been made in developing codes for which computer run-time and problem-size scale well with the number of processors on massively parallel processors (MPPs). Examples include the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPPs to produce three-dimensional, general geometry, nonlinear particle simulations that have accelerated advances in understanding the nature of turbulence self-regulation by zonal flows. These calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In looking towards the future, the current results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. This should produce the scientific excitement which will help to (a) stimulate enhanced cross-cutting collaborations with other fields and (b) attract the bright young talent needed for the future health of the field of plasma science.
Video Guidance Sensor and Time-of-Flight Rangefinder
NASA Technical Reports Server (NTRS)
Bryan, Thomas; Howard, Richard; Bell, Joseph L.; Roe, Fred D.; Book, Michael L.
2007-01-01
A proposed video guidance sensor (VGS) would be based mostly on the hardware and software of a prior Advanced VGS (AVGS), with some additions to enable it to function as a time-of-flight rangefinder (in contradistinction to a triangulation or image-processing rangefinder). It would typically be used at distances of the order of 2 or 3 kilometers, where a typical target would appear in a video image as a single blob, making it possible to extract the direction to the target (but not the orientation of the target or the distance to the target) from a video image of light reflected from the target. As described in several previous NASA Tech Briefs articles, an AVGS system is an optoelectronic system that provides guidance for automated docking of two vehicles. In the original application, the two vehicles are spacecraft, but the basic principles of design and operation of the system are applicable to aircraft, robots, objects maneuvered by cranes, or other objects that may be required to be aligned and brought together automatically or under remote control. In a prior AVGS system of the type upon which the now-proposed VGS is largely based, the tracked vehicle is equipped with one or more passive targets that reflect light from one or more continuous-wave laser diode(s) on the tracking vehicle, a video camera on the tracking vehicle acquires images of the targets in the reflected laser light, the video images are digitized, and the image data are processed to obtain the direction to the target. The design concept of the proposed VGS does not call for any memory or processor hardware beyond that already present in the prior AVGS, but does call for some additional hardware and some additional software. It also calls for assignment of some additional tasks to two subsystems that are parts of the prior VGS: a field-programmable gate array (FPGA) that generates timing and control signals, and a digital signal processor (DSP) that processes the digitized video images. The additional timing and control signals generated by the FPGA would cause the VGS to alternate between an imaging (direction-finding) mode and a time-of-flight (range-finding mode) and would govern operation in the range-finding mode.
The artificial retina for track reconstruction at the LHC crossing rate
NASA Astrophysics Data System (ADS)
Abba, A.; Bedeschi, F.; Citterio, M.; Caponio, F.; Cusimano, A.; Geraci, A.; Marino, P.; Morello, M. J.; Neri, N.; Punzi, G.; Piucci, A.; Ristori, L.; Spinella, F.; Stracka, S.; Tonelli, D.
2016-04-01
We present the results of an R&D study for a specialized processor capable of precisely reconstructing events with hundreds of charged-particle tracks in pixel and silicon strip detectors at 40 MHz, thus suitable for processing LHC events at the full crossing frequency. For this purpose we design and test a massively parallel pattern-recognition algorithm, inspired to the current understanding of the mechanisms adopted by the primary visual cortex of mammals in the early stages of visual-information processing. The detailed geometry and charged-particle's activity of a large tracking detector are simulated and used to assess the performance of the artificial retina algorithm. We find that high-quality tracking in large detectors is possible with sub-microsecond latencies when the algorithm is implemented in modern, high-speed, high-bandwidth FPGA devices.
End-to-end test of the electron-proton spectrometer
NASA Technical Reports Server (NTRS)
Cash, B. L.
1972-01-01
A series of end-to-end tests were performed to demonstrate the proper functioning of the complete Electron-Proton Spectrometer (EPS). The purpose of the tests was to provide experimental verification of the design and to provide a complete functional performance check of the instrument from the excitation of the sensors to and including the data processor and equipment test set. Each of the channels of the EPS was exposed to a calibrated beam of energetic particles, and counts were accumulated for a predetermined period of time for each of several energies. The counts were related to the known flux of particles to give a monodirectional response function for each channel. The measured response function of the test unit was compared to the response function determined for the calibration sensors from the data taken from the calibration program.
Coding, testing and documentation of processors for the flight design system
NASA Technical Reports Server (NTRS)
1980-01-01
The general functional design and implementation of processors for a space flight design system are briefly described. Discussions of a basetime initialization processor; conic, analytical, and precision coasting flight processors; and an orbit lifetime processor are included. The functions of several utility routines are also discussed.
The computational structural mechanics testbed generic structural-element processor manual
NASA Technical Reports Server (NTRS)
Stanley, Gary M.; Nour-Omid, Shahram
1990-01-01
The usage and development of structural finite element processors based on the CSM Testbed's Generic Element Processor (GEP) template is documented. By convention, such processors have names of the form ESi, where i is an integer. This manual is therefore intended for both Testbed users who wish to invoke ES processors during the course of a structural analysis, and Testbed developers who wish to construct new element processors (or modify existing ones).
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1994-01-01
In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.
Karasick, Michael S.; Strip, David R.
1996-01-01
A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.
Switch for serial or parallel communication networks
Crosette, D.B.
1994-07-19
A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.
Switch for serial or parallel communication networks
Crosette, Dario B.
1994-01-01
A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.
Abrasion resistant composition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fischer, Keith D; Barnes, Christopher A; Henderson, Stephen L
A surface covering composition of abrasion resistant character adapted for disposition in overlying bonded relation to a metal substrate. The surface covering composition includes metal carbide particles within a metal matrix at a packing factor of not less than about 0.6. Not less than about 40 percent by weight of the metal carbide particles are characterized by an effective diameter in the range of +14-32 mesh prior to introduction to the metal matrix. Not less than about 3 percent by weight of the metal carbide particles are characterized by an effective diameter of +60 mesh prior to introduction to themore » metal matrix.« less
Single Event Transients in Linear Integrated Circuits
NASA Technical Reports Server (NTRS)
Buchner, Stephen; McMorrow, Dale
2005-01-01
On November 5, 2001, a processor reset occurred on board the Microwave Anisotropy Probe (MAP), a NASA mission to measure the anisotropy of the microwave radiation left over from the Big Bang. The reset caused the spacecraft to enter a safehold mode from which it took several days to recover. Were that to happen regularly, the entire mission would be compromised, so it was important to find the cause of the reset and, if possible, to mitigate it. NASA assembled a team of engineers that included experts in radiation effects to tackle the problem. The first clue was the observation that the processor reset occurred during a solar event characterized by large increases in the proton and heavy ion fluxes emitted by the sun. To the radiation effects engineers on the team, this strongly suggested that particle radiation might be the culprit, particularly when it was discovered that the reset circuit contained three voltage comparators (LM139). Previous testing revealed that large voltage transients, or glitches appeared at the output of the LM139 when it was exposed to a beam of heavy ions [NI96]. The function of the reset circuit was to monitor the supply voltage and to issue a reset command to the processor should the voltage fall below a reference of 2.5 V [PO02]. Eventually, the team of engineers concluded that ionizing particle radiation from the solar event produced a negative voltage transient on the output of one of the LM139s sufficiently large to reset the processor on MAP. Fortunately, as of the end of 2004, only two such resets have occurred. The reset on MAP was not the first malfunction on a spacecraft attributed to a transient. That occurred shortly after the launch of NASA s TOPEX/Poseidon satellite in 1992. It was suspected, and later confirmed, that an anomaly in the Earth Sensor was caused by a transient in an operational amplifier (OP-15) [KO93]. Over the next few years, problems on TDRS, CASSINI, [PR02] SOHO [HA99,HA01] and TERRA were also attributed to transients. In some cases, such events produced resets by falsely triggering circuits designed to protect against over- voltage or over-current. On at least three occasions, transients caused satellites to switch into "safe mode" in which most of the systems on board the satellites were powered down for an extended period. By the time the satellites were reconfigured and returned to full operational state, much scientific data had been lost. Fortunately, no permanent damage occurred in any of the systems and they were all successfully re-activated.
Conditions for space invariance in optical data processors used with coherent or noncoherent light.
Arsenault, H R
1972-10-01
The conditions for space invariance in coherent and noncoherent optical processors are considered. All linear optical processors are shown to belong to one of two types. The conditions for space invariance are more stringent for noncoherent processors than for coherent processors, so that a system that is linear in coherent light may be nonlinear in noncoherent light. However, any processor that is linear in noncoherent light is also linear in the coherent limit.
Distributed multitasking ITS with PVM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fan, W.C.; Halbleib, J.A. Sr.
1995-12-31
Advances in computer hardware and communication software have made it possible to perform parallel-processing computing on a collection of desktop workstations. For many applications, multitasking on a cluster of high-performance workstations has achieved performance comparable to or better than that on a traditional supercomputer. From the point of view of cost-effectiveness, it also allows users to exploit available but unused computational resources and thus achieve a higher performance-to-cost ratio. Monte Carlo calculations are inherently parallelizable because the individual particle trajectories can be generated independently with minimum need for interprocessor communication. Furthermore, the number of particle histories that can be generatedmore » in a given amount of wall-clock time is nearly proportional to the number of processors in the cluster. This is an important fact because the inherent statistical uncertainty in any Monte Carlo result decreases as the number of histories increases. For these reasons, researchers have expended considerable effort to take advantage of different parallel architectures for a variety of Monte Carlo radiation transport codes, often with excellent results. The initial interest in this work was sparked by the multitasking capability of the MCNP code on a cluster of workstations using the Parallel Virtual Machine (PVM) software. On a 16-machine IBM RS/6000 cluster, it has been demonstrated that MCNP runs ten times as fast as on a single-processor CRAY YMP. In this paper, we summarize the implementation of a similar multitasking capability for the coupled electronphoton transport code system, the Integrated TIGER Series (ITS), and the evaluation of two load-balancing schemes for homogeneous and heterogeneous networks.« less
Distributed multitasking ITS with PVM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fan, W.C.; Halbleib, J.A. Sr.
1995-02-01
Advances of computer hardware and communication software have made it possible to perform parallel-processing computing on a collection of desktop workstations. For many applications, multitasking on a cluster of high-performance workstations has achieved performance comparable or better than that on a traditional supercomputer. From the point of view of cost-effectiveness, it also allows users to exploit available but unused computational resources, and thus achieve a higher performance-to-cost ratio. Monte Carlo calculations are inherently parallelizable because the individual particle trajectories can be generated independently with minimum need for interprocessor communication. Furthermore, the number of particle histories that can be generated inmore » a given amount of wall-clock time is nearly proportional to the number of processors in the cluster. This is an important fact because the inherent statistical uncertainty in any Monte Carlo result decreases as the number of histories increases. For these reasons, researchers have expended considerable effort to take advantage of different parallel architectures for a variety of Monte Carlo radiation transport codes, often with excellent results. The initial interest in this work was sparked by the multitasking capability of MCNP on a cluster of workstations using the Parallel Virtual Machine (PVM) software. On a 16-machine IBM RS/6000 cluster, it has been demonstrated that MCNP runs ten times as fast as on a single-processor CRAY YMP. In this paper, we summarize the implementation of a similar multitasking capability for the coupled electron/photon transport code system, the Integrated TIGER Series (ITS), and the evaluation of two load balancing schemes for homogeneous and heterogeneous networks.« less
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Romano, Paul K.; Siegel, Andrew R.
The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. Lastly, when the execution times for events are allowed to vary, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration.« less
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport
Romano, Paul K.; Siegel, Andrew R.
2017-07-01
The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. Lastly, when the execution times for events are allowed to vary, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration.« less
Particle size distributions from laboratory-scale biomass fires using fast response instruments
S Hosseini; L. Qi; D. Cocker; D. Weise; A. Miller; M. Shrivastava; J.W. Miller; S. Mahalingam; M. Princevac; H. Jung
2010-01-01
Particle size distribution from biomass combustion is an important parameter as it affects air quality, climate modelling and health effects. To date, particle size distributions reported from prior studies vary not only due to difference in fuels but also difference in experimental conditions. This study aims to report characteristics of particle size distributions in...
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
Vincenti, H.; Lobet, M.; Lehe, R.; ...
2016-09-19
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vincenti, H.; Lobet, M.; Lehe, R.
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
Woody biomass size reduction with selective material orientation
Dooley, James H.; Lanning, David N.; Lanning, Christopher J.
2013-01-01
Roundwood logs from forests and energy plantations must be chipped, ground, or otherwise comminuted into small particles prior to conversion to solid or liquid biofuels. Rotary veneer followed by cross-grain shearing is demonstrated to be a novel and low energy consuming method for primary breakdown of logs into a raw material having high transport and storage density. Processing of high moisture raw logs into 2.5 – 4.2 mm particles prior to drying or conversion consumes less than 20% of the energy required for achieving similar particle size with hammer mills while producing a more uniform particle shape and size. Asmore » a result, energy savings from the proposed method may reduce the comminution cost of woody feedstocks by more than half.« less
Broadcasting collective operation contributions throughout a parallel computer
Faraj, Ahmad [Rochester, MN
2012-02-21
Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
LANDSAT-D flight segment operations manual. Appendix B: OBC software operations
NASA Technical Reports Server (NTRS)
Talipsky, R.
1981-01-01
The LANDSAT 4 satellite contains two NASA standard spacecraft computers and 65,536 words of memory. Onboard computer software is divided into flight executive and applications processors. Both applications processors and the flight executive use one or more of 67 system tables to obtain variables, constants, and software flags. Output from the software for monitoring operation is via 49 OBC telemetry reports subcommutated in the spacecraft telemetry. Information is provided about the flight software as it is used to control the various spacecraft operations and interpret operational OBC telemetry. Processor function descriptions, processor operation, software constraints, processor system tables, processor telemetry, and processor flow charts are presented.
NASA Astrophysics Data System (ADS)
Pruhs, Kirk
A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.
Simulation of solid-liquid flows in a stirred bead mill based on computational fluid dynamics (CFD)
NASA Astrophysics Data System (ADS)
Winardi, S.; Widiyastuti, W.; Septiani, E. L.; Nurtono, T.
2018-05-01
The selection of simulation model is an important step in computational fluid dynamics (CFD) to obtain an agreement with experimental work. In addition, computational time and processor speed also influence the performance of the simulation results. Here, we report the simulation of solid-liquid flow in a bead mill using Eulerian model. Multiple Reference Frame (MRF) was also used to model the interaction between moving (shaft and disk) and stationary (chamber exclude shaft and disk) zones. Bead mill dimension was based on the experimental work of Yamada and Sakai (2013). The effect of shaft rotation speed of 1200 and 1800 rpm on the particle distribution and the flow field was discussed. For rotation speed of 1200 rpm, the particles spread evenly throughout the bead mill chamber. On the other hand, for the rotation speed of 1800 rpm, the particles tend to be thrown to the near wall region resulting in the dead zone and found no particle in the center region. The selected model agreed well to the experimental data with average discrepancies less than 10%. Furthermore, the simulation was run without excessive computational cost.
Multi-Core Processor Memory Contention Benchmark Analysis Case Study
NASA Technical Reports Server (NTRS)
Simon, Tyler; McGalliard, James
2009-01-01
Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
Hydrogen Generation Via Fuel Reforming
NASA Astrophysics Data System (ADS)
Krebs, John F.
2003-07-01
Reforming is the conversion of a hydrocarbon based fuel to a gas mixture that contains hydrogen. The H2 that is produced by reforming can then be used to produce electricity via fuel cells. The realization of H2-based power generation, via reforming, is facilitated by the existence of the liquid fuel and natural gas distribution infrastructures. Coupling these same infrastructures with more portable reforming technology facilitates the realization of fuel cell powered vehicles. The reformer is the first component in a fuel processor. Contaminants in the H2-enriched product stream, such as carbon monoxide (CO) and hydrogen sulfide (H2S), can significantly degrade the performance of current polymer electrolyte membrane fuel cells (PEMFC's). Removal of such contaminants requires extensive processing of the H2-rich product stream prior to utilization by the fuel cell to generate electricity. The remaining components of the fuel processor remove the contaminants in the H2 product stream. For transportation applications the entire fuel processing system must be as small and lightweight as possible to achieve desirable performance requirements. Current efforts at Argonne National Laboratory are focused on catalyst development and reactor engineering of the autothermal processing train for transportation applications.
An ultra-compact processor module based on the R3000
NASA Astrophysics Data System (ADS)
Mullenhoff, D. J.; Kaschmitter, J. L.; Lyke, J. C.; Forman, G. A.
1992-08-01
Viable high density packaging is of critical importance for future military systems, particularly space borne systems which require minimum weight and size and high mechanical integrity. A leading, emerging technology for high density packaging is multi-chip modules (MCM). During the 1980's, a number of different MCM technologies have emerged. In support of Strategic Defense Initiative Organization (SDIO) programs, Lawrence Livermore National Laboratory (LLNL) has developed, utilized, and evaluated several different MCM technologies. Prior LLNL efforts include modules developed in 1986, using hybrid wafer scale packaging, which are still operational in an Air Force satellite mission. More recent efforts have included very high density cache memory modules, developed using laser pantography. As part of the demonstration effort, LLNL and Phillips Laboratory began collaborating in 1990 in the Phase 3 Multi-Chip Module (MCM) technology demonstration project. The goal of this program was to demonstrate the feasibility of General Electric's (GE) High Density Interconnect (HDI) MCM technology. The design chosen for this demonstration was the processor core for a MIPS R3000 based reduced instruction set computer (RISC), which has been described previously. It consists of the R3000 microprocessor, R3010 floating point coprocessor and 128 Kbytes of cache memory.
Research in the design of high-performance reconfigurable systems
NASA Technical Reports Server (NTRS)
Slotnick, D. L.; Mcewan, S. D.; Spry, A. J.
1984-01-01
An initial design for the Bit Processor (BP) referred to in prior reports as the Processing Element or PE has been completed. Eight BP's, together with their supporting random-access memory, a 64 k x 9 ROM to perform addition, routing logic, and some additional logic, constitute the components of a single stage. An initial stage design is given. Stages may be combined to perform high-speed fixed or floating point arithmetic. Stages can be configured into a range of arithmetic modules that includes bit-serial one or two-dimensional arrays; one or two dimensional arrays fixed or floating point processors; and specialized uniprocessors, such as long-word arithmetic units. One to eight BP's represent a likely initial chip level. The Stage would then correspond to a first-level pluggable module. As both this project and VLSI CAD/CAM progress, however, it is expected that the chip level would migrate upward to the stage and, perhaps, ultimately the box level. The BP RAM, consisting of two banks, holds only operands and indices. Programs are at the box (high-level function) and system level. At the system level initial effort has been concentrated on specifying the tools needed to evaluate design alternatives.
Stereo and IMU-Assisted Visual Odometry for Small Robots
NASA Technical Reports Server (NTRS)
2012-01-01
This software performs two functions: (1) taking stereo image pairs as input, it computes stereo disparity maps from them by cross-correlation to achieve 3D (three-dimensional) perception; (2) taking a sequence of stereo image pairs as input, it tracks features in the image sequence to estimate the motion of the cameras between successive image pairs. A real-time stereo vision system with IMU (inertial measurement unit)-assisted visual odometry was implemented on a single 750 MHz/520 MHz OMAP3530 SoC (system on chip) from TI (Texas Instruments). Frame rates of 46 fps (frames per second) were achieved at QVGA (Quarter Video Graphics Array i.e. 320 240), or 8 fps at VGA (Video Graphics Array 640 480) resolutions, while simultaneously tracking up to 200 features, taking full advantage of the OMAP3530's integer DSP (digital signal processor) and floating point ARM processors. This is a substantial advancement over previous work as the stereo implementation produces 146 Mde/s (millions of disparities evaluated per second) in 2.5W, yielding a stereo energy efficiency of 58.8 Mde/J, which is 3.75 better than prior DSP stereo while providing more functionality.
Simulink/PARS Integration Support
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vacaliuc, B.; Nakhaee, N.
2013-12-18
The state of the art for signal processor hardware has far out-paced the development tools for placing applications on that hardware. In addition, signal processors are available in a variety of architectures, each uniquely capable of handling specific types of signal processing efficiently. With these processors becoming smaller and demanding less power, it has become possible to group multiple processors, a heterogeneous set of processors, into single systems. Different portions of the desired problem set can be assigned to different processor types as appropriate. As software development tools do not keep pace with these processors, especially when multiple processors ofmore » different types are used, a method is needed to enable software code portability among multiple processors and multiple types of processors along with their respective software environments. Sundance DSP, Inc. has developed a software toolkit called “PARS”, whose objective is to provide a framework that uses suites of tools provided by different vendors, along with modeling tools and a real time operating system, to build an application that spans different processor types. The software language used to express the behavior of the system is a very high level modeling language, “Simulink”, a MathWorks product. ORNL has used this toolkit to effectively implement several deliverables. This CRADA describes this collaboration between ORNL and Sundance DSP, Inc.« less
NASA Astrophysics Data System (ADS)
Golonka, P.; Pierzchała, T.; Waş, Z.
2004-02-01
Theoretical predictions in high energy physics are routinely provided in the form of Monte Carlo generators. Comparisons of predictions from different programs and/or different initialization set-ups are often necessary. MC-TESTER can be used for such tests of decays of intermediate states (particles or resonances) in a semi-automated way. Our test consists of two steps. Different Monte Carlo programs are run; events with decays of a chosen particle are searched, decay trees are analyzed and appropriate information is stored. Then, at the analysis step, a list of all found decay modes is defined and branching ratios are calculated for both runs. Histograms of all scalar Lorentz-invariant masses constructed from the decay products are plotted and compared for each decay mode found in both runs. For each plot a measure of the difference of the distributions is calculated and its maximal value over all histograms for each decay channel is printed in a summary table. As an example of MC-TESTER application, we include a test with the τ lepton decay Monte Carlo generators, TAUOLA and PYTHIA. The HEPEVT (or LUJETS) common block is used as exclusive source of information on the generated events. Program summaryTitle of the program:MC-TESTER, version 1.1 Catalogue identifier: ADSM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer: PC, two Intel Xeon 2.0 GHz processors, 512MB RAM Operating system: Linux Red Hat 6.1, 7.2, and also 8.0 Programming language used:C++, FORTRAN77: gcc 2.96 or 2.95.2 (also 3.2) compiler suite with g++ and g77 Size of the package: 7.3 MB directory including example programs (2 MB compressed distribution archive), without ROOT libraries (additional 43 MB). No. of bytes in distributed program, including test data, etc.: 2 024 425 Distribution format: tar gzip file Additional disk space required: Depends on the analyzed particle: 40 MB in the case of τ lepton decays (30 decay channels, 594 histograms, 82-pages booklet). Keywords: particle physics, decay simulation, Monte Carlo methods, invariant mass distributions, programs comparison Nature of the physical problem: The decays of individual particles are well defined modules of a typical Monte Carlo program chain in high energy physics. A fast, semi-automatic way of comparing results from different programs is often desirable, for the development of new programs, to check correctness of the installations or for discussion of uncertainties. Method of solution: A typical HEP Monte Carlo program stores the generated events in the event records such as HEPEVT or PYJETS. MC-TESTER scans, event by event, the contents of the record and searches for the decays of the particle under study. The list of the found decay modes is successively incremented and histograms of all invariant masses which can be calculated from the momenta of the particle decay products are defined and filled. The outputs from the two runs of distinct programs can be later compared. A booklet of comparisons is created: for every decay channel, all histograms present in the two outputs are plotted and parameter quantifying shape difference is calculated. Its maximum over every decay channel is printed in the summary table. Restrictions on the complexity of the problem: For a list of limitations see Section 6. Typical running time: Varies substantially with the analyzed decay particle. On a PC/Linux with 2.0 GHz processors MC-TESTER increases the run time of the τ-lepton Monte Carlo program TAUOLA by 4.0 seconds for every 100 000 analyzed events (generation itself takes 26 seconds). The analysis step takes 13 seconds; ? processing takes additionally 10 seconds. Generation step runs may be executed simultaneously on multi-processor machines. Accessibility: web page: http://cern.ch/Piotr.Golonka/MC/MC-TESTER e-mails: Piotr.Golonka@CERN.CH, T.Pierzchala@friend.phys.us.edu.pl, Zbigniew.Was@CERN.CH.
NASA Astrophysics Data System (ADS)
Esepkina, N. A.; Lavrov, A. P.; Anan'ev, M. N.; Blagodarnyi, V. S.; Ivanov, S. I.; Mansyrev, M. I.; Molodyakov, S. A.
1995-10-01
Two new types of optoelectronic radio-signal processors were investigated. Charge-coupled device (CCD) photodetectors are used in these processors under continuous scanning conditions, i.e. in a time delay and storage mode. One of these processors is based on a CCD photodetector array with a reference-signal amplitude transparency and the other is an adaptive acousto-optical signal processor with linear frequency modulation. The processor with the transparency performs multichannel discrete—analogue convolution of an input signal with a corresponding kernel of the transformation determined by the transparency. If a light source is an array of light-emitting diodes of special (stripe) geometry, the optical stages of the processor can be made from optical fibre components and the whole processor then becomes a rigid 'sandwich' (a compact hybrid optoelectronic microcircuit). A report is given also of a study of a prototype processor with optical fibre components for the reception of signals from a system with antenna aperture synthesis, which forms a radio image of the Earth.
Karasick, M.S.; Strip, D.R.
1996-01-30
A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.
Shared performance monitor in a multiprocessor system
Chiu, George; Gara, Alan G.; Salapura, Valentina
2012-07-24
A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
Implementation of kernels on the Maestro processor
NASA Astrophysics Data System (ADS)
Suh, Jinwoo; Kang, D. I. D.; Crago, S. P.
Currently, most microprocessors use multiple cores to increase performance while limiting power usage. Some processors use not just a few cores, but tens of cores or even 100 cores. One such many-core microprocessor is the Maestro processor, which is based on Tilera's TILE64 processor. The Maestro chip is a 49-core, general-purpose, radiation-hardened processor designed for space applications. The Maestro processor, unlike the TILE64, has a floating point unit (FPU) in each core for improved floating point performance. The Maestro processor runs at 342 MHz clock frequency. On the Maestro processor, we implemented several widely used kernels: matrix multiplication, vector add, FIR filter, and FFT. We measured and analyzed the performance of these kernels. The achieved performance was up to 5.7 GFLOPS, and the speedup compared to single tile was up to 49 using 49 tiles.
Ordering of guarded and unguarded stores for no-sync I/O
Gara, Alan; Ohmacht, Martin
2013-06-25
A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
Effect of high pressure processing on dispersive and aggregative properties of almond milk.
Dhakal, Santosh; Giusti, M Monica; Balasubramaniam, V M
2016-08-01
A study was conducted to investigate the impact of high pressure (450 and 600 MPa at 30 °C) and thermal (72, 85 and 99 °C at 0.1 MPa) treatments on dispersive and aggregative characteristics of almond milk. Experiments were conducted using a kinetic pressure testing unit and water bath. Particle size distribution, microstructure, UV absorption spectra, pH and color changes of processed and unprocessed samples were analyzed. Raw almond milk represented the mono model particle size distribution with average particle diameters of 2 to 3 µm. Thermal or pressure treatment of almond milk shifted the particle size distribution towards right and increased particle size by five- to six-fold. Micrographs confirmed that both the treatments increased particle size due to aggregation of macromolecules. Pressure treatment produced relatively more and larger aggregates than those produced by heat treated samples. The apparent aggregation rate constant for 450 MPa and 600 MPa processed samples were k450MPa,30°C = 0.0058 s(-1) and k600MPa,30°C = 0.0095 s(-1) respectively. This study showed that dispersive and aggregative properties of high pressure and heat-treated almond milk were different due to differences in protein denaturation, particles coagulation and aggregates morphological characteristics. Knowledge gained from the study will help food processors to formulate novel plant-based beverages treated with high pressure. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.
Electrochemical sensing using voltage-current time differential
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woo, Leta Yar-Li; Glass, Robert Scott; Fitzpatrick, Joseph Jay
2017-02-28
A device for signal processing. The device includes a signal generator, a signal detector, and a processor. The signal generator generates an original waveform. The signal detector detects an affected waveform. The processor is coupled to the signal detector. The processor receives the affected waveform from the signal detector. The processor also compares at least one portion of the affected waveform with the original waveform. The processor also determines a difference between the affected waveform and the original waveform. The processor also determines a value corresponding to a unique portion of the determined difference between the original and affected waveforms.more » The processor also outputs the determined value.« less
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems
NASA Technical Reports Server (NTRS)
Downie, John D.; Goodman, Joseph W.
1989-01-01
The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.
Modeling heterogeneous processor scheduling for real time systems
NASA Technical Reports Server (NTRS)
Leathrum, J. F.; Mielke, R. R.; Stoughton, J. W.
1994-01-01
A new model is presented to describe dataflow algorithms implemented in a multiprocessing system. Called the resource/data flow graph (RDFG), the model explicitly represents cyclo-static processor schedules as circuits of processor arcs which reflect the order that processors execute graph nodes. The model also allows the guarantee of meeting hard real-time deadlines. When unfolded, the model identifies statically the processor schedule. The model therefore is useful for determining the throughput and latency of systems with heterogeneous processors. The applicability of the model is demonstrated using a space surveillance algorithm.
Temperature histories in liquid and solid polar stratospheric cloud formation
NASA Astrophysics Data System (ADS)
Larsen, Niels; Knudsen, Bjørn M.; Rosen, James M.; Kjome, Norman T.; Neuber, Roland; Kyrö, Esko
1997-10-01
Polar stratospheric clouds (PSCs) have been observed by balloonborne backscatter sondes from Alert, Thule, Heiss Island, Scoresbysund, Sodankylä, Søndre Strømfjord, and Ny Ȧlesund during winters 1989, 1990, 1995, and 1996 in 30 flights. The observations can be categorized into two main groups: type 1a and type 1b PSC particles. Type 1b PSCs show the characteristics expected from liquid ternary solution (HNO3/H2SO4/H2O) particles, consistent with model simulations. Type 1a PSCs are observed at all temperatures below the condensation temperature TNAT of nitric acid trihydrate (NAT), consistent with solid NAT composition. Air parcel trajectories have been calculated for all observations to provide synoptic temperature histories of the observed particles. A number of cases have been identified, where the particles have experienced temperatures close to or above the sulfuric acid tetrahydrate melting temperatures within 20 days prior to observation. This assures a knowledge of the physical phase (liquid) of the particles at this time, prior to observation. The subsequent synoptic temperature histories, between melting and the time of observation, show pronounced differences for type 1a and type 1b PSC particles, indicating the qualitative temperature conditions, necessary to generate solid type 1a PSCs. The temperature histories of type 1b particles show smoothly, in most cases monotonic, decreasing temperatures. The temperature can apparently decrease to the frost point without causing the particles to freeze. The type 1b PSC particles are mostly observed shortly after entering a cold region. The observed type 1a particles have spent several days at temperatures close to or below TNAT prior to observation, often associated with several synoptic temperature oscillations around TNAT, and the particles are observed in aged clouds. It appears that the PSC particles may freeze, if they experience synoptic temperatures below TNAT with a duration of at least 1 day, possibly accompanied by several temperature oscillations. However, liquid particles that experience a smooth cooling, even to very low temperatures, or single smooth cooling/heating below TNAT without synoptic temperature fluctuations do not seem to freeze.
Parallel processor for real-time structural control
NASA Astrophysics Data System (ADS)
Tise, Bert L.
1993-07-01
A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.
Testing and operating a multiprocessor chip with processor redundancy
Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J
2014-10-21
A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reed, D.A.; Grunwald, D.C.
The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
FLY MPI-2: a parallel tree code for LSS
NASA Astrophysics Data System (ADS)
Becciani, U.; Comparato, M.; Antonuccio-Delogu, V.
2006-04-01
New version program summaryProgram title: FLY 3.1 Catalogue identifier: ADSC_v2_0 Licensing provisions: yes Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSC_v2_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland No. of lines in distributed program, including test data, etc.: 158 172 No. of bytes in distributed program, including test data, etc.: 4 719 953 Distribution format: tar.gz Programming language: Fortran 90, C Computer: Beowulf cluster, PC, MPP systems Operating system: Linux, Aix RAM: 100M words Catalogue identifier of previous version: ADSC_v1_0 Journal reference of previous version: Comput. Phys. Comm. 155 (2003) 159 Does the new version supersede the previous version?: yes Nature of problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force Solution method: FLY is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986) Reasons for the new version: The new version of FLY is implemented by using the MPI-2 standard: the distributed version 3.1 was developed by using the MPICH2 library on a PC Linux cluster. Today the FLY performance allows us to consider the FLY code among the most powerful parallel codes for tree N-body simulations. Another important new feature regards the availability of an interface with hydrodynamical Paramesh based codes. Simulations must follow a box large enough to accurately represent the power spectrum of fluctuations on very large scales so that we may hope to compare them meaningfully with real data. The number of particles then sets the mass resolution of the simulation, which we would like to make as fine as possible. The idea to build an interface between two codes, that have different and complementary cosmological tasks, allows us to execute complex cosmological simulations with FLY, specialized for DM evolution, and a code specialized for hydrodynamical components that uses a Paramesh block structure. Summary of revisions: The parallel communication schema was totally changed. The new version adopts the MPICH2 library. Now FLY can be executed on all Unix systems having an MPI-2 standard library. The main data structure, is declared in a module procedure of FLY (fly_h.F90 routine). FLY creates the MPI Window object for one-sided communication for all the shared arrays, with a call like the following: CALL MPI_WIN_CREATE(POS, SIZE, REAL8, MPI_INFO_NULL, MPI_COMM_WORLD, WIN_POS, IERR) the following main window objects are created: win_pos, win_vel, win_acc: particles positions velocities and accelerations, win_pos_cell, win_mass_cell, win_quad, win_subp, win_grouping: cells positions, masses, quadrupole momenta, tree structure and grouping cells. Other windows are created for dynamic load balance and global counters. Restrictions: The program uses the leapfrog integrator schema, but could be changed by the user. Unusual features: FLY uses the MPI-2 standard: the MPICH2 library on Linux systems was adopted. To run this version of FLY the working directory must be shared among all the processors that execute FLY. Additional comments: Full documentation for the program is included in the distribution in the form of a README file, a User Guide and a Reference manuscript. Running time: IBM Linux Cluster 1350, 512 nodes with 2 processors for each node and 2 GB RAM for each processor, at Cineca, was adopted to make performance tests. Processor type: Intel Xeon Pentium IV 3.0 GHz and 512 KB cache (128 nodes have Nocona processors). Internal Network: Myricom LAN Card "C" Version and "D" Version. Operating System: Linux SuSE SLES 8. The code was compiled using the mpif90 compiler version 8.1 and with basic optimization options in order to have performances that could be useful compared with other generic clusters Processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woo, Leta Yar-Li; Glass, Robert Scott; Fitzpatrick, Joseph Jay
2018-01-02
A device for signal processing. The device includes a signal generator, a signal detector, and a processor. The signal generator generates an original waveform. The signal detector detects an affected waveform. The processor is coupled to the signal detector. The processor receives the affected waveform from the signal detector. The processor also compares at least one portion of the affected waveform with the original waveform. The processor also determines a difference between the affected waveform and the original waveform. The processor also determines a value corresponding to a unique portion of the determined difference between the original and affected waveforms.more » The processor also outputs the determined value.« less
Particle-In-Cell simulations of high pressure plasmas using graphics processing units
NASA Astrophysics Data System (ADS)
Gebhardt, Markus; Atteln, Frank; Brinkmann, Ralf Peter; Mussenbrock, Thomas; Mertmann, Philipp; Awakowicz, Peter
2009-10-01
Particle-In-Cell (PIC) simulations are widely used to understand the fundamental phenomena in low-temperature plasmas. Particularly plasmas at very low gas pressures are studied using PIC methods. The inherent drawback of these methods is that they are very time consuming -- certain stability conditions has to be satisfied. This holds even more for the PIC simulation of high pressure plasmas due to the very high collision rates. The simulations take up to very much time to run on standard computers and require the help of computer clusters or super computers. Recent advances in the field of graphics processing units (GPUs) provides every personal computer with a highly parallel multi processor architecture for very little money. This architecture is freely programmable and can be used to implement a wide class of problems. In this paper we present the concepts of a fully parallel PIC simulation of high pressure plasmas using the benefits of GPU programming.
Perera, Ajith; Gauss, Jürgen; Verma, Prakash; Morales, Jorge A
2017-04-28
We present a parallel implementation to compute electron spin resonance g-tensors at the coupled-cluster singles and doubles (CCSD) level which employs the ACES III domain-specific software tools for scalable parallel programming, i.e., the super instruction architecture language and processor (SIAL and SIP), respectively. A unique feature of the present implementation is the exact (not approximated) inclusion of the five one- and two-particle contributions to the g-tensor [i.e., the mass correction, one- and two-particle paramagnetic spin-orbit, and one- and two-particle diamagnetic spin-orbit terms]. Like in a previous implementation with effective one-electron operators [J. Gauss et al., J. Phys. Chem. A 113, 11541-11549 (2009)], our implementation utilizes analytic CC second derivatives and, therefore, classifies as a true CC linear-response treatment. Therefore, our implementation can unambiguously appraise the accuracy of less costly effective one-particle schemes and provide a rationale for their widespread use. We have considered a large selection of radicals used previously for benchmarking purposes including those studied in earlier work and conclude that at the CCSD level, the effective one-particle scheme satisfactorily captures the two-particle effects less costly than the rigorous two-particle scheme. With respect to the performance of density functional theory (DFT), we note that results obtained with the B3LYP functional exhibit the best agreement with our CCSD results. However, in general, the CCSD results agree better with the experimental data than the best DFT/B3LYP results, although in most cases within the rather large experimental error bars.
NASA Astrophysics Data System (ADS)
Yilbas, B. S.; Ibrahim, A.; Ali, H.; Khaled, M.; Laoui, T.
2018-06-01
Hydrophobic and optical transmittance characteristics of the functionalized silica particles on the glass surface prior and after transfer of graphene and graphene oxide films on the surface are examined. Nano-size silica particles are synthesized and functionalized via chemical grafting and deposited onto a glass surface. Graphene film, grown on copper substrate, was transferred onto the functionalized silica particles surface through direct fishing method. Graphene oxide layer was deposited onto the functionalized silica particles surface via spin coating technique. Morphological, hydrophobic, and optical characteristics of the functionalized silica particles deposited surface prior and after graphene and graphene oxide films transfer are examined using the analytical tools. It is found that the functionalized silica particles are agglomerated at the surface forming packed structures with few micro/nano size pores. This arrangement gives rise to water droplet contact angle and contact angle hysteresis in the order of 163° and 2°, respectively, and remains almost uniform over the entire surface. Transferring graphene and depositing graphene oxide films over the functionalized silica particles surface lowers the water droplet contact angle slightly (157-160°) and increases the contact angle hysteresis (4°). The addition of the graphene and graphene oxide films onto the surface of the deposited functionalized silica particles improves the optical transmittance.
Hybrid Electro-Optic Processor
1991-07-01
This report describes the design of a hybrid electro - optic processor to perform adaptive interference cancellation in radar systems. The processor is...modulator is reported. Included is this report is a discussion of the design, partial fabrication in the laboratory, and partial testing of the hybrid electro ... optic processor. A follow on effort is planned to complete the construction and testing of the processor. The work described in this report is the
JPRS Report, Science & Technology, Europe.
1991-04-30
processor in collaboration with Intel . The processor , christened Touchstone, will be used as the core of a parallel computer with 2,000 processors . One of...ELECTRONIQUE HEBDO in French 24 Jan 91 pp 14-15 [Article by Claire Remy: "Everything Set for Neural Signal Processors " first paragraph is ELECTRONIQUE...paving the way for neural signal processors in so doing. The principal advantage of this specific circuit over a neuromimetic software program is
Processor register error correction management
Bose, Pradip; Cher, Chen-Yong; Gupta, Meeta S.
2016-12-27
Processor register protection management is disclosed. In embodiments, a method of processor register protection management can include determining a sensitive logical register for executable code generated by a compiler, generating an error-correction table identifying the sensitive logical register, and storing the error-correction table in a memory accessible by a processor. The processor can be configured to generate a duplicate register of the sensitive logical register identified by the error-correction table.
The CSM testbed matrix processors internal logic and dataflow descriptions
NASA Technical Reports Server (NTRS)
Regelbrugge, Marc E.; Wright, Mary A.
1988-01-01
This report constitutes the final report for subtask 1 of Task 5 of NASA Contract NAS1-18444, Computational Structural Mechanics (CSM) Research. This report contains a detailed description of the coded workings of selected CSM Testbed matrix processors (i.e., TOPO, K, INV, SSOL) and of the arithmetic utility processor AUS. These processors and the current sparse matrix data structures are studied and documented. Items examined include: details of the data structures, interdependence of data structures, data-blocking logic in the data structures, processor data flow and architecture, and processor algorithmic logic flow.
Parallel processor for real-time structural control
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tise, B.L.
1992-01-01
A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection tomore » host computer, parallelizing code generator, and look-up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating-point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An Open Windows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.« less
7 CFR 1435.310 - Sharing processors' allocations with producers.
Code of Federal Regulations, 2011 CFR
2011-01-01
... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...
7 CFR 1435.310 - Sharing processors' allocations with producers.
Code of Federal Regulations, 2010 CFR
2010-01-01
... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...
7 CFR 1435.310 - Sharing processors' allocations with producers.
Code of Federal Regulations, 2012 CFR
2012-01-01
... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...
7 CFR 1435.310 - Sharing processors' allocations with producers.
Code of Federal Regulations, 2014 CFR
2014-01-01
... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...
7 CFR 1435.310 - Sharing processors' allocations with producers.
Code of Federal Regulations, 2013 CFR
2013-01-01
... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...
Code of Federal Regulations, 2010 CFR
2010-07-01
...) When a test rule or subsequent Federal Register notice pertaining to a test rule expressly obligates processors as well as manufacturers to assume direct testing and data reimbursement responsibilities. (2... processors voluntarily agree to reimburse manufacturers for a portion of test costs. Only those processors...
Atac, R.; Fischler, M.S.; Husby, D.E.
1991-01-15
A bus switching apparatus and method for multiple processor computer systems comprises a plurality of bus switches interconnected by branch buses. Each processor or other module of the system is connected to a spigot of a bus switch. Each bus switch also serves as part of a backplane of a modular crate hardware package. A processor initiates communication with another processor by identifying that other processor. The bus switch to which the initiating processor is connected identifies and secures, if possible, a path to that other processor, either directly or via one or more other bus switches which operate similarly. If a particular desired path through a given bus switch is not available to be used, an alternate path is considered, identified and secured. 11 figures.
Chatterjee, Siddhartha [Yorktown Heights, NY; Gunnels, John A [Brewster, NY
2011-11-08
A method and structure of distributing elements of an array of data in a computer memory to a specific processor of a multi-dimensional mesh of parallel processors includes designating a distribution of elements of at least a portion of the array to be executed by specific processors in the multi-dimensional mesh of parallel processors. The pattern of the designating includes a cyclical repetitive pattern of the parallel processor mesh, as modified to have a skew in at least one dimension so that both a row of data in the array and a column of data in the array map to respective contiguous groupings of the processors such that a dimension of the contiguous groupings is greater than one.
Atac, Robert; Fischler, Mark S.; Husby, Donald E.
1991-01-01
A bus switching apparatus and method for multiple processor computer systems comprises a plurality of bus switches interconnected by branch buses. Each processor or other module of the system is connected to a spigot of a bus switch. Each bus switch also serves as part of a backplane of a modular crate hardware package. A processor initiates communication with another processor by identifying that other processor. The bus switch to which the initiating processor is connected identifies and secures, if possible, a path to that other processor, either directly or via one or more other bus switches which operate similarly. If a particular desired path through a given bus switch is not available to be used, an alternate path is considered, identified and secured.
Web-Enabled Optoelectronic Particle-Fallout Monitor
NASA Technical Reports Server (NTRS)
Lineberger, Lewis P.
2008-01-01
A Web-enabled optoelectronic particle- fallout monitor has been developed as a prototype of future such instruments that (l) would be installed in multiple locations for which assurance of cleanliness is required and (2) could be interrogated and controlled in nearly real time by multiple remote users. Like prior particle-fallout monitors, this instrument provides a measure of particles that accumulate on a surface as an indication of the quantity of airborne particulate contaminants. The design of this instrument reflects requirements to: Reduce the cost and complexity of its optoelectronic sensory subsystem relative to those of prior optoelectronic particle fallout monitors while maintaining or improving capabilities; Use existing network and office computers for distributed display and control; Derive electric power for the instrument from a computer network, a wall outlet, or a battery; Provide for Web-based retrieval and analysis of measurement data and of a file containing such ancillary data as a log of command attempts at remote units; and Use the User Datagram Protocol (UDP) for maximum performance and minimal network overhead.
Variable word length encoder reduces TV bandwith requirements
NASA Technical Reports Server (NTRS)
Sivertson, W. E., Jr.
1965-01-01
Adaptive variable resolution encoding technique provides an adaptive compression pseudo-random noise signal processor for reducing television bandwidth requirements. Complementary processors are required in both the transmitting and receiving systems. The pretransmission processor is analog-to-digital, while the postreception processor is digital-to-analog.
Competitive Heterogeneous Nucleation Between Zr and MgO Particles in Commercial Purity Magnesium
NASA Astrophysics Data System (ADS)
Peng, G. S.; Wang, Y.; Fan, Z.
2018-04-01
Grain refining of commercial purity (CP) Mg by Zr addition with intensive melt shearing prior to solidification has been investigated. Experimental results showed that, when intensive melt shearing is imposed prior to solidification, the grain structure of CP Mg exhibits a complex changing pattern with increasing Zr addition. This complex behavior can be attributed to the change of nucleating particles in terms of their crystal structure, size, and number density with varied Zr additions. Naturally occurring MgO particles are found to be {100} faceted with a cubic morphology and 50 to 300 nm in size. Such MgO particles are usually populated densely in a liquid film (usually referred as oxide film) and can be effectively dispersed by intensive melt shearing. It has been confirmed that the dispersed MgO particles can act as nucleating substrates resulting in a significant grain refinement of CP Mg when no other more potent particles are present in the melt. However, Zr particles in the Mg-Zr alloys are more potent than MgO particles for nucleation of Mg due to their same crystal structure and similar lattice parameters with Mg. With the addition of Zr, Zr and the MgO particles co-exist in the melt. Grain refining efficiency is closely related to the competition for heterogeneous nucleation between Zr and the MgO particles. The final solidified microstructure is mainly determined by the interplay of three factors: nucleation potency (measured by lattice misfit), particle size, and particle number density.
Competitive Heterogeneous Nucleation Between Zr and MgO Particles in Commercial Purity Magnesium
NASA Astrophysics Data System (ADS)
Peng, G. S.; Wang, Y.; Fan, Z.
2018-06-01
Grain refining of commercial purity (CP) Mg by Zr addition with intensive melt shearing prior to solidification has been investigated. Experimental results showed that, when intensive melt shearing is imposed prior to solidification, the grain structure of CP Mg exhibits a complex changing pattern with increasing Zr addition. This complex behavior can be attributed to the change of nucleating particles in terms of their crystal structure, size, and number density with varied Zr additions. Naturally occurring MgO particles are found to be {100} faceted with a cubic morphology and 50 to 300 nm in size. Such MgO particles are usually populated densely in a liquid film (usually referred as oxide film) and can be effectively dispersed by intensive melt shearing. It has been confirmed that the dispersed MgO particles can act as nucleating substrates resulting in a significant grain refinement of CP Mg when no other more potent particles are present in the melt. However, Zr particles in the Mg-Zr alloys are more potent than MgO particles for nucleation of Mg due to their same crystal structure and similar lattice parameters with Mg. With the addition of Zr, Zr and the MgO particles co-exist in the melt. Grain refining efficiency is closely related to the competition for heterogeneous nucleation between Zr and the MgO particles. The final solidified microstructure is mainly determined by the interplay of three factors: nucleation potency (measured by lattice misfit), particle size, and particle number density.
NASA Technical Reports Server (NTRS)
Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.
1995-01-01
A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.
A strongly interacting polaritonic quantum dot
NASA Astrophysics Data System (ADS)
Jia, Ningyuan; Schine, Nathan; Georgakopoulos, Alexandros; Ryou, Albert; Clark, Logan W.; Sommer, Ariel; Simon, Jonathan
2018-06-01
Polaritons are promising constituents of both synthetic quantum matter1 and quantum information processors2, whose properties emerge from their components: from light, polaritons draw fast dynamics and ease of transport; from matter, they inherit the ability to collide with one another. Cavity polaritons are particularly promising as they may be confined and subjected to synthetic magnetic fields controlled by cavity geometry3, and furthermore they benefit from increased robustness due to the cavity enhancement in light-matter coupling. Nonetheless, until now, cavity polaritons have operated only in a weakly interacting mean-field regime4,5. Here we demonstrate strong interactions between individual cavity polaritons enabled by employing highly excited Rydberg atoms as the matter component of the polaritons. We assemble a quantum dot composed of approximately 150 strongly interacting Rydberg-dressed 87Rb atoms in a cavity, and observe blockaded transport of photons through it. We further observe coherent photon tunnelling oscillations, demonstrating that the dot is zero-dimensional. This work establishes the cavity Rydberg polariton as a candidate qubit in a photonic information processor and, by employing multiple resonator modes as the spatial degrees of freedom of a photonic particle, the primary ingredient to form photonic quantum matter6.
NASA Astrophysics Data System (ADS)
Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.
2016-04-01
We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.
NASA Technical Reports Server (NTRS)
Vincent, R. K.
1974-01-01
Four independent investigations are reported; in general these are concerned with improving and utilizing the correlation between the physical properties of natural materials as evidenced in laboratory spectra and spectral data collected by multispectral scanners. In one investigation, two theoretical models were devised that permit the calculation of spectral emittance spectra for rock and mineral surfaces of various particle sizes. The simpler of the two models can be used to qualitatively predict the effect of texture on the spectral emittance of rocks and minerals; it is also potentially useful as an aid in predicting the identification of natural atmospheric aerosol constituents. The second investigation determined, via an infrared ratio imaging technique, the best pair of infrared filters for silicate rock-type discrimination. In a third investigation, laboratory spectra of natural materials were compressed into 11-digit ratio codes for use in feature selection, in searches for false alarm candidates, and eventually for use as training sets in completely automatic data processors. In the fourth investigation, general outlines of a ratio preprocessor and an automatic recognition map processor are developed for on-board data processing in the space shuttle era.
Accelerating molecular dynamic simulation on the cell processor and Playstation 3.
Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S
2009-01-30
Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.
Leung, Vitus J [Albuquerque, NM; Phillips, Cynthia A [Albuquerque, NM; Bender, Michael A [East Northport, NY; Bunde, David P [Urbana, IL
2009-07-21
In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assume a loop configuration, and bin-packing is applied to this loop configuration. The interconnection of the processors can be conceptualized as a generally rectangular 3-dimensional grid, and the MC allocation algorithm is applied with respect to the 3-dimensional grid.
Communications systems and methods for subsea processors
Gutierrez, Jose; Pereira, Luis
2016-04-26
A subsea processor may be located near the seabed of a drilling site and used to coordinate operations of underwater drilling components. The subsea processor may be enclosed in a single interchangeable unit that fits a receptor on an underwater drilling component, such as a blow-out preventer (BOP). The subsea processor may issue commands to control the BOP and receive measurements from sensors located throughout the BOP. A shared communications bus may interconnect the subsea processor and underwater components and the subsea processor and a surface or onshore network. The shared communications bus may be operated according to a time division multiple access (TDMA) scheme.
An Efficient Functional Test Generation Method For Processors Using Genetic Algorithms
NASA Astrophysics Data System (ADS)
Hudec, Ján; Gramatová, Elena
2015-07-01
The paper presents a new functional test generation method for processors testing based on genetic algorithms and evolutionary strategies. The tests are generated over an instruction set architecture and a processor description. Such functional tests belong to the software-oriented testing. Quality of the tests is evaluated by code coverage of the processor description using simulation. The presented test generation method uses VHDL models of processors and the professional simulator ModelSim. The rules, parameters and fitness functions were defined for various genetic algorithms used in automatic test generation. Functionality and effectiveness were evaluated using the RISC type processor DP32.
Experimental testing of the noise-canceling processor.
Collins, Michael D; Baer, Ralph N; Simpson, Harry J
2011-09-01
Signal-processing techniques for localizing an acoustic source buried in noise are tested in a tank experiment. Noise is generated using a discrete source, a bubble generator, and a sprinkler. The experiment has essential elements of a realistic scenario in matched-field processing, including complex source and noise time series in a waveguide with water, sediment, and multipath propagation. The noise-canceling processor is found to outperform the Bartlett processor and provide the correct source range for signal-to-noise ratios below -10 dB. The multivalued Bartlett processor is found to outperform the Bartlett processor but not the noise-canceling processor. © 2011 Acoustical Society of America
A High Performance VLSI Computer Architecture For Computer Graphics
NASA Astrophysics Data System (ADS)
Chin, Chi-Yuan; Lin, Wen-Tai
1988-10-01
A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA
NASA Astrophysics Data System (ADS)
Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei
2013-03-01
With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.
NASA Astrophysics Data System (ADS)
Weber, Walter H.; Mair, H. Douglas; Jansen, Dion
2003-03-01
A suite of basic signal processors has been developed. These basic building blocks can be cascaded together to form more complex processors without the need for programming. The data structures between each of the processors are handled automatically. This allows a processor built for one purpose to be applied to any type of data such as images, waveform arrays and single values. The processors are part of Winspect Data Acquisition software. The new processors are fast enough to work on A-scan signals live while scanning. Their primary use is to extract features, reduce noise or to calculate material properties. The cascaded processors work equally well on live A-scan displays, live gated data or as a post-processing engine on saved data. Researchers are able to call their own MATLAB or C-code from anywhere within the processor structure. A built-in formula node processor that uses a simple algebraic editor may make external user programs unnecessary. This paper also discusses the problems associated with ad hoc software development and how graphical programming languages can tie up researchers writing software rather than designing experiments.
Array processor architecture connection network
NASA Technical Reports Server (NTRS)
Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)
1982-01-01
A connection network is disclosed for use between a parallel array of processors and a parallel array of memory modules for establishing non-conflicting data communications paths between requested memory modules and requesting processors. The connection network includes a plurality of switching elements interposed between the processor array and the memory modules array in an Omega networking architecture. Each switching element includes a first and a second processor side port, a first and a second memory module side port, and control logic circuitry for providing data connections between the first and second processor ports and the first and second memory module ports. The control logic circuitry includes strobe logic for examining data arriving at the first and the second processor ports to indicate when the data arriving is requesting data from a requesting processor to a requested memory module. Further, connection circuitry is associated with the strobe logic for examining requesting data arriving at the first and the second processor ports for providing a data connection therefrom to the first and the second memory module ports in response thereto when the data connection so provided does not conflict with a pre-established data connection currently in use.
21 CFR 892.1900 - Automatic radiographic film processor.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 21 Food and Drugs 8 2011-04-01 2011-04-01 false Automatic radiographic film processor. 892.1900... (CONTINUED) MEDICAL DEVICES RADIOLOGY DEVICES Diagnostic Devices § 892.1900 Automatic radiographic film processor. (a) Identification. An automatic radiographic film processor is a device intended to be used to...
21 CFR 892.1900 - Automatic radiographic film processor.
Code of Federal Regulations, 2013 CFR
2013-04-01
... 21 Food and Drugs 8 2013-04-01 2013-04-01 false Automatic radiographic film processor. 892.1900... (CONTINUED) MEDICAL DEVICES RADIOLOGY DEVICES Diagnostic Devices § 892.1900 Automatic radiographic film processor. (a) Identification. An automatic radiographic film processor is a device intended to be used to...
21 CFR 892.1900 - Automatic radiographic film processor.
Code of Federal Regulations, 2014 CFR
2014-04-01
... 21 Food and Drugs 8 2014-04-01 2014-04-01 false Automatic radiographic film processor. 892.1900... (CONTINUED) MEDICAL DEVICES RADIOLOGY DEVICES Diagnostic Devices § 892.1900 Automatic radiographic film processor. (a) Identification. An automatic radiographic film processor is a device intended to be used to...
21 CFR 892.1900 - Automatic radiographic film processor.
Code of Federal Regulations, 2012 CFR
2012-04-01
... 21 Food and Drugs 8 2012-04-01 2012-04-01 false Automatic radiographic film processor. 892.1900... (CONTINUED) MEDICAL DEVICES RADIOLOGY DEVICES Diagnostic Devices § 892.1900 Automatic radiographic film processor. (a) Identification. An automatic radiographic film processor is a device intended to be used to...
7 CFR 1160.108 - Fluid milk processor.
Code of Federal Regulations, 2013 CFR
2013-01-01
... 7 Agriculture 9 2013-01-01 2013-01-01 false Fluid milk processor. 1160.108 Section 1160.108... AGREEMENTS AND ORDERS; MILK), DEPARTMENT OF AGRICULTURE FLUID MILK PROMOTION PROGRAM Fluid Milk Promotion Order Definitions § 1160.108 Fluid milk processor. (a) Fluid milk processor means any person who...
7 CFR 1160.108 - Fluid milk processor.
Code of Federal Regulations, 2012 CFR
2012-01-01
... 7 Agriculture 9 2012-01-01 2012-01-01 false Fluid milk processor. 1160.108 Section 1160.108... Agreements and Orders; Milk), DEPARTMENT OF AGRICULTURE FLUID MILK PROMOTION PROGRAM Fluid Milk Promotion Order Definitions § 1160.108 Fluid milk processor. (a) Fluid milk processor means any person who...
7 CFR 1160.108 - Fluid milk processor.
Code of Federal Regulations, 2014 CFR
2014-01-01
... 7 Agriculture 9 2014-01-01 2013-01-01 true Fluid milk processor. 1160.108 Section 1160.108... AGREEMENTS AND ORDERS; MILK), DEPARTMENT OF AGRICULTURE FLUID MILK PROMOTION PROGRAM Fluid Milk Promotion Order Definitions § 1160.108 Fluid milk processor. (a) Fluid milk processor means any person who...
21 CFR 892.1900 - Automatic radiographic film processor.
Code of Federal Regulations, 2010 CFR
2010-04-01
... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Automatic radiographic film processor. 892.1900... (CONTINUED) MEDICAL DEVICES RADIOLOGY DEVICES Diagnostic Devices § 892.1900 Automatic radiographic film processor. (a) Identification. An automatic radiographic film processor is a device intended to be used to...
7 CFR 1160.108 - Fluid milk processor.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 9 2010-01-01 2009-01-01 true Fluid milk processor. 1160.108 Section 1160.108... Agreements and Orders; Milk), DEPARTMENT OF AGRICULTURE FLUID MILK PROMOTION PROGRAM Fluid Milk Promotion Order Definitions § 1160.108 Fluid milk processor. (a) Fluid milk processor means any person who...
7 CFR 1160.108 - Fluid milk processor.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 9 2011-01-01 2011-01-01 false Fluid milk processor. 1160.108 Section 1160.108... Agreements and Orders; Milk), DEPARTMENT OF AGRICULTURE FLUID MILK PROMOTION PROGRAM Fluid Milk Promotion Order Definitions § 1160.108 Fluid milk processor. (a) Fluid milk processor means any person who...
Shared performance monitor in a multiprocessor system
Chiu, George; Gara, Alan G; Salapura, Valentina
2014-12-02
A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
Noncoherent parallel optical processor for discrete two-dimensional linear transformations.
Glaser, I
1980-10-01
We describe a parallel optical processor, based on a lenslet array, that provides general linear two-dimensional transformations using noncoherent light. Such a processor could become useful in image- and signal-processing applications in which the throughput requirements cannot be adequately satisfied by state-of-the-art digital processors. Experimental results that illustrate the feasibility of the processor by demonstrating its use in parallel optical computation of the two-dimensional Walsh-Hadamard transformation are presented.
Processors for wavelet analysis and synthesis: NIFS and TI-C80 MVP
NASA Astrophysics Data System (ADS)
Brooks, Geoffrey W.
1996-03-01
Two processors are considered for image quadrature mirror filtering (QMF). The neuromorphic infrared focal-plane sensor (NIFS) is an existing prototype analog processor offering high speed spatio-temporal Gaussian filtering, which could be used for the QMF low- pass function, and difference of Gaussian filtering, which could be used for the QMF high- pass function. Although not designed specifically for wavelet analysis, the biologically- inspired system accomplishes the most computationally intensive part of QMF processing. The Texas Instruments (TI) TMS320C80 Multimedia Video Processor (MVP) is a 32-bit RISC master processor with four advanced digital signal processors (DSPs) on a single chip. Algorithm partitioning, memory management and other issues are considered for optimal performance. This paper presents these considerations with simulated results leading to processor implementation of high-speed QMF analysis and synthesis.
77 FR 124 - Biological Processors of Alabama; Decatur, Morgan County, AL; Notice of Settlement
Federal Register 2010, 2011, 2012, 2013, 2014
2012-01-03
... ENVIRONMENTAL PROTECTION AGENCY [FRL-9612-9] Biological Processors of Alabama; Decatur, Morgan... reimbursement of past response costs concerning the Biological Processors of Alabama Superfund Site located in... Ms. Paula V. Painter. Submit your comments by Site name Biological Processors of Alabama Superfund...
Federal Register 2010, 2011, 2012, 2013, 2014
2012-07-30
... not be fuel resistant, which could lead to detachment of particles from the fuel hose and cause..., if not corrected, could lead to detachment of particles from the fuel hose and irregularities in the... waiving notice and comment prior to adoption of this rule because detachment of particles from the fuel...
Multiple core computer processor with globally-accessible local memories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shalf, John; Donofrio, David; Oliker, Leonid
A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality ofmore » processor cores.« less
Parallel processor-based raster graphics system architecture
Littlefield, Richard J.
1990-01-01
An apparatus for generating raster graphics images from the graphics command stream includes a plurality of graphics processors connected in parallel, each adapted to receive any part of the graphics command stream for processing the command stream part into pixel data. The apparatus also includes a frame buffer for mapping the pixel data to pixel locations and an interconnection network for interconnecting the graphics processors to the frame buffer. Through the interconnection network, each graphics processor may access any part of the frame buffer concurrently with another graphics processor accessing any other part of the frame buffer. The plurality of graphics processors can thereby transmit concurrently pixel data to pixel locations in the frame buffer.
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
NASA Astrophysics Data System (ADS)
Tanikawa, Ataru; Yoshikawa, Kohji; Okamoto, Takashi; Nitadori, Keigo
2012-02-01
We present a high-performance N-body code for self-gravitating collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we implemented a fourth-order Hermite scheme with individual timestep scheme ( Makino and Aarseth, 1992), and achieved the performance of ˜20 giga floating point number operations per second (GFLOPS) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions ( Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core, respectively. We have parallelized the code by using so-called NINJA scheme ( Nitadori et al., 2006a), and achieved ˜90 GFLOPS for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating collisional system with N ˜ 10 5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems, such as the one with about 200 Tesla C1070 GPUs ( Spurzem et al., 2010). This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.
Impacts of Goethite Particles on UV Disinfection of Drinking Water
Wu, Youxian; Clevenger, Thomas; Deng, Baolin
2005-01-01
A unique association between bacterial cells and small goethite particles (∼0.2 by 2 μm) protected Escherichia coli and Pseudomonas putida from UV inactivation. The protection increased with the particle concentration in the turbidity range of 1 to 50 nephelometric turbidity units and with the bacterium-particle attachment time prior to UV irradiation. The lower degree of bacterial inactivation at longer attachment time was mostly attributed to the particle aggregation surrounding bacteria that provided shielding from UV radiation. PMID:16000835
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, F.; Morel, M.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
Extended performance electric propulsion power processor design study. Volume 2: Technical summary
NASA Technical Reports Server (NTRS)
Biess, J. J.; Inouye, L. Y.; Schoenfeld, A. D.
1977-01-01
Electric propulsion power processor technology has processed during the past decade to the point that it is considered ready for application. Several power processor design concepts were evaluated and compared. Emphasis was placed on a 30 cm ion thruster power processor with a beam power rating supply of 2.2KW to 10KW for the main propulsion power stage. Extension in power processor performance were defined and were designed in sufficient detail to determine efficiency, component weight, part count, reliability and thermal control. A detail design was performed on a microprocessor as the thyristor power processor controller. A reliability analysis was performed to evaluate the effect of the control electronics redesign. Preliminary electrical design, mechanical design and thermal analysis were performed on a 6KW power transformer for the beam supply. Bi-Mod mechanical, structural and thermal control configurations were evaluated for the power processor and preliminary estimates of mechanical weight were determined.
Wald, Ingo; Ize, Santiago
2015-07-28
Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
Sequence information signal processor
Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.
1999-01-01
An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
Conditional load and store in a shared memory
Blumrich, Matthias A; Ohmacht, Martin
2015-02-03
A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Code of Federal Regulations, 2011 CFR
2011-04-01
... information processors: form of application and amendments. 242.609 Section 242.609 Commodity and Securities....609 Registration of securities information processors: form of application and amendments. (a) An application for the registration of a securities information processor shall be filed on Form SIP (§ 249.1001...
Code of Federal Regulations, 2010 CFR
2010-04-01
... information processors: form of application and amendments. 242.609 Section 242.609 Commodity and Securities....609 Registration of securities information processors: form of application and amendments. (a) An application for the registration of a securities information processor shall be filed on Form SIP (§ 249.1001...
Optical Associative Processors For Visual Perception"
NASA Astrophysics Data System (ADS)
Casasent, David; Telfer, Brian
1988-05-01
We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.
Enabling Future Robotic Missions with Multicore Processors
NASA Technical Reports Server (NTRS)
Powell, Wesley A.; Johnson, Michael A.; Wilmot, Jonathan; Some, Raphael; Gostelow, Kim P.; Reeves, Glenn; Doyle, Richard J.
2011-01-01
Recent commercial developments in multicore processors (e.g. Tilera, Clearspeed, HyperX) have provided an option for high performance embedded computing that rivals the performance attainable with FPGA-based reconfigurable computing architectures. Furthermore, these processors offer more straightforward and streamlined application development by allowing the use of conventional programming languages and software tools in lieu of hardware design languages such as VHDL and Verilog. With these advantages, multicore processors can significantly enhance the capabilities of future robotic space missions. This paper will discuss these benefits, along with onboard processing applications where multicore processing can offer advantages over existing or competing approaches. This paper will also discuss the key artchitecural features of current commercial multicore processors. In comparison to the current art, the features and advancements necessary for spaceflight multicore processors will be identified. These include power reduction, radiation hardening, inherent fault tolerance, and support for common spacecraft bus interfaces. Lastly, this paper will explore how multicore processors might evolve with advances in electronics technology and how avionics architectures might evolve once multicore processors are inserted into NASA robotic spacecraft.
Hot Chips and Hot Interconnects for High End Computing Systems
NASA Technical Reports Server (NTRS)
Saini, Subhash
2005-01-01
I will discuss several processors: 1. The Cray proprietary processor used in the Cray X1; 2. The IBM Power 3 and Power 4 used in an IBM SP 3 and IBM SP 4 systems; 3. The Intel Itanium and Xeon, used in the SGI Altix systems and clusters respectively; 4. IBM System-on-a-Chip used in IBM BlueGene/L; 5. HP Alpha EV68 processor used in DOE ASCI Q cluster; 6. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; 7. An NEC proprietary processor, which is used in NEC SX-6/7; 8. Power 4+ processor, which is used in Hitachi SR11000; 9. NEC proprietary processor, which is used in Earth Simulator. The IBM POWER5 and Red Storm Computing Systems will also be discussed. The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks).
High-performance ultra-low power VLSI analog processor for data compression
NASA Technical Reports Server (NTRS)
Tawel, Raoul (Inventor)
1996-01-01
An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.
On the relationship between parallel computation and graph embedding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gupta, A.K.
1989-01-01
The problem of efficiently simulating an algorithm designed for an n-processor parallel machine G on an m-processor parallel machine H with n > m arises when parallel algorithms designed for an ideal size machine are simulated on existing machines which are of a fixed size. The author studies this problem when every processor of H takes over the function of a number of processors in G, and he phrases the simulation problem as a graph embedding problem. New embeddings presented address relevant issues arising from the parallel computation environment. The main focus centers around embedding complete binary trees into smaller-sizedmore » binary trees, butterflies, and hypercubes. He also considers simultaneous embeddings of r source machines into a single hypercube. Constant factors play a crucial role in his embeddings since they are not only important in practice but also lead to interesting theoretical problems. All of his embeddings minimize dilation and load, which are the conventional cost measures in graph embeddings and determine the maximum amount of time required to simulate one step of G on H. His embeddings also optimize a new cost measure called ({alpha},{beta})-utilization which characterizes how evenly the processors of H are used by the processors of G. Ideally, the utilization should be balanced (i.e., every processor of H simulates at most (n/m) processors of G) and the ({alpha},{beta})-utilization measures how far off from a balanced utilization the embedding is. He presents embeddings for the situation when some processors of G have different capabilities (e.g. memory or I/O) than others and the processors with different capabilities are to be distributed uniformly among the processors of H. Placing such conditions on an embedding results in an increase in some of the cost measures.« less
Transient Solid Dynamics Simulations on the Sandia/Intel Teraflop Computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Attaway, S.; Brown, K.; Gardner, D.
1997-12-31
Transient solid dynamics simulations are among the most widely used engineering calculations. Industrial applications include vehicle crashworthiness studies, metal forging, and powder compaction prior to sintering. These calculations are also critical to defense applications including safety studies and weapons simulations. The practical importance of these calculations and their computational intensiveness make them natural candidates for parallelization. This has proved to be difficult, and existing implementations fail to scale to more than a few dozen processors. In this paper we describe our parallelization of PRONTO, Sandia`s transient solid dynamics code, via a novel algorithmic approach that utilizes multiple decompositions for differentmore » key segments of the computations, including the material contact calculation. This latter calculation is notoriously difficult to perform well in parallel, because it involves dynamically changing geometry, global searches for elements in contact, and unstructured communications among the compute nodes. Our approach scales to at least 3600 compute nodes of the Sandia/Intel Teraflop computer (the largest set of nodes to which we have had access to date) on problems involving millions of finite elements. On this machine we can simulate models using more than ten- million elements in a few tenths of a second per timestep, and solve problems more than 3000 times faster than a single processor Cray Jedi.« less
NASA Technical Reports Server (NTRS)
Bradley, William; Bird, Ross; Eldred, Dennis; Zook, Jon; Knowles, Gareth
2013-01-01
This work involved developing spacequalifiable switch mode DC/DC power supplies that improve performance with fewer components, and result in elimination of digital components and reduction in magnetics. This design is for missions where systems may be operating under extreme conditions, especially at elevated temperature levels from 200 to 300 degC. Prior art for radiation-tolerant DC/DC converters has been accomplished utilizing classical magnetic-based switch mode converter topologies; however, this requires specific shielding and component de-rating to meet the high-reliability specifications. It requires complex measurement and feedback components, and will not enable automatic re-optimization for larger changes in voltage supply or electrical loading condition. The innovation is a switch mode DC/DC power supply that eliminates the need for processors and most magnetics. It can provide a well-regulated voltage supply with a gain of 1:100 step-up to 8:1 step down, tolerating an up to 30% fluctuation of the voltage supply parameters. The circuit incorporates a ceramic core transformer in a manner that enables it to provide a well-regulated voltage output without use of any processor components or magnetic transformers. The circuit adjusts its internal parameters to re-optimize its performance for changes in supply voltage, environmental conditions, or electrical loading at the output
Feasibility study of parallel optical correlation-decoding analysis of lightning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Descour, M.R.; Sweatt, W.C.; Elliott, G.R.
The optical correlator described in this report is intended to serve as an attention-focusing processor. The objective is to narrowly bracket the range of a parameter value that characterizes the correlator input. The input is a waveform collected by a satellite-borne receiver. In the correlator, this waveform is simultaneously correlated with an ensemble of ionosphere impulse-response functions, each corresponding to a different total-electron-count (TEC) value. We have found that correlation is an effective method of bracketing the range of TEC values likely to be represented by the input waveform. High accuracy in a computational sense is not required of themore » correlator. Binarization of the impulse-response functions and the input waveforms prior to correlation results in a lower correlation-peak-to-background-fluctuation (signal-to-noise) ratio than the peak that is obtained when all waveforms retain their grayscale values. The results presented in this report were obtained by means of an acousto-optic correlator previously developed at SNL as well as by simulation. An optical-processor architecture optimized for 1D correlation of long waveforms characteristic of this application is described. Discussions of correlator components, such as optics, acousto-optic cells, digital micromirror devices, laser diodes, and VCSELs are included.« less
Health benefits of particle filtration
This product was developed under an interagency agreement between the U.S. EPA and the U.S. Department of Energy - Lawrence Berkeley National Laboratory (LBNL). The evidence of health benefits of particle filtration in homes and commercial buildings is reviewed. Prior reviews o...
Code of Federal Regulations, 2011 CFR
2011-04-01
... registration as a securities information processor or to amend such an application or registration. 249.1001..., SECURITIES EXCHANGE ACT OF 1934 Form for Registration of, and Reporting by Securities Information Processors § 249.1001 Form SIP, for application for registration as a securities information processor or to amend...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-13
... Fisheries Act (AFA) trawl catcher/processor sector (otherwise known as the Amendment 80 sector... catcher/processors. Hook-and-line catcher/processors are allocated 48.7 percent of the annual BSAI Pacific... harvest of Pacific cod by hook-and-line catcher/processors, although this is one of the major groundfish...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-10
... the Securities Information Processors (``SIPs'' or ``Processors'') responsible for consolidation of... Plan. \\9\\ 17 CFR 242.603(b). The Plan refers to this entity as the Processor. \\10\\ See Section I(T) of... Euronext, to Elizabeth M. Murphy, Secretary, Commission, dated May 24, 2012. The Processors would also...
Simulating Synchronous Processors
1988-06-01
34f Fvtvru m LABORATORY FOR INMASSACHUSETTSFCOMPUTER SCIENCE TECHNOLOGY MIT/LCS/TM-359 SIMULATING SYNCHRONOUS PROCESSORS Jennifer Lundelius Welch...PROJECT TASK WORK UNIT Arlington, VA 22217 ELEMENT NO. NO. NO ACCESSION NO. 11. TITLE Include Security Classification) Simulating Synchronous Processors...necessary and identify by block number) In this paper we show how a distributed system with synchronous processors and asynchro- nous message delays can
Middle School Pupil Writing and the Word Processor.
ERIC Educational Resources Information Center
Ediger, Marlow
Pupils in middle schools should have ample opportunities to write with the use of word processors. Legible writing in longhand will always be necessary in selected situations but, nevertheless, much drudgery is taken care of when using a word processor. Word processors tend to be very user friendly in that few mechanical skills are needed by the…
Code of Federal Regulations, 2010 CFR
2010-04-01
... registration as a securities information processor or to amend such an application or registration. 249.1001..., SECURITIES EXCHANGE ACT OF 1934 Form for Registration of, and Reporting by Securities Information Processors § 249.1001 Form SIP, for application for registration as a securities information processor or to amend...
Analog Processor To Solve Optimization Problems
NASA Technical Reports Server (NTRS)
Duong, Tuan A.; Eberhardt, Silvio P.; Thakoor, Anil P.
1993-01-01
Proposed analog processor solves "traveling-salesman" problem, considered paradigm of global-optimization problems involving routing or allocation of resources. Includes electronic neural network and auxiliary circuitry based partly on concepts described in "Neural-Network Processor Would Allocate Resources" (NPO-17781) and "Neural Network Solves 'Traveling-Salesman' Problem" (NPO-17807). Processor based on highly parallel computing solves problem in significantly less time.
Finite elements and the method of conjugate gradients on a concurrent processor
NASA Technical Reports Server (NTRS)
Lyzenga, G. A.; Raefsky, A.; Hager, G. H.
1985-01-01
An algorithm for the iterative solution of finite element problems on a concurrent processor is presented. The method of conjugate gradients is used to solve the system of matrix equations, which is distributed among the processors of a MIMD computer according to an element-based spatial decomposition. This algorithm is implemented in a two-dimensional elastostatics program on the Caltech Hypercube concurrent processor. The results of tests on up to 32 processors show nearly linear concurrent speedup, with efficiencies over 90 percent for sufficiently large problems.
Sobol, Wlad T
2002-01-01
A simple kinetic model that describes the time evolution of the chemical concentration of an arbitrary compound within the tank of an automatic film processor is presented. It provides insights into the kinetics of chemistry concentration inside the processor's tank; the results facilitate the tasks of processor tuning and quality control (QC). The model has successfully been used in several troubleshooting sessions of low-volume mammography processors for which maintaining consistent QC tracking was difficult due to fluctuations of bromide levels in the developer tank.
Multithreading in vector processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi
In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.
Finite elements and the method of conjugate gradients on a concurrent processor
NASA Technical Reports Server (NTRS)
Lyzenga, G. A.; Raefsky, A.; Hager, B. H.
1984-01-01
An algorithm for the iterative solution of finite element problems on a concurrent processor is presented. The method of conjugate gradients is used to solve the system of matrix equations, which is distributed among the processors of a MIMD computer according to an element-based spatial decomposition. This algorithm is implemented in a two-dimensional elastostatics program on the Caltech Hypercube concurrent processor. The results of tests on up to 32 processors show nearly linear concurrent speedup, with efficiencies over 90% for sufficiently large problems.
A fully reconfigurable photonic integrated signal processor
NASA Astrophysics Data System (ADS)
Liu, Weilin; Li, Ming; Guzzon, Robert S.; Norberg, Erik J.; Parker, John S.; Lu, Mingzhi; Coldren, Larry A.; Yao, Jianping
2016-03-01
Photonic signal processing has been considered a solution to overcome the inherent electronic speed limitations. Over the past few years, an impressive range of photonic integrated signal processors have been proposed, but they usually offer limited reconfigurability, a feature highly needed for the implementation of large-scale general-purpose photonic signal processors. Here, we report and experimentally demonstrate a fully reconfigurable photonic integrated signal processor based on an InP-InGaAsP material system. The proposed photonic signal processor is capable of performing reconfigurable signal processing functions including temporal integration, temporal differentiation and Hilbert transformation. The reconfigurability is achieved by controlling the injection currents to the active components of the signal processor. Our demonstration suggests great potential for chip-scale fully programmable all-optical signal processing.
Neurovision processor for designing intelligent sensors
NASA Astrophysics Data System (ADS)
Gupta, Madan M.; Knopf, George K.
1992-03-01
A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing
NASA Astrophysics Data System (ADS)
Decyk, V. K.; Dauger, D. E.
We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
Tracking Debris Shed by a Space-Shuttle Launch Vehicle
NASA Technical Reports Server (NTRS)
Stuart, Phillip C.; Rogers, Stuart E.
2009-01-01
The DEBRIS software predicts the trajectories of debris particles shed by a space-shuttle launch vehicle during ascent, to aid in assessing potential harm to the space-shuttle orbiter and crew. The user specifies the location of release and other initial conditions for a debris particle. DEBRIS tracks the particle within an overset grid system by means of a computational fluid dynamics (CFD) simulation of the local flow field and a ballistic simulation that takes account of the mass of the particle and its aerodynamic properties in the flow field. The computed particle trajectory is stored in a file to be post-processed by other software for viewing and analyzing the trajectory. DEBRIS supplants a prior debris tracking code that took .15 minutes to calculate a single particle trajectory: DEBRIS can calculate 1,000 trajectories in .20 seconds on a desktop computer. Other improvements over the prior code include adaptive time-stepping to ensure accuracy, forcing at least one step per grid cell to ensure resolution of all CFD-resolved flow features, ability to simulate rebound of debris from surfaces, extensive error checking, a builtin suite of test cases, and dynamic allocation of memory.
Effects of heavy particle irradiation and diet on object recognition memory in rats
NASA Astrophysics Data System (ADS)
Rabin, Bernard M.; Carrihill-Knoll, Kirsty; Hinchman, Marie; Shukitt-Hale, Barbara; Joseph, James A.; Foster, Brian C.
2009-04-01
On long-duration missions to other planets astronauts will be exposed to types and doses of radiation that are not experienced in low earth orbit. Previous research using a ground-based model for exposure to cosmic rays has shown that exposure to heavy particles, such as 56Fe, disrupts spatial learning and memory measured using the Morris water maze. Maintaining rats on diets containing antioxidant phytochemicals for 2 weeks prior to irradiation ameliorated this deficit. The present experiments were designed to determine: (1) the generality of the particle-induced disruption of memory by examining the effects of exposure to 56Fe particles on object recognition memory; and (2) whether maintaining rats on these antioxidant diets for 2 weeks prior to irradiation would also ameliorate any potential deficit. The results showed that exposure to low doses of 56Fe particles does disrupt recognition memory and that maintaining rats on antioxidant diets containing blueberry and strawberry extract for only 2 weeks was effective in ameliorating the disruptive effects of irradiation. The results are discussed in terms of the mechanisms by which exposure to these particles may produce effects on neurocognitive performance.
When emotionality trumps reason: a study of individual processing style and juror bias.
Gunnell, Justin J; Ceci, Stephen J
2010-01-01
"Cognitive Experiential Self Theory" (CEST) postulates that information-processing proceeds through two pathways, a rational one and an experiential one. The former is characterized by an emphasis on analysis, fact, and logical argument, whereas the latter is characterized by emotional and personal experience. We examined whether individuals influenced by the experiential system (E-processors) are more susceptible to extralegal biases (e.g. defendant attractiveness) than those influenced by the rational system (R-processors). Participants reviewed a criminal trial transcript and defendant profile and determined verdict, sentencing, and extralegal susceptibility. Although E-processors and R-processors convicted attractive defendants at similar rates, E-processors were more likely to convict less attractive defendants. Whereas R-processors did not sentence attractive and less attractive defendants differently, E-processors gave more lenient sentences to attractive defendants and harsher sentences to less attractive defendants. E-processors were also more likely to report that extralegal factors would change their verdicts. Further, the degree to which emotionality trumped rationality within an individual, as measured by a novel scoring method, linearly correlated with harsher sentences and extralegal influence. In sum, the results support an "unattractive harshness" effect during guilt determination, an attraction leniency effect during sentencing and increased susceptibility to extralegal factors within E-processors. Copyright © 2010 John Wiley & Sons, Ltd. Copyright © 2010 John Wiley & Sons, Ltd.
Soft-core processor study for node-based architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Houten, Jonathan Roger; Jarosz, Jason P.; Welch, Benjamin James
2008-09-01
Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable Field Programmable Gate Array (FPGA) based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hardcore processor built into the FPGA or as a soft-core processor builtmore » out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA based processors for use in future NBA systems--two soft cores (MicroBlaze and non-fault-tolerant LEON) and one hard core (PowerPC 405). Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration. Cache configurations impacted the results greatly; for optimal processor efficiency it is necessary to enable caches on the processors. Processor caches carry a penalty; cache error mitigation is necessary when operating in a radiation environment.« less
Development of small scale cluster computer for numerical analysis
NASA Astrophysics Data System (ADS)
Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.
2017-09-01
In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-12-10
...; catcher/ processor--40 percent; and motherships--10 percent. Under Sec. 679.20(a)(5)(iii)(B)(2)(i) and (ii... sector, 40 percent to the catcher/processor sector, and 10 percent to the mothership sector. In the.../processor sector will be available for harvest by AFA catcher vessels with catcher/ processor sector...
Processor architecture for airborne SAR systems
NASA Technical Reports Server (NTRS)
Glass, C. M.
1983-01-01
Digital processors for spaceborne imaging radars and application of the technology developed for airborne SAR systems are considered. Transferring algorithms and implementation techniques from airborne to spaceborne SAR processors offers obvious advantages. The following topics are discussed: (1) a quantification of the differences in processing algorithms for airborne and spaceborne SARs; and (2) an overview of three processors for airborne SAR systems.
Yes! An object-oriented compiler compiler (YOOCC)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Avotins, J.; Mingins, C.; Schmidt, H.
1995-12-31
Grammar-based processor generation is one of the most widely studied areas in language processor construction. However, there have been very few approaches to date that reconcile object-oriented principles, processor generation, and an object-oriented language. Pertinent here also. is that currently to develop a processor using the Eiffel Parse libraries requires far too much time to be expended on tasks that can be automated. For these reasons, we have developed YOOCC (Yes! an Object-Oriented Compiler Compiler), which produces a processor framework from a grammar using an enhanced version of the Eiffel Parse libraries, incorporating the ideas hypothesized by Meyer, and Grapemore » and Walden, as well as many others. Various essential changes have been made to the Eiffel Parse libraries. Examples are presented to illustrate the development of a processor using YOOCC, and it is concluded that the Eiffel Parse libraries are now not only an intelligent, but also a productive option for processor construction.« less
Effect of poor control of film processors on mammographic image quality.
Kimme-Smith, C; Sun, H; Bassett, L W; Gold, R H
1992-11-01
With the increasingly stringent standards of image quality in mammography, film processor quality control is especially important. Current methods are not sufficient for ensuring good processing. The authors used a sensitometer and densitometer system to evaluate the performance of 22 processors at 16 mammographic facilities. Standard sensitometric values of two films were established, and processor performance was assessed for variations from these standards. Developer chemistry of each processor was analyzed and correlated with its sensitometric values. Ten processors were retested, and nine were found to be out of calibration. The developer components of hydroquinone, sulfites, bromide, and alkalinity varied the most, and low concentrations of hydroquinone were associated with lower average gradients at two facilities. Use of the sensitometer and densitometer system helps identify out-of-calibration processors, but further study is needed to correlate sensitometric values with developer component values. The authors believe that present quality control would be improved if sensitometric or other tests could be used to identify developer components that are out of calibration.
Automatic film processors' quality control test in Greek military hospitals.
Lymberis, C; Efstathopoulos, E P; Manetou, A; Poudridis, G
1993-04-01
The two major military radiology installations (Athens, Greece) using a total of 15 automatic film processors were assessed using the 21-step-wedge method. The results of quality control in all these processors are presented. The parameters measured under actual working conditions were base and fog, contrast and speed. Base and fog as well as speed displayed large variations with average values generally higher than acceptable, whilst contrast displayed greater stability. Developer temperature was measured daily during the test and was found to be outside the film manufacturers' recommended limits in nine of the 15 processors. In only one processor did film passing time vary on an every day basis and this was due to maloperation. Developer pH test was not part of the daily monitoring service being performed every 5 days for each film processor and found to be in the range 9-12; 10 of the 15 processors presented pH values outside the limits specified by the film manufacturers.
A high-accuracy optical linear algebra processor for finite element applications
NASA Technical Reports Server (NTRS)
Casasent, D.; Taylor, B. K.
1984-01-01
Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
Optimal processor assignment for pipeline computations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath
1991-01-01
The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.
NASA Technical Reports Server (NTRS)
Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)
1983-01-01
A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Extended performance electric propulsion power processor design study. Volume 1: Executive summary
NASA Technical Reports Server (NTRS)
Biess, J. J.; Inouye, L. Y.; Schoenfeld, A. D.
1977-01-01
Several power processor design concepts were evaluated and compared. Emphasis was placed on a 30cm ion thruster power processor with a beam supply rating of 2.2kW to 10kW. Extensions in power processor performance were defined and were designed in sufficient detail to determine efficiency, component weight, part count, reliability and thermal control. Preliminary electrical design, mechanical design, and thermal analysis were performed on a 6kW power transformer for the beam supply. Bi-Mod mechanical, structural, and thermal control configurations were evaluated for the power processor, and preliminary estimates of mechanical weight were determined. A program development plan was formulated that outlines the work breakdown structure for the development, qualification and fabrication of the power processor flight hardware.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool
NASA Astrophysics Data System (ADS)
Barr, David R. W.; Dudek, Piotr
2009-12-01
We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
Efficient Interconnection Schemes for VLSI and Parallel Computation
1989-08-01
Definition: Let R be a routing network. A set S of wires in R is a (directed) cut if it partitions the network into two sets of processors A and B ...such that every path from a processor in A to a processor in B contains a wire in S. The capacity cap(S) is the number of wires in the cut. For a set of...messages M, define the load load(M, S) of M on a cut S to be the number of messages in M from a processor in A to a processor in B . The load factor
Hypercluster - Parallel processing for computational mechanics
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1988-01-01
An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
2015-06-13
The Berkeley Out-of-Order Machine (BOOM): An Industry- Competitive, Synthesizable, Parameterized RISC-V Processor Christopher Celio David A...Synthesizable, Parameterized RISC-V Processor Christopher Celio, David Patterson, and Krste Asanović University of California, Berkeley, California 94720...Order Machine BOOM is a synthesizable, parameterized, superscalar out- of-order RISC-V core designed to serve as the prototypical baseline processor
A Medical Language Processor for Two Indo-European Languages
Nhan, Ngo Thanh; Sager, Naomi; Lyman, Margaret; Tick, Leo J.; Borst, François; Su, Yun
1989-01-01
The syntax and semantics of clinical narrative across Indo-European languages are quite similar, making it possible to envison a single medical language processor that can be adapted for different European languages. The Linguistic String Project of New York University is continuing the development of its Medical Language Processor in this direction. The paper describes how the processor operates on English and French.
Performance Modeling of the ADA Rendezvous
1991-10-01
queueing network of figure 2, SERVERTASK can complete only one rendezvous at a time. Thus, the rate that the rendezvous requests are processed at the... Network 1, SERVERTASK competes with the traffic tasks of Server Processor. Each time SERVERTASK gains access to the processor, SERVERTASK completes...Client Processor Server Processor Software Server Nek Netork2 Figure 10. A conceptualization of the algorithm. The SERVERTASK software server of Network 2
A Parallel Algorithm for Contact in a Finite Element Hydrocode
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pierce, Timothy G.
A parallel algorithm is developed for contact/impact of multiple three dimensional bodies undergoing large deformation. As time progresses the relative positions of contact between the multiple bodies changes as collision and sliding occurs. The parallel algorithm is capable of tracking these changes and enforcing an impenetrability constraint and momentum transfer across the surfaces in contact. Portions of the various surfaces of the bodies are assigned to the processors of a distributed-memory parallel machine in an arbitrary fashion, known as the primary decomposition. A secondary, dynamic decomposition is utilized to bring opposing sections of the contacting surfaces together on the samemore » processors, so that opposing forces may be balanced and the resultant deformation of the bodies calculated. The secondary decomposition is accomplished and updated using only local communication with a limited subset of neighbor processors. Each processor represents both a domain of the primary decomposition and a domain of the secondary, or contact, decomposition. Thus each processor has four sets of neighbor processors: (a) those processors which represent regions adjacent to it in the primary decomposition, (b) those processors which represent regions adjacent to it in the contact decomposition, (c) those processors which send it the data from which it constructs its contact domain, and (d) those processors to which it sends its primary domain data, from which they construct their contact domains. The latter three of these neighbor sets change dynamically as the simulation progresses. By constraining all communication to these sets of neighbors, all global communication, with its attendant nonscalable performance, is avoided. A set of tests are provided to measure the degree of scalability achieved by this algorithm on up to 1024 processors. Issues related to the operating system of the test platform which lead to some degradation of the results are analyzed. This algorithm has been implemented as the contact capability of the ALE3D multiphysics code, and is currently in production use.« less
NASA Technical Reports Server (NTRS)
1994-01-01
A Small Business Innovation Research (SBIR) contract resulted in a series of commercially available lasers, which have application in fiber optic communications, difference frequency generation, fiber optic sensing and general laboratory use. Developed under a Small Business Innovation Research (SBIR) contract, the Phase Doppler Particles Analyzer is a non-disruptive, highly accurate laser-based method of determining particle size, number density, trajectory, turbulence and other information about particles passing through a measurement probe volume. The system consists of an optical transmitter and receiver, signal processor and computer with data acquisition and analysis software. A variety of systems are offered for applications including spray characterization for paint, and agricultural and other sprays. The Microsizer, a related product, is used in medical equipment manufacturing and analysis of contained flows. High frequency components and subsystems produced by Millitech Corporation are marketed for both research and commercial use. These systems, which operate in the upper portion of the millimeter wave, resulted from a number of Small Business Innovation Research (SBIR) projects. By developing very high performance mixers and multipliers, the company has advanced the state of the art in sensitive receiver technology. Components are used in receivers and transceivers for monitoring chlorine monoxides, ozone, in plasma characterization and in material properties characterization.
Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations
NASA Astrophysics Data System (ADS)
Hause, Benjamin; Parker, Scott; Chen, Yang
2013-10-01
We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.
FPGA wavelet processor design using language for instruction-set architectures (LISA)
NASA Astrophysics Data System (ADS)
Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios
2007-04-01
The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.
Air cooled turbine component having an internal filtration system
Beeck, Alexander R [Orlando, FL
2012-05-15
A centrifugal particle separator is provided for removing particles such as microscopic dirt or dust particles from the compressed cooling air prior to reaching and cooling the turbine blades or turbine vanes of a turbine engine. The centrifugal particle separator structure has a substantially cylindrical body with an inlet arranged on a periphery of the substantially cylindrical body. Cooling air enters centrifugal particle separator through the separator inlet port having a linear velocity. When the cooling air impinges the substantially cylindrical body, the linear velocity is transformed into a rotational velocity, separating microscopic particles from the cooling air. Microscopic dust particles exit the centrifugal particle separator through a conical outlet and returned to a working medium.
Automobile Crash Sensor Signal Processor
DOT National Transportation Integrated Search
1973-11-01
The crash sensor signal processor described interfaces between an automobile-installed doppler radar and an air bag activating solenoid or equivalent electromechanical device. The processor utilizes both digital and analog techniques to produce an ou...
NASA Technical Reports Server (NTRS)
Srinivasan, J.; Farrington, A.; Gray, A.
2001-01-01
They present an overview of long-life reconfigurable processor technologies and of a specific architecture for implementing a software reconfigurable (software-defined) network processor for space applications.
Evaluating local indirect addressing in SIMD proc essors
NASA Technical Reports Server (NTRS)
Middleton, David; Tomboulian, Sherryl
1989-01-01
In the design of parallel computers, there exists a tradeoff between the number and power of individual processors. The single instruction stream, multiple data stream (SIMD) model of parallel computers lies at one extreme of the resulting spectrum. The available hardware resources are devoted to creating the largest possible number of processors, and consequently each individual processor must use the fewest possible resources. Disagreement exists as to whether SIMD processors should be able to generate addresses individually into their local data memory, or all processors should access the same address. The tradeoff is examined between the increased capability and the reduced number of processors that occurs in this single instruction stream, multiple, locally addressed, data (SIMLAD) model. Factors are assembled that affect this design choice, and the SIMLAD model is compared with the bare SIMD and the MIMD models.
WATERLOPP V2/64: A highly parallel machine for numerical computation
NASA Astrophysics Data System (ADS)
Ostlund, Neil S.
1985-07-01
Current technological trends suggest that the high performance scientific machines of the future are very likely to consist of a large number (greater than 1024) of processors connected and communicating with each other in some as yet undetermined manner. Such an assembly of processors should behave as a single machine in obtaining numerical solutions to scientific problems. However, the appropriate way of organizing both the hardware and software of such an assembly of processors is an unsolved and active area of research. It is particularly important to minimize the organizational overhead of interprocessor comunication, global synchronization, and contention for shared resources if the performance of a large number ( n) of processors is to be anything like the desirable n times the performance of a single processor. In many situations, adding a processor actually decreases the performance of the overall system since the extra organizational overhead is larger than the extra processing power added. The systolic loop architecture is a new multiple processor architecture which attemps at a solution to the problem of how to organize a large number of asynchronous processors into an effective computational system while minimizing the organizational overhead. This paper gives a brief overview of the basic systolic loop architecture, systolic loop algorithms for numerical computation, and a 64-processor implementation of the architecture, WATERLOOP V2/64, that is being used as a testbed for exploring the hardware, software, and algorithmic aspects of the architecture.
Multiprocessing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1990-01-01
Very little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPs or more) in computational aerodynamics to significantly improve turnaround time. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, the improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) through multi-tasking is applied via a strategy which requires relatively minor modifications to an existing code for a single processor. Essentially, this approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. The existing single processor code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor. As a demonstration of this approach, a Multiple Processor Multiple Grid (MPMG) code is developed. It is capable of using nine processors, and can be easily extended to a larger number of processors. This code solves the three-dimensional, Reynolds averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. The solver is applied to generic oblique-wing aircraft problem on a four processor Cray-2 computer. A tricubic interpolation scheme is developed to increase the accuracy of coupling of overlapped grids. For the oblique-wing aircraft problem, a speedup of two in elapsed (turnaround) time is observed in a saturated time-sharing environment.
Health Monitor for Multitasking, Safety-Critical, Real-Time Software
NASA Technical Reports Server (NTRS)
Zoerner, Roger
2011-01-01
Health Manager can detect Bad Health prior to a failure occurring by periodically monitoring the application software by looking for code corruption errors, and sanity-checking each critical data value prior to use. A processor s memory can fail and corrupt the software, or the software can accidentally write to the wrong address and overwrite the executing software. This innovation will continuously calculate a checksum of the software load to detect corrupted code. This will allow a system to detect a failure before it happens. This innovation monitors each software task (thread) so that if any task reports "bad health," or does not report to the Health Manager, the system is declared bad. The Health Manager reports overall system health to the outside world by outputting a square wave signal. If the square wave stops, this indicates that system health is bad or hung and cannot report. Either way, "bad health" can be detected, whether caused by an error, corrupted data, or a hung processor. A separate Health Monitor Task is started and run periodically in a loop that starts and stops pending on a semaphore. Each monitored task registers with the Health Manager, which maintains a count for the task. The registering task must indicate if it will run more or less often than the Health Manager. If the task runs more often than the Health Manager, the monitored task calls a health function that increments the count and verifies it did not go over max-count. When the periodic Health Manager runs, it verifies that the count did not go over the max-count and zeroes it. If the task runs less often than the Health Manager, the periodic Health Manager will increment the count. The monitored task zeroes the count, and both the Health Manager and monitored task verify that the count did not go over the max-count.
Database for LDV Signal Processor Performance Analysis
NASA Technical Reports Server (NTRS)
Baker, Glenn D.; Murphy, R. Jay; Meyers, James F.
1989-01-01
A comparative and quantitative analysis of various laser velocimeter signal processors is difficult because standards for characterizing signal bursts have not been established. This leaves the researcher to select a signal processor based only on manufacturers' claims without the benefit of direct comparison. The present paper proposes the use of a database of digitized signal bursts obtained from a laser velocimeter under various configurations as a method for directly comparing signal processors.
The Use of a Microcomputer Based Array Processor for Real Time Laser Velocimeter Data Processing
NASA Technical Reports Server (NTRS)
Meyers, James F.
1990-01-01
The application of an array processor to laser velocimeter data processing is presented. The hardware is described along with the method of parallel programming required by the array processor. A portion of the data processing program is described in detail. The increase in computational speed of a microcomputer equipped with an array processor is illustrated by comparative testing with a minicomputer.
Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data
NASA Technical Reports Server (NTRS)
Smith, B. W.; Siegel, H. J.; Swain, P. H.
1981-01-01
A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.
Morales, Rocío; Martínez, Karina D; Pizones Ruiz-Henestrosa, Víctor M; Pilosof, Ana M R
2015-09-01
The effect of high intensity ultrasound (HIUS) may produce structural modifications on proteins through a friendly environmental process. Thus, it can be possible to obtain aggregates with a determined particle size, and altering a defined functional property at the same time. The objective of this work was to explore the impact of HIUS on the functionality of a denatured soy protein isolate (SPI) on foaming and interfacial properties. SPI solutions at pH 6.9 were treated with HIUS for 20 min, in an ultrasonic processor at room temperature, at 75, 80 and 85°C. The operating conditions were: 20 kHz, 4.27 ± 0.71 W and 20% of amplitude. It was determined the size of the protein particles, before and after the HIUS treatment, by dynamic light scattering. It was also analyzed the interfacial behavior of the different systems as well as their foaming properties, by applying the whipping method. The HIUS treatment and HIUS with temperature improved the foaming capacity by alteration of particle size whereas stability was not modified significantly. The temperature of HIUS treatment (80 and 85°C) showed a synergistic effect on foaming capacity. It was found that the reduction of particle size was related to the increase of foaming capacity of SPI. On the other hand, the invariable elasticity of the interfacial films could explain the stability of foams over time. Copyright © 2015 Elsevier B.V. All rights reserved.
Effect of processor temperature on film dosimetry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Srivastava, Shiv P.; Das, Indra J., E-mail: idas@iupui.edu
2012-07-01
Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d{sub max.}, 10 Multiplication-Sign 10 cm{sup 2}, 100 cm) to a given dose. Anmore » automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4-40.6 Degree-Sign C (85-105 Degree-Sign F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.« less
Cargo Movement Operations System (CMOS). Requirements Traceability Matrix Increment II
1990-05-17
NO [ ] COMMENT DISPOSITION: ACCEPT [ ] REJECT [ ] COMMENT STATUS: OPEN [ ] CLOSED [ ] Cmnt Page Paragraph No. No. Number Comment 1. C-i SS0-3 Change "workstation" to "processor". 2. C-2 SS0009 Change "workstation" to "processor". SS0016 3. C-6 SS0032 Change "workstation" to "processor". SS0035 4. C-9 SS0063 Add comma after "e.g." 5. C-i SS0082 Change "workstation" to "processor". 6. C-17 SS0131 Change "workstation" to "processor". SS0132 7. C-28 SS0242 Change "workstation"
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Baseband processor development for the Advanced Communications Satellite Program
NASA Technical Reports Server (NTRS)
Moat, D.; Sabourin, D.; Stilwell, J.; Mccallister, R.; Borota, M.
1982-01-01
An onboard-baseband-processor concept for a satellite-switched time-division-multiple-access (SS-TDMA) communication system was developed for NASA Lewis Research Center. The baseband processor routes and controls traffic on an individual message basis while providing significant advantages in improved link margins and system flexibility. Key technology developments required to prove the flight readiness of the baseband-processor design are being verified in a baseband-processor proof-of-concept model. These technology developments include serial MSK modems, Clos-type baseband routing switch, a single-chip CMOS maximum-likelihood convolutional decoder, and custom LSL implementation of high-speed, low-power ECL building blocks.
The software system development for the TAMU real-time fan beam scatterometer data processors
NASA Technical Reports Server (NTRS)
Clark, B. V.; Jean, B. R.
1980-01-01
A software package was designed and written to process in real-time any one quadrature channel pair of radar scatterometer signals form the NASA L- or C-Band radar scatterometer systems. The software was successfully tested in the C-Band processor breadboard hardware using recorded radar and NERDAS (NASA Earth Resources Data Annotation System) signals as the input data sources. The processor development program and the overall processor theory of operation and design are described. The real-time processor software system is documented and the results of the laboratory software tests, and recommendations for the efficient application of the data processing capabilities are presented.
GRAPE-4: A special-purpose computer for gravitational N-body problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Makino, Junichiro; Taiji, Makoto; Ebisuzaki, Toshikazu
1995-12-01
We describe GRAPE-4, a special-purpose computer for gravitational N-body simulations. In gravitational N-body simulations, almost all computing time is spent for the calculation of interaction between particles. GRAPE-4 is a specialized hardware to calculate the interaction between particles. It is used with a general-purpose host computer that performs all calculations other than the force calculation. With this architecture, it is relatively easy to realize a massively parallel system. In 1991, we developed the GRAPE-3 system with the peak speed equivalent to 14.4 Gflops. It consists of 48 custom pipelined processors. In 1992 we started the development of GRAPE-4. The GRAPE-4more » system will consist of 1920 custom pipeline chips. Each chip has the speed of 600 Mflops, when operated on 30 MHz clock. A prototype system with two custom LSIs has been completed July 1994, and the full system is now under manufacturing.« less
Neutral particle beam intensity controller
Dagenhart, W.K.
1984-05-29
The neutral beam intensity controller is based on selected magnetic defocusing of the ion beam prior to neutralization. The defocused portion of the beam is dumped onto a beam dump disposed perpendicular to the beam axis. Selective defocusing is accomplished by means of a magnetic field generator disposed about the neutralizer so that the field is transverse to the beam axis. The magnetic field intensity is varied to provide the selected partial beam defocusing of the ions prior to neutralization. The desired focused neutral beam portion passes along the beam path through a defining aperture in the beam dump, thereby controlling the desired fraction of neutral particles transmitted to a utilization device without altering the kinetic energy level of the desired neutral particle fraction. By proper selection of the magnetic field intensity, virtually zero through 100% intensity control of the neutral beam is achieved.
Composite Solid Electrolyte Containing Li+- Conducting Fibers
NASA Technical Reports Server (NTRS)
Appleby, A. John; Wang, Chunsheng; Zhang, Xiangwu
2006-01-01
Improved composite solid polymer electrolytes (CSPEs) are being developed for use in lithium-ion power cells. The matrix components of these composites, like those of some prior CSPEs, are high-molecular-weight dielectric polymers [generally based on polyethylene oxide (PEO)]. The filler components of these composites are continuous, highly-Li(+)-conductive, inorganic fibers. PEO-based polymers alone would be suitable for use as solid electrolytes, were it not for the fact that their room-temperature Li(+)-ion conductivities lie in the range between 10(exp -6) and 10(exp -8) S/cm, too low for practical applications. In a prior approach to formulating a CSPE, one utilizes nonconductive nanoscale inorganic filler particles to increase the interfacial stability of the conductive phase. The filler particles also trap some electrolyte impurities. The achievable increase in conductivity is limited by the nonconductive nature of the filler particles.
A digital retina-like low-level vision processor.
Mertoguno, S; Bourbakis, N G
2003-01-01
This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.
Simulation of a master-slave event set processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Comfort, J.C.
1984-03-01
Event set manipulation may consume a considerable amount of the computation time spent in performing a discrete-event simulation. One way of minimizing this time is to allow event set processing to proceed in parallel with the remainder of the simulation computation. The paper describes a multiprocessor simulation computer, in which all non-event set processing is performed by the principal processor (called the host). Event set processing is coordinated by a front end processor (the master) and actually performed by several other functionally identical processors (the slaves). A trace-driven simulation program modeling this system was constructed, and was run with tracemore » output taken from two different simulation programs. Output from this simulation suggests that a significant reduction in run time may be realized by this approach. Sensitivity analysis was performed on the significant parameters to the system (number of slave processors, relative processor speeds, and interprocessor communication times). A comparison between actual and simulation run times for a one-processor system was used to assist in the validation of the simulation. 7 references.« less
DFT algorithms for bit-serial GaAs array processor architectures
NASA Technical Reports Server (NTRS)
Mcmillan, Gary B.
1988-01-01
Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
Mechanically verified hardware implementing an 8-bit parallel IO Byzantine agreement processor
NASA Technical Reports Server (NTRS)
Moore, J. Strother
1992-01-01
Consider a network of four processors that use the Oral Messages (Byzantine Generals) Algorithm of Pease, Shostak, and Lamport to achieve agreement in the presence of faults. Bevier and Young have published a functional description of a single processor that, when interconnected appropriately with three identical others, implements this network under the assumption that the four processors step in synchrony. By formalizing the original Pease, et al work, Bevier and Young mechanically proved that such a network achieves fault tolerance. We develop, formalize, and discuss a hardware design that has been mechanically proven to implement their processor. In particular, we formally define mapping functions from the abstract state space of the Bevier-Young processor to a concrete state space of a hardware module and state a theorem that expresses the claim that the hardware correctly implements the processor. We briefly discuss the Brock-Hunt Formal Hardware Description Language which permits designs both to be proved correct with the Boyer-Moore theorem prover and to be expressed in a commercially supported hardware description language for additional electrical analysis and layout. We briefly describe our implementation.
Implementing direct, spatially isolated problems on transputer networks
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1988-01-01
Parametric studies were performed on transputer networks of up to 40 processors to determine how to implement and maximize the performance of the solution of problems where no processor-to-processor data transfer is required for the problem solution (spatially isolated). Two types of problems are investigated a computationally intensive problem where the solution required the transmission of 160 bytes of data through the parallel network, and a communication intensive example that required the transmission of 3 Mbytes of data through the network. This data consists of solutions being sent back to the host processor and not intermediate results for another processor to work on. Studies were performed on both integer and floating-point transputers. The latter features an on-chip floating-point math unit and offers approximately an order of magnitude performance increase over the integer transputer on real valued computations. The results indicate that a minimum amount of work is required on each node per communication to achieve high network speedups (efficiencies). The floating-point processor requires approximately an order of magnitude more work per communication than the integer processor because of the floating-point unit's increased computing capacity.
Support for Diagnosis of Custom Computer Hardware
NASA Technical Reports Server (NTRS)
Molock, Dwaine S.
2008-01-01
The Coldfire SDN Diagnostics software is a flexible means of exercising, testing, and debugging custom computer hardware. The software is a set of routines that, collectively, serve as a common software interface through which one can gain access to various parts of the hardware under test and/or cause the hardware to perform various functions. The routines can be used to construct tests to exercise, and verify the operation of, various processors and hardware interfaces. More specifically, the software can be used to gain access to memory, to execute timer delays, to configure interrupts, and configure processor cache, floating-point, and direct-memory-access units. The software is designed to be used on diverse NASA projects, and can be customized for use with different processors and interfaces. The routines are supported, regardless of the architecture of a processor that one seeks to diagnose. The present version of the software is configured for Coldfire processors on the Subsystem Data Node processor boards of the Solar Dynamics Observatory. There is also support for the software with respect to Mongoose V, RAD750, and PPC405 processors or their equivalents.
A rapid bioassy for monitoring acute toxicity of wastewater, ground water, and soil and sediment extracts using submitochondrial particles (SMP) has been developed. The assay utilizes the mitochondrial electron transfer enzyme complex present in all eukaryotic cells. Prior develo...
Progress Towards a Rad-Hydro Code for Modern Computing Architectures LA-UR-10-02825
NASA Astrophysics Data System (ADS)
Wohlbier, J. G.; Lowrie, R. B.; Bergen, B.; Calef, M.
2010-11-01
We are entering an era of high performance computing where data movement is the overwhelming bottleneck to scalable performance, as opposed to the speed of floating-point operations per processor. All multi-core hardware paradigms, whether heterogeneous or homogeneous, be it the Cell processor, GPGPU, or multi-core x86, share this common trait. In multi-physics applications such as inertial confinement fusion or astrophysics, one may be solving multi-material hydrodynamics with tabular equation of state data lookups, radiation transport, nuclear reactions, and charged particle transport in a single time cycle. The algorithms are intensely data dependent, e.g., EOS, opacity, nuclear data, and multi-core hardware memory restrictions are forcing code developers to rethink code and algorithm design. For the past two years LANL has been funding a small effort referred to as Multi-Physics on Multi-Core to explore ideas for code design as pertaining to inertial confinement fusion and astrophysics applications. The near term goals of this project are to have a multi-material radiation hydrodynamics capability, with tabular equation of state lookups, on cartesian and curvilinear block structured meshes. In the longer term we plan to add fully implicit multi-group radiation diffusion and material heat conduction, and block structured AMR. We will report on our progress to date.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2015-04-01
PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Nair, Erika L; Sousa, Rhonda; Wannagot, Shannon
Guidelines established by the AAA currently recommend behavioral testing when fitting frequency modulated (FM) systems to individuals with cochlear implants (CIs). A protocol for completing electroacoustic measures has not yet been validated for personal FM systems or digital modulation (DM) systems coupled to CI sound processors. In response, some professionals have used or altered the AAA electroacoustic verification steps for fitting FM systems to hearing aids when fitting FM systems to CI sound processors. More recently steps were outlined in a proposed protocol. The purpose of this research is to review and compare the electroacoustic test measures outlined in a 2013 article by Schafer and colleagues in the Journal of the American Academy of Audiology titled "A Proposed Electroacoustic Test Protocol for Personal FM Receivers Coupled to Cochlear Implant Sound Processors" to the AAA electroacoustic verification steps for fitting FM systems to hearing aids when fitting DM systems to CI users. Electroacoustic measures were conducted on 71 CI sound processors and Phonak Roger DM systems using a proposed protocol and an adapted AAA protocol. Phonak's recommended default receiver gain setting was used for each CI sound processor manufacturer and adjusted if necessary to achieve transparency. Electroacoustic measures were conducted on Cochlear and Advanced Bionics (AB) sound processors. In this study, 28 Cochlear Nucleus 5/CP810 sound processors, 26 Cochlear Nucleus 6/CP910 sound processors, and 17 AB Naida CI Q70 sound processors were coupled in various combinations to Phonak Roger DM dedicated receivers (25 Phonak Roger 14 receivers-Cochlear dedicated receiver-and 9 Phonak Roger 17 receivers-AB dedicated receiver) and 20 Phonak Roger Inspiro transmitters. Employing both the AAA and the Schafer et al protocols, electroacoustic measurements were conducted with the Audioscan Verifit in a clinical setting on 71 CI sound processors and Phonak Roger DM systems to determine transparency and verify FM advantage, comparing speech inputs (65 dB SPL) in an effort to achieve equal outputs. If transparency was not achieved at Phonak's recommended default receiver gain, adjustments were made to the receiver gain. The integrity of the signal was monitored with the appropriate manufacturer's monitor earphones. Using the AAA hearing aid protocol, 50 of the 71 CI sound processors achieved transparency, and 59 of the 71 CI sound processors achieved transparency when using the proposed protocol at Phonak's recommended default receiver gain. After the receiver gain was adjusted, 3 of 21 CI sound processors still did not meet transparency using the AAA protocol, and 2 of 12 CI sound processors still did not meet transparency using the Schafer et al proposed protocol. Both protocols were shown to be effective in taking reliable electroacoustic measurements and demonstrate transparency. Both protocols are felt to be clinically feasible and to address the needs of populations that are unable to reliably report regarding the integrity of their personal DM systems. American Academy of Audiology
RUSHMAPS: Real-Time Uploadable Spherical Harmonic Moment Analysis for Particle Spectrometers
NASA Technical Reports Server (NTRS)
Figueroa-Vinas, Adolfo
2013-01-01
RUSHMAPS is a new onboard data reduction scheme that gives real-time access to key science parameters (e.g. moments) of a class of heliophysics science and/or solar system exploration investigation that includes plasma particle spectrometers (PPS), but requires moments reporting (density, bulk-velocity, temperature, pressure, etc.) of higher-level quality, and tolerates a lowpass (variable quality) spectral representation of the corresponding particle velocity distributions, such that telemetry use is minimized. The proposed methodology trades access to the full-resolution velocity distribution data, saving on telemetry, for real-time access to both the moments and an adjustable-quality (increasing quality increases volume) spectral representation of distribution functions. Traditional onboard data storage and downlink bandwidth constraints severely limit PPS system functionality and drive cost, which, as a consequence, drives a limited data collection and lower angular energy and time resolution. This prototypical system exploit, using high-performance processing technology at GSFC (Goddard Space Flight Center), uses a SpaceCube and/or Maestro-type platform for processing. These processing platforms are currently being used on the International Space Station as a technology demonstration, and work is currently ongoing in a new onboard computation system for the Earth Science missions, but they have never been implemented in heliospheric science or solar system exploration missions. Preliminary analysis confirms that the targeted processor platforms possess the processing resources required for realtime application of these algorithms to the spectrometer data. SpaceCube platforms demonstrate that the target architecture possesses the sort of compact, low-mass/power, radiation-tolerant characteristics needed for flight. These high-performing hybrid systems embed unprecedented amounts of onboard processing power in the CPU (central processing unit), FPGAs (field programmable gate arrays), and DSP (digital signal processing) elements. The fundamental computational algorithm de constructs 3D velocity distributions in terms of spherical harmonic spectral coefficients (which are analogous to a Fourier sine-cosine decomposition), but uses instead spherical harmonics Legendre polynomial orthogonal functions as a basis for the expansion, portraying each 2D angular distribution at every energy or, geometrically, spherical speed-shell swept by the particle spectrometer. Optionally, these spherical harmonic spectral coefficients may be telemetered to the ground. These will provide a smoothed description of the velocity distribution function whose quality will depend on the number of coefficients determined. Successfully implemented on the GSFC-developed processor, the capability to integrate the proposed methodology with both heritage and anticipated future plasma particle spectrometer designs is demonstrated (with sufficiently detailed design analysis to advance TRL) to show specific science relevancy with future HSD (Heliophysics Science Division) solar-interplanetary, planetary missions, sounding rockets and/or CubeSat missions.
Development of a Novel, Two-Processor Architecture for a Small UAV Autopilot System,
2006-07-26
is, and the control laws the user implements to control it. The flight control system board will contain the processor selected for this system...Unit (IMU). The IMU contains solid-state gyros and accelerometers and uses these to determine the attitude of the UAV within the three dimensions of...multiple-UAV swarming for combat support operations. The mission processor board will contain the processor selected to execute the mission
1990-08-01
LCTVs) ..................... 17 2.14 JOINT FOURIER TRANSFORM PROCESSOR .................. 18 2.15 HOLOGRAPHIC ASSOCIATIVE MEMORY USING A MICRO ...RADC-TR-90-256 Final Technical Report August1990 AD-A227 163 HYBRID OPTICAL PROCESSOR Dove Electronics, Inc. J.F. Dove, F.T .S. Yu, C. Eldering...ANM SUSUE & FUNDING NUMBERS C - F19628-87-C-0086 HYBRID OPTICAL PROCESSOR PE - 61102F PR - 2305 &AUThNOA TA - J7 J.F. Dove, F.T.S. Yu, C. Eldering WU
Communications Processor Operating System Study. Executive Summary,
1980-11-01
AD-A095 b36 ROME AIR DEVELOPMENT CENTER GRIFFISS AFB NY F/e 17/2 COMMUNICATIONS PROCESSOR OPERATING SYSTEM STUDY. EXECUTIVE SUMM—ETC(U) NOV 80 J...COMMUNICATIONS PROCESSOR OPERATING SYSTEM STUDY Julian Gitlih SPTIC ELECTE«^ FEfi 2 6 1981^ - E APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED "a O...Subtitle) EXECUTIVE^SUMMARY 0F> COMMUNICATIONS PROCESSOR OPERATING SYSTEM $t - • >X W tdLl - ’•• • 7 AUTHORf«! ! , Julian
Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM
2009-03-17
A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.
NASA Technical Reports Server (NTRS)
Chang, Chen J. (Inventor); Liaghati, Jr., Amir L. (Inventor); Liaghati, Mahsa L. (Inventor)
2018-01-01
Methods and apparatus are provided for telemetry processing using a telemetry processor. The telemetry processor can include a plurality of communications interfaces, a computer processor, and data storage. The telemetry processor can buffer sensor data by: receiving a frame of sensor data using a first communications interface and clock data using a second communications interface, receiving an end of frame signal using a third communications interface, and storing the received frame of sensor data in the data storage. After buffering the sensor data, the telemetry processor can generate an encapsulated data packet including a single encapsulated data packet header, the buffered sensor data, and identifiers identifying telemetry devices that provided the sensor data. A format of the encapsulated data packet can comply with a Consultative Committee for Space Data Systems (CCSDS) standard. The telemetry processor can send the encapsulated data packet using a fourth and a fifth communications interfaces.
Image processing for a tactile/vision substitution system using digital CNN.
Lin, Chien-Nan; Yu, Sung-Nien; Hu, Jin-Cheng
2006-01-01
In view of the parallel processing and easy implementation properties of CNN, we propose to use digital CNN as the image processor of a tactile/vision substitution system (TVSS). The digital CNN processor is used to execute the wavelet down-sampling filtering and the half-toning operations, aiming to extract important features from the images. A template combination method is used to embed the two image processing functions into a single CNN processor. The digital CNN processor is implemented on an intellectual property (IP) and is implemented on a XILINX VIRTEX II 2000 FPGA board. Experiments are designated to test the capability of the CNN processor in the recognition of characters and human subjects in different environments. The experiments demonstrates impressive results, which proves the proposed digital CNN processor a powerful component in the design of efficient tactile/vision substitution systems for the visually impaired people.
Life sciences flight experiments microcomputer
NASA Technical Reports Server (NTRS)
Bartram, Peter N.
1987-01-01
A promising microcomputer configuration for the Spacelab Life Sciences Lab. Equipment inventory consists of multiple processors. One processor's use is reserved, with additional processors dedicated to real time input and output operations. A simple form of such a configuration, with a processor board for analog to digital conversion and another processor board for digital to analog conversion, was studied. The system used digital parallel data lines between the boards, operating independently of the system bus. Good performance of individual components was demonstrated: the analog to digital converter was at over 10,000 samples per second. The combination of the data transfer between boards with the input or output functions on each board slowed performance, with a maximum throughput of 2800 to 2900 analog samples per second. Any of several techniques, such as use of the system bus for data transfer or the addition of direct memory access hardware to the processor boards, should give significantly improved performance.
Chen, Dong; Giampapa, Mark; Heidelberger, Philip; Ohmacht, Martin; Satterfield, David L; Steinmacher-Burow, Burkhard; Sugavanam, Krishnan
2013-05-21
A system and method for enhancing performance of a computer which includes a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program are executed by a processer. The processor processes instructions from the program. A wait state in the processor waits for receiving specified data. A thread in the processor has a pause state wherein the processor waits for specified data. A pin in the processor initiates a return to an active state from the pause state for the thread. A logic circuit is external to the processor, and the logic circuit is configured to detect a specified condition. The pin initiates a return to the active state of the thread when the specified condition is detected using the logic circuit.
Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
A word processor optimized for preparing journal articles and student papers.
Wolach, A H; McHale, M A
2001-11-01
A new Windows-based word processor for preparing journal articles and student papers is described. In addition to standard features found in word processors, the present word processor provides specific help in preparing manuscripts. Clicking on "Reference Help (APA Form)" in the "File" menu provides a detailed help system for entering the references in a journal article. Clicking on "Examples and Explanations of APA Form" provides a help system with examples of the various sections of a review article, journal article that has one experiment, or journal article that has two or more experiments. The word processor can automatically place the manuscript page header and page number at the top of each page using the form required by APA and Psychonomic Society journals. The "APA Form" submenu of the "Help" menu provides detailed information about how the word processor is optimized for preparing articles and papers.
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer
NASA Technical Reports Server (NTRS)
Goldberg, J.; Kautz, W. H.; Melliar-Smith, P. M.; Green, M. W.; Levitt, K. N.; Schwartz, R. L.; Weinstock, C. B.
1984-01-01
SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness.
Acoustooptic linear algebra processors - Architectures, algorithms, and applications
NASA Technical Reports Server (NTRS)
Casasent, D.
1984-01-01
Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Health benefits of particle filtration.
Fisk, W J
2013-10-01
The evidence of health benefits of particle filtration in homes and commercial buildings is reviewed. Prior reviews of papers published before 2000 are summarized. The results of 16 more recent intervention studies are compiled and analyzed. Also, reviewed are four studies that modeled health benefits of using filtration to reduce indoor exposures to particles from outdoors. Prior reviews generally concluded that particle filtration is, at best, a source of small improvements in allergy and asthma health effects; however, many early studies had weak designs. A majority of recent intervention studies employed strong designs and more of these studies report statistically significant improvements in health symptoms or objective health outcomes, particularly for subjects with allergies or asthma. The percentage improvement in health outcomes is typically modest, for example, 7% to 25%. Delivery of filtered air to the breathing zone of sleeping allergic or asthmatic persons may be more consistently effective in improving health than room air filtration. Notable are two studies that report statistically significant improvements, with filtration, in markers that predict future adverse coronary events. From modeling, the largest potential benefits of indoor particle filtration may be reductions in morbidity and mortality from reducing indoor exposures to particles from outdoor air. Published 2013. This article is a US Government work and is in the public domain in the USA.
Faber, Vance; Moore, James W.
1992-01-01
A network of interconnected processors is formed from a vertex symmetric graph selected from graphs .GAMMA..sub.d (k) with degree d, diameter k, and (d+1)!/(d-k+1)! processors for each d.gtoreq.k and .GAMMA..sub.d (k,-1) with degree 3-1, diameter k+1, and (d+1)!/(d-k+1)! processors for each d.gtoreq.k.gtoreq.4. Each processor has an address formed by one of the permutations from a predetermined sequence of letters chosen a selected number of letters at a time, and an extended address formed by appending to the address the remaining ones of the predetermined sequence of letters. A plurality of transmission channels is provided from each of the processors, where each processor has one less channel than the selected number of letters forming the sequence. Where a network .GAMMA..sub.d (k,-1) is provided, no processor has a channel connected to form an edge in a direction .delta..sub.1. Each of the channels has an identification number selected from the sequence of letters and connected from a first processor having a first extended address to a second processor having a second address formed from a second extended address defined by moving to the front of the first extended address the letter found in the position within the first extended address defined by the channel identification number. The second address is then formed by selecting the first elements of the second extended address corresponding to the selected number used to form the address permutations.
Detection of alpha radiation in a beta radiation field
Mohagheghi, Amir H.; Reese, Robert P.
2001-01-01
An apparatus and method for detecting alpha particles in the presence of high activities of beta particles utilizing an alpha spectrometer. The apparatus of the present invention utilizes a magnetic field applied around the sample in an alpha spectrometer to deflect the beta particles from the sample prior to reaching the detector, thus permitting detection of low concentrations of alpha particles. In the method of the invention, the strength of magnetic field required to adequately deflect the beta particles and permit alpha particle detection is given by an algorithm that controls the field strength as a function of sample beta energy and the distance of the sample to the detector.
Code of Federal Regulations, 2011 CFR
2011-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE POPCORN PROMOTION, RESEARCH, AND CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14 Processor. Processor means a person engaged in the preparation of unpopped popcorn for the market who owns...
Code of Federal Regulations, 2010 CFR
2010-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE POPCORN PROMOTION, RESEARCH, AND CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14 Processor. Processor means a person engaged in the preparation of unpopped popcorn for the market who owns...
Code of Federal Regulations, 2014 CFR
2014-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE POPCORN PROMOTION, RESEARCH, AND CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14 Processor. Processor means a person engaged in the preparation of unpopped popcorn for the market who owns...
Code of Federal Regulations, 2013 CFR
2013-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE POPCORN PROMOTION, RESEARCH, AND CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14 Processor. Processor means a person engaged in the preparation of unpopped popcorn for the market who owns...
Code of Federal Regulations, 2012 CFR
2012-01-01
... AND ORDERS; MISCELLANEOUS COMMODITIES), DEPARTMENT OF AGRICULTURE POPCORN PROMOTION, RESEARCH, AND CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14 Processor. Processor means a person engaged in the preparation of unpopped popcorn for the market who owns...
Shuttle orbiter S-band payload communications equipment design evaluation
NASA Technical Reports Server (NTRS)
Springett, J. C.; Maronde, R. G.
1979-01-01
The analysis of the design, and the performance assessment of the Orbiter S-band communication equipment are reported. The equipment considered include: network transponder, network signal processor, FM transmitter, FM signal processor, payload interrogator, and payload signal processor.
A radiative transfer scheme that considers absorption, scattering, and distribution of light-absorbing elemental carbon (EC) particles collected on a quartz-fiber filter was developed to explain simultaneous filter reflectance and transmittance observations prior to and during...
Concept of a programmable maintenance processor applicable to multiprocessing systems
NASA Technical Reports Server (NTRS)
Glover, Richard D.
1988-01-01
A programmable maintenance processor concept applicable to multiprocessing systems has been developed at the NASA Ames Research Center's Dryden Flight Research Facility. This stand-alone-processor is intended to provide support for system and application software testing as well as hardware diagnostics. An initial machanization has been incorporated into the extended aircraft interrogation and display system (XAIDS) which is multiprocessing general-purpose ground support equipment. The XAIDS maintenance processor has independent terminal and printer interfaces and a dedicated magnetic bubble memory that stores system test sequences entered from the terminal. This report describes the hardware and software embodied in this processor and shows a typical application in the check-out of a new XAIDS.
Watchdog activity monitor (WAM) for use wth high coverage processor self-test
NASA Technical Reports Server (NTRS)
Tulpule, Bhalchandra R. (Inventor); Crosset, III, Richard W. (Inventor); Versailles, Richard E. (Inventor)
1988-01-01
A high fault coverage, instruction modeled self-test for a signal processor in a user environment is disclosed. The self-test executes a sequence of sub-tests and issues a state transition signal upon the execution of each sub-test. The self-test may be combined with a watchdog activity monitor (WAM) which provides a test-failure signal in the presence of a counted number of state transitions not agreeing with an expected number. An independent measure of time may be provided in the WAM to increase fault coverage by checking the processor's clock. Additionally, redundant processor systems are protected from inadvertent unsevering of a severed processor using a unique unsever arming technique and apparatus.
Reduced power processor requirements for the 30-cm diameter HG ion thruster
NASA Technical Reports Server (NTRS)
Rawlin, V. K.
1979-01-01
The characteristics of power processors strongly impact the overall performance and cost of electric propulsion systems. A program was initiated to evaluate simplifications of the thruster-power processor interface requirements. The power processor requirements are mission dependent with major differences arising for those missions which require a nearly constant thruster operating point (typical of geocentric and some inbound planetary missions) and those requiring operation over a large range of input power (such as outbound planetary missions). This paper describes the results of tests which have indicated that as many as seven of the twelve power supplies may be eliminated from the present Functional Model Power Processor used with 30-cm diameter Hg ion thrusters.
Optical backplane interconnect switch for data processors and computers
NASA Technical Reports Server (NTRS)
Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.
1989-01-01
An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.
NASA Astrophysics Data System (ADS)
Blok, A. S.; Bukhenskii, A. F.; Krupitskii, É. I.; Morozov, S. V.; Pelevin, V. Yu; Sergeenko, T. N.; Yakovlev, V. I.
1995-10-01
An investigation is reported of acousto-optical and fibre-optic Fourier processors of electric signals, based on semiconductor lasers. A description is given of practical acousto-optical processors with an analysis band 120 MHz wide, a resolution of 200 kHz, and 7 cm × 8 cm × 18 cm dimensions. Fibre-optic Fourier processors are considered: they represent a new class of devices which are promising for the processing of gigahertz signals.
Software-type Wave-Particle Interaction Analyzer on board the ARASE satellite
NASA Astrophysics Data System (ADS)
Katoh, Y.; Kojima, H.; Hikishima, M.; Takashima, T.; Asamura, K.; Miyoshi, Y.; Kasahara, Y.; Kasahara, S.; Mitani, T.; Higashio, N.; Matsuoka, A.; Ozaki, M.; Yagitani, S.; Yokota, S.; Matsuda, S.; Kitahara, M.; Shinohara, I.
2017-12-01
Wave-Particle Interaction Analyzer (WPIA) is a new type of instrumentation recently proposed by Fukuhara et al. (2009) for direct and quantitative measurements of wave-particle interactions. WPIA computes an inner product W(ti) = qE(ti)·vi, where ti is the detection timing of the i-th particle, E(ti) is the wave electric field vector at ti, and q and vi is the charge and the velocity vector of the i-th particle, respectively. Since W(ti) is the gain or the loss of the kinetic energy of the i-th particle, by accumulating W for detected particles, we obtain the net amount of the energy exchange in the region of interest. Software-type WPIA (S-WPIA) is installed in the ARASE satellite as a software function running on the mission data processor. S-WPIA on board the ARASE satellite uses electromagnetic field waveform measured by Waveform Capture (WFC) of Plasma Wave Experiment (PWE) and velocity vectors detected by Medium-Energy Particle Experiments - Electron Analyzer (MEP-e), High-Energy Electron Experiments (HEP), and Extremely High-Energy Electron Experiment (XEP). The prime target of S-WPIA is the measurement of the energy exchange between whistler-mode chorus emissions and energetic electrons in the inner magnetosphere. It is essential for S-WPIA to synchronize instruments in the time resolution better than the time scale of wave-particle interactions. Since the typical frequency of chorus emissions is a few kHz in the inner magnetosphere, the time resolution better than 10 micro-sec should be realized so as to measure the relative phase angle between wave and velocity vectors with the accuracy enough to detect the sign of W correctly. In the ARASE satellite, a dedicated system has been developed in order to realize the required time resolution for the inter-instruments communications. In this presentation, we show the principle of the WPIA and its significance as well as the implementation of S-WPIA on the ARASE satellite.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fisk, William J.
The evidence of health benefits of particle filtration in homes and commercial buildings is reviewed. Prior reviews of papers published before 2000 are summarized. The results of 16 more recent intervention studies are compiled and analyzed. Also, reviewed are four studies that modeled health benefits of using filtration to reduce indoor exposures to particles from outdoors. Prior reviews generally concluded that particle filtration is, at best, a source of small improvements in allergy and asthma health effects; however, many early studies had weak designs. A majority of recent intervention studies employed strong designs and more of these studies report statisticallymore » significant improvements in health symptoms or objective health outcomes, particularly for subjects with allergies or asthma. The percent age improvement in health outcomes is typically modest, for example, 7percent to 25percent. Delivery of filtered air to the breathing zone of sleeping allergic or asthmatic persons may be more consistently effective in improving health than room air filtration. Notable are two studies that report statistically significant improvements, with filtration, in markers that predict future adverse coronary events. From modeling, the largest potential benefits of indoor particle filtration may be reductions in morbidity and mortality from reducing indoor exposures to particles from outdoor air.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fisk, William J.
The evidence of health benefits of particle filtration in homes and commercial buildings is reviewed. Prior reviews of papers published before 2000 are summarized. The results of 16 more recent intervention studies are compiled and analyzed. Also reviewed are four studies that modeled health benefits of using filtration to reduce indoor exposures to particles from outdoors. Prior reviews generally concluded that particle filtration is, at best, a source of small improvements in allergy and asthma health effects; however, many early studies had weak designs. A majority of recent intervention studies employed strong designs and more of these studies report statisticallymore » significant improvements in health symptoms or objective health outcomes, particularly for subjects with allergies or asthma. The percentage improvement in health outcomes is typically modest, e.g., 7percent to 25percent. Delivery of filtered air to the breathing zone of sleeping allergic or asthmatic persons may be more consistently effective in improving health than room air filtration. Notable are two studies that report statistically significant improvements, with filtration, in markers that predict future adverse coronary events. From modeling, the largest potential benefits of indoor particle filtration may be reductions in morbidity and mortality from reducing indoor exposures to particles from outdoor air.« less
Accelerated Application Development: The ORNL Titan Experience
Joubert, Wayne; Archibald, Richard K.; Berrill, Mark A.; ...
2015-05-09
The use of computational accelerators such as NVIDIA GPUs and Intel Xeon Phi processors is now widespread in the high performance computing community, with many applications delivering impressive performance gains. However, programming these systems for high performance, performance portability and software maintainability has been a challenge. In this paper we discuss experiences porting applications to the Titan system. Titan, which began planning in 2009 and was deployed for general use in 2013, was the first multi-petaflop system based on accelerator hardware. To ready applications for accelerated computing, a preparedness effort was undertaken prior to delivery of Titan. In this papermore » we report experiences and lessons learned from this process and describe how users are currently making use of computational accelerators on Titan.« less
Accelerated application development: The ORNL Titan experience
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joubert, Wayne; Archibald, Rick; Berrill, Mark
2015-08-01
The use of computational accelerators such as NVIDIA GPUs and Intel Xeon Phi processors is now widespread in the high performance computing community, with many applications delivering impressive performance gains. However, programming these systems for high performance, performance portability and software maintainability has been a challenge. In this paper we discuss experiences porting applications to the Titan system. Titan, which began planning in 2009 and was deployed for general use in 2013, was the first multi-petaflop system based on accelerator hardware. To ready applications for accelerated computing, a preparedness effort was undertaken prior to delivery of Titan. In this papermore » we report experiences and lessons learned from this process and describe how users are currently making use of computational accelerators on Titan.« less
Real-Time Visualization of Tissue Ischemia
NASA Technical Reports Server (NTRS)
Bearman, Gregory H. (Inventor); Chrien, Thomas D. (Inventor); Eastwood, Michael L. (Inventor)
2000-01-01
A real-time display of tissue ischemia which comprises three CCD video cameras, each with a narrow bandwidth filter at the correct wavelength is discussed. The cameras simultaneously view an area of tissue suspected of having ischemic areas through beamsplitters. The output from each camera is adjusted to give the correct signal intensity for combining with, the others into an image for display. If necessary a digital signal processor (DSP) can implement algorithms for image enhancement prior to display. Current DSP engines are fast enough to give real-time display. Measurement at three, wavelengths, combined into a real-time Red-Green-Blue (RGB) video display with a digital signal processing (DSP) board to implement image algorithms, provides direct visualization of ischemic areas.
Data processing for the DMSP microwave radiometer system
NASA Technical Reports Server (NTRS)
Rigone, J. L.; Stogryn, A. P.
1977-01-01
A software program was developed and tested to process microwave radiometry data to be acquired by the microwave sensor (SSM/T) on the Defense Meteorological Satellite Program spacecraft. The SSM/T 7-channel microwave radiometer and systems data will be data-linked to Air Force Global Weather Central (AFGWC) where they will be merged with ephemeris data prior to product processing for use in the AFGWC upper air data base (UADB). The overall system utilizes an integrated design to provide atmospheric temperature soundings for global applications. The fully automated processing at AFGWC was accomplished by four related computer processor programs to produce compatible UADB soundings, evaluate system performance, and update the a priori developed inversion matrices. Tests with simulated data produced results significantly better than climatology.
Post-prior equivalence for transfer reactions with complex potentials
NASA Astrophysics Data System (ADS)
Lei, Jin; Moro, Antonio M.
2018-01-01
In this paper, we address the problem of the post-prior equivalence in the calculation of inclusive breakup and transfer cross sections. For that, we employ the model proposed by Ichimura et al. [Phys. Rev. C 32, 431 (1985), 10.1103/PhysRevC.32.431], conveniently generalized to include the part of the cross section corresponding the transfer to bound states. We pay particular attention to the case in which the unobserved particle is left in a bound state of the residual nucleus, in which case the theory prescribes the use of a complex potential, responsible for the spreading width of the populated single-particle states. We see that the introduction of this complex potential gives rise to an additional term in the prior cross-section formula, not present in the usual case of real binding potentials. The equivalence is numerically tested for the 58Ni(d ,p X ) reaction.
A Survey of Plasmas and Their Applications
NASA Technical Reports Server (NTRS)
Eastman, Timothy E.; Grabbe, C. (Editor)
2006-01-01
Plasmas are everywhere and relevant to everyone. We bath in a sea of photons, quanta of electromagnetic radiation, whose sources (natural and artificial) are dominantly plasma-based (stars, fluorescent lights, arc lamps.. .). Plasma surface modification and materials processing contribute increasingly to a wide array of modern artifacts; e.g., tiny plasma discharge elements constitute the pixel arrays of plasma televisions and plasma processing provides roughly one-third of the steps to produce semiconductors, essential elements of our networking and computing infrastructure. Finally, plasmas are central to many cutting edge technologies with high potential (compact high-energy particle accelerators; plasma-enhanced waste processors; high tolerance surface preparation and multifuel preprocessors for transportation systems; fusion for energy production).
Implementing wavelet inverse-transform processor with surface acoustic wave device.
Lu, Wenke; Zhu, Changchun; Liu, Qinghong; Zhang, Jingduan
2013-02-01
The objective of this research was to investigate the implementation schemes of the wavelet inverse-transform processor using surface acoustic wave (SAW) device, the length function of defining the electrodes, and the possibility of solving the load resistance and the internal resistance for the wavelet inverse-transform processor using SAW device. In this paper, we investigate the implementation schemes of the wavelet inverse-transform processor using SAW device. In the implementation scheme that the input interdigital transducer (IDT) and output IDT stand in a line, because the electrode-overlap envelope of the input IDT is identical with the one of the output IDT (i.e. the two transducers are identical), the product of the input IDT's frequency response and the output IDT's frequency response can be implemented, so that the wavelet inverse-transform processor can be fabricated. X-112(0)Y LiTaO(3) is used as a substrate material to fabricate the wavelet inverse-transform processor. The size of the wavelet inverse-transform processor using this implementation scheme is small, so its cost is low. First, according to the envelope function of the wavelet function, the length function of the electrodes is defined, then, the lengths of the electrodes can be calculated from the length function of the electrodes, finally, the input IDT and output IDT can be designed according to the lengths and widths for the electrodes. In this paper, we also present the load resistance and the internal resistance as the two problems of the wavelet inverse-transform processor using SAW devices. The solutions to these problems are achieved in this study. When the amplifiers are subjected to the input end and output end for the wavelet inverse-transform processor, they can eliminate the influence of the load resistance and the internal resistance on the output voltage of the wavelet inverse-transform processor using SAW device. Copyright © 2012 Elsevier B.V. All rights reserved.
Instrumentation System Diagnoses a Thermocouple
NASA Technical Reports Server (NTRS)
Perotti, Jose; Santiago, Josephine; Mata, Carlos; Vokrot, Peter; Zavala, Carlos; Burns, Bradley
2008-01-01
An improved self-validating thermocouple (SVT) instrumentation system not only acquires readings from a thermocouple but is also capable of detecting deterioration and a variety of discrete faults in the thermocouple and its lead wires. Prime examples of detectable discrete faults and deterioration include open- and short-circuit conditions and debonding of the thermocouple junction from the object, the temperature of which one seeks to measure. Debonding is the most common cause of errors in thermocouple measurements, but most prior SVT instrumentation systems have not been capable of detecting debonding. The improved SVT instrumentation system includes power circuitry, a cold-junction compensator, signal-conditioning circuitry, pulse-width-modulation (PWM) thermocouple-excitation circuitry, an analog-to-digital converter (ADC), a digital data processor, and a universal serial bus (USB) interface. The system can operate in any of the following three modes: temperature measurement, thermocouple validation, and bonding/debonding detection. The software running in the processor includes components that implement statistical algorithms to evaluate the state of the thermocouple and the instrumentation system. When the power is first turned on, the user can elect to start a diagnosis/ monitoring sequence, in which the PWM is used to estimate the characteristic times corresponding to the correct configuration. The user also has the option of using previous diagnostic values, which are stored in an electrically erasable, programmable read-only memory so that they are available every time the power is turned on.
An 81.6 μW FastICA processor for epileptic seizure detection.
Yang, Chia-Hsiang; Shih, Yi-Hsin; Chiueh, Herming
2015-02-01
To improve the performance of epileptic seizure detection, independent component analysis (ICA) is applied to multi-channel signals to separate artifacts and signals of interest. FastICA is an efficient algorithm to compute ICA. To reduce the energy dissipation, eigenvalue decomposition (EVD) is utilized in the preprocessing stage to reduce the convergence time of iterative calculation of ICA components. EVD is computed efficiently through an array structure of processing elements running in parallel. Area-efficient EVD architecture is realized by leveraging the approximate Jacobi algorithm, leading to a 77.2% area reduction. By choosing proper memory element and reduced wordlength, the power and area of storage memory are reduced by 95.6% and 51.7%, respectively. The chip area is minimized through fixed-point implementation and architectural transformations. Given a latency constraint of 0.1 s, an 86.5% area reduction is achieved compared to the direct-mapped architecture. Fabricated in 90 nm CMOS, the core area of the chip is 0.40 mm(2). The FastICA processor, part of an integrated epileptic control SoC, dissipates 81.6 μW at 0.32 V. The computation delay of a frame of 256 samples for 8 channels is 84.2 ms. Compared to prior work, 0.5% power dissipation, 26.7% silicon area, and 3.4 × computation speedup are achieved. The performance of the chip was verified by human dataset.
Rapid Damage Assessment. Volume II. Development and Testing of Rapid Damage Assessment System.
1981-02-01
pixels/s Camera Line Rate 732.4 lines/s Pixels per Line 1728 video 314 blank 4 line number (binary) 2 run number (BCD) 2048 total Pixel Resolution 8 bits...sists of an LSI-ll microprocessor, a VDI -200 video display processor, an FD-2 dual floppy diskette subsystem, an FT-I function key-trackball module...COMPONENT LIST FOR IMAGE PROCESSOR SYSTEM IMAGE PROCESSOR SYSTEM VIEWS I VDI -200 Display Processor Racks, Table FD-2 Dual Floppy Diskette Subsystem FT-l
Master/Programmable-Slave Computer
NASA Technical Reports Server (NTRS)
Smaistrla, David; Hall, William A.
1990-01-01
Unique modular computer features compactness, low power, mass storage of data, multiprocessing, and choice of various input/output modes. Master processor communicates with user via usual keyboard and video display terminal. Coordinates operations of as many as 24 slave processors, each dedicated to different experiment. Each slave circuit card includes slave microprocessor and assortment of input/output circuits for communication with external equipment, with master processor, and with other slave processors. Adaptable to industrial process control with selectable degrees of automatic control, automatic and/or manual monitoring, and manual intervention.
System Level RBDO for Military Ground Vehicles using High Performance Computing
2008-01-01
platform. Only the analyses that required more than 24 processors were conducted on the Onyx 350 due to the limited number of processors on the...optimization constraints varied. The queues set the number of processors and number of finite element code licenses available to the analyses. sgi ONYX ...3900: unix 24 MIPS R16000 PROCESSORS 4 IR2 GRAPHICS PIPES 4 IR3 GRAPHICS PIPES 24 GBYTES MEMORY 36 GBYTES LOCAL DISK SPACE sgi ONYX 350: unix 32 MIPS
A data base processor semantics specification package
NASA Technical Reports Server (NTRS)
Fishwick, P. A.
1983-01-01
A Semantics Specification Package (DBPSSP) for the Intel Data Base Processor (DBP) is defined. DBPSSP serves as a collection of cross assembly tools that allow the analyst to assemble request blocks on the host computer for passage to the DBP. The assembly tools discussed in this report may be effectively used in conjunction with a DBP compatible data communications protocol to form a query processor, precompiler, or file management system for the database processor. The source modules representing the components of DBPSSP are fully commented and included.
Experience in highly parallel processing using DAP
NASA Technical Reports Server (NTRS)
Parkinson, D.
1987-01-01
Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.
NASA Astrophysics Data System (ADS)
Hackl, Jason F.
The relative dispersion of one uid particle with respect to another is fundamentally related to the transport and mixing of contaminant species in turbulent flows. The most basic consequence of Kolmogorov's 1941 similarity hypotheses for relative dispersion, the Richardson-Obukhov law that mean-square pair separation distance
Faber, V.; Moore, J.W.
1988-06-20
A network of interconnected processors is formed from a vertex symmetric graph selected from graphs GAMMA/sub d/(k) with degree d, diameter k, and (d + 1)exclamation/ (d /minus/ k + 1)exclamation processors for each d greater than or equal to k and GAMMA/sub d/(k, /minus/1) with degree d /minus/ 1, diameter k + 1, and (d + 1)exclamation/(d /minus/ k + 1)exclamation processors for each d greater than or equal to k greater than or equal to 4. Each processor has an address formed by one of the permutations from a predetermined sequence of letters chosen a selected number of letters at a time, and an extended address formed by appending to the address the remaining ones of the predetermined sequence of letters. A plurality of transmission channels is provided from each of the processors, where each processor has one less channel than the selected number of letters forming the sequence. Where a network GAMMA/sub d/(k, /minus/1) is provided, no processor has a channel connected to form an edge in a direction delta/sub 1/. Each of the channels has an identification number selected from the sequence of letters and connected from a first processor having a first extended address to a second processor having a second address formed from a second extended address defined by moving to the front of the first extended address the letter found in the position within the first extended address defined by the channel identification number. The second address is then formed by selecting the first elements of the second extended address corresponding to the selected number used to form the address permutations. 9 figs.
Ultra-Reliable Digital Avionics (URDA) processor
NASA Astrophysics Data System (ADS)
Branstetter, Reagan; Ruszczyk, William; Miville, Frank
1994-10-01
Texas Instruments Incorporated (TI) developed the URDA processor design under contract with the U.S. Air Force Wright Laboratory and the U.S. Army Night Vision and Electro-Sensors Directorate. TI's approach couples advanced packaging solutions with advanced integrated circuit (IC) technology to provide a high-performance (200 MIPS/800 MFLOPS) modular avionics processor module for a wide range of avionics applications. TI's processor design integrates two Ada-programmable, URDA basic processor modules (BPM's) with a JIAWG-compatible PiBus and TMBus on a single F-22 common integrated processor-compatible form-factor SEM-E avionics card. A separate, high-speed (25-MWord/second 32-bit word) input/output bus is provided for sensor data. Each BPM provides a peak throughput of 100 MIPS scalar concurrent with 400-MFLOPS vector processing in a removable multichip module (MCM) mounted to a liquid-flowthrough (LFT) core and interfacing to a processor interface module printed wiring board (PWB). Commercial RISC technology coupled with TI's advanced bipolar complementary metal oxide semiconductor (BiCMOS) application specific integrated circuit (ASIC) and silicon-on-silicon packaging technologies are used to achieve the high performance in a miniaturized package. A Mips R4000-family reduced instruction set computer (RISC) processor and a TI 100-MHz BiCMOS vector coprocessor (VCP) ASIC provide, respectively, the 100 MIPS of a scalar processor throughput and 400 MFLOPS of vector processing throughput for each BPM. The TI Aladdim ASIC chipset was developed on the TI Aladdin Program under contract with the U.S. Army Communications and Electronics Command and was sponsored by the Advanced Research Projects Agency with technical direction from the U.S. Army Night Vision and Electro-Sensors Directorate.
System support software for the Space Ultrareliable Modular Computer (SUMC)
NASA Technical Reports Server (NTRS)
Hill, T. E.; Hintze, G. C.; Hodges, B. C.; Austin, F. A.; Buckles, B. P.; Curran, R. T.; Lackey, J. D.; Payne, R. E.
1974-01-01
The highly transportable programming system designed and implemented to support the development of software for the Space Ultrareliable Modular Computer (SUMC) is described. The SUMC system support software consists of program modules called processors. The initial set of processors consists of the supervisor, the general purpose assembler for SUMC instruction and microcode input, linkage editors, an instruction level simulator, a microcode grid print processor, and user oriented utility programs. A FORTRAN 4 compiler is undergoing development. The design facilitates the addition of new processors with a minimum effort and provides the user quasi host independence on the ground based operational software development computer. Additional capability is provided to accommodate variations in the SUMC architecture without consequent major modifications in the initial processors.
Electrical Prototype Power Processor for the 30-cm Mercury electric propulsion engine
NASA Technical Reports Server (NTRS)
Biess, J. J.; Frye, R. J.
1978-01-01
An Electrical Prototpye Power Processor has been designed to the latest electrical and performance requirements for a flight-type 30-cm ion engine and includes all the necessary power, command, telemetry and control interfaces for a typical electric propulsion subsystem. The power processor was configured into seven separate mechanical modules that would allow subassembly fabrication, test and integration into a complete power processor unit assembly. The conceptual mechanical packaging of the electrical prototype power processor unit demonstrated the relative location of power, high voltage and control electronic components to minimize electrical interactions and to provide adequate thermal control in a vacuum environment. Thermal control was accomplished with a heat pipe simulator attached to the base of the modules.
Method and system for selecting data sampling phase for self timed interface logic
Hoke, Joseph Michael; Ferraiolo, Frank D.; Lo, Tin-Chee; Yarolin, John Michael
2005-01-04
An exemplary embodiment of the present invention is a method for transmitting data among processors over a plurality of parallel data lines and a clock signal line. A receiver processor receives both data and a clock signal from a sender processor. At the receiver processor a bit of the data is phased aligned with the transmitted clock signal. The phase aligning includes selecting a data phase from a plurality of data phases in a delay chain and then adjusting the selected data phase to compensate for a round-off error. Additional embodiments include a system and storage medium for transmitting data among processors over a plurality of parallel data lines and a clock signal line.
The implementation and use of Ada on distributed systems with reliability requirements
NASA Technical Reports Server (NTRS)
Reynolds, P. F.; Knight, J. C.; Urquhart, J. I. A.
1983-01-01
The issues involved in the use of the programming language Ada on distributed systems are discussed. The effects of Ada programs on hardware failures such as loss of a processor are emphasized. It is shown that many Ada language elements are not well suited to this environment. Processor failure can easily lead to difficulties on those processors which remain. As an example, the calling task in a rendezvous may be suspended forever if the processor executing the serving task fails. A mechanism for detecting failure is proposed and changes to the Ada run time support system are suggested which avoid most of the difficulties. Ada program structures are defined which allow programs to reconfigure and continue to provide service following processor failure.
Lu, Wenke; Zhu, Changchun
2011-11-01
The objective of this research was to investigate the possibility of compensating for the insertion losses of the wavelet inverse-transform processors using SAW devices. The motivation for this work was prompted by the processors which are of large insertion losses. In this paper, the insertion losses are the key problem of the wavelet inverse-transform processors using SAW devices. A novel compensation method of the insertion losses is achieved in this study. When the output ends of the wavelet inverse-transform processors are respectively connected to the amplifiers, their insertion losses can be compensated for. The bandwidths of the amplifiers and their adjustment method are also given in this paper. © 2011 American Institute of Physics
An optical/digital processor - Hardware and applications
NASA Technical Reports Server (NTRS)
Casasent, D.; Sterling, W. M.
1975-01-01
A real-time two-dimensional hybrid processor consisting of a coherent optical system, an optical/digital interface, and a PDP-11/15 control minicomputer is described. The input electrical-to-optical transducer is an electron-beam addressed potassium dideuterium phosphate (KD2PO4) light valve. The requirements and hardware for the output optical-to-digital interface, which is constructed from modular computer building blocks, are presented. Initial experimental results demonstrating the operation of this hybrid processor in phased-array radar data processing, synthetic-aperture image correlation, and text correlation are included. The applications chosen emphasize the role of the interface in the analysis of data from an optical processor and possible extensions to the digital feedback control of an optical processor.
Curvature-Mediated Assembly of Janus Nanoparticles on Membrane Vesicles.
Bahrami, Amir Houshang; Weikl, Thomas R
2018-02-14
Besides direct particle-particle interactions, nanoparticles adsorbed to biomembranes experience indirect interactions that are mediated by the membrane curvature arising from particle adsorption. In this Letter, we show that the curvature-mediated interactions of adsorbed Janus particles depend on the initial curvature of the membrane prior to adsorption, that is, on whether the membrane initially bulges toward or away from the particles in our simulations. The curvature-mediated interaction can be strongly attractive for Janus particles adsorbed to the outside of a membrane vesicle, which initially bulges away from the particles. For Janus particles adsorbed to the vesicle inside, in contrast, the curvature-mediated interactions are repulsive. We find that the area fraction of the adhesive Janus particle surface is an important control parameter for the curvature-mediated interaction and assembly of the particles, besides the initial membrane curvature.
The Compact Environmental Anomaly Sensor (CEASE) III
NASA Astrophysics Data System (ADS)
Roddy, P.; Hilmer, R. V.; Ballenthin, J.; Lindstrom, C. D.; Barton, D. A.; Ignazio, J. M.; Coombs, J. M.; Johnston, W. R.; Wheelock, A. T.; Quigley, S.
2016-12-01
The Air Force Research Laboratory's Energetic Charged Particle (ECP) sensor project is a comprehensive effort to measure the charged particle environment that causes satellite anomalies. The project includes the Compact Environmental Anomaly Sensor (CEASE) III, building on the flight heritage of prior CEASE designs. CEASE III consists of multiple sensor modules. High energy particles are observed using independent unique silicon detector stacks. In addition CEASE III includes an electrostatic analyzer (ESA) assembly which uses charge multiplication for particle detection. The sensors cover a wide range of proton and electron energies that contribute to satellite anomalies.
Computer program documentation for the pasture/range condition assessment processor
NASA Technical Reports Server (NTRS)
Mcintyre, K. S.; Miller, T. G. (Principal Investigator)
1982-01-01
The processor which drives for the RANGE software allows the user to analyze LANDSAT data containing pasture and rangeland. Analysis includes mapping, generating statistics, calculating vegetative indexes, and plotting vegetative indexes. Routines for using the processor are given. A flow diagram is included.
Code of Federal Regulations, 2014 CFR
2014-01-01
... RECORDKEEPING REQUIREMENTS APPLICABLE TO CRANBERRIES NOT SUBJECT TO THE CRANBERRY MARKETING ORDER § 926.13 Processor. Processor means any person who receives or acquires fresh or frozen cranberries or cranberries in... uses such cranberries or concentrate, with or without other ingredients, in the production of a product...
Code of Federal Regulations, 2013 CFR
2013-01-01
... RECORDKEEPING REQUIREMENTS APPLICABLE TO CRANBERRIES NOT SUBJECT TO THE CRANBERRY MARKETING ORDER § 926.13 Processor. Processor means any person who receives or acquires fresh or frozen cranberries or cranberries in... uses such cranberries or concentrate, with or without other ingredients, in the production of a product...
Code of Federal Regulations, 2011 CFR
2011-01-01
... RECORDKEEPING REQUIREMENTS APPLICABLE TO CRANBERRIES NOT SUBJECT TO THE CRANBERRY MARKETING ORDER § 926.13 Processor. Processor means any person who receives or acquires fresh or frozen cranberries or cranberries in... uses such cranberries or concentrate, with or without other ingredients, in the production of a product...
Code of Federal Regulations, 2012 CFR
2012-01-01
... RECORDKEEPING REQUIREMENTS APPLICABLE TO CRANBERRIES NOT SUBJECT TO THE CRANBERRY MARKETING ORDER § 926.13 Processor. Processor means any person who receives or acquires fresh or frozen cranberries or cranberries in... uses such cranberries or concentrate, with or without other ingredients, in the production of a product...
A hierarchical, automated target recognition algorithm for a parallel analog processor
NASA Technical Reports Server (NTRS)
Woodward, Gail; Padgett, Curtis
1997-01-01
A hierarchical approach is described for an automated target recognition (ATR) system, VIGILANTE, that uses a massively parallel, analog processor (3DANN). The 3DANN processor is capable of performing 64 concurrent inner products of size 1x4096 every 250 nanoseconds.
Potential of minicomputer/array-processor system for nonlinear finite-element analysis
NASA Technical Reports Server (NTRS)
Strohkorb, G. A.; Noor, A. K.
1983-01-01
The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.
Design of RISC Processor Using VHDL and Cadence
NASA Astrophysics Data System (ADS)
Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram
The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.
Fault tolerant, radiation hard, high performance digital signal processor
NASA Technical Reports Server (NTRS)
Holmann, Edgar; Linscott, Ivan R.; Maurer, Michael J.; Tyler, G. L.; Libby, Vibeke
1990-01-01
An architecture has been developed for a high-performance VLSI digital signal processor that is highly reliable, fault-tolerant, and radiation-hard. The signal processor, part of a spacecraft receiver designed to support uplink radio science experiments at the outer planets, organizes the connections between redundant arithmetic resources, register files, and memory through a shuffle exchange communication network. The configuration of the network and the state of the processor resources are all under microprogram control, which both maps the resources according to algorithmic needs and reconfigures the processing should a failure occur. In addition, the microprogram is reloadable through the uplink to accommodate changes in the science objectives throughout the course of the mission. The processor will be implemented with silicon compiler tools, and its design will be verified through silicon compilation simulation at all levels from the resources to full functionality. By blending reconfiguration with redundancy the processor implementation is fault-tolerant and reliable, and possesses the long expected lifetime needed for a spacecraft mission to the outer planets.
Digital system for structural dynamics simulation
NASA Technical Reports Server (NTRS)
Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.
1982-01-01
State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.
Next Generation Space Telescope Integrated Science Module Data System
NASA Technical Reports Server (NTRS)
Schnurr, Richard G.; Greenhouse, Matthew A.; Jurotich, Matthew M.; Whitley, Raymond; Kalinowski, Keith J.; Love, Bruce W.; Travis, Jeffrey W.; Long, Knox S.
1999-01-01
The Data system for the Next Generation Space Telescope (NGST) Integrated Science Module (ISIM) is the primary data interface between the spacecraft, telescope, and science instrument systems. This poster includes block diagrams of the ISIM data system and its components derived during the pre-phase A Yardstick feasibility study. The poster details the hardware and software components used to acquire and process science data for the Yardstick instrument compliment, and depicts the baseline external interfaces to science instruments and other systems. This baseline data system is a fully redundant, high performance computing system. Each redundant computer contains three 150 MHz power PC processors. All processors execute a commercially available real time multi-tasking operating system supporting, preemptive multi-tasking, file management and network interfaces. These six processors in the system are networked together. The spacecraft interface baseline is an extension of the network, which links the six processors. The final selection for Processor busses, processor chips, network interfaces, and high-speed data interfaces will be made during mid 2002.
A universal computer control system for motors
NASA Technical Reports Server (NTRS)
Szakaly, Zoltan F. (Inventor)
1991-01-01
A control system for a multi-motor system such as a space telerobot, having a remote computational node and a local computational node interconnected with one another by a high speed data link is described. A Universal Computer Control System (UCCS) for the telerobot is located at each node. Each node is provided with a multibus computer system which is characterized by a plurality of processors with all processors being connected to a common bus, and including at least one command processor. The command processor communicates over the bus with a plurality of joint controller cards. A plurality of direct current torque motors, of the type used in telerobot joints and telerobot hand-held controllers, are connected to the controller cards and responds to digital control signals from the command processor. Essential motor operating parameters are sensed by analog sensing circuits and the sensed analog signals are converted to digital signals for storage at the controller cards where such signals can be read during an address read/write cycle of the command processing processor.
Green Secure Processors: Towards Power-Efficient Secure Processor Design
NASA Astrophysics Data System (ADS)
Chhabra, Siddhartha; Solihin, Yan
With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.
NASA Technical Reports Server (NTRS)
Berggren, Mark
2010-01-01
The Lunar Soil Particle Separator (LSPS) beneficiates soil prior to in situ resource utilization (ISRU). It can improve ISRU oxygen yield by boosting the concentration of ilmenite, or other iron-oxide-bearing materials found in lunar soils, which can substantially reduce hydrogen reduction reactor size, as well as drastically decreasing the power input required for soil heating
Assimilator Ensemble Post-processor (EnsPost) Hydrologic Model Output Statistics (HMOS) Ensemble Verification capabilities (see diagram below): the Ensemble Pre-processor, the Ensemble Post-processor, the Hydrologic Model (OpenDA, http://www.openda.org/joomla/index.php) to be used within the CHPS environment. Ensemble Post
40 CFR 747.195 - Triethanolamine salt of a substituted organic acid.
Code of Federal Regulations, 2010 CFR
2010-07-01
..., commerce, importer, impurity, Inventory, manufacturer, person, process, processor, and small quantities... control of the processor. (ii) Distribution in commerce is limited to purposes of export. (iii) The processor or distributor may not use the substance except in small quantities solely for research and...
7 CFR 1435.306 - Allocation of marketing allotments to processors.
Code of Federal Regulations, 2010 CFR
2010-01-01
...) COMMODITY CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.306 Allocation of marketing allotments to processors. (a) Each sugar beet processor's allocation, other than a new entrant's, of the beet allotment will be...
7 CFR 1435.306 - Allocation of marketing allotments to processors.
Code of Federal Regulations, 2011 CFR
2011-01-01
...) COMMODITY CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.306 Allocation of marketing allotments to processors. (a) Each sugar beet processor's allocation, other than a new entrant's, of the beet allotment will be...
7 CFR 1435.306 - Allocation of marketing allotments to processors.
Code of Federal Regulations, 2013 CFR
2013-01-01
...) COMMODITY CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.306 Allocation of marketing allotments to processors. (a) Each sugar beet processor's allocation, other than a new entrant's, of the beet allotment will be...
7 CFR 1435.306 - Allocation of marketing allotments to processors.
Code of Federal Regulations, 2014 CFR
2014-01-01
...) COMMODITY CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.306 Allocation of marketing allotments to processors. (a) Each sugar beet processor's allocation, other than a new entrant's, of the beet allotment will be...
7 CFR 1435.306 - Allocation of marketing allotments to processors.
Code of Federal Regulations, 2012 CFR
2012-01-01
...) COMMODITY CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.306 Allocation of marketing allotments to processors. (a) Each sugar beet processor's allocation, other than a new entrant's, of the beet allotment will be...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, X; Gao, H; Schuemann, J
2015-06-15
Purpose: The Monte Carlo (MC) method is a gold standard for dose calculation in radiotherapy. However, it is not a priori clear how many particles need to be simulated to achieve a given dose accuracy. Prior error estimate and stopping criterion are not well established for MC. This work aims to fill this gap. Methods: Due to the statistical nature of MC, our approach is based on one-sample t-test. We design the prior error estimate method based on the t-test, and then use this t-test based error estimate for developing a simulation stopping criterion. The three major components are asmore » follows.First, the source particles are randomized in energy, space and angle, so that the dose deposition from a particle to the voxel is independent and identically distributed (i.i.d.).Second, a sample under consideration in the t-test is the mean value of dose deposition to the voxel by sufficiently large number of source particles. Then according to central limit theorem, the sample as the mean value of i.i.d. variables is normally distributed with the expectation equal to the true deposited dose.Third, the t-test is performed with the null hypothesis that the difference between sample expectation (the same as true deposited dose) and on-the-fly calculated mean sample dose from MC is larger than a given error threshold, in addition to which users have the freedom to specify confidence probability and region of interest in the t-test based stopping criterion. Results: The method is validated for proton dose calculation. The difference between the MC Result based on the t-test prior error estimate and the statistical Result by repeating numerous MC simulations is within 1%. Conclusion: The t-test based prior error estimate and stopping criterion are developed for MC and validated for proton dose calculation. Xiang Hong and Hao Gao were partially supported by the NSFC (#11405105), the 973 Program (#2015CB856000) and the Shanghai Pujiang Talent Program (#14PJ1404500)« less
NASA Astrophysics Data System (ADS)
Ishimoto, Jun; Oh, U.; Tan, Daisuke
2012-10-01
A new type of ultra-high heat flux cooling system using the atomized spray of cryogenic micro-solid nitrogen (SN2) particles produced by a superadiabatic two-fluid nozzle was developed and numerically investigated for application to next generation super computer processor thermal management. The fundamental characteristics of heat transfer and cooling performance of micro-solid nitrogen particulate spray impinging on a heated substrate were numerically investigated and experimentally measured by a new type of integrated computational-experimental technique. The employed Computational Fluid Dynamics (CFD) analysis based on the Euler-Lagrange model is focused on the cryogenic spray behavior of atomized particulate micro-solid nitrogen and also on its ultra-high heat flux cooling characteristics. Based on the numerically predicted performance, a new type of cryogenic spray cooling technique for application to a ultra-high heat power density device was developed. In the present integrated computation, it is clarified that the cryogenic micro-solid spray cooling characteristics are affected by several factors of the heat transfer process of micro-solid spray which impinges on heated surface as well as by atomization behavior of micro-solid particles. When micro-SN2 spraying cooling was used, an ultra-high cooling heat flux level was achieved during operation, a better cooling performance than that with liquid nitrogen (LN2) spray cooling. As micro-SN2 cooling has the advantage of direct latent heat transport which avoids the film boiling state, the ultra-short time scale heat transfer in a thin boundary layer is more possible than in LN2 spray. The present numerical prediction of the micro-SN2 spray cooling heat flux profile can reasonably reproduce the measurement results of cooling wall heat flux profiles. The application of micro-solid spray as a refrigerant for next generation computer processors is anticipated, and its ultra-high heat flux technology is expected to result in an extensive improvement in the effective cooling performance of large scale supercomputer systems.
Advanced Multiple Processor Configuration Study. Final Report.
ERIC Educational Resources Information Center
Clymer, S. J.
This summary of a study on multiple processor configurations includes the objectives, background, approach, and results of research undertaken to provide the Air Force with a generalized model of computer processor combinations for use in the evaluation of proposed flight training simulator computational designs. An analysis of a real-time flight…
75 FR 52507 - Submission for OMB Review; Comment Request
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-26
... standards designed to ensure that all catch delivered to the processor is accurately weighed and accounted... NMFS for catcher/processors and motherships is based on the vessel meeting a series of design criteria. Because of the wide variations in factory layout for inshore processors, NMFS requires a performance-based...
PREMAQ: A NEW PRE-PROCESSOR TO CMAQ FOR AIR-QUALITY FORECASTING
A new pre-processor to CMAQ (PREMAQ) has been developed as part of the national air-quality forecasting system. PREMAQ combines the functionality of MCIP and parts of SMOKE in a single real-time processor. PREMAQ was specifically designed to link NCEP's Eta model with CMAQ, and...
50 CFR 679.30 - General CDQ regulations.
Code of Federal Regulations, 2010 CFR
2010-10-01
... description of the target fisheries, the types of vessels and processors that will be used, the locations and... vessels or processors fishing under contract with any CDQ group. Any vessel or processor harvesting or... nature of the work and the career advancement potential for each type of work. (iv) Community eligibility...
A Survey of Parallel Sorting Algorithms.
1981-12-01
see that, in this algorithm, each Processor i, for 1 itp -2, interacts directly only with Processors i+l and i-l. Processor j 0 only interacts with...Chan76] Chandra, A.K., "Maximal Parallelism in Matrix Multiplication," IBM Report RC. 6193, Watson Research Center, Yorktown Heights, N.Y., October 1976
7 CFR 1435.503 - In-kind payments.
Code of Federal Regulations, 2013 CFR
2013-01-01
... OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Processor Sugar Payment-In-Kind..., make payments in the form of sugar held in CCC inventory. (b) To the maximum extent practicable, CCC... sugar held in storage by the processor; (2) CCC-owned sugar held in storage by any other processor in...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-06-04
... floating processor landing reporting requirements; and to consolidate CQE Program eligibility by community... determine their annual reporting requirements. CQE Floating Processor Landing Report Requirements This action revises the recordkeeping and reporting regulations at Sec. 679.5(e) for CQE floating processors...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-06
... clarify the CQE floating processor landing reporting requirements; and to consolidate CQE Program... their annual reporting requirements. CQE Floating Processor Landing Report Requirements This action would revise the recordkeeping and reporting regulations at Sec. 679.5(e) for CQE floating processors...
50 CFR 679.94 - Economic data report (EDR) for the Amendment 80 sector.
Code of Federal Regulations, 2010 CFR
2010-10-01
...: NMFS, Alaska Fisheries Science Center, Economic Data Reports, 7600 Sand Point Way NE, F/AKC2, Seattle... Operation Description of code Code NMFS Alaska region ADF&G FCP Catcher/processor Floating catcher processor. FLD Mothership Floating domestic mothership. IFP Stationary Floating Processor Inshore floating...
ERIC Educational Resources Information Center
Ortony, Andrew; Radin, Dean I.
The product of researchers' efforts to develop a computer processor which distinguishes between relevant and irrelevant information in the database, Spreading Activation Processor for Information Encoded in Network Structures (SAPIENS) exhibits (1) context sensitivity, (2) efficiency, (3) decreasing activation over time, (4) summation of…
Space and frequency-multiplexed optical linear algebra processor - Fabrication and initial tests
NASA Technical Reports Server (NTRS)
Casasent, D.; Jackson, J.
1986-01-01
A new optical linear algebra processor architecture is described. Space and frequency-multiplexing are used to accommodate bipolar and complex-valued data. A fabricated laboratory version of this processor is described, the electronic support system used is discussed, and initial test data obtained on it are presented.
Detailed specifications are given for a network of data processors and submodels that can generate the parameter fields required by the regional oxidant model formulated in Part 1 of this report. Operations performed by the processor network include simulation of the motion and d...
A fault-tolerant information processing concept for space vehicles.
NASA Technical Reports Server (NTRS)
Hopkins, A. L., Jr.
1971-01-01
A distributed fault-tolerant information processing system is proposed, comprising a central multiprocessor, dedicated local processors, and multiplexed input-output buses connecting them together. The processors in the multiprocessor are duplicated for error detection, which is felt to be less expensive than using coded redundancy of comparable effectiveness. Error recovery is made possible by a triplicated scratchpad memory in each processor. The main multiprocessor memory uses replicated memory for error detection and correction. Local processors use any of three conventional redundancy techniques: voting, duplex pairs with backup, and duplex pairs in independent subsystems.
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor
NASA Astrophysics Data System (ADS)
Hristov, Ivan; Goranov, Goran; Hristova, Radoslava
2018-02-01
We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.
Preliminary study on the potential usefulness of array processor techniques for structural synthesis
NASA Technical Reports Server (NTRS)
Feeser, L. J.
1980-01-01
The effects of the use of array processor techniques within the structural analyzer program, SPAR, are simulated in order to evaluate the potential analysis speedups which may result. In particular the connection of a Floating Point System AP120 processor to the PRIME computer is discussed. Measurements of execution, input/output, and data transfer times are given. Using these data estimates are made as to the relative speedups that can be executed in a more complete implementation on an array processor maxi-mini computer system.
Sentinel-2 Level 2A Prototype Processor: Architecture, Algorithms And First Results
NASA Astrophysics Data System (ADS)
Muller-Wilm, Uwe; Louis, Jerome; Richter, Rudolf; Gascon, Ferran; Niezette, Marc
2013-12-01
Sen2Core is a prototype processor for Sentinel-2 Level 2A product processing and formatting. The processor is developed for and with ESA and performs the tasks of Atmospheric Correction and Scene Classification of Level 1C input data. Level 2A outputs are: Bottom-Of- Atmosphere (BOA) corrected reflectance images, Aerosol Optical Thickness-, Water Vapour-, Scene Classification maps and Quality indicators, including cloud and snow probabilities. The Level 2A Product Formatting performed by the processor follows the specification of the Level 1C User Product.
The computational structural mechanics testbed architecture. Volume 2: The interface
NASA Technical Reports Server (NTRS)
Felippa, Carlos A.
1988-01-01
This is the third set of five volumes which describe the software architecture for the Computational Structural Mechanics Testbed. Derived from NICE, an integrated software system developed at Lockheed Palo Alto Research Laboratory, the architecture is composed of the command language CLAMP, the command language interpreter CLIP, and the data manager GAL. Volumes 1, 2, and 3 (NASA CR's 178384, 178385, and 178386, respectively) describe CLAMP and CLIP and the CLIP-processor interface. Volumes 4 and 5 (NASA CR's 178387 and 178388, respectively) describe GAL and its low-level I/O. CLAMP, an acronym for Command Language for Applied Mechanics Processors, is designed to control the flow of execution of processors written for NICE. Volume 3 describes the CLIP-Processor interface and related topics. It is intended only for processor developers.
Fault detection and bypass in a sequence information signal processor
NASA Technical Reports Server (NTRS)
Peterson, John C. (Inventor); Chow, Edward T. (Inventor)
1992-01-01
The invention comprises a plurality of scan registers, each such register respectively associated with a processor element; an on-chip comparator, encoder and fault bypass register. Each scan register generates a unitary signal the logic state of which depends on the correctness of the input from the previous processor in the systolic array. These unitary signals are input to a common comparator which generates an output indicating whether or not an error has occurred. These unitary signals are also input to an encoder which identifies the location of any fault detected so that an appropriate multiplexer can be switched to bypass the faulty processor element. Input scan data can be readily programmed to fully exercise all of the processor elements so that no fault can remain undetected.
Accelerated convergence for synchronous approximate agreement
NASA Technical Reports Server (NTRS)
Kearns, J. P.; Park, S. K.; Sjogren, J. A.
1988-01-01
The protocol for synchronous approximate agreement presented by Dolev et. al. exhibits the undesirable property that a faulty processor, by the dissemination of a value arbitrarily far removed from the values held by good processors, may delay the termination of the protocol by an arbitrary amount of time. Such behavior is clearly undesirable in a fault tolerant dynamic system subject to hard real-time constraints. A mechanism is presented by which editing data suspected of being from Byzantine-failed processors can lead to quicker, predictable, convergence to an agreement value. Under specific assumptions about the nature of values transmitted by failed processors relative to those transmitted by good processors, a Monte Carlo simulation is presented whose qualitative results illustrate the trade-off between accelerated convergence and the accuracy of the value agreed upon.
The Engineer Topographic Laboratories /ETL/ hybrid optical/digital image processor
NASA Astrophysics Data System (ADS)
Benton, J. R.; Corbett, F.; Tuft, R.
1980-01-01
An optical-digital processor for generalized image enhancement and filtering is described. The optical subsystem is a two-PROM Fourier filter processor. Input imagery is isolated, scaled, and imaged onto the first PROM; this input plane acts like a liquid gate and serves as an incoherent-to-coherent converter. The image is transformed onto a second PROM which also serves as a filter medium; filters are written onto the second PROM with a laser scanner in real time. A solid state CCTV camera records the filtered image, which is then digitized and stored in a digital image processor. The operator can then manipulate the filtered image using the gray scale and color remapping capabilities of the video processor as well as the digital processing capabilities of the minicomputer.
The ATLAS Level-1 Calorimeter Trigger: PreProcessor implementation and performance
NASA Astrophysics Data System (ADS)
Åsman, B.; Achenbach, R.; Allbrooke, B. M. M.; Anders, G.; Andrei, V.; Büscher, V.; Bansil, H. S.; Barnett, B. M.; Bauss, B.; Bendtz, K.; Bohm, C.; Bracinik, J.; Brawn, I. P.; Brock, R.; Buttinger, W.; Caputo, R.; Caughron, S.; Cerrito, L.; Charlton, D. G.; Childers, J. T.; Curtis, C. J.; Daniells, A. C.; Davis, A. O.; Davygora, Y.; Dorn, M.; Eckweiler, S.; Edmunds, D.; Edwards, J. P.; Eisenhandler, E.; Ellis, K.; Ermoline, Y.; Föhlisch, F.; Faulkner, P. J. W.; Fedorko, W.; Fleckner, J.; French, S. T.; Gee, C. N. P.; Gillman, A. R.; Goeringer, C.; Hülsing, T.; Hadley, D. R.; Hanke, P.; Hauser, R.; Heim, S.; Hellman, S.; Hickling, R. S.; Hidvégi, A.; Hillier, S. J.; Hofmann, J. I.; Hristova, I.; Ji, W.; Johansen, M.; Keller, M.; Khomich, A.; Kluge, E.-E.; Koll, J.; Laier, H.; Landon, M. P. J.; Lang, V. S.; Laurens, P.; Lepold, F.; Lilley, J. N.; Linnemann, J. T.; Müller, F.; Müller, T.; Mahboubi, K.; Martin, T. A.; Mass, A.; Meier, K.; Meyer, C.; Middleton, R. P.; Moa, T.; Moritz, S.; Morris, J. D.; Mudd, R. D.; Narayan, R.; zur Nedden, M.; Neusiedl, A.; Newman, P. R.; Nikiforov, A.; Ohm, C. C.; Perera, V. J. O.; Pfeiffer, U.; Plucinski, P.; Poddar, S.; Prieur, D. P. F.; Qian, W.; Rieck, P.; Rizvi, E.; Sankey, D. P. C.; Schäfer, U.; Scharf, V.; Schmitt, K.; Schröder, C.; Schultz-Coulon, H.-C.; Schumacher, C.; Schwienhorst, R.; Silverstein, S. B.; Simioni, E.; Snidero, G.; Staley, R. J.; Stamen, R.; Stock, P.; Stockton, M. C.; Tan, C. L. A.; Tapprogge, S.; Thomas, J. P.; Thompson, P. D.; Thomson, M.; True, P.; Watkins, P. M.; Watson, A. T.; Watson, M. F.; Weber, P.; Wessels, M.; Wiglesworth, C.; Williams, S. L.
2012-12-01
The PreProcessor system of the ATLAS Level-1 Calorimeter Trigger (L1Calo) receives about 7200 analogue signals from the electromagnetic and hadronic components of the calorimetric detector system. Lateral division results in cells which are pre-summed to so-called Trigger Towers of size 0.1 × 0.1 along azimuth (phi) and pseudorapidity (η). The received calorimeter signals represent deposits of transverse energy. The system consists of 124 individual PreProcessor modules that digitise the input signals for each LHC collision, and provide energy and timing information to the digital processors of the L1Calo system, which identify physics objects forming much of the basis for the full ATLAS first level trigger decision. This paper describes the architecture of the PreProcessor, its hardware realisation, functionality, and performance.
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.
2013-01-01
The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.
Reconfigurable signal processor designs for advanced digital array radar systems
NASA Astrophysics Data System (ADS)
Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining
2017-05-01
The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
A novel VLSI processor architecture for supercomputing arrays
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.
1993-01-01
Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.
PixonVision real-time video processor
NASA Astrophysics Data System (ADS)
Puetter, R. C.; Hier, R. G.
2007-09-01
PixonImaging LLC and DigiVision, Inc. have developed a real-time video processor, the PixonVision PV-200, based on the patented Pixon method for image deblurring and denoising, and DigiVision's spatially adaptive contrast enhancement processor, the DV1000. The PV-200 can process NTSC and PAL video in real time with a latency of 1 field (1/60 th of a second), remove the effects of aerosol scattering from haze, mist, smoke, and dust, improve spatial resolution by up to 2x, decrease noise by up to 6x, and increase local contrast by up to 8x. A newer version of the processor, the PV-300, is now in prototype form and can handle high definition video. Both the PV-200 and PV-300 are FPGA-based processors, which could be spun into ASICs if desired. Obvious applications of these processors include applications in the DOD (tanks, aircraft, and ships), homeland security, intelligence, surveillance, and law enforcement. If developed into an ASIC, these processors will be suitable for a variety of portable applications, including gun sights, night vision goggles, binoculars, and guided munitions. This paper presents a variety of examples of PV-200 processing, including examples appropriate to border security, battlefield applications, port security, and surveillance from unmanned aerial vehicles.
PDT - PARTICLE DISPLACEMENT TRACKING SOFTWARE
NASA Technical Reports Server (NTRS)
Wernet, M. P.
1994-01-01
Particle Imaging Velocimetry (PIV) is a quantitative velocity measurement technique for measuring instantaneous planar cross sections of a flow field. The technique offers very high precision (1%) directionally resolved velocity vector estimates, but its use has been limited by high equipment costs and complexity of operation. Particle Displacement Tracking (PDT) is an all-electronic PIV data acquisition and reduction procedure which is simple, fast, and easily implemented. The procedure uses a low power, continuous wave laser and a Charged Coupled Device (CCD) camera to electronically record the particle images. A frame grabber board in a PC is used for data acquisition and reduction processing. PDT eliminates the need for photographic processing, system costs are moderately low, and reduced data are available within seconds of acquisition. The technique results in velocity estimate accuracies on the order of 5%. The software is fully menu-driven from the acquisition to the reduction and analysis of the data. Options are available to acquire a single image or 5- or 25-field series of images separated in time by multiples of 1/60 second. The user may process each image, specifying its boundaries to remove unwanted glare from the periphery and adjusting its background level to clearly resolve the particle images. Data reduction routines determine the particle image centroids and create time history files. PDT then identifies the velocity vectors which describe the particle movement in the flow field. Graphical data analysis routines are included which allow the user to graph the time history files and display the velocity vector maps, interpolated velocity vector grids, iso-velocity vector contours, and flow streamlines. The PDT data processing software is written in FORTRAN 77 and the data acquisition routine is written in C-Language for 80386-based IBM PC compatibles running MS-DOS v3.0 or higher. Machine requirements include 4 MB RAM (3 MB Extended), a single or multiple frequency RGB monitor (EGA or better), a math co-processor, and a pointing device. The printers supported by the graphical analysis routines are the HP Laserjet+, Series II, and Series III with at least 1.5 MB memory. The data acquisition routines require the EPIX 4-MEG video board and optional 12.5MHz oscillator, and associated EPIX software. Data can be acquired from any CCD or RS-170 compatible video camera with pixel resolution of 600hX400v or better. PDT is distributed on one 5.25 inch 360K MS-DOS format diskette. Due to the use of required proprietary software, executable code is not provided on the distribution media. Compiling the source code requires the Microsoft C v5.1 compiler, Microsoft QuickC v2.0, the Microsoft Mouse Library, EPIX Image Processing Libraries, the Microway NDP-Fortran-386 v2.1 compiler, and the Media Cybernetics HALO Professional Graphics Kernal System. Due to the complexities of the machine requirements, COSMIC strongly recommends the purchase and review of the documentation prior to the purchase of the program. The source code, and sample input and output files are provided in PKZIP format; the PKUNZIP utility is included. PDT was developed in 1990. All trade names used are the property of their respective corporate owners.
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Romano, Paul K.; Siegel, Andrew R.
The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of two parameters: the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size in order to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, however, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration. For some problems, this implies that vector effciencies over 50% may not be attainable. While there are many factors impacting performance of an event-based algorithm that are not captured by our model, it nevertheless provides insights into factors that may be limiting in a real implementation.« less
Diet as a factor in behavioral radiation protection following exposure to heavy particles
NASA Technical Reports Server (NTRS)
Rabin, Bernard M.; Shukitt-Hale, Barbara; Joseph, James; Todd, Paul
2005-01-01
Major risks associated with radiation exposures on deep space missions include carcinogenesis due to heavy-particle exposure of cancer-prone tissues and performance decrements due to neurological damage produced by heavy particles. Because exposure to heavy particles can cause oxidative stress, it is possible that antioxidants can be used to mitigate these risks (and possibly some health risks of microgravity). To assess the capacity of antioxidant diets to mitigate the effects of exposure to heavy particles, rats were maintained on antioxidant diets containing 2% blueberry or strawberry extract or a control diet for 8 weeks prior to exposure to 1.5 or 2.0 Gy of accelerated iron particles at Brookhaven National Laboratory. Following irradiation rats were tested on a series of behavioral tasks: amphetamine-induced taste aversion learning, operant responding and spatial learning and memory. The results indicated that the performance of the irradiated rats maintained on the antioxidant diets was, in general, significantly better than that of the control animals, although the effectiveness of the diets ameliorating the radiation-induced deterioration in performance varied as a function of both the specific diet and the specific endpoint. In addition, animals fed antioxidant diets prior to exposure showed reduced heavy particle-induced tumorigenesis one year after exposure compared to the animals fed the control diet. These results suggest that antioxidant diets have the potential to serve as part of a system designed to provide protection to astronauts against the effects of heavy particles on exploratory missions outside the magnetic field of the earth.
NASA Astrophysics Data System (ADS)
Sewell, Stephen
This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.
Photonic Low Cost Micro-Sensor for in-Line Wear Particle Detection in Flowing Lube Oils.
Mabe, Jon; Zubia, Joseba; Gorritxategi, Eneko
2017-03-14
The presence of microscopic particles in suspension in industrial fluids is often an early warning of latent or imminent failures in the equipment or processes where they are being used. This manuscript describes work undertaken to integrate different photonic principles with a micro- mechanical fluidic structure and an embedded processor to develop a fully autonomous wear debris sensor for in-line monitoring of industrial fluids. Lens-less microscopy, stroboscopic illumination, a CMOS imager and embedded machine vision technologies have been merged to develop a sensor solution that is able to detect and quantify the number and size of micrometric particles suspended in a continuous flow of a fluid. A laboratory test-bench has been arranged for setting up the configuration of the optical components targeting a static oil sample and then a sensor prototype has been developed for migrating the measurement principles to real conditions in terms of operating pressure and flow rate of the oil. Imaging performance is quantified using micro calibrated samples, as well as by measuring real used lubricated oils. Sampling a large fluid volume with a decent 2D spatial resolution, this photonic micro sensor offers a powerful tool at very low cost and compacted size for in-line wear debris monitoring.
Photonic Low Cost Micro-Sensor for in-Line Wear Particle Detection in Flowing Lube Oils
Mabe, Jon; Zubia, Joseba; Gorritxategi, Eneko
2017-01-01
The presence of microscopic particles in suspension in industrial fluids is often an early warning of latent or imminent failures in the equipment or processes where they are being used. This manuscript describes work undertaken to integrate different photonic principles with a micro- mechanical fluidic structure and an embedded processor to develop a fully autonomous wear debris sensor for in-line monitoring of industrial fluids. Lens-less microscopy, stroboscopic illumination, a CMOS imager and embedded machine vision technologies have been merged to develop a sensor solution that is able to detect and quantify the number and size of micrometric particles suspended in a continuous flow of a fluid. A laboratory test-bench has been arranged for setting up the configuration of the optical components targeting a static oil sample and then a sensor prototype has been developed for migrating the measurement principles to real conditions in terms of operating pressure and flow rate of the oil. Imaging performance is quantified using micro calibrated samples, as well as by measuring real used lubricated oils. Sampling a large fluid volume with a decent 2D spatial resolution, this photonic micro sensor offers a powerful tool at very low cost and compacted size for in-line wear debris monitoring. PMID:28335436
Thermal performance of ethylene glycol based nanofluids in an electronic heat sink.
Selvakumar, P; Suresh, S
2014-03-01
Heat transfer in electronic devices such as micro processors and power converters is much essential to keep these devices cool for the better functioning of the systems. Air cooled heat sinks are not able to remove the high heat flux produced by the today's electronic components. Liquids work better than air in removing heat. Thermal conductivity which is the most essential property of any heat transfer fluid can be enhanced by adding nano scale solid particles which possess higher thermal conductivity than the liquids. In this work the convective heat transfer and pressure drop characteristics of the water/ethylene glycol mixture based nanofluids consisting of Al2O3, CuO nanoparticles with a volume concentration of 0.1% are studied experimentally in a rectangular channel heat sink. The nano particles are characterized using Scanning Electron Microscope and the nannofluids are prepared by using an ultrasonic vibrator and Sodium Lauryl Salt surfactant. The experimental results showed that nanofluids of 0.1% volume concentration give higher convective heat transfer coefficient values than the plain water/ethylene glycol mixture which is prepared in the volume ratio of 70:30. There is no much penalty in the pressure drop values due to the inclusion of nano particles in the water/ethylene glycol mixture.
7 CFR 201.73 - Processors and processing of all classes of certified seed.
Code of Federal Regulations, 2010 CFR
2010-01-01
... (CONTINUED) FEDERAL SEED ACT FEDERAL SEED ACT REGULATIONS Certified Seed § 201.73 Processors and processing... of certified seed: (a) Facilities shall be available to perform processing without introducing... 7 Agriculture 3 2010-01-01 2010-01-01 false Processors and processing of all classes of certified...
7 CFR 201.73 - Processors and processing of all classes of certified seed.
Code of Federal Regulations, 2011 CFR
2011-01-01
... (CONTINUED) FEDERAL SEED ACT FEDERAL SEED ACT REGULATIONS Certified Seed § 201.73 Processors and processing... of certified seed: (a) Facilities shall be available to perform processing without introducing... 7 Agriculture 3 2011-01-01 2011-01-01 false Processors and processing of all classes of certified...
7 CFR 201.73 - Processors and processing of all classes of certified seed.
Code of Federal Regulations, 2013 CFR
2013-01-01
... (CONTINUED) FEDERAL SEED ACT FEDERAL SEED ACT REGULATIONS Certified Seed § 201.73 Processors and processing... of certified seed: (a) Facilities shall be available to perform processing without introducing... 7 Agriculture 3 2013-01-01 2013-01-01 false Processors and processing of all classes of certified...
7 CFR 201.73 - Processors and processing of all classes of certified seed.
Code of Federal Regulations, 2012 CFR
2012-01-01
... (CONTINUED) FEDERAL SEED ACT FEDERAL SEED ACT REGULATIONS Certified Seed § 201.73 Processors and processing... of certified seed: (a) Facilities shall be available to perform processing without introducing... 7 Agriculture 3 2012-01-01 2012-01-01 false Processors and processing of all classes of certified...
7 CFR 201.73 - Processors and processing of all classes of certified seed.
Code of Federal Regulations, 2014 CFR
2014-01-01
... (CONTINUED) FEDERAL SEED ACT FEDERAL SEED ACT REGULATIONS Certified Seed § 201.73 Processors and processing... of certified seed: (a) Facilities shall be available to perform processing without introducing... 7 Agriculture 3 2014-01-01 2014-01-01 false Processors and processing of all classes of certified...
Code of Federal Regulations, 2014 CFR
2014-07-01
... 40 Protection of Environment 17 2014-07-01 2014-07-01 false Gasoline sulfur standards and... Gasoline Sulfur § 80.1607 Gasoline sulfur standards and requirements for transmix processors and transmix... to a refiner under this subpart O. (a) Any transmix processor who recovers transmix gasoline product...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-11-15
... (coop) programs for the at-sea mothership and catcher/processor trawl fleets (whiting only). Since that... permit holder (vessel owner) to change their vessel ownership, 9. Clarify that the processor obligation..., Mothership Coop (MS) Program--Whiting At-sea Trawl Fishery, and Catcher-Processor (C/P) Coop Program--Whiting...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-12
... processor. In addition, the Amendment modifies the proposals so that a market maker's quoting obligations... reported by the responsible single plan processor. Finally, so that the markets may coordinate..., as reported by the responsible single plan processor. The Amendment also modifies that the market...
Federal Register 2010, 2011, 2012, 2013, 2014
2010-03-23
... Pollock Fishery This proposed rule applies to owners and operators of catcher vessels, catcher/processors, motherships, inshore processors, and the six Western Alaska Community Development Quota (CDQ) Program groups... fishery by identifying the vessels and processors eligible to participate in the fishery and allocating...
Federal Register 2010, 2011, 2012, 2013, 2014
2011-12-14
... the program to allow participation by all types of near shore, stationary processors for halibut... This proposed rule would apply to owners and operators of catcher vessels, catcher/processors, and inshore processors participating in the pollock (Theragra chalcogramma) trawl fisheries in the Central and...
75 FR 44709 - Common Crop Insurance Regulations; Stonefruit Crop Insurance Provisions
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-29
... specified in the Special Provisions or is accepted by a packer, processor or other handler.'' According to... not make grade, it is not considered marketable unless a packer, handler or processor accepts the... meeting the standards or being accepted by a processor, etc., without any indication that the grade...
Robust, High-Speed Network Design for Large-Scale Multiprocessing
1993-09-01
3.17 Left: Non-expansive Wiring of Processors to First Stage Routing Elements . ... 38 3.18 Right: Expansive Wiring of Processors to First Stage...162 8.2 RNI Micro -architecture ........ .............................. 163 8.3 Packaged RN I IC...169 11.1 MLUNK Message Formats ........ .............................. 173 12.1 Routing Board Arrangement for 64- processor Machine
Digital Hardware Architecture Implementation
1993-02-15
of micro - MOTOROLA 63.7 50MHZ 64 BIT 2092 N/A processors during quarterly re- INTEL 42 50MHz 64 BIT 1092 N/A views and monthly reports. The 186o XP...27 3.2.1 Signal Processor (SP) Analysis...31 3.2.1.11 MasPar Software Statements ........................................................ 32 3.2.2 Data Processor
A debugger-interpreter with setup facilities for assembly programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolinskii, I.S.; Zisel`man, I.M.; Belotskii, S.L.
1995-11-01
In this paper a software program allowing one to introduce and debug the descriptions of the von Nuemann architecture processors and their assemblers, efficiently debug assembly programs, and investigate the instruction sets of the described processors is considered. For a description of the processor sematics and assembler syntax, a metassembly language is suggested.