Algorithms versus architectures for computational chemistry
NASA Technical Reports Server (NTRS)
Partridge, H.; Bauschlicher, C. W., Jr.
1986-01-01
The algorithms employed are computationally intensive and, as a result, increased performance (both algorithmic and architectural) is required to improve accuracy and to treat larger molecular systems. Several benchmark quantum chemistry codes are examined on a variety of architectures. While these codes are only a small portion of a typical quantum chemistry library, they illustrate many of the computationally intensive kernels and data manipulation requirements of some applications. Furthermore, understanding the performance of the existing algorithm on present and proposed supercomputers serves as a guide for future programs and algorithm development. The algorithms investigated are: (1) a sparse symmetric matrix vector product; (2) a four index integral transformation; and (3) the calculation of diatomic two electron Slater integrals. The vectorization strategies are examined for these algorithms for both the Cyber 205 and Cray XMP. In addition, multiprocessor implementations of the algorithms are looked at on the Cray XMP and on the MIT static data flow machine proposed by DENNIS.
Parallel algorithms and architecture for computation of manipulator forward dynamics
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1989-01-01
Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.
A Simple Physical Optics Algorithm Perfect for Parallel Computing Architecture
NASA Technical Reports Server (NTRS)
Imbriale, W. A.; Cwik, T.
1994-01-01
A reflector antenna computer program based upon a simple discreet approximation of the radiation integral has proven to be extremely easy to adapt to the parallel computing architecture of the modest number of large-gain computing elements such as are used in the Intel iPSC and Touchstone Delta parallel machines.
1989-01-20
LLAA6 .l iI -SA/TR-2/89 A003: FINAL REPORT * COMPUTER ALGORITHMS AND ARCHITECTURES N FOR THREE-DIMENSIONAL EDDY-CURRENT NONDESTRUCTIVE EVALUATION...Ciasuication) COMPUTER ALGORITHMS AND ARCHITECTURES FOR THREE-DIMENSIONAL EDD~j~~JRRN iv ummary Q PERSONAL AUTriOR(S) SBAHASCAE 1 3a. TYPE Of REPORT
Irregular Applications: Architectures & Algorithms
Feo, John T.; Villa, Oreste; Tumeo, Antonino; Secchi, Simone
2012-02-06
Irregular applications are characterized by irregular data structures, control and communication patterns. Novel irregular high performance applications which deal with large data sets and require have recently appeared. Unfortunately, current high performance systems and software infrastructures executes irregular algorithms poorly. Only coordinated efforts by end user, area specialists and computer scientists that consider both the architecture and the software stack may be able to provide solutions to the challenges of modern irregular applications.
1989-01-20
mflC FILE. OOR SA/TR-2/89 A003: FINAL REPORT COMPUTER ALGORITHMS AND ARCHITECTURES FOR THREE-DIMENSIONAL EDDY-CURRENT NONDESTRUCTIVE EVALUATION CD...J., Ullman, J., The Design and Analysis of Computer Algorithms , Addison-Wesley Publishing Company, 1974. [A2] Anderson, B., Moore, J., Optimal...actual data. DC- 17 I I I I [All Aho, A., Hopcroft, J., Ullman, J., The Design and Analysis of Computer Algorithms , Addison-Wesley Publishing Company
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
Parallel Architecture For Robotics Computation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1990-01-01
Universal Real-Time Robotic Controller and Simulator (URRCS) is highly parallel computing architecture for control and simulation of robot motion. Result of extensive algorithmic study of different kinematic and dynamic computational problems arising in control and simulation of robot motion. Study led to development of class of efficient parallel algorithms for these problems. Represents algorithmically specialized architecture, in sense capable of exploiting common properties of this class of parallel algorithms. System with both MIMD and SIMD capabilities. Regarded as processor attached to bus of external host processor, as part of bus memory.
Feng, Lu; Fedrigo, Enrico; Béchet, Clémentine; Brunner, Elisabeth; Pirani, Werther
2012-06-01
The European Southern Observatory (ESO) is studying the next generation giant telescope, called the European Extremely Large Telescope (E-ELT). With a 42 m diameter primary mirror, it is a significant step from currently existing telescopes. Therefore, the E-ELT with its instruments poses new challenges in terms of cost and computational complexity for the control system, including its adaptive optics (AO). Since the conventional matrix-vector multiplication (MVM) method successfully used so far for AO wavefront reconstruction cannot be efficiently scaled to the size of the AO systems on the E-ELT, faster algorithms are needed. Among those recently developed wavefront reconstruction algorithms, three are studied in this paper from the point of view of design, implementation, and absolute speed on three multicore multi-CPU platforms. We focus on a single-conjugate AO system for the E-ELT. The algorithms are the MVM, the Fourier transform reconstructor (FTR), and the fractal iterative method (FRiM). This study enhances the scaling of these algorithms with an increasing number of CPUs involved in the computation. We discuss implementation strategies, depending on various CPU architecture constraints, and we present the first quantitative execution times so far at the E-ELT scale. MVM suffers from a large computational burden, making the current computing platform undersized to reach timings short enough for AO wavefront reconstruction. In our study, the FTR provides currently the fastest reconstruction. FRiM is a recently developed algorithm, and several strategies are investigated and presented here in order to implement it for real-time AO wavefront reconstruction, and to optimize its execution time. The difficulty to parallelize the algorithm in such architecture is enhanced. We also show that FRiM can provide interesting scalability using a sparse matrix approach.
NASA Astrophysics Data System (ADS)
Romano, Paul Kollath
measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
James, Conrad D.; Aimone, James B.; Miner, Nadine E.; ...
2017-01-04
In this study, biological neural networks continue to inspire new developments in algorithms and microelectronic hardware to solve challenging data processing and classification problems. Here in this research, we survey the history of neural-inspired and neuromorphic computing in order to examine the complex and intertwined trajectories of the mathematical theory and hardware developed in this field. Early research focused on adapting existing hardware to emulate the pattern recognition capabilities of living organisms. Contributions from psychologists, mathematicians, engineers, neuroscientists, and other professions were crucial to maturing the field from narrowly-tailored demonstrations to more generalizable systems capable of addressing difficult problem classesmore » such as object detection and speech recognition. Algorithms that leverage fundamental principles found in neuroscience such as hierarchical structure, temporal integration, and robustness to error have been developed, and some of these approaches are achieving world-leading performance on particular data classification tasks. Additionally, novel microelectronic hardware is being developed to perform logic and to serve as memory in neuromorphic computing systems with optimized system integration and improved energy efficiency. Key to such advancements was the incorporation of new discoveries in neuroscience research, the transition away from strict structural replication and towards the functional replication of neural systems, and the use of mathematical theory frameworks to guide algorithm and hardware developments.« less
James, Conrad D.; Aimone, James B.; Miner, Nadine E.; Vineyard, Craig M.; Rothganger, Fredrick H.; Carlson, Kristofor D.; Mulder, Samuel A.; Draelos, Timothy J.; Faust, Aleksandra; Marinella, Matthew J.; Naegle, John H.; Plimpton, Steven J.
2017-01-01
Biological neural networks continue to inspire new developments in algorithms and microelectronic hardware to solve challenging data processing and classification problems. Here in this research, we survey the history of neural-inspired and neuromorphic computing in order to examine the complex and intertwined trajectories of the mathematical theory and hardware developed in this field. Early research focused on adapting existing hardware to emulate the pattern recognition capabilities of living organisms. Contributions from psychologists, mathematicians, engineers, neuroscientists, and other professions were crucial to maturing the field from narrowly-tailored demonstrations to more generalizable systems capable of addressing difficult problem classes such as object detection and speech recognition. Algorithms that leverage fundamental principles found in neuroscience such as hierarchical structure, temporal integration, and robustness to error have been developed, and some of these approaches are achieving world-leading performance on particular data classification tasks. Additionally, novel microelectronic hardware is being developed to perform logic and to serve as memory in neuromorphic computing systems with optimized system integration and improved energy efficiency. Key to such advancements was the incorporation of new discoveries in neuroscience research, the transition away from strict structural replication and towards the functional replication of neural systems, and the use of mathematical theory frameworks to guide algorithm and hardware developments.
Layered Architecture for Quantum Computing
NASA Astrophysics Data System (ADS)
Jones, N. Cody; Van Meter, Rodney; Fowler, Austin G.; McMahon, Peter L.; Kim, Jungsang; Ladd, Thaddeus D.; Yamamoto, Yoshihisa
2012-07-01
We develop a layered quantum-computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface-code quantum error correction. In doing so, we propose a new quantum-computer architecture based on optical control of quantum dots. The time scales of physical-hardware operations and logical, error-corrected quantum gates differ by several orders of magnitude. By dividing functionality into layers, we can design and analyze subsystems independently, demonstrating the value of our layered architectural approach. Using this concrete hardware platform, we provide resource analysis for executing fault-tolerant quantum algorithms for integer factoring and quantum simulation, finding that the quantum-dot architecture we study could solve such problems on the time scale of days.
Architecture Adaptive Computing Environment
NASA Technical Reports Server (NTRS)
Dorband, John E.
2006-01-01
Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
NASA Astrophysics Data System (ADS)
Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.
2011-07-01
We describe and evaluate a fast implementation of a classical block-matching motion estimation algorithm for multiple graphical processing units (GPUs) using the compute unified device architecture computing engine. The implemented block-matching algorithm uses summed absolute difference error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation, we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and noninteger search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a noninteger search grid. The additional speedup for a noninteger search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition, we compared the execution time of the proposed FS GPU implementation with two existing, highly optimized nonfull grid search CPU-based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and simplified unsymmetrical multi-hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720 × 480 pixels in resolution commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G
2011-07-01
In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
Syntactic Algorithms for Image Segmentation and a Special Computer Architecture for Image Processing
1977-12-01
Experimental Results of image Segmentation from FLIR ( Forword Looking Infrared) Images . ...... . . . . . . . 1115 4.3.1 Data Acquisition System of...of a picture. Concerning the computer processing time in- volved In image segmentation, the grey level histogram thresholding approach is quite fast ...computer storage and the CPU time for each matching operation. The syntax- controlled method has the advantage of fast computer processing time for
Spectral element methods: Algorithms and architectures
NASA Technical Reports Server (NTRS)
Fischer, Paul; Ronquist, Einar M.; Dewey, Daniel; Patera, Anthony T.
1988-01-01
Spectral element methods are high-order weighted residual techniques for partial differential equations that combine the geometric flexibility of finite element methods with the rapid convergence of spectral techniques. Spectral element methods are described for the simulation of incompressible fluid flows, with special emphasis on implementation of spectral element techniques on medium-grained parallel processors. Two parallel architectures are considered: the first, a commercially available message-passing hypercube system; the second, a developmental reconfigurable architecture based on Geometry-Defining Processors. High parallel efficiency is obtained in hypercube spectral element computations, indicating that load balancing and communication issues can be successfully addressed by a high-order technique/medium-grained processor algorithm-architecture coupling.
Monte Carlo simulations on SIMD computer architectures
Burmester, C.P.; Gronsky, R.; Wille, L.T.
1992-03-01
Algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SMM) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carlo updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures.
Efficient algorithm and systolic architecture for modular division
NASA Astrophysics Data System (ADS)
Chen, Chuanpeng; Qin, Zhongping
2011-06-01
A new efficient modular division algorithm suitable for systolic implementation and its systolic architecture is proposed in this article. With a new exit condition of while loop and a new updating method of a control variable, the new algorithm reduces the average of iteration numbers by more than 14.3% compared to the algorithm proposed by Chen, Bai and Chen. Based on the new algorithm, we design a fast systolic architecture with an optimised core computing cell. Compared to the architecture proposed by Chen, Bai and Chen, our systolic architecture has reduced the critical path delay by about 18% and the total computational time for one modular division by almost 30%, with the cost of about 1% more cells. Moreover, by the addition of a flag signal and three logic gates, the proposed systolic architecture can also perform Montgomery modular multiplication and a fast unified modular divider/multiplier is realised.
1989-01-20
appropriate sensing coil. This computation should be done in (11), because it is very easy to integrate the exponential in (2,y). The result is to introduce...introduces another Fourier transform, so that the result is I LM =JJPL(zI6)Pm(y/6y)(,)dzdy (6 z 6 y)2 N N ,. )/_ T) sin(l 6z/2) )2sin(kby/2)2 (38)= 4...have gotten good results with simulated data at lower frequencies. I’ II . I -I I I 5. Ring Source Inverse Model The inverse algorithm developed in
A hierarchical clustering algorithm for MIMD architecture.
Du, Zhihua; Lin, Feng
2004-12-01
Hierarchical clustering is the most often used method for grouping similar patterns of gene expression data. A fundamental problem with existing implementations of this clustering method is the inability to handle large data sets within a reasonable time and memory resources. We propose a parallelized algorithm of hierarchical clustering to solve this problem. Our implementation on a multiple instruction multiple data (MIMD) architecture shows considerable reduction in computational time and inter-node communication overhead, especially for large data sets. We use the standard message passing library, message passing interface (MPI) for any MIMD systems.
Stereoscopic depth perception for robot vision: algorithms and architectures
Safranek, R.J.; Kak, A.C.
1983-01-01
The implementation of depth perception algorithms for computer vision is considered. In automated manufacturing, depth information is vital for tasks such as path planning and 3-d scene analysis. The presentation begins with a survey of computer algorithms for stereoscopic depth perception. The emphasis is on the Marr-Poggio paradigm of human stereo vision and its computer implementation. In addition, a stereo matching algorithm based on the relaxation labelling technique is examined. A computer architecture designed to efficiently implement stereo matching algorithms, an MIMD array interfaced to a global memory, is presented. 9 references.
Computing architecture for autonomous microgrids
Goldsmith, Steven Y.
2015-09-29
A computing architecture that facilitates autonomously controlling operations of a microgrid is described herein. A microgrid network includes numerous computing devices that execute intelligent agents, each of which is assigned to a particular entity (load, source, storage device, or switch) in the microgrid. The intelligent agents can execute in accordance with predefined protocols to collectively perform computations that facilitate uninterrupted control of the .
Parallel algorithms and architectures for the manipulator inertia matrix
Amin-Javaheri, M.
1989-01-01
Several parallel algorithms and architectures to compute the manipulator inertia matrix in real time are proposed. An O(N) and an O(log{sub 2}N) parallel algorithm based upon recursive computation of the inertial parameters of sets of composite rigid bodies are formulated. One- and two-dimensional systolic architectures are presented to implement the O(N) parallel algorithm. A cube architecture is employed to implement the diagonal element of the inertia matrix in O(log{sub 2}N) time and the upper off-diagonal elements in O(N) time. The resulting K{sub 1}O(N) + K{sub 2}O(log{sub 2}N) parallel algorithm is more efficient for a cube network implementation. All the architectural configurations are based upon a VLSI Robotics Processor exploiting fine-grain parallelism. In evaluation all the architectural configurations, significant performance parameters such as I/O time and idle time due to processor synchronization as well as CPU utilization and on-chip memory size are fully included. The O(N) and O(log{sub 2}N) parallel algorithms adhere to the precedence relationships among the processors. In order to achieve a higher speedup factor; however, parallel algorithms in conjunction with Non-Strict Computational Models are devised to relax interprocess precedence, and as a result, to decrease the effective computational delays. The effectiveness of the Non-strict Computational Algorithms is verified by computer simulations, based on a PUMA 560 robot manipulator. It is demonstrated that a combination of parallel algorithms and architectures results in a very effective approach to achieve real-time response for computing the manipulator inertia matrix.
Savannah River Site computing architecture
Not Available
1991-03-29
A computing architecture is a framework for making decisions about the implementation of computer technology and the supporting infrastructure. Because of the size, diversity, and amount of resources dedicated to computing at the Savannah River Site (SRS), there must be an overall strategic plan that can be followed by the thousands of site personnel who make decisions daily that directly affect the SRS computing environment and impact the site's production and business systems. This plan must address the following requirements: There must be SRS-wide standards for procurement or development of computing systems (hardware and software). The site computing organizations must develop systems that end users find easy to use. Systems must be put in place to support the primary function of site information workers. The developers of computer systems must be given tools that automate and speed up the development of information systems and applications based on computer technology. This document describes a proposal for a site-wide computing architecture that addresses the above requirements. In summary, this architecture is standards-based data-driven, and workstation-oriented with larger systems being utilized for the delivery of needed information to users in a client-server relationship.
Savannah River Site computing architecture
Not Available
1991-03-29
A computing architecture is a framework for making decisions about the implementation of computer technology and the supporting infrastructure. Because of the size, diversity, and amount of resources dedicated to computing at the Savannah River Site (SRS), there must be an overall strategic plan that can be followed by the thousands of site personnel who make decisions daily that directly affect the SRS computing environment and impact the site`s production and business systems. This plan must address the following requirements: There must be SRS-wide standards for procurement or development of computing systems (hardware and software). The site computing organizations must develop systems that end users find easy to use. Systems must be put in place to support the primary function of site information workers. The developers of computer systems must be given tools that automate and speed up the development of information systems and applications based on computer technology. This document describes a proposal for a site-wide computing architecture that addresses the above requirements. In summary, this architecture is standards-based data-driven, and workstation-oriented with larger systems being utilized for the delivery of needed information to users in a client-server relationship.
Nonlinear hierarchical substructural parallelism and computer architecture
NASA Technical Reports Server (NTRS)
Padovan, Joe
1989-01-01
Computer architecture is investigated in conjunction with the algorithmic structures of nonlinear finite-element analysis. To help set the stage for this goal, the development is undertaken by considering the wide-ranging needs associated with the analysis of rolling tires which possess the full range of kinematic, material and boundary condition induced nonlinearity in addition to gross and local cord-matrix material properties.
NASA Astrophysics Data System (ADS)
Basu, S.; Ganguly, S.; Nemani, R. R.; Mukhopadhyay, S.; Milesi, C.; Votava, P.; Michaelis, A.; Zhang, G.; Cook, B. D.; Saatchi, S. S.; Boyda, E.
2014-12-01
Accurate tree cover delineation is a useful instrument in the derivation of Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) satellite imagery data. Numerous algorithms have been designed to perform tree cover delineation in high to coarse resolution satellite imagery, but most of them do not scale to terabytes of data, typical in these VHR datasets. In this paper, we present an automated probabilistic framework for the segmentation and classification of 1-m VHR data as obtained from the National Agriculture Imagery Program (NAIP) for deriving tree cover estimates for the whole of Continental United States, using a High Performance Computing Architecture. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field (CRF), which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by incorporating expert knowledge through the relabeling of misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the state of California, which covers a total of 11,095 NAIP tiles and spans a total geographical area of 163,696 sq. miles. Our framework produced correct detection rates of around 85% for fragmented forests and 70% for urban tree cover areas, with false positive rates lower than 3% for both regions. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR high-resolution canopy height model shows the effectiveness of our algorithm in generating accurate high-resolution tree cover maps.
Specialized computer architectures for computational aerodynamics
NASA Technical Reports Server (NTRS)
Stevenson, D. K.
1978-01-01
In recent years, computational fluid dynamics has made significant progress in modelling aerodynamic phenomena. Currently, one of the major barriers to future development lies in the compute-intensive nature of the numerical formulations and the relative high cost of performing these computations on commercially available general purpose computers, a cost high with respect to dollar expenditure and/or elapsed time. Today's computing technology will support a program designed to create specialized computing facilities to be dedicated to the important problems of computational aerodynamics. One of the still unresolved questions is the organization of the computing components in such a facility. The characteristics of fluid dynamic problems which will have significant impact on the choice of computer architecture for a specialized facility are reviewed.
Wireless Computing Architecture
2009-07-01
mechanisms are relevant to a broad spectrum of applications , but are particularly important to data broadcast in wireless distributed computing...significantly improve applications where reliable data broadcast is required. For example, unmanned aerial vehicles (UAVs) may use Rainbow to distribute ...68-74. 8. Dean, J., Ghemawat, S., “ MapReduce : simplified data processing on large clusters ”, Communications of the ACM, 51, 1, 2008, pp. 107-113
Wireless Computing Architecture II
2010-11-01
responsible for running computation tasks as well as storing HDFS data blocks. This arrangement is consistent with that of Amazon Elastic MapReduce clusters ...unpredictable application demands and large data sets. For example, application demands may change in response to sudden weather shifts or ―surprise...comparing TCP throughput distributions for model-generated traces against those for actual traces randomly sampled from field data . Our modeling
A Parallel Rendering Algorithm for MIMD Architectures
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.; Orloff, Tobias
1991-01-01
Applications such as animation and scientific visualization demand high performance rendering of complex three dimensional scenes. To deliver the necessary rendering rates, highly parallel hardware architectures are required. The challenge is then to design algorithms and software which effectively use the hardware parallelism. A rendering algorithm targeted to distributed memory MIMD architectures is described. For maximum performance, the algorithm exploits both object-level and pixel-level parallelism. The behavior of the algorithm is examined both analytically and experimentally. Its performance for large numbers of processors is found to be limited primarily by communication overheads. An experimental implementation for the Intel iPSC/860 shows increasing performance from 1 to 128 processors across a wide range of scene complexities. It is shown that minimal modifications to the algorithm will adapt it for use on shared memory architectures as well.
VLSI Architectures for Computing DFT's
NASA Technical Reports Server (NTRS)
Truong, T. K.; Chang, J. J.; Hsu, I. S.; Reed, I. S.; Pei, D. Y.
1986-01-01
Simplifications result from use of residue Fermat number systems. System of finite arithmetic over residue Fermat number systems enables calculation of discrete Fourier transform (DFT) of series of complex numbers with reduced number of multiplications. Computer architectures based on approach suitable for design of very-large-scale integrated (VLSI) circuits for computing DFT's. General approach not limited to DFT's; Applicable to decoding of error-correcting codes and other transform calculations. System readily implemented in VLSI.
Optical linear algebra processors - Architectures and algorithms
NASA Technical Reports Server (NTRS)
Casasent, David
1986-01-01
Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.
Efficient tree codes on SIMD computer architectures
NASA Astrophysics Data System (ADS)
Olson, Kevin M.
1996-11-01
This paper describes changes made to a previous implementation of an N -body tree code developed for a fine-grained, SIMD computer architecture. These changes include (1) switching from a balanced binary tree to a balanced oct tree, (2) addition of quadrupole corrections, and (3) having the particles search the tree in groups rather than individually. An algorithm for limiting errors is also discussed. In aggregate, these changes have led to a performance increase of over a factor of 10 compared to the previous code. For problems several times larger than the processor array, the code now achieves performance levels of ~ 1 Gflop on the Maspar MP-2 or roughly 20% of the quoted peak performance of this machine. This percentage is competitive with other parallel implementations of tree codes on MIMD architectures. This is significant, considering the low relative cost of SIMD architectures.
Fast semivariogram computation using FPGA architectures
NASA Astrophysics Data System (ADS)
Lagadapati, Yamuna; Shirvaikar, Mukul; Dong, Xuanliang
2015-02-01
The semivariogram is a statistical measure of the spatial distribution of data and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. The semivariogram is a plot of semivariances for different lag distances between pixels. A semi-variance, γ(h), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h. Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O(n2). Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz, but they can perform tens of thousands of calculations per clock cycle while operating in the low range of power. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. The design consists of several modules dedicated to the constituent computational tasks. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. Anisotropic semivariogram implementation is anticipated to be an extension of the current architecture, ostensibly based on refinements to the current modules. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from MRI scans are utilized for the experiments
An S_{N} Algorithm for Modern Architectures
Baker, Randal Scott
2016-08-29
LANL discrete ordinates transport packages are required to perform large, computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While KBA methods scale out well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, we describe a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current, and future, architectural changes within a compute node.
Innovative architectures for dense multi-microprocessor computers
NASA Technical Reports Server (NTRS)
Donaldson, Thomas; Doty, Karl; Engle, Steven W.; Larson, Robert E.; O'Reilly, John G.
1988-01-01
The results of a Phase I Small Business Innovative Research (SBIR) project performed for the NASA Langley Computational Structural Mechanics Group are described. The project resulted in the identification of a family of chordal-ring interconnection architectures with excellent potential to serve as the basis for new multimicroprocessor (MMP) computers. The paper presents examples of how computational algorithms from structural mechanics can be efficiently implemented on the chordal-ring architecture.
Gropp, William D.
2014-06-23
With the coming end of Moore's law, it has become essential to develop new algorithms and techniques that can provide the performance needed by demanding computational science applications, especially those that are part of the DOE science mission. This work was part of a multi-institution, multi-investigator project that explored several approaches to develop algorithms that would be effective at the extreme scales and with the complex processor architectures that are expected at the end of this decade. The work by this group developed new performance models that have already helped guide the development of highly scalable versions of an algebraic multigrid solver, new programming approaches designed to support numerical algorithms on heterogeneous architectures, and a new, more scalable version of conjugate gradient, an important algorithm in the solution of very large linear systems of equations.
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi
1989-01-01
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
A heterogeneous hierarchical architecture for real-time computing
Skroch, D.A.; Fornaro, R.J.
1988-12-01
The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.
Architectural Implications for Spatial Object Association Algorithms
Kumar, V S; Kurc, T; Saltz, J; Abdulla, G; Kohn, S R; Matarazzo, C
2009-01-29
Spatial object association, also referred to as cross-match of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server R, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation provides insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST).
NASA Technical Reports Server (NTRS)
Metcalfe, A. G.; Bodenheimer, R. E.
1976-01-01
A parallel algorithm for counting the number of logic-l elements in a binary array or image developed during preliminary investigation of the Tse concept is described. The counting algorithm is implemented using a basic combinational structure. Modifications which improve the efficiency of the basic structure are also presented. A programmable Tse computer structure is proposed, along with a hardware control unit, Tse instruction set, and software program for execution of the counting algorithm. Finally, a comparison is made between the different structures in terms of their more important characteristics.
Savannah River Site computing architecture migration guide
Not Available
1991-07-30
The SRS Computing Architecture is a vision statement for site computing which enumerates the strategies which will guide SRS computing efforts for the 1990s. Each strategy is supported by a number of feature statements which clarify the strategy by providing additional detail. Since it is a strategic planning document, the Architecture has sitewide applicability and endorsement but does not attempt to specify implementation details. It does, however, specify that a document will be developed to guide the migration from the current site environment to that envisioned by the new architecture. The goal of this document, the SRS Computing Architecture Migration Guide, is to identify specific strategic and tactical tasks which would have to be completed to fully implement the architectural vision for site computing as well as a recommended sequence and timeframe for addressing these tasks. It takes into account the expected availability of technology, the existing installed base, and interdependencies among architectural components and objectives.
Algorithmes et architectures pour ordinateurs quantiques supraconducteurs
NASA Astrophysics Data System (ADS)
Blais, A.
2003-09-01
Algorithms and architectures for superconducting quantum computers Since its formulation, information theory was based, implicitly, on the laws of classical physics. Such a formulation is however incomplete because it does not take into account quantum reality. During the last twenty years, expansion of theory information to include quantum effects has known growing interest. The practical realization of a system for quantum data processing system, a quantum computer, presents however many challenges. In this book, we are interested in various aspects of these challenges. We start by presenting algorithmic concepts like optimization of quantum computations and geometric quantum computation. We then consider various designs and aspects of qubits based on Josephson junctions. In particular, an original approach to the interaction between superconducting qubits is presented. This approach is very general since it can be applied to various designs of qubits. Finally, we are interested in read-out of the superconductic flux qubits. The detector suggested here has the advantage that it is possible to uncouple it from the qubit when no measurement is in progress. Depuis sa formulation, la théorie de l'information a été basée, implicitement, sur les lois de la physique classique. Une telle formulation est toutefois incomplète puisqu'elle ne tient pas compte de la réalité quantique. Au cours des vingt dernières années, l'expansion de la théorie de l'information, de façon à englober les effets purement quantiques, a connu un intérêt grandissant. La réalisation d'un système de traitement de l'information quantique, un ordinateur quantique, présente toutefois de nombreux défis. Dans cet ouvrage, on s'intéresse à différents aspects concernant ces défis. On commence par présenter des concepts algorithmiques comme l'optimisation de calculs quantiques et le calcul quantique géométrique. Par la suite, on s'intéresse à différents designs et aspects de l
Algorithms on ensemble quantum computers.
Boykin, P Oscar; Mor, Tal; Roychowdhury, Vwani; Vatan, Farrokh
2010-06-01
In ensemble (or bulk) quantum computation, all computations are performed on an ensemble of computers rather than on a single computer. Measurements of qubits in an individual computer cannot be performed; instead, only expectation values (over the complete ensemble of computers) can be measured. As a result of this limitation on the model of computation, many algorithms cannot be processed directly on such computers, and must be modified, as the common strategy of delaying the measurements usually does not resolve this ensemble-measurement problem. Here we present several new strategies for resolving this problem. Based on these strategies we provide new versions of some of the most important quantum algorithms, versions that are suitable for implementing on ensemble quantum computers, e.g., on liquid NMR quantum computers. These algorithms are Shor's factorization algorithm, Grover's search algorithm (with several marked items), and an algorithm for quantum fault-tolerant computation. The first two algorithms are simply modified using a randomizing and a sorting strategies. For the last algorithm, we develop a classical-quantum hybrid strategy for removing measurements. We use it to present a novel quantum fault-tolerant scheme. More explicitly, we present schemes for fault-tolerant measurement-free implementation of Toffoli and σ(z)(¼) as these operations cannot be implemented "bitwise", and their standard fault-tolerant implementations require measurement.
A biconjugate gradient type algorithm on massively parallel architectures
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Hochbruck, Marlis
1991-01-01
The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.
Computational Controls Workstation: Algorithms and hardware
NASA Technical Reports Server (NTRS)
Venugopal, R.; Kumar, M.
1993-01-01
The Computational Controls Workstation provides an integrated environment for the modeling, simulation, and analysis of Space Station dynamics and control. Using highly efficient computational algorithms combined with a fast parallel processing architecture, the workstation makes real-time simulation of flexible body models of the Space Station possible. A consistent, user-friendly interface and state-of-the-art post-processing options are combined with powerful analysis tools and model databases to provide users with a complete environment for Space Station dynamics and control analysis. The software tools available include a solid modeler, graphical data entry tool, O(n) algorithm-based multi-flexible body simulation, and 2D/3D post-processors. This paper describes the architecture of the workstation while a companion paper describes performance and user perspectives.
Grammar Rules as Computer Algorithms.
ERIC Educational Resources Information Center
Rieber, Lloyd
1992-01-01
One college writing teacher engaged his class in the revision of a computer program to check grammar, focusing on improvement of the algorithms for identifying inappropriate uses of the passive voice. Process and problems of constructing new algorithms, effects on student writing, and other algorithm applications are discussed. (MSE)
Parallel Computing Strategies for Irregular Algorithms
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Electro-Optic Computing Architectures. Volume I
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit (OW
Fibonacci Numbers and Computer Algorithms.
ERIC Educational Resources Information Center
Atkins, John; Geist, Robert
1987-01-01
The Fibonacci Sequence describes a vast array of phenomena from nature. Computer scientists have discovered and used many algorithms which can be classified as applications of Fibonacci's sequence. In this article, several of these applications are considered. (PK)
A computer architecture for intelligent machines
NASA Technical Reports Server (NTRS)
Lefebvre, D. R.; Saridis, G. N.
1991-01-01
The Theory of Intelligent Machines proposes a hierarchical organization for the functions of an autonomous robot based on the Principle of Increasing Precision With Decreasing Intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed in recent years. A computer architecture that implements the lower two levels of the intelligent machine is presented. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Details of Execution Level controllers for motion and vision systems are addressed, as well as the Petri net transducer software used to implement Coordination Level functions. Extensions to UNIX and VxWorks operating systems which enable the development of a heterogeneous, distributed application are described. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.
Switching from Computer to Microcomputer Architecture Education
ERIC Educational Resources Information Center
Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore
2010-01-01
In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to…
THE COMPUTER AND THE ARCHITECTURAL PROFESSION.
ERIC Educational Resources Information Center
HAVILAND, DAVID S.
THE ROLE OF ADVANCING TECHNOLOGY IN THE FIELD OF ARCHITECTURE IS DISCUSSED IN THIS REPORT. PROBLEMS IN COMMUNICATION AND THE DESIGN PROCESS ARE IDENTIFIED. ADVANTAGES AND DISADVANTAGES OF COMPUTERS ARE MENTIONED IN RELATION TO MAN AND MACHINE INTERACTION. PRESENT AND FUTURE IMPLICATIONS OF COMPUTER USAGE ARE IDENTIFIED AND DISCUSSED WITH RESPECT…
DSP algorithms in FPGA: proposition of a new architecture
NASA Astrophysics Data System (ADS)
Kolasinski, Piotr; Zabolotny, Wojciech
2008-01-01
This paper presents a new reconfigurable architecture created in FPGA which is optimized for DSP algorithms like digital filters or digital transforms. The architecture tries to combine advantages of typical architectures like DSP processors and datapath architecture, while avoiding their drawbacks. The architecture is built from blocks called Operational Units (OU). Each Operational Unit contains the Control Unit (CU), which controls its operation. The Operational Units may operate in parallel, which shortens the processing time. This structure is also highly flexible, because all OUs may operate independently, executing their own programs. User may customize connections between units and modify architecture by adding new modules.
Algorithm and Architecture Independent Benchmarking with SEAK
Tallent, Nathan R.; Manzano Franco, Joseph B.; Gawande, Nitin A.; Kang, Seung-Hwa; Kerbyson, Darren J.; Hoisie, Adolfy; Cross, Joseph
2016-05-23
Many applications of high performance embedded computing are limited by performance or power bottlenecks. We have designed the Suite for Embedded Applications & Kernels (SEAK), a new benchmark suite, (a) to capture these bottlenecks in a way that encourages creative solutions; and (b) to facilitate rigorous, objective, end-user evaluation for their solutions. To avoid biasing solutions toward existing algorithms, SEAK benchmarks use a mission-centric (abstracted from a particular algorithm) and goal-oriented (functional) specification. To encourage solutions that are any combination of software or hardware, we use an end-user black-box evaluation that can capture tradeoffs between performance, power, accuracy, size, and weight. The tradeoffs are especially informative for procurement decisions. We call our benchmarks future proof because each mission-centric interface and evaluation remains useful despite shifting algorithmic preferences. It is challenging to create both concise and precise goal-oriented specifications for mission-centric problems. This paper describes the SEAK benchmark suite and presents an evaluation of sample solutions that highlights power and performance tradeoffs.
FFT Computation with Systolic Arrays, A New Architecture
NASA Technical Reports Server (NTRS)
Boriakoff, Valentin
1994-01-01
The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.
A fully programmable computing architecture for medical ultrasound machines.
Schneider, Fabio Kurt; Agarwal, Anup; Yoo, Yang Mo; Fukuoka, Tetsuya; Kim, Yongmin
2010-03-01
Application-specific ICs have been traditionally used to support the high computational and data rate requirements in medical ultrasound systems, particularly in receive beamforming. Utilizing the previously developed efficient front-end algorithms, in this paper, we present a simple programmable computing architecture, consisting of a field-programmable gate array (FPGA) and a digital signal processor (DSP), to support core ultrasound signal processing. It was found that 97.3% and 51.8% of the FPGA and DSP resources are, respectively, needed to support all the front-end and back-end processing for B-mode imaging with 64 channels and 120 scanlines per frame at 30 frames/s. These results indicate that this programmable architecture can meet the requirements of low- and medium-level ultrasound machines while providing a flexible platform for supporting the development and deployment of new algorithms and emerging clinical applications.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1987-01-01
The results of ongoing research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a spatial distributed computer environment is presented. This model is identified by the acronym ATAMM (Algorithm/Architecture Mapping Model). The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to optimize computational concurrency in the multiprocessor environment and to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.
A VLSI architecture for simplified arithmetic Fourier transform algorithm
NASA Technical Reports Server (NTRS)
Reed, Irving S.; Shih, Ming-Tang; Truong, T. K.; Hendon, E.; Tufts, D. W.
1992-01-01
The arithmetic Fourier transform (AFT) is a number-theoretic approach to Fourier analysis which has been shown to perform competitively with the classical FFT in terms of accuracy, complexity, and speed. Theorems developed in a previous paper for the AFT algorithm are used here to derive the original AFT algorithm which Bruns found in 1903. This is shown to yield an algorithm of less complexity and of improved performance over certain recent AFT algorithms. A VLSI architecture is suggested for this simplified AFT algorithm. This architecture uses a butterfly structure which reduces the number of additions by 25 percent of that used in the direct method.
Architectural Implications of Cloud Computing
2011-10-24
Mellon University Final Thoughts 1 Cloud Computing is in essence an economic model • It is a different way to acquire and manage IT resources...Cloud (EC2): http://aws.amazon.com/ec2/ • Amazon Simple Storage Solution (S3): http://aws.amazon.com/s3/ • Eucalyptus Systems: http
Electromagnetic physics models for parallel computing architectures
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2016-11-21
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.
Electromagnetic physics models for parallel computing architectures
Amadio, G.; Ananya, A.; Apostolakis, J.; ...
2016-11-21
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less
Electromagnetic Physics Models for Parallel Computing Architectures
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2016-10-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
Advanced high-performance computer system architectures
NASA Astrophysics Data System (ADS)
Vinogradov, V. I.
2007-02-01
Convergence of computer systems and communication technologies are moving to switched high-performance modular system architectures on the basis of high-speed switched interconnections. Multi-core processors become more perspective way to high-performance system, and traditional parallel bus system architectures (VME/VXI, cPCI/PXI) are moving to new higher speed serial switched interconnections. Fundamentals in system architecture development are compact modular component strategy, low-power processor, new serial high-speed interface chips on the board, and high-speed switched fabric for SAN architectures. Overview of advanced modular concepts and new international standards for development high-performance embedded and compact modular systems for real-time applications are described.
Associative Algorithms for Computational Creativity
ERIC Educational Resources Information Center
Varshney, Lav R.; Wang, Jun; Varshney, Kush R.
2016-01-01
Computational creativity, the generation of new, unimagined ideas or artifacts by a machine that are deemed creative by people, can be applied in the culinary domain to create novel and flavorful dishes. In fact, we have done so successfully using a combinatorial algorithm for recipe generation combined with statistical models for recipe ranking…
Algorithms Bridging Quantum Computation and Chemistry
NASA Astrophysics Data System (ADS)
McClean, Jarrod Ryan
The design of new materials and chemicals derived entirely from computation has long been a goal of computational chemistry, and the governing equation whose solution would permit this dream is known. Unfortunately, the exact solution to this equation has been far too expensive and clever approximations fail in critical situations. Quantum computers offer a novel solution to this problem. In this work, we develop not only new algorithms to use quantum computers to study hard problems in chemistry, but also explore how such algorithms can help us to better understand and improve our traditional approaches. In particular, we first introduce a new method, the variational quantum eigensolver, which is designed to maximally utilize the quantum resources available in a device to solve chemical problems. We apply this method in a real quantum photonic device in the lab to study the dissociation of the helium hydride (HeH+) molecule. We also enhance this methodology with architecture specific optimizations on ion trap computers and show how linear-scaling techniques from traditional quantum chemistry can be used to improve the outlook of similar algorithms on quantum computers. We then show how studying quantum algorithms such as these can be used to understand and enhance the development of classical algorithms. In particular we use a tool from adiabatic quantum computation, Feynman's Clock, to develop a new discrete time variational principle and further establish a connection between real-time quantum dynamics and ground state eigenvalue problems. We use these tools to develop two novel parallel-in-time quantum algorithms that outperform competitive algorithms as well as offer new insights into the connection between the fermion sign problem of ground states and the dynamical sign problem of quantum dynamics. Finally we use insights gained in the study of quantum circuits to explore a general notion of sparsity in many-body quantum systems. In particular we use
Parallel architectures for computing cyclic convolutions
NASA Technical Reports Server (NTRS)
Yeh, C.-S.; Reed, I. S.; Truong, T. K.
1983-01-01
In the paper two parallel architectural structures are developed to compute one-dimensional cyclic convolutions. The first structure is based on the Chinese remainder theorem and Kung's pipelined array. The second structure is a direct mapping from the mathematical definition of a cyclic convolution to a computational architecture. To compute a d-point cyclic convolution the first structure needs d/2 inner product cells, while the second structure and Kung's linear array require d cells. However, to compute a cyclic convolution, the second structure requires less time than both the first structure and Kung's linear array. Another application of the second structure is to multiply a Toeplitz matrix by a vector. A table is listed to compare these two structures and Kung's linear array. Both structures are simple and regular and are therefore suitable for VLSI implementation.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1988-01-01
The purpose is to document research to develop strategies for concurrent processing of complex algorithms in data driven architectures. The problem domain consists of decision-free algorithms having large-grained, computationally complex primitive operations. Such are often found in signal processing and control applications. The anticipated multiprocessor environment is a data flow architecture containing between two and twenty computing elements. Each computing element is a processor having local program memory, and which communicates with a common global data memory. A new graph theoretic model called ATAMM which establishes rules for relating a decomposed algorithm to its execution in a data flow architecture is presented. The ATAMM model is used to determine strategies to achieve optimum time performance and to develop a system diagnostic software tool. In addition, preliminary work on a new multiprocessor operating system based on the ATAMM specifications is described.
Quantum perceptron over a field and neural network architecture selection in a quantum computer.
da Silva, Adenilton José; Ludermir, Teresa Bernarda; de Oliveira, Wilson Rosa
2016-04-01
In this work, we propose a quantum neural network named quantum perceptron over a field (QPF). Quantum computers are not yet a reality and the models and algorithms proposed in this work cannot be simulated in actual (or classical) computers. QPF is a direct generalization of a classical perceptron and solves some drawbacks found in previous models of quantum perceptrons. We also present a learning algorithm named Superposition based Architecture Learning algorithm (SAL) that optimizes the neural network weights and architectures. SAL searches for the best architecture in a finite set of neural network architectures with linear time over the number of patterns in the training set. SAL is the first learning algorithm to determine neural network architectures in polynomial time. This speedup is obtained by the use of quantum parallelism and a non-linear quantum operator.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Som, Sukhamoy; Stoughton, John W.; Mielke, Roland R.
1990-01-01
Performance modeling and performance enhancement for periodic execution of large-grain, decision-free algorithms in data flow architectures are discussed. Applications include real-time implementation of control and signal processing algorithms where performance is required to be highly predictable. The mapping of algorithms onto the specified class of data flow architectures is realized by a marked graph model called algorithm to architecture mapping model (ATAMM). Performance measures and bounds are established. Algorithm transformation techniques are identified for performance enhancement and reduction of resource (computing element) requirements. A systematic design procedure is described for generating operating conditions for predictable performance both with and without resource constraints. An ATAMM simulator is used to test and validate the performance prediction by the design procedure. Experiments on a three resource testbed provide verification of the ATAMM model and the design procedure.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.; Som, Sukhamony
1990-01-01
The performance modeling and enhancement for periodic execution of large-grain, decision-free algorithms in data flow architectures is examined. Applications include real-time implementation of control and signal processing algorithms where performance is required to be highly predictable. The mapping of algorithms onto the specified class of data flow architectures is realized by a marked graph model called ATAMM (Algorithm To Architecture Mapping Model). Performance measures and bounds are established. Algorithm transformation techniques are identified for performance enhancement and reduction of resource (computing element) requirements. A systematic design procedure is described for generating operating conditions for predictable performance both with and without resource constraints. An ATAMM simulator is used to test and validate the performance prediction by the design procedure. Experiments on a three resource testbed provide verification of the ATAMM model and the design procedure.
Evaluation of Visual Computer Simulator for Computer Architecture Education
ERIC Educational Resources Information Center
Imai, Yoshiro; Imai, Masatoshi; Moritoh, Yoshio
2013-01-01
This paper presents trial evaluation of a visual computer simulator in 2009-2011, which has been developed to play some roles of both instruction facility and learning tool simultaneously. And it illustrates an example of Computer Architecture education for University students and usage of e-Learning tool for Assembly Programming in order to…
Highly parallel computer architecture for robotic computation
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Anta K. (Inventor)
1991-01-01
In a computer having a large number of single instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.
ATCA for Machines-- Advanced Telecommunications Computing Architecture
Larsen, R.S.; /SLAC
2008-04-22
The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R&D including application of HA principles to power electronics systems.
Implementing a computing architecture with WISDOM
Zebrowski, J.R.
1991-01-01
Over the past two years, the Savannah River Site (SRS) work force has expanded by more than 6000 employees. This large influx of personnel, in conjunction with the limited office space, has resulted in an overcrowding problem on site. To alleviate some of the overcrowding, Westinghouse Savannah River Company (WSRC) has been in the process of leasing space from several office buildings within Aiken, SC. Brookhaven, the latest off-site office building to be leased, is the starting point for a new direction in office automation which will eventually spread throughout SRS. The computing architecture in place at Brookhaven was designed to adhere to the SRS computer architecture guidelines as published by the WSRC Computer Architecture Standards Team (CAST). At the heart of the Brookhaven implementation is a Workstation Integration System for DOS, OS/2 and Macintosh (WISDOM). The key features of the WISDOM system include: it's utilization of a Local Area Network (LAN), it's Graphical User Interface (GUI), it's cross-platform capability, it's portable user interface, and the installation program. To begin, I will give an overview of the network architecture, then discuss WISDOM in detail, mention some platform integration problems that need to be addressed and conclude with a summary of the user benefits that WISDOM provides.
Implementing a computing architecture with WISDOM
Zebrowski, J.R.
1991-12-31
Over the past two years, the Savannah River Site (SRS) work force has expanded by more than 6000 employees. This large influx of personnel, in conjunction with the limited office space, has resulted in an overcrowding problem on site. To alleviate some of the overcrowding, Westinghouse Savannah River Company (WSRC) has been in the process of leasing space from several office buildings within Aiken, SC. Brookhaven, the latest off-site office building to be leased, is the starting point for a new direction in office automation which will eventually spread throughout SRS. The computing architecture in place at Brookhaven was designed to adhere to the SRS computer architecture guidelines as published by the WSRC Computer Architecture Standards Team (CAST). At the heart of the Brookhaven implementation is a Workstation Integration System for DOS, OS/2 and Macintosh (WISDOM). The key features of the WISDOM system include: it`s utilization of a Local Area Network (LAN), it`s Graphical User Interface (GUI), it`s cross-platform capability, it`s portable user interface, and the installation program. To begin, I will give an overview of the network architecture, then discuss WISDOM in detail, mention some platform integration problems that need to be addressed and conclude with a summary of the user benefits that WISDOM provides.
Computer graphics in architecture and engineering
NASA Technical Reports Server (NTRS)
Greenberg, D. P.
1975-01-01
The present status of the application of computer graphics to the building profession or architecture and its relationship to other scientific and technical areas were discussed. It was explained that, due to the fragmented nature of architecture and building activities (in contrast to the aerospace industry), a comprehensive, economic utilization of computer graphics in this area is not practical and its true potential cannot now be realized due to the present inability of architects and structural, mechanical, and site engineers to rely on a common data base. Future emphasis will therefore have to be placed on a vertical integration of the construction process and effective use of a three-dimensional data base, rather than on waiting for any technological breakthrough in interactive computing.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1988-01-01
Research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a special distributed computer environment is presented. This model is identified by the acronym ATAMM which represents Algorithms To Architecture Mapping Model. The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.
Computing architecture for telerobots in earth orbit
NASA Technical Reports Server (NTRS)
Bejczy, A. K.; Dotson, R. S.; Szakaly, Z.
1987-01-01
Based on generic operational and computational requirements associated with the control of telerobots in earth orbit, a multibus-based distributed but integrated computing architecture is proposed. An experimental system of that kind under development at the Jet Propulsion Laboratory (JPL) is briefly described. It uses Intel Multibus I at both control station and remote robot (telerobot) computing nodes. An essential element within each multibus is a Unified (or Universal) Computer Control Subsystem (UCCS) for telerobot and control station motor components. The two multibus-based computing nodes can be linked by parallel or high speed serial links for real-time data transmission and for closing the real-time bilateral (force-reflecting) control loop between telerobot and control station. The experimental system is briefly commented, followed by a brief discussion of future development plans and possibilities.
Efficient Universal Computing Architectures for Decoding Neural Activity
Rapoport, Benjamin I.; Turicchia, Lorenzo; Wattanapanitch, Woradorn; Davidson, Thomas J.; Sarpeshkar, Rahul
2012-01-01
The ability to decode neural activity into meaningful control signals for prosthetic devices is critical to the development of clinically useful brain– machine interfaces (BMIs). Such systems require input from tens to hundreds of brain-implanted recording electrodes in order to deliver robust and accurate performance; in serving that primary function they should also minimize power dissipation in order to avoid damaging neural tissue; and they should transmit data wirelessly in order to minimize the risk of infection associated with chronic, transcutaneous implants. Electronic architectures for brain– machine interfaces must therefore minimize size and power consumption, while maximizing the ability to compress data to be transmitted over limited-bandwidth wireless channels. Here we present a system of extremely low computational complexity, designed for real-time decoding of neural signals, and suited for highly scalable implantable systems. Our programmable architecture is an explicit implementation of a universal computing machine emulating the dynamics of a network of integrate-and-fire neurons; it requires no arithmetic operations except for counting, and decodes neural signals using only computationally inexpensive logic operations. The simplicity of this architecture does not compromise its ability to compress raw neural data by factors greater than . We describe a set of decoding algorithms based on this computational architecture, one designed to operate within an implanted system, minimizing its power consumption and data transmission bandwidth; and a complementary set of algorithms for learning, programming the decoder, and postprocessing the decoded output, designed to operate in an external, nonimplanted unit. The implementation of the implantable portion is estimated to require fewer than 5000 operations per second. A proof-of-concept, 32-channel field-programmable gate array (FPGA) implementation of this portion is consequently energy efficient
NASA Technical Reports Server (NTRS)
Hsia, T. C.; Lu, G. Z.; Han, W. H.
1987-01-01
In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
Roadmap to the SRS computing architecture
Johnson, A.
1994-07-05
This document outlines the major steps that must be taken by the Savannah River Site (SRS) to migrate the SRS information technology (IT) environment to the new architecture described in the Savannah River Site Computing Architecture. This document proposes an IT environment that is {open_quotes}...standards-based, data-driven, and workstation-oriented, with larger systems being utilized for the delivery of needed information to users in a client-server relationship.{close_quotes} Achieving this vision will require many substantial changes in the computing applications, systems, and supporting infrastructure at the site. This document consists of a set of roadmaps which provide explanations of the necessary changes for IT at the site and describes the milestones that must be completed to finish the migration.
DFT algorithms for bit-serial GaAs array processor architectures
NASA Technical Reports Server (NTRS)
Mcmillan, Gary B.
1988-01-01
Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
Reconfigurable materials: Algorithm for architectural origami
NASA Astrophysics Data System (ADS)
Paik, Jamie
2017-01-01
An algorithm has been developed allowing the rational design of origami-inspired materials that can be rearranged to change their properties. This might open the way to strategies for making reconfigurable robots. See Article p.347
NASA Technical Reports Server (NTRS)
Mielke, R.; Stoughton, J.; Som, S.; Obando, R.; Malekpour, M.; Mandala, B.
1990-01-01
A functional description of the ATAMM Multicomputer Operating System is presented. ATAMM (Algorithm to Architecture Mapping Model) is a marked graph model which describes the implementation of large grained, decomposed algorithms on data flow architectures. AMOS, the ATAMM Multicomputer Operating System, is an operating system which implements the ATAMM rules. A first generation version of AMOS which was developed for the Advanced Development Module (ADM) is described. A second generation version of AMOS being developed for the Generic VHSIC Spaceborne Computer (GVSC) is also presented.
Experimental comparison of two quantum computing architectures
Linke, Norbert M.; Maslov, Dmitri; Roetteler, Martin; Debnath, Shantanu; Figgatt, Caroline; Landsman, Kevin A.; Wright, Kenneth; Monroe, Christopher
2017-01-01
We run a selection of algorithms on two state-of-the-art 5-qubit quantum computers that are based on different technology platforms. One is a publicly accessible superconducting transmon device (www.research.ibm.com/ibm-q) with limited connectivity, and the other is a fully connected trapped-ion system. Even though the two systems have different native quantum interactions, both can be programed in a way that is blind to the underlying hardware, thus allowing a comparison of identical quantum algorithms between different physical systems. We show that quantum algorithms and circuits that use more connectivity clearly benefit from a better-connected system of qubits. Although the quantum systems here are not yet large enough to eclipse classical computers, this experiment exposes critical factors of scaling quantum computers, such as qubit connectivity and gate expressivity. In addition, the results suggest that codesigning particular quantum applications with the hardware itself will be paramount in successfully using quantum computers in the future. PMID:28325879
Performance evaluation of the SX-6 vector architecture forscientific computations
Oliker, Leonid; Canning, Andrew; Carter, Jonathan Carter; Shalf,John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri,Jahed; Van der Wijngaart, Rob
2005-01-01
The growing gap between sustained and peak performance for scientific applications is a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to reduce this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor, and compares it against the cache-based IBMPower3 and Power4 superscalar architectures, across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines many low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Overall results demonstrate that the SX-6 achieves high performance on a large fraction of our application suite and often significantly outperforms the cache-based architectures. However, certain classes of applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively.
Job Superscheduler Architecture and Performance in Computational Grid Environments
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Oliker, Leonid; Biswas, Rupak
2003-01-01
Computational grids hold great promise in utilizing geographically separated heterogeneous resources to solve large-scale complex scientific problems. However, a number of major technical hurdles, including distributed resource management and effective job scheduling, stand in the way of realizing these gains. In this paper, we propose a novel grid superscheduler architecture and three distributed job migration algorithms. We also model the critical interaction between the superscheduler and autonomous local schedulers. Extensive performance comparisons with ideal, central, and local schemes using real workloads from leading computational centers are conducted in a simulation environment. Additionally, synthetic workloads are used to perform a detailed sensitivity analysis of our superscheduler. Several key metrics demonstrate that substantial performance gains can be achieved via smart superscheduling in distributed computational grids.
Developing a Distributed Computing Architecture at Arizona State University.
ERIC Educational Resources Information Center
Armann, Neil; And Others
1994-01-01
Development of Arizona State University's computing architecture, designed to ensure that all new distributed computing pieces will work together, is described. Aspects discussed include the business rationale, the general architectural approach, characteristics and objectives of the architecture, specific services, and impact on the university…
Frances: A Tool for Understanding Computer Architecture and Assembly Language
ERIC Educational Resources Information Center
Sondag, Tyler; Pokorny, Kian L.; Rajan, Hridesh
2012-01-01
Students in all areas of computing require knowledge of the computing device including software implementation at the machine level. Several courses in computer science curricula address these low-level details such as computer architecture and assembly languages. For such courses, there are advantages to studying real architectures instead of…
Acoustooptic linear algebra processors - Architectures, algorithms, and applications
NASA Technical Reports Server (NTRS)
Casasent, D.
1984-01-01
Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1987-01-01
The development of numerical methods and software tools for parallel processors can be aided through the use of a hardware test-bed. The test-bed architecture must be flexible enough to support investigations into architecture-algorithm interactions. One way to implement a test-bed is to use a commercial parallel processor. Unfortunately, most commercial parallel processors are fixed in their interconnection and/or processor architecture. In this paper, we describe a modified n cube architecture, called the hypercluster, which is a superset of many other processor and interconnection architectures. The hypercluster is intended to support research into parallel processing of computational fluid and structural mechanics problems which may require a number of different architectural configurations. An example of how a typical partial differential equation solution algorithm maps on to the hypercluster is given.
Computer architecture evaluation for structural dynamics computations: Project summary
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1989-01-01
The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.
Computing Architecture for the ngVLA
NASA Astrophysics Data System (ADS)
Kern, Jeffrey S.; Glendenning, Brian; Hiriart, R.
2017-01-01
Computing challenges for the Next Generation Very Large Array (ngVLA) are not always the ones that first come to mind. Current design concepts have visibility data rates which allow the permanent storage of the raw visibility data, and although challenging, the calibration and imaging processing for the ngVLA is not beyond the capabilities of existing systems (let alone those that will exist when ngVLA construction is completed). Design goals include a system that supports a wide range of PI-driven projects, end to end data management, and the production of science ready data products. This should be accomplished while minimizing the operating costs of an array consisting of hundreds of elements distributed over an area of nearly 100,000 km2. We discuss a proposed architecture of the computing system, design constraints for a detailed design, and some possible design choices and their implications.
Quantum computation architecture using optical tweezers
Weitenberg, Christof; Kuhr, Stefan; Moelmer, Klaus; Sherson, Jacob F.
2011-09-15
We present a complete architecture for scalable quantum computation with ultracold atoms in optical lattices using optical tweezers focused to the size of a lattice spacing. We discuss three different two-qubit gates based on local collisional interactions. The gates between arbitrary qubits require the transport of atoms to neighboring sites. We numerically optimize the nonadiabatic transport of the atoms through the lattice and the intensity ramps of the optical tweezer in order to maximize the gate fidelities. We find overall gate times of a few 100 {mu}s, while keeping the error probability due to vibrational excitations and spontaneous scattering below 10{sup -3}. The requirements on the positioning error and intensity noise of the optical tweezer and the magnetic field stability are analyzed and we show that atoms in optical lattices could meet the requirements for fault-tolerant scalable quantum computing.
NASA Astrophysics Data System (ADS)
Hou, Zhen-Long; Wei, Xiao-Hui; Huang, Da-Nian; Sun, Xu
2015-09-01
We apply reweighted inversion focusing to full tensor gravity gradiometry data using message-passing interface (MPI) and compute unified device architecture (CUDA) parallel computing algorithms, and then combine MPI with CUDA to formulate a hybrid algorithm. Parallel computing performance metrics are introduced to analyze and compare the performance of the algorithms. We summarize the rules for the performance evaluation of parallel algorithms. We use model and real data from the Vinton salt dome to test the algorithms. We find good match between model and real density data, and verify the high efficiency and feasibility of parallel computing algorithms in the inversion of full tensor gravity gradiometry data.
A High Performance COTS Based Computer Architecture
NASA Astrophysics Data System (ADS)
Patte, Mathieu; Grimoldi, Raoul; Trautner, Roland
2014-08-01
Using Commercial Off The Shelf (COTS) electronic components for space applications is a long standing idea. Indeed the difference in processing performance and energy efficiency between radiation hardened components and COTS components is so important that COTS components are very attractive for use in mass and power constrained systems. However using COTS components in space is not straightforward as one must account with the effects of the space environment on the COTS components behavior. In the frame of the ESA funded activity called High Performance COTS Based Computer, Airbus Defense and Space and its subcontractor OHB CGS have developed and prototyped a versatile COTS based architecture for high performance processing. The rest of the paper is organized as follows: in a first section we will start by recapitulating the interests and constraints of using COTS components for space applications; then we will briefly describe existing fault mitigation architectures and present our solution for fault mitigation based on a component called the SmartIO; in the last part of the paper we will describe the prototyping activities executed during the HiP CBC project.
Hybrid Architectures for Evolutionary Computing Algorithms
2008-01-01
STATEMENT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. PA# WPAFB 08-0064 13. SUPPLEMENTARY NOTES 14 . ABSTRACT This report documents the...operator (in Xilinx StateCadtool) …….. 14 Figure 6. Partial code fragment of the declone operator, hand coded from state Diagram...Histogram of number of words added by exhaustive search for the runs of Figure 12 ………………………………………………………... 27 Figure 14 . Performance comparison of
Hybrid Architectures for Evolutionary Computing Algorithms
2006-01-01
SIMBIOSYS program we managed. We developed prototype optimization software tools in three programming environments, Labview, Matlab, and compiled C, and...Optimization Toolbox from North Carolina State University, integrated MatLab bio-models from Purdue University SIMBIOSYS PI and also with C version...That work was done actually done as a part of the in-house component of our involvement in the DARPA SIMBIOSYS and BIOCOMP programs. Under those
Algorithms and architectures for high performance analysis of semantic graphs.
Hendrickson, Bruce Alan
2005-09-01
analysis. Since intelligence datasets can be extremely large, the focus of this work is on the use of parallel computers. We have been working to develop scalable parallel algorithms that will be at the core of a semantic graph analysis infrastructure. Our work has involved two different thrusts, corresponding to two different computer architectures. The first architecture of interest is distributed memory, message passing computers. These machines are ubiquitous and affordable, but they are challenging targets for graph algorithms. Much of our distributed-memory work to date has been collaborative with researchers at Lawrence Livermore National Laboratory and has focused on finding short paths on distributed memory parallel machines. Our implementation on 32K processors of BlueGene/Light finds shortest paths between two specified vertices in just over a second for random graphs with 4 billion vertices.
Parallel language constructs for tensor product computations on loosely coupled architectures
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Vanrosendale, John
1989-01-01
Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
Resource utilization model for the algorithm to architecture mapping model
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Patel, Rakesh R.
1993-01-01
The analytical model for resource utilization and the variable node time and conditional node model for the enhanced ATAMM model for a real-time data flow architecture are presented in this research. The Algorithm To Architecture Mapping Model, ATAMM, is a Petri net based graph theoretic model developed at Old Dominion University, and is capable of modeling the execution of large-grained algorithms on a real-time data flow architecture. Using the resource utilization model, the resource envelope may be obtained directly from a given graph and, consequently, the maximum number of required resources may be evaluated. The node timing diagram for one iteration period may be obtained using the analytical resource envelope. The variable node time model, which describes the change in resource requirement for the execution of an algorithm under node time variation, is useful to expand the applicability of the ATAMM model to heterogeneous architectures. The model also describes a method of detecting the presence of resource limited mode and its subsequent prevention. Graphs with conditional nodes are shown to be reduced to equivalent graphs with time varying nodes and, subsequently, may be analyzed using the variable node time model to determine resource requirements. Case studies are performed on three graphs for the illustration of applicability of the analytical theories.
Algorithms and architectures for robot vision
NASA Technical Reports Server (NTRS)
Schenker, Paul S.
1990-01-01
The scope of the current work is to develop practical sensing implementations for robots operating in complex, partially unstructured environments. A focus in this work is to develop object models and estimation techniques which are specific to requirements of robot locomotion, approach and avoidance, and grasp and manipulation. Such problems have to date received limited attention in either computer or human vision - in essence, asking not only how perception is in general modeled, but also what is the functional purpose of its underlying representations. As in the past, researchers are drawing on ideas from both the psychological and machine vision literature. Of particular interest is the development 3-D shape and motion estimates for complex objects when given only partial and uncertain information and when such information is incrementally accrued over time. Current studies consider the use of surface motion, contour, and texture information, with the longer range goal of developing a fused sensing strategy based on these sources and others.
QPSO-based adaptive DNA computing algorithm.
Karakose, Mehmet; Cigdem, Ugur
2013-01-01
DNA (deoxyribonucleic acid) computing that is a new computation model based on DNA molecules for information storage has been increasingly used for optimization and data analysis in recent years. However, DNA computing algorithm has some limitations in terms of convergence speed, adaptability, and effectiveness. In this paper, a new approach for improvement of DNA computing is proposed. This new approach aims to perform DNA computing algorithm with adaptive parameters towards the desired goal using quantum-behaved particle swarm optimization (QPSO). Some contributions provided by the proposed QPSO based on adaptive DNA computing algorithm are as follows: (1) parameters of population size, crossover rate, maximum number of operations, enzyme and virus mutation rate, and fitness function of DNA computing algorithm are simultaneously tuned for adaptive process, (2) adaptive algorithm is performed using QPSO algorithm for goal-driven progress, faster operation, and flexibility in data, and (3) numerical realization of DNA computing algorithm with proposed approach is implemented in system identification. Two experiments with different systems were carried out to evaluate the performance of the proposed approach with comparative results. Experimental results obtained with Matlab and FPGA demonstrate ability to provide effective optimization, considerable convergence speed, and high accuracy according to DNA computing algorithm.
Verifying a Computer Algorithm Mathematically.
ERIC Educational Resources Information Center
Olson, Alton T.
1986-01-01
Presents an example of mathematics from an algorithmic point of view, with emphasis on the design and verification of this algorithm. The program involves finding roots for algebraic equations using the half-interval search algorithm. The program listing is included. (JN)
NASA Technical Reports Server (NTRS)
Weeks, Cindy Lou
1986-01-01
Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Algorithmic Mechanism Design of Evolutionary Computation
Pei, Yan
2015-01-01
We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm. PMID:26257777
Monte Carlo simulations on SIMD computer architectures. [Single instruction multiple data (SIMD)
Burmester, C.P.; Gronsky, R. ); Wille, L.T. . Dept. of Physics)
1992-03-01
Algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SMM) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carlo updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures.
A high throughput architecture for a low complexity soft-output demapping algorithm
NASA Astrophysics Data System (ADS)
Ali, I.; Wasenmüller, U.; Wehn, N.
2015-11-01
Iterative channel decoders such as Turbo-Code and LDPC decoders show exceptional performance and therefore they are a part of many wireless communication receivers nowadays. These decoders require a soft input, i.e., the logarithmic likelihood ratio (LLR) of the received bits with a typical quantization of 4 to 6 bits. For computing the LLR values from a received complex symbol, a soft demapper is employed in the receiver. The implementation cost of traditional soft-output demapping methods is relatively large in high order modulation systems, and therefore low complexity demapping algorithms are indispensable in low power receivers. In the presence of multiple wireless communication standards where each standard defines multiple modulation schemes, there is a need to have an efficient demapper architecture covering all the flexibility requirements of these standards. Another challenge associated with hardware implementation of the demapper is to achieve a very high throughput in double iterative systems, for instance, MIMO and Code-Aided Synchronization. In this paper, we present a comprehensive communication and hardware performance evaluation of low complexity soft-output demapping algorithms to select the best algorithm for implementation. The main goal of this work is to design a high throughput, flexible, and area efficient architecture. We describe architectures to execute the investigated algorithms. We implement these architectures on a FPGA device to evaluate their hardware performance. The work has resulted in a hardware architecture based on the figured out best low complexity algorithm delivering a high throughput of 166 Msymbols/second for Gray mapped 16-QAM modulation on Virtex-5. This efficient architecture occupies only 127 slice registers, 248 slice LUTs and 2 DSP48Es.
Malleable architecture generator for FPGA computing
NASA Astrophysics Data System (ADS)
Gokhale, Maya; Kaba, James; Marks, Aaron; Kim, Jang
1996-10-01
The malleable architecture generator (MARGE) is a tool set that translates high-level parallel C to configuration bit streams for field-programmable logic based computing systems. MARGE creates an application-specific instruction set and generates the custom hardware components required to perform exactly those computations specified by the C program. In contrast to traditional fixed-instruction processors, MARGE's dynamic instruction set creation provides for efficient use of hardware resources. MARGE processes intermediate code in which each operation is annotated by the bit lengths of the operands. Each basic block (sequence of straight line code) is mapped into a single custom instruction which contains all the operations and logic inherent in the block. A synthesis phase maps the operations comprising the instructions into register transfer level structural components and control logic which have been optimized to exploit functional parallelism and function unit reuse. As a final stage, commercial technology-specific tools are used to generate configuration bit streams for the desired target hardware. Technology- specific pre-placed, pre-routed macro blocks are utilized to implement as much of the hardware as possible. MARGE currently supports the Xilinx-based Splash-2 reconfigurable accelerator and National Semiconductor's CLAy-based parallel accelerator, MAPA. The MARGE approach has been demonstrated on systolic applications such as DNA sequence comparison.
A biomimetic adaptive algorithm and low-power architecture for implantable neural decoders.
Rapoport, Benjamin I; Wattanapanitch, Woradorn; Penagos, Hector L; Musallam, Sam; Andersen, Richard A; Sarpeshkar, Rahul
2009-01-01
Algorithmically and energetically efficient computational architectures that operate in real time are essential for clinically useful neural prosthetic devices. Such devices decode raw neural data to obtain direct control signals for external devices. They can also perform data compression and vastly reduce the bandwidth and consequently power expended in wireless transmission of raw data from implantable brain-machine interfaces. We describe a biomimetic algorithm and micropower analog circuit architecture for decoding neural cell ensemble signals. The decoding algorithm implements a continuous-time artificial neural network, using a bank of adaptive linear filters with kernels that emulate synaptic dynamics. The filters transform neural signal inputs into control-parameter outputs, and can be tuned automatically in an on-line learning process. We provide experimental validation of our system using neural data from thalamic head-direction cells in an awake behaving rat.
A modular architecture for transparent computation in recurrent neural networks.
Carmantini, Giovanni S; Beim Graben, Peter; Desroches, Mathieu; Rodrigues, Serafim
2017-01-01
Computation is classically studied in terms of automata, formal languages and algorithms; yet, the relation between neural dynamics and symbolic representations and operations is still unclear in traditional eliminative connectionism. Therefore, we suggest a unique perspective on this central issue, to which we would like to refer as transparent connectionism, by proposing accounts of how symbolic computation can be implemented in neural substrates. In this study we first introduce a new model of dynamics on a symbolic space, the versatile shift, showing that it supports the real-time simulation of a range of automata. We then show that the Gödelization of versatile shifts defines nonlinear dynamical automata, dynamical systems evolving on a vectorial space. Finally, we present a mapping between nonlinear dynamical automata and recurrent artificial neural networks. The mapping defines an architecture characterized by its granular modularity, where data, symbolic operations and their control are not only distinguishable in activation space, but also spatially localizable in the network itself, while maintaining a distributed encoding of symbolic representations. The resulting networks simulate automata in real-time and are programmed directly, in the absence of network training. To discuss the unique characteristics of the architecture and their consequences, we present two examples: (i) the design of a Central Pattern Generator from a finite-state locomotive controller, and (ii) the creation of a network simulating a system of interactive automata that supports the parsing of garden-path sentences as investigated in psycholinguistics experiments.
NASA Astrophysics Data System (ADS)
Ortiz, Fernando E.; Kelmelis, Eric J.; Arce, Gonzalo R.
2007-04-01
According to the Shannon-Nyquist theory, the number of samples required to reconstruct a signal is proportional to its bandwidth. Recently, it has been shown that acceptable reconstructions are possible from a reduced number of random samples, a process known as compressive sampling. Taking advantage of this realization has radical impact on power consumption and communication bandwidth, crucial in applications based on small/mobile/unattended platforms such as UAVs and distributed sensor networks. Although the benefits of these compression techniques are self-evident, the reconstruction process requires the solution of nonlinear signal processing algorithms, which limit applicability in portable and real-time systems. In particular, (1) the power consumption associated with the difficult computations offsets the power savings afforded by compressive sampling, and (2) limited computational power prevents these algorithms to maintain pace with the data-capturing sensors, resulting in undesirable data loss. FPGA based computers offer low power consumption and high computational capacity, providing a solution to both problems simultaneously. In this paper, we present an architecture that implements the algorithms central to compressive sampling in an FPGA environment. We start by studying the computational profile of the convex optimization algorithms used in compressive sampling. Then we present the design of a pixel pipeline suitable for FPGA implementation, able to compute these algorithms.
Optimal Multistage Algorithm for Adjoint Computation
Aupy, Guillaume; Herrmann, Julien; Hovland, Paul; Robert, Yves
2016-01-01
We reexamine the work of Stumm and Walther on multistage algorithms for adjoint computation. We provide an optimal algorithm for this problem when there are two levels of checkpoints, in memory and on disk. Previously, optimal algorithms for adjoint computations were known only for a single level of checkpoints with no writing and reading costs; a well-known example is the binomial checkpointing algorithm of Griewank and Walther. Stumm and Walther extended that binomial checkpointing algorithm to the case of two levels of checkpoints, but they did not provide any optimality results. We bridge the gap by designing the first optimal algorithm in this context. We experimentally compare our optimal algorithm with that of Stumm and Walther to assess the difference in performance.
A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.; Levit, Creon
1989-01-01
The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.
Algorithms in Modern Mathematics and Computer Science.
1980-01-01
A069 912 STANFORD UNIV CA DEPT OF COMPUTER SCIENCE F/6 12/1 ALGORITHMS IN MODERN MATHEMATICS AND COMPUTER SCIENCE .(U) JAN 80 D E KNUTH N00014-76-C...8217 Stanford Department of Computer Scienos aur 1980 Report No. STAN-CS-80-788 LEYEL~ rm ALGORITHMS IN MODERN MATHEMATICS AND COMPUTER SCIENCE by Donald L...Knuth 0 Oct Research sponsored by \\ ~ National Science Foun dation and Office of Naval Rlesearch COMPUTER SCIENCE DEPARlTMENT Stanford University
Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation
NASA Technical Reports Server (NTRS)
Stocker, John C.; Golomb, Andrew M.
2011-01-01
Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
Modern hardware architectures accelerate porous media flow computations
NASA Astrophysics Data System (ADS)
Kulczewski, Michal; Kurowski, Krzysztof; Kierzynka, Michal; Dohnalik, Marek; Kaczmarczyk, Jan; Borujeni, Ali Takbiri
2012-05-01
Investigation of rock properties, porosity and permeability particularly, which determines transport media characteristic, is crucial to reservoir engineering. Nowadays, micro-tomography (micro-CT) methods allow to obtain vast of petro-physical properties. The micro-CT method facilitates visualization of pores structures and acquisition of total porosity factor, determined by sticking together 2D slices of scanned rock and applying proper absorption cut-off point. Proper segmentation of pores representation in 3D is important to solve the permeability of porous media. This factor is recently determined by the means of Computational Fluid Dynamics (CFD), a popular method to analyze problems related to fluid flows, taking advantage of numerical methods and constantly growing computing powers. The recent advent of novel multi-, many-core and graphics processing unit (GPU) hardware architectures allows scientists to benefit even more from parallel processing and built-in new features. The high level of parallel scalability offers both, the time-to-solution decrease and greater accuracy - top factors in reservoir engineering. This paper aims to present research results related to fluid flow simulations, particularly solving the total porosity and permeability of porous media, taking advantage of modern hardware architectures. In our approach total porosity is calculated by the means of general-purpose computing on multiple GPUs. This application sticks together 2D slices of scanned rock and by the means of a marching tetrahedra algorithm, creates a 3D representation of pores and calculates the total porosity. Experimental results are compared with data obtained via other popular methods, including Nuclear Magnetic Resonance (NMR), helium porosity and nitrogen permeability tests. Then CFD simulations are performed on a large-scale high performance hardware architecture to solve the flow and permeability of porous media. In our experiments we used Lattice Boltzmann
Computing Algorithms for Nuffield Advanced Physics.
ERIC Educational Resources Information Center
Summers, M. K.
1978-01-01
Defines all recurrence relations used in the Nuffield course, to solve first- and second-order differential equations, and describes a typical algorithm for computer generation of solutions. (Author/GA)
NASA Technical Reports Server (NTRS)
Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra
1989-01-01
In part 1 architecture of NETRA is presented. A performance evaluation of NETRA using several common vision algorithms is also presented. Performance of algorithms when they are mapped on one cluster is described. It is shown that SIMD, MIMD, and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. For some algorithms, analytical performance results are compared with implementation performance results. It is observed that the analysis is very accurate. Performance analysis of parallel algorithms when mapped across clusters is presented. Mappings across clusters illustrate the importance and use of shared as well as distributed memory in achieving high performance. The parameters for evaluation are derived from the characteristics of the parallel algorithms, and these parameters are used to evaluate the alternative communication strategies in NETRA. Furthermore, the effect of communication interference from other processors in the system on the execution of an algorithm is studied. Using the analysis, performance of many algorithms with different characteristics is presented. It is observed that if communication speeds are matched with the computation speeds, good speedups are possible when algorithms are mapped across clusters.
Biomimetic design processes in architecture: morphogenetic and evolutionary computational design.
Menges, Achim
2012-03-01
Design computation has profound impact on architectural design methods. This paper explains how computational design enables the development of biomimetic design processes specific to architecture, and how they need to be significantly different from established biomimetic processes in engineering disciplines. The paper first explains the fundamental difference between computer-aided and computational design in architecture, as the understanding of this distinction is of critical importance for the research presented. Thereafter, the conceptual relation and possible transfer of principles from natural morphogenesis to design computation are introduced and the related developments of generative, feature-based, constraint-based, process-based and feedback-based computational design methods are presented. This morphogenetic design research is then related to exploratory evolutionary computation, followed by the presentation of two case studies focusing on the exemplary development of spatial envelope morphologies and urban block morphologies.
Analysis of dissection algorithms for vector computers
NASA Technical Reports Server (NTRS)
George, A.; Poole, W. G., Jr.; Voigt, R. G.
1978-01-01
Recently two dissection algorithms (one-way and incomplete nested dissection) have been developed for solving the sparse positive definite linear systems arising from n by n grid problems. Concurrently, vector computers (such as the CDC STAR-100 and TI ASC) have been developed for large scientific applications. An analysis of the use of dissection algorithms on vector computers dictates that vectors of maximum length be utilized thereby implying little or no dissection; on the other hand, minimizing operation counts suggest that considerable dissection be performed. In this paper we discuss the resolution of this conflict by minimizing the total time required by vectorized versions of the two algorithms.
Fault Tolerant Statistical Signal Processing Algorithms for Parallel Architectures.
2014-09-26
AD-fi57 393 FAULT TOLERANT STATISTICAL SIGNAL PROCESSING ALGORITHMS i/i FOR PARALLEL ARCH U) JOHNS HOPKINS UNIV BALTIMORE MD DEPT OF ELECTRICAL...COVERED * ’ Fault Tolerant Statistical Signal Processing Technical A l g o r i t h m s f o r P a r a l l e l A r c h i t e c t u r e s a ._ P E R F O R M I...Identify by block number) , Fault Tolerance, Signal Processing, Parallel Architecture 0 20. ABSTRACT (Continue on reveree side It neceseary and identify by
A Parallel Saturation Algorithm on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Ezekiel, Jonathan; Siminiceanu
2007-01-01
Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters
Torres-Huitzil, Cesar
2013-01-01
Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k2 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding. PMID:24288456
Computer algorithms to detect bloodstream infections.
Trick, William E; Zagorski, Brandon M; Tokars, Jerome I; Vernon, Michael O; Welbel, Sharon F; Wisniewski, Mary F; Richards, Chesley; Weinstein, Robert A
2004-09-01
We compared manual and computer-assisted bloodstream infection surveillance for adult inpatients at two hospitals. We identified hospital-acquired, primary, central-venous catheter (CVC)-associated bloodstream infections by using five methods: retrospective, manual record review by investigators; prospective, manual review by infection control professionals; positive blood culture plus manual CVC determination; computer algorithms; and computer algorithms and manual CVC determination. We calculated sensitivity, specificity, predictive values, plus the kappa statistic (kappa) between investigator review and other methods, and we correlated infection rates for seven units. The kappa value was 0.37 for infection control review, 0.48 for positive blood culture plus manual CVC determination, 0.49 for computer algorithm, and 0.73 for computer algorithm plus manual CVC determination. Unit-specific infection rates, per 1,000 patient days, were 1.0-12.5 by investigator review and 1.4-10.2 by computer algorithm (correlation r = 0.91, p = 0.004). Automated bloodstream infection surveillance with electronic data is an accurate alternative to surveillance with manually collected data.
Implementation of an efficient labeling algorithm on a pipelined architecture
NASA Astrophysics Data System (ADS)
Olsson, Olof J.; Penman, David W.
1992-11-01
This paper describes an efficient approach, developed by the authors, for labelling images using a combination of pipeline (Datacube) and host (general purpose computer) processing. The output of the algorithm is a coordinate list of labelled object pixels that facilitates further high level operations.
Parallel and Distributed Computing Combinatorial Algorithms
1993-10-01
FUPNDKC %2,•, PARALLEL AND DISTRIBUTED COMPUTING COMBINATORIAL ALGORITHMS 6. AUTHOR(S) 2304/DS F49620-92-J-0125 DR. LEIGHTON 7 PERFORMING ORGANIZATION NAME...on several problems involving parallel and distributed computing and combinatorial optimization. This research is reported in the numerous papers that...network decom- position. In Proceedings of the Eleventh Annual ACM Symposium on Principles of Distributed Computing , August 1992. [15] B. Awerbuch, B
An improved spectral graph partitioning algorithm for mapping parallel computations
Hendrickson, B.; Leland, R.
1992-09-01
Efficient use of a distributed memory parallel computer requires that the computational load be balanced across processors in a way that minimizes interprocessor communication. We present a new domain mapping algorithm that extends recent work in which ideas from spectral graph theory have been applied to this problem. Our generalization of spectral graph bisection involves a novel use of multiple eigenvectors to allow for division of a computation into four or eight parts at each stage of a recursive decomposition. The resulting method is suitable for scientific computations like irregular finite elements or differences performed on hypercube or mesh architecture machines. Experimental results confirm that the new method provides better decompositions arrived at more economically and robustly than with previous spectral methods. We have also improved upon the known spectral lower bound for graph bisection.
Integrated computer control system architectural overview
Van Arsdall, P.
1997-06-18
This overview introduces the NIF Integrated Control System (ICCS) architecture. The design is abstract to allow the construction of many similar applications from a common framework. This summary lays the essential foundation for understanding the model-based engineering approach used to execute the design.
Investigating Architectural Issues in Neuromorphic Computing
2009-06-01
approaching other difficult to scale applications like Parallel Discrete Event Simulation (PDES). PDES applications are models of physical processes...architectures with the need to communicate events to all affected elements 4 within the simulation . PDES applications typically do not scale well...dendrites with axons at junctures called synapses. Neurons produce electrical signals along these pathways. The signals may either excite or inhibit
Distributed computing architecture for image-based wavefront sensing and 2D FFTs
NASA Astrophysics Data System (ADS)
Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan
2006-06-01
Image-based wavefront sensing provides significant advantages over interferometric-based wavefront sensors such as optical design simplicity and stability. However, the image-based approach is computationally intensive, and therefore, applications utilizing the image-based approach gain substantial benefits using specialized high-performance computing architectures. The development and testing of these computing architectures are essential to missions such as James Webb Space Telescope (JWST), Terrestrial Planet Finder-Coronagraph (TPF-C and CorSpec), and the Spherical Primary Optical Telescope (SPOT). The algorithms implemented on these specialized computing architectures make use of numerous two-dimensional Fast Fourier Transforms (FFTs) which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of digital signal processors (DSPs) and multiple DSP field programmable gate arrays (FPGAs), offering a novel application of low-diameter graph theory. Timing results and performance analysis are presented. The solutions offered could be applied to other computationally complex all-to-all communication problems.
Computer Architecture. (Latest Citations from the Aerospace Database)
NASA Technical Reports Server (NTRS)
1996-01-01
The bibliography contains citations concerning research and development in the field of computer architecture. Design of computer systems, microcomputer components, and digital networks are among the topics discussed. Multimicroprocessor system performance, software development, and aerospace avionics applications are also included. (Contains 50-250 citations and includes a subject term index and title list.)
The Contribution of Visualization to Learning Computer Architecture
ERIC Educational Resources Information Center
Yehezkel, Cecile; Ben-Ari, Mordechai; Dreyfus, Tommy
2007-01-01
This paper describes a visualization environment and associated learning activities designed to improve learning of computer architecture. The environment, EasyCPU, displays a model of the components of a computer and the dynamic processes involved in program execution. We present the results of a research program that analysed the contribution of…
Distributed Computing Environment: An Architecture For Supporting Change?
1995-11-01
Distributed Computing Environment (DCE) has been in development for about five years but has only been widely used in the last two years. It consists...these services form an architecture for distributed computing that enables users to carry out the new, cheaper operations they require with the
Middleware in Modern High Performance Computing System Architectures
Engelmann, Christian; Ong, Hong Hoe; Scott, Stephen L
2007-01-01
A recent trend in modern high performance computing (HPC) system architectures employs ''lean'' compute nodes running a lightweight operating system (OS). Certain parts of the OS as well as other system software services are moved to service nodes in order to increase performance and scalability. This paper examines the impact of this HPC system architecture trend on HPC ''middleware'' software solutions, which traditionally equip HPC systems with advanced features, such as parallel and distributed programming models, appropriate system resource management mechanisms, remote application steering and user interaction techniques. Since the approach of keeping the compute node software stack small and simple is orthogonal to the middleware concept of adding missing OS features between OS and application, the role and architecture of middleware in modern HPC systems needs to be revisited. The result is a paradigm shift in HPC middleware design, where single middleware services are moved to service nodes, while runtime environments (RTEs) continue to reside on compute nodes.
A Simple Physical Optics Algorithm Perfect for Parallel Computing
NASA Technical Reports Server (NTRS)
Imbriale, W. A.; Cwik, T.
1993-01-01
One of the simplest reflector antenna computer programs is based upon a discrete approximation of the radiation integral. This calculation replaces the actual reflector surface with a triangular facet representation so that the reflector resembles a geodesic dome. The Physical Optics (PO) current is assumed to be constant in magnitude and phase over each facet so the radiation integral is reduced to a simple summation. This program has proven to be surprisingly robust and useful for the analysis of arbitrary reflectors, particularly when the near-field is desired and surface derivatives are not known. Because of its simplicity, the algorithm has proven to be extremely easy to adapt to the parallel computing architecture of a modest number of large-grain computing elements such as are used in the Intel iPSC and Touchstone Delta parallel machines.
JPRS Report. Science & Technology, Japan: Computer Architecture
2007-11-02
No 3, 1987, pp 650-651. [HIBI86] Information provided by Y. Hibino of NTT. [KNUT73] D.E. Knuth , "The Art of Computer Programming," Vol 3: Sorting... computation model, and have been engaged in the experimental generation of a neural network description language, a compiler and simulators and in...functions by simulation. For the simulations, we used simulators implemented by software on conventional types of computers (LISP machine, VAX
Clinical Decision Support Systems for Comorbidity: Architecture, Algorithms, and Applications
Fan, Aihua; Tang, Yu
2017-01-01
In this paper, we present the design of a clinical decision support system (CDSS) for monitoring comorbid conditions. Specifically, we address the architecture of a CDSS by characterizing it from three layers and discuss the algorithms in each layer. Also we address the applications of CDSSs in a few real scenarios and analyze the accuracy of a CDSS in consideration of the potential conflicts when using multiple clinical practice guidelines concurrently. Finally, we compare the system performance in our design with that in the other design schemes. Our study shows that our proposed design can achieve a clinical decision in a shorter time than the other designs, while ensuring a high level of system accuracy. PMID:28373881
Fault tolerant hypercube computer system architecture
NASA Technical Reports Server (NTRS)
Madan, Herb S. (Inventor); Chow, Edward (Inventor)
1989-01-01
A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W
2012-06-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm.
An architecture for a wafer-scale-implemented MIMD parallel computer
Wang, Chiajiu.
1988-01-01
In this dissertation, a general-purpose parallel computer architecture is proposed and studied. The proposed architecture, called the modified mesh-connected parallel computer (MMCPC) is obtained by enhancing a mesh-connected parallel computer with row buses and column buses. The MMCPC is a multiple instruction multiple data parallel machine. Because of the regular structure and distributed control mechanisms, the MMCPC is suitable for VLSI or WSI implementation. The bus structure of the MMCPC lends itself to configurability and fault tolerance. The MMCPC can be logically configured as a number of different parallel computer topologies. The MMCPC can tolerate as many faulty PE's, located randomly, as there are available spares, resulting in 100% redundancy utilization. The performance of the MMCPC was analyzed by applying a generalized stochastic Petri net graph to the MMCPC. The GSPN performance modeling results show a need for a new processing element (PE). A new PE architecture, able to handle data processing and message passing concurrently, is proposed and the silicon overhead is estimated in comparison with transputer-like PE's. Based upon the proposed PE, optimum sizes of the MMCPC for different program structures are derived. Two routing algorithms for the MMCPC were proposed and studied. Routing analysis was carried out through simulation. The simulation results show that the dynamic routing algorithm out performs the deterministic routing algorithm.
Architecture independent environment for developing engineering software on MIMD computers
NASA Technical Reports Server (NTRS)
Valimohamed, Karim A.; Lopez, L. A.
1990-01-01
Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management.
Problem Solving with Generic Algorithms and Computers.
ERIC Educational Resources Information Center
Larson, Jay
Success in using a computer in education as a problem-solving tool requires a change in the way of thinking or of approaching a problem. An algorithm, i.e., a finite step-by-step solution to a problem, can be designed around the data processing concepts of input, processing, and output to provide a basis for classifying problems. If educators…
Heavy Lift Vehicle (HLV) Avionics Flight Computing Architecture Study
NASA Technical Reports Server (NTRS)
Hodson, Robert F.; Chen, Yuan; Morgan, Dwayne R.; Butler, A. Marc; Sdhuh, Joseph M.; Petelle, Jennifer K.; Gwaltney, David A.; Coe, Lisa D.; Koelbl, Terry G.; Nguyen, Hai D.
2011-01-01
A NASA multi-Center study team was assembled from LaRC, MSFC, KSC, JSC and WFF to examine potential flight computing architectures for a Heavy Lift Vehicle (HLV) to better understand avionics drivers. The study examined Design Reference Missions (DRMs) and vehicle requirements that could impact the vehicles avionics. The study considered multiple self-checking and voting architectural variants and examined reliability, fault-tolerance, mass, power, and redundancy management impacts. Furthermore, a goal of the study was to develop the skills and tools needed to rapidly assess additional architectures should requirements or assumptions change.
Implementation of the FDK algorithm for cone-beam CT on the cell broadband engine architecture
NASA Astrophysics Data System (ADS)
Scherl, Holger; Koerner, Mario; Hofmann, Hannes; Eckert, Wieland; Kowarschik, Markus; Hornegger, Joachim
2007-03-01
In most of today's commercially available cone-beam CT scanners, the well known FDK method is used for solving the 3D reconstruction task. The computational complexity of this algorithm prohibits its use for many medical applications without hardware acceleration. The brand-new Cell Broadband Engine Architecture (CBEA) with its high level of parallelism is a cost-efficient processor for performing the FDK reconstruction according to the medical requirements. The programming scheme, however, is quite different to any standard personal computer hardware. In this paper, we present an innovative implementation of the most time-consuming parts of the FDK algorithm: filtering and back-projection. We also explain the required transformations to parallelize the algorithm for the CBEA. Our software framework allows to compute the filtering and back-projection in parallel, making it possible to do an on-the-fly-reconstruction. The achieved results demonstrate that a complete FDK reconstruction is computed with the CBEA in less than seven seconds for a standard clinical scenario. Given the fact that scan times are usually much higher, we conclude that reconstruction is finished right after the end of data acquisition. This enables us to present the reconstructed volume to the physician in real-time, immediately after the last projection image has been acquired by the scanning device.
Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures
NASA Astrophysics Data System (ADS)
Biazewicz, Marek; Kurowski, Krzysztof; Ludwiczak, Bogdan; Napieraia, Krystyna
2010-09-01
Computational Fluid Dynamics (CFD) is one of the branches of fluid mechanics, which uses numerical methods and algorithms to solve and analyze fluid flows. CFD is used in various domains, such as oil and gas reservoir uncertainty analysis, aerodynamic body shapes optimization (e.g. planes, cars, ships, sport helmets, skis), natural phenomena analysis, numerical simulation for weather forecasting or realistic visualizations. CFD problem is very complex and needs a lot of computational power to obtain the results in a reasonable time. We have implemented a parallel application for two-dimensional CFD simulation with a free surface approximation (MAC method) using new hardware architectures, in particular multi-GPU and hybrid computing environments. For this purpose we decided to use NVIDIA graphic cards with CUDA environment due to its simplicity of programming and good computations performance. We used finite difference discretization of Navier-Stokes equations, where fluid is propagated over an Eulerian Grid. In this model, the behavior of the fluid inside the cell depends only on the properties of local, surrounding cells, therefore it is well suited for the GPU-based architecture. In this paper we demonstrate how to use efficiently the computing power of GPUs for CFD. Additionally, we present some best practices to help users analyze and improve the performance of CFD applications executed on GPU. Finally, we discuss various challenges around the multi-GPU implementation on the example of matrix multiplication.
Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures
Datta, Kaushik; Murphy, Mark; Volkov, Vasily; Williams, Samuel; Carter, Jonathan; Oliker, Leonid; Patterson, David; Shalf, John; Yelick, Katherine
2008-08-22
Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations -- a class of algorithms at the heart of many structured grid codes, including PDE solvers. We develop a number of effective optimization strategies, and build an auto-tuning environment that searches over our optimizations and their parameters to minimize runtime, while maximizing performance portability. To evaluate the effectiveness of these strategies we explore the broadest set of multicore architectures in the current HPC literature, including the Intel Clovertown, AMD Barcelona, Sun Victoria Falls, IBM QS22 PowerXCell 8i, and NVIDIA GTX280. Overall, our auto-tuning optimization methodology results in the fastest multicore stencil performance to date. Finally, we present several key insights into the architectural trade-offs of emerging multicore designs and their implications on scientific algorithm development.
Aerodynamic optimization studies on advanced architecture computers
NASA Technical Reports Server (NTRS)
Chawla, Kalpana
1995-01-01
The approach to carrying out multi-discipline aerospace design studies in the future, especially in massively parallel computing environments, comprises of choosing (1) suitable solvers to compute solutions to equations characterizing a discipline, and (2) efficient optimization methods. In addition, for aerodynamic optimization problems, (3) smart methodologies must be selected to modify the surface shape. In this research effort, a 'direct' optimization method is implemented on the Cray C-90 to improve aerodynamic design. It is coupled with an existing implicit Navier-Stokes solver, OVERFLOW, to compute flow solutions. The optimization method is chosen such that it can accomodate multi-discipline optimization in future computations. In the work , however, only single discipline aerodynamic optimization will be included.
Computational algorithms for simulations in atmospheric optics.
Konyaev, P A; Lukin, V P
2016-04-20
A computer simulation technique for atmospheric and adaptive optics based on parallel programing is discussed. A parallel propagation algorithm is designed and a modified spectral-phase method for computer generation of 2D time-variant random fields is developed. Temporal power spectra of Laguerre-Gaussian beam fluctuations are considered as an example to illustrate the applications discussed. Implementation of the proposed algorithms using Intel MKL and IPP libraries and NVIDIA CUDA technology is shown to be very fast and accurate. The hardware system for the computer simulation is an off-the-shelf desktop with an Intel Core i7-4790K CPU operating at a turbo-speed frequency up to 5 GHz and an NVIDIA GeForce GTX-960 graphics accelerator with 1024 1.5 GHz processors.
Algorithm-dependent fault tolerance for distributed computing
P. D. Hough; M. e. Goldsby; E. J. Walsh
2000-02-01
Large-scale distributed systems assembled from commodity parts, like CPlant, have become common tools in the distributed computing world. Because of their size and diversity of parts, these systems are prone to failures. Applications that are being run on these systems have not been equipped to efficiently deal with failures, nor is there vendor support for fault tolerance. Thus, when a failure occurs, the application crashes. While most programmers make use of checkpoints to allow for restarting of their applications, this is cumbersome and incurs substantial overhead. In many cases, there are more efficient and more elegant ways in which to address failures. The goal of this project is to develop a software architecture for the detection of and recovery from faults in a cluster computing environment. The detection phase relies on the latest techniques developed in the fault tolerance community. Recovery is being addressed in an application-dependent manner, thus allowing the programmer to take advantage of algorithmic characteristics to reduce the overhead of fault tolerance. This architecture will allow large-scale applications to be more robust in high-performance computing environments that are comprised of clusters of commodity computers such as CPlant and SMP clusters.
LINCS: Livermore's network architecture. [Octopus computing network
Fletcher, J.G.
1982-01-01
Octopus, a local computing network that has been evolving at the Lawrence Livermore National Laboratory for over fifteen years, is currently undergoing a major revision. The primary purpose of the revision is to consolidate and redefine the variety of conventions and formats, which have grown up over the years, into a single standard family of protocols, the Livermore Interactive Network Communication Standard (LINCS). This standard treats the entire network as a single distributed operating system such that access to a computing resource is obtained in a single way, whether that resource is local (on the same computer as the accessing process) or remote (on another computer). LINCS encompasses not only communication but also such issues as the relationship of customer to server processes and the structure, naming, and protection of resources. The discussion includes: an overview of the Livermore user community and computing hardware, the functions and structure of each of the seven layers of LINCS protocol, the reasons why we have designed our own protocols and why we are dissatisfied by the directions that current protocol standards are taking.
Panel on future directions in parallel computer architecture
VanTilborg, A.M. )
1989-06-01
One of the program highlights of the 15th Annual International Symposium on Computer Architecture, held May 30 - June 2, 1988 in Honolulu, was a panel session on future directions in parallel computer architecture. The panel was organized and chaired by the author, and was comprised of Prof. Jack Dennis (NASA Ames Research Institute for Advanced Computer Science), Prof. H.T. Kung (Carnegie Mellon), and Dr. Burton Smith (Tera Computer Company). The objective of the panel was to identify the likely trajectory of future parallel computer system progress, particularly from the sandpoint of marketplace acceptance. Approximately 250 attendees participated in the session, in which each panelist began with a ten minute viewgraph explanation of his views, followed by an open and sometimes lively exchange with the audience and fellow panelists. The session ran for ninety minutes.
Algorithms for the Computation of Debris Risks
NASA Technical Reports Server (NTRS)
Matney, Mark
2017-01-01
Determining the risks from space debris involve a number of statistical calculations. These calculations inevitably involve assumptions about geometry - including the physical geometry of orbits and the geometry of non-spherical satellites. A number of tools have been developed in NASA's Orbital Debris Program Office to handle these calculations; many of which have never been published before. These include algorithms that are used in NASA's Orbital Debris Engineering Model ORDEM 3.0, as well as other tools useful for computing orbital collision rates and ground casualty risks. This paper will present an introduction to these algorithms and the assumptions upon which they are based.
Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable
Schuller, Ivan K.; Stevens, Rick; Pino, Robinson; Pechan, Michael
2015-10-29
Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS based technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.
Novel Architectures and Devices for Computing
NASA Astrophysics Data System (ADS)
Waugh, Frederick Rogers
1995-01-01
This thesis explores some of the more unusual architectures and devices being considered today as the basis for information processing, emphasizing architectures that are highly parallel and devices that are extremely small compared to current standards. The first part of this thesis theoretically and numerically analyzes analog electronic neural networks in which competition within neuron clusters leads to pattern classification and feature extraction abilities. Global stability theorems, derived using a Liapunov approach, provide general guidelines for network design and operation. The theorems state that with continuous-time updating, competitive networks converge only to fixed points, while with discrete -time, parallel updating, they converge to either fixed points or period-two limit cycles. A stability criterion guarantees that discrete-time networks converge only to fixed points when a quantity related to the neuron gain, or transfer function slope, is sufficiently small. A set of analytical phase diagrams for competitive associative memories is derived using a combination of statistical mechanics and nonlinear dynamics. The diagrams classify attractor types as a function of pattern storage fraction and neuron gain. Numerical tests agree well with the diagrams. Analog annealing, a technique for improving network performance by reducing neuron gain, is shown to improve performance in an analog associative memory by dramatically reducing the number of fixed points. The number of fixed points decreases exponentially with network size with a scaling exponent that decreases with neuron gain. Numerical data based on fixed-point counts in small networks support the results. The second part of this thesis discusses low-temperature tunneling measurements at zero magnetic field through double and triple quantum dots with adjustable inter-dot coupling, fabricated in a GaAs/AlGaAs heterostructure. The devices have capacitances so small that the charging energy of
LIBRA: A high-performance balanced computer architecture for Prolog
Mills, J.W.
1988-01-01
Four reduced-instruction-set computer (RISC) architectures for Prolog are presented: the Simple Abstract Machine (SAM), the Logic Programming Windowed RISC I (LOW RISC I), the LOW RISC II, and the Logical Inference Balanced RISC Architecture (LIBRA). An informal methodology for the semantic-based design of computer architectures relates the design of each architecture to its predecessor. The suitability of each architecture for Prolog is evaluated using macro expansions for each WAM instruction, from which execution speed, code density, memory usage, branch frequency, standard logical inferences per second, benchmark logical inferences per second and the semantic gap of each architecture relative to Prolog are calculated. The final design, the LIBRA, is 2.3 times as fast as the Berkeley PLM without interleaved memory, and 15 times as fast with eight-way instruction and data memory interleaving, reaching an estimated execution speed of 7.5 million standard logical inferences per second. The LIBRA's performance is due to parallelized tag and data operations, pipelining, reduced branch frequency, and complex single-cycle instructions.
Advanced Computing Architectures for Cognitive Processing
2009-07-01
AND IS APPROVED FOR PUBLICATION IN ACCORDANCE WITH ASSIGNED DISTRIBUTION STATEMENT. FOR THE DIRECTOR: / s ... s / LOK YAN EDWARD J. JONES, Deputy Chief Work Unit Manager Advanced Computing Division...ELEMENT NUMBER 62702F 6. AUTHOR( S ) Gregory D. Peterson 5d. PROJECT NUMBER 459T 5e. TASK NUMBER AC 5f. WORK UNIT NUMBER CP 7. PERFORMING
On the impact of approximate computation in an analog DeSTIN architecture.
Young, Steven; Lu, Junjie; Holleman, Jeremy; Arel, Itamar
2014-05-01
Deep machine learning (DML) holds the potential to revolutionize machine learning by automating rich feature extraction, which has become the primary bottleneck of human engineering in pattern recognition systems. However, the heavy computational burden renders DML systems implemented on conventional digital processors impractical for large-scale problems. The highly parallel computations required to implement large-scale deep learning systems are well suited to custom hardware. Analog computation has demonstrated power efficiency advantages of multiple orders of magnitude relative to digital systems while performing nonideal computations. In this paper, we investigate typical error sources introduced by analog computational elements and their impact on system-level performance in DeSTIN--a compositional deep learning architecture. These inaccuracies are evaluated on a pattern classification benchmark, clearly demonstrating the robustness of the underlying algorithm to the errors introduced by analog computational elements. A clear understanding of the impacts of nonideal computations is necessary to fully exploit the efficiency of analog circuits.
Gálvez, Sergio; Ferusic, Adis; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan A; Dorado, Gabriel
2016-10-01
The Smith-Waterman algorithm has a great sensitivity when used for biological sequence-database searches, but at the expense of high computing-power requirements. To overcome this problem, there are implementations in literature that exploit the different hardware-architectures available in a standard PC, such as GPU, CPU, and coprocessors. We introduce an application that splits the original database-search problem into smaller parts, resolves each of them by executing the most efficient implementations of the Smith-Waterman algorithms in different hardware architectures, and finally unifies the generated results. Using non-overlapping hardware allows simultaneous execution, and up to 2.58-fold performance gain, when compared with any other algorithm to search sequence databases. Even the performance of the popular BLAST heuristic is exceeded in 78% of the tests. The application has been tested with standard hardware: Intel i7-4820K CPU, Intel Xeon Phi 31S1P coprocessors, and nVidia GeForce GTX 960 graphics cards. An important increase in performance has been obtained in a wide range of situations, effectively exploiting the available hardware.
NASA Astrophysics Data System (ADS)
Nemes, Csaba; Barcza, Gergely; Nagy, Zoltán; Legeza, Örs; Szolgay, Péter
2014-06-01
In the numerical analysis of strongly correlated quantum lattice models one of the leading algorithms developed to balance the size of the effective Hilbert space and the accuracy of the simulation is the density matrix renormalization group (DMRG) algorithm, in which the run-time is dominated by the iterative diagonalization of the Hamilton operator. As the most time-dominant step of the diagonalization can be expressed as a list of dense matrix operations, the DMRG is an appealing candidate to fully utilize the computing power residing in novel kilo-processor architectures. In the paper a smart hybrid CPU-GPU implementation is presented, which exploits the power of both CPU and GPU and tolerates problems exceeding the GPU memory size. Furthermore, a new CUDA kernel has been designed for asymmetric matrix-vector multiplication to accelerate the rest of the diagonalization. Besides the evaluation of the GPU implementation, the practical limits of an FPGA implementation are also discussed.
Iterative algorithms for large sparse linear systems on parallel computers
NASA Technical Reports Server (NTRS)
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Computational plasticity algorithm for particle dynamics simulations
NASA Astrophysics Data System (ADS)
Krabbenhoft, K.; Lyamin, A. V.; Vignes, C.
2017-03-01
The problem of particle dynamics simulation is interpreted in the framework of computational plasticity leading to an algorithm which is mathematically indistinguishable from the common implicit scheme widely used in the finite element analysis of elastoplastic boundary value problems. This algorithm provides somewhat of a unification of two particle methods, the discrete element method and the contact dynamics method, which usually are thought of as being quite disparate. In particular, it is shown that the former appears as the special case where the time stepping is explicit while the use of implicit time stepping leads to the kind of schemes usually labelled contact dynamics methods. The framing of particle dynamics simulation within computational plasticity paves the way for new approaches similar (or identical) to those frequently employed in nonlinear finite element analysis. These include mixed implicit-explicit time stepping, dynamic relaxation and domain decomposition schemes.
Fast computation algorithms for speckle pattern simulation
Nascov, Victor; Samoilă, Cornel; Ursuţiu, Doru
2013-11-13
We present our development of a series of efficient computation algorithms, generally usable to calculate light diffraction and particularly for speckle pattern simulation. We use mainly the scalar diffraction theory in the form of Rayleigh-Sommerfeld diffraction formula and its Fresnel approximation. Our algorithms are based on a special form of the convolution theorem and the Fast Fourier Transform. They are able to evaluate the diffraction formula much faster than by direct computation and we have circumvented the restrictions regarding the relative sizes of the input and output domains, met on commonly used procedures. Moreover, the input and output planes can be tilted each to other and the output domain can be off-axis shifted.
Heterogeneous computer architecture for embedded real-time image interpretation
NASA Astrophysics Data System (ADS)
Salinger, Jeremy A.
1993-10-01
A heterogeneous parallel-processing computer architecture is being developed for embedded real-time interpretation of images and other data collected from sensors on mobile platforms. The Advanced Target Cueing and Recognition Engine (ATCURE) architecture includes specialized subsystems for input/output, image processing, numeric processing, and symbolic processing. Different specialization is provided for each subsystem to exploit distinctive demands for data storage, data representation, mixes of operations, and program control structures. The characteristics of each subsystem are described, with the Image Processing Subsystem (IPS) used to illustrate how the design is driven by careful analysis of current and projected computational requirements from many applications. These considerations led to a programming model for the Image Processing Subsystem in which images and their subsets are the fundamental unit of data. The processor implementation incorporates a scalable synchronous pipeline of processing elements that eliminates many of the bottlenecks found in MIMD and SIMD architectures.
New computer architectures as tools for ecological thought.
Villa, F
1992-06-01
Recent achievements of computer science provide unrivaled power for the advancement of ecology. This power is not merely computational: parallel computers, having hierarchical organization as their architectural principle, also provide metaphors for understanding complex systems. In this sense they might play for a science of ecological complexity a role like equilibrium-based metaphors had in the development of dynamic systems ecology. Parallel computers provide this opportunity through an informational view of ecological reality and multilevel modelling paradigms. Spatial and individual-oriented models allow application and full understanding of the new metaphors in the ecological context.
Pipelined CPU Design with FPGA in Teaching Computer Architecture
ERIC Educational Resources Information Center
Lee, Jong Hyuk; Lee, Seung Eun; Yu, Heon Chang; Suh, Taeweon
2012-01-01
This paper presents a pipelined CPU design project with a field programmable gate array (FPGA) system in a computer architecture course. The class project is a five-stage pipelined 32-bit MIPS design with experiments on the Altera DE2 board. For proper scheduling, milestones were set every one or two weeks to help students complete the project on…
VLSI architectures for computing multiplications and inverses in GF(2-m)
NASA Technical Reports Server (NTRS)
Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.; Omura, J. K.; Reed, I. S.
1983-01-01
Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that are easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. A pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal-basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.
Algorithms for parallel flow solvers on message passing architectures
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1995-01-01
The purpose of this project has been to identify and test suitable technologies for implementation of fluid flow solvers -- possibly coupled with structures and heat equation solvers -- on MIMD parallel computers. In the course of this investigation much attention has been paid to efficient domain decomposition strategies for ADI-type algorithms. Multi-partitioning derives its efficiency from the assignment of several blocks of grid points to each processor in the parallel computer. A coarse-grain parallelism is obtained, and a near-perfect load balance results. In uni-partitioning every processor receives responsibility for exactly one block of grid points instead of several. This necessitates fine-grain pipelined program execution in order to obtain a reasonable load balance. Although fine-grain parallelism is less desirable on many systems, especially high-latency networks of workstations, uni-partition methods are still in wide use in production codes for flow problems. Consequently, it remains important to achieve good efficiency with this technique that has essentially been superseded by multi-partitioning for parallel ADI-type algorithms. Another reason for the concentration on improving the performance of pipeline methods is their applicability in other types of flow solver kernels with stronger implied data dependence. Analytical expressions can be derived for the size of the dynamic load imbalance incurred in traditional pipelines. From these it can be determined what is the optimal first-processor retardation that leads to the shortest total completion time for the pipeline process. Theoretical predictions of pipeline performance with and without optimization match experimental observations on the iPSC/860 very well. Analysis of pipeline performance also highlights the effect of uncareful grid partitioning in flow solvers that employ pipeline algorithms. If grid blocks at boundaries are not at least as large in the wall-normal direction as those
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced
Newman, Aaron M; Cooper, James B
2007-01-01
Background Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed. Results To address limitations of current repeat identification methods, and to provide an efficient and flexible algorithm for the detection and analysis of TRs in protein sequences, we designed and implemented a new computational method called XSTREAM. Running time tests confirm the practicality of XSTREAM for analyses of multi-genome datasets. Each of the key capabilities of XSTREAM (e.g., merging, nesting, long-period detection, and TR architecture modeling) are demonstrated using anecdotal examples, and the utility of XSTREAM for identifying TR proteins was validated using data from a recently published paper. Conclusion We show that XSTREAM is a practical and valuable tool for TR detection in protein and nucleotide sequences at the multi-genome scale, and an effective tool for modeling TR domains with diverse architectures and varied levels of degeneracy. Because of these useful features, XSTREAM has significant potential for the discovery of naturally-evolved modular proteins with applications for engineering novel biostructural and biomimetic materials, and identifying new vaccine and diagnostic targets. PMID:17931424
Computational algorithms to predict Gene Ontology annotations
2015-01-01
Background Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful. Methods We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set. Results We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm. Conclusions Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper
Woodruff, S.B.
1992-01-01
The Transient Reactor Analysis Code (TRAC), which features a two- fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, poor load balancing will degrade efficiency on either vector or data parallel architectures when the data are organized according to spatial location. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. This document discusses why developers algorithms, such as a neural net representation, that do not exhibit algorithms, such as a neural net representation, that do not exhibit load-balancing problems.
Cluster-based architecture for fault-tolerant quantum computation
Fujii, Keisuke; Yamamoto, Katsuji
2010-04-15
We present a detailed description of an architecture for fault-tolerant quantum computation, which is based on the cluster model of encoded qubits. In this cluster-based architecture, concatenated computation is implemented in a quite different way from the usual circuit-based architecture where physical gates are recursively replaced by logical gates with error-correction gadgets. Instead, some relevant cluster states, say fundamental clusters, are recursively constructed through verification and postselection in advance for the higher-level one-way computation, which namely provides error-precorrection of gate operations. A suitable code such as the Steane seven-qubit code is adopted for transversal operations. This concatenated construction of verified fundamental clusters has a simple transversal structure of logical errors, and achieves a high noise threshold {approx}3% for computation by using appropriate verification procedures. Since the postselection is localized within each fundamental cluster with the help of deterministic bare controlled-Z gates without verification, divergence of resources is restrained, which reconciles postselection with scalability.
Cluster-based architecture for fault-tolerant quantum computation
NASA Astrophysics Data System (ADS)
Fujii, Keisuke; Yamamoto, Katsuji
2010-04-01
We present a detailed description of an architecture for fault-tolerant quantum computation, which is based on the cluster model of encoded qubits. In this cluster-based architecture, concatenated computation is implemented in a quite different way from the usual circuit-based architecture where physical gates are recursively replaced by logical gates with error-correction gadgets. Instead, some relevant cluster states, say fundamental clusters, are recursively constructed through verification and postselection in advance for the higher-level one-way computation, which namely provides error-precorrection of gate operations. A suitable code such as the Steane seven-qubit code is adopted for transversal operations. This concatenated construction of verified fundamental clusters has a simple transversal structure of logical errors, and achieves a high noise threshold ~3% for computation by using appropriate verification procedures. Since the postselection is localized within each fundamental cluster with the help of deterministic bare controlled-Z gates without verification, divergence of resources is restrained, which reconciles postselection with scalability.
OS friendly microprocessor architecture: Hardware level computer security
NASA Astrophysics Data System (ADS)
Jungwirth, Patrick; La Fratta, Patrick
2016-05-01
We present an introduction to the patented OS Friendly Microprocessor Architecture (OSFA) and hardware level computer security. Conventional microprocessors have not tried to balance hardware performance and OS performance at the same time. Conventional microprocessors have depended on the Operating System for computer security and information assurance. The goal of the OS Friendly Architecture is to provide a high performance and secure microprocessor and OS system. We are interested in cyber security, information technology (IT), and SCADA control professionals reviewing the hardware level security features. The OS Friendly Architecture is a switched set of cache memory banks in a pipeline configuration. For light-weight threads, the memory pipeline configuration provides near instantaneous context switching times. The pipelining and parallelism provided by the cache memory pipeline provides for background cache read and write operations while the microprocessor's execution pipeline is running instructions. The cache bank selection controllers provide arbitration to prevent the memory pipeline and microprocessor's execution pipeline from accessing the same cache bank at the same time. This separation allows the cache memory pages to transfer to and from level 1 (L1) caching while the microprocessor pipeline is executing instructions. Computer security operations are implemented in hardware. By extending Unix file permissions bits to each cache memory bank and memory address, the OSFA provides hardware level computer security.
Domain decomposition algorithms and computational fluid dynamics
NASA Technical Reports Server (NTRS)
Chan, Tony F.
1988-01-01
Some of the new domain decomposition algorithms are applied to two model problems in computational fluid dynamics: the two-dimensional convection-diffusion problem and the incompressible driven cavity flow problem. First, a brief introduction to the various approaches of domain decomposition is given, and a survey of domain decomposition preconditioners for the operator on the interface separating the subdomains is then presented. For the convection-diffusion problem, the effect of the convection term and its discretization on the performance of some of the preconditioners is discussed. For the driven cavity problem, the effectiveness of a class of boundary probe preconditioners is examined.
Algorithms for Computing the Lag Function.
1981-03-27
and S. J. Giner Subject: Algorithms for Computing the Lag Function References: See p . 27 Abstract: This memorandum provides a scheme for the numerical...highly oscillatory, and with singularities at the end points. j -3- 27 March 1981 GHP:SJG:Ihz TABLE OF CONTENTS P age Abstract...0 -9 16 -9 1) 1 11 1 1 -8 3 -1 -t I -8 8 -1 -1 1i 1 2 -6 2 1 1 2 -6 2 1 1 1 3 -3 -1 1 3 -3 -1 1i 1 4 1 1 4 1 -10- 27 March 1981 (1- P : SJG: 1hz The
Implementation and analysis of a Navier-Stokes algorithm on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1988-01-01
The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Block sparse Cholesky algorithms on advanced uniprocessor computers
Ng, E.G.; Peyton, B.W.
1991-12-01
As with many other linear algebra algorithms, devising a portable implementation of sparse Cholesky factorization that performs well on the broad range of computer architectures currently available is a formidable challenge. Even after limiting our attention to machines with only one processor, as we have done in this report, there are still several interesting issues to consider. For dense matrices, it is well known that block factorization algorithms are the best means of achieving this goal. We take this approach for sparse factorization as well. This paper has two primary goals. First, we examine two sparse Cholesky factorization algorithms, the multifrontal method and a blocked left-looking sparse Cholesky method, in a systematic and consistent fashion, both to illustrate the strengths of the blocking techniques in general and to obtain a fair evaluation of the two approaches. Second, we assess the impact of various implementation techniques on time and storage efficiency, paying particularly close attention to the work-storage requirement of the two methods and their variants.
Integration of nanoscale memristor synapses in neuromorphic computing architectures.
Indiveri, Giacomo; Linares-Barranco, Bernabé; Legenstein, Robert; Deligeorgis, George; Prodromakis, Themistoklis
2013-09-27
Conventional neuro-computing architectures and artificial neural networks have often been developed with no or loose connections to neuroscience. As a consequence, they have largely ignored key features of biological neural processing systems, such as their extremely low-power consumption features or their ability to carry out robust and efficient computation using massively parallel arrays of limited precision, highly variable, and unreliable components. Recent developments in nano-technologies are making available extremely compact and low power, but also variable and unreliable solid-state devices that can potentially extend the offerings of availing CMOS technologies. In particular, memristors are regarded as a promising solution for modeling key features of biological synapses due to their nanoscale dimensions, their capacity to store multiple bits of information per element and the low energy required to write distinct states. In this paper, we first review the neuro- and neuromorphic computing approaches that can best exploit the properties of memristor and scale devices, and then propose a novel hybrid memristor-CMOS neuromorphic circuit which represents a radical departure from conventional neuro-computing approaches, as it uses memristors to directly emulate the biophysics and temporal dynamics of real synapses. We point out the differences between the use of memristors in conventional neuro-computing architectures and the hybrid memristor-CMOS circuit proposed, and argue how this circuit represents an ideal building block for implementing brain-inspired probabilistic computing paradigms that are robust to variability and fault tolerant by design.
Integration of nanoscale memristor synapses in neuromorphic computing architectures
NASA Astrophysics Data System (ADS)
Indiveri, Giacomo; Linares-Barranco, Bernabé; Legenstein, Robert; Deligeorgis, George; Prodromakis, Themistoklis
2013-09-01
Conventional neuro-computing architectures and artificial neural networks have often been developed with no or loose connections to neuroscience. As a consequence, they have largely ignored key features of biological neural processing systems, such as their extremely low-power consumption features or their ability to carry out robust and efficient computation using massively parallel arrays of limited precision, highly variable, and unreliable components. Recent developments in nano-technologies are making available extremely compact and low power, but also variable and unreliable solid-state devices that can potentially extend the offerings of availing CMOS technologies. In particular, memristors are regarded as a promising solution for modeling key features of biological synapses due to their nanoscale dimensions, their capacity to store multiple bits of information per element and the low energy required to write distinct states. In this paper, we first review the neuro- and neuromorphic computing approaches that can best exploit the properties of memristor and scale devices, and then propose a novel hybrid memristor-CMOS neuromorphic circuit which represents a radical departure from conventional neuro-computing approaches, as it uses memristors to directly emulate the biophysics and temporal dynamics of real synapses. We point out the differences between the use of memristors in conventional neuro-computing architectures and the hybrid memristor-CMOS circuit proposed, and argue how this circuit represents an ideal building block for implementing brain-inspired probabilistic computing paradigms that are robust to variability and fault tolerant by design.
Scaling to Nanotechnology Limits with the PIMS Computer Architecture and a new Scaling Rule
Debenedictis, Erik P.
2015-02-01
We describe a new approach to computing that moves towards the limits of nanotechnology using a newly formulated sc aling rule. This is in contrast to the current computer industry scali ng away from von Neumann's original computer at the rate of Moore's Law. We extend Moore's Law to 3D, which l eads generally to architectures that integrate logic and memory. To keep pow er dissipation cons tant through a 2D surface of the 3D structure requires using adiabatic principles. We call our newly proposed architecture Processor In Memory and Storage (PIMS). We propose a new computational model that integrates processing and memory into "tiles" that comprise logic, memory/storage, and communications functions. Since the programming model will be relatively stable as a system scales, programs repr esented by tiles could be executed in a PIMS system built with today's technology or could become the "schematic diagram" for implementation in an ultimate 3D nanotechnology of the future. We build a systems software approach that offers advantages over and above the technological and arch itectural advantages. Firs t, the algorithms may be more efficient in the conventional sens e of having fewer steps. Second, the algorithms may run with higher power efficiency per operation by being a better match for the adiabatic scaling ru le. The performance analysis based on demonstrated ideas in physical science suggests 80,000 x improvement in cost per operation for the (arguably) gene ral purpose function of emulating neurons in Deep Learning.
Architectural requirements for the Red Storm computing system.
Camp, William J.; Tomkins, James Lee
2003-10-01
This report is based on the Statement of Work (SOW) describing the various requirements for delivering 3 new supercomputer system to Sandia National Laboratories (Sandia) as part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative (ASCI) program. This system is named Red Storm and will be a distributed memory, massively parallel processor (MPP) machine built primarily out of commodity parts. The requirements presented here distill extensive architectural and design experience accumulated over a decade and a half of research, development and production operation of similar machines at Sandia. Red Storm will have an unusually high bandwidth, low latency interconnect, specially designed hardware and software reliability features, a light weight kernel compute node operating system and the ability to rapidly switch major sections of the machine between classified and unclassified computing environments. Particular attention has been paid to architectural balance in the design of Red Storm, and it is therefore expected to achieve an atypically high fraction of its peak speed of 41 TeraOPS on real scientific computing applications. In addition, Red Storm is designed to be upgradeable to many times this initial peak capability while still retaining appropriate balance in key design dimensions. Installation of the Red Storm computer system at Sandia's New Mexico site is planned for 2004, and it is expected that the system will be operated for a minimum of five years following installation.
A Component Architecture for High-Performance Scientific Computing
Bernholdt, D E; Allan, B A; Armstrong, R; Bertrand, F; Chiu, K; Dahlgren, T L; Damevski, K; Elwasif, W R; Epperly, T W; Govindaraju, M; Katz, D S; Kohl, J A; Krishnan, M; Kumfert, G; Larson, J W; Lefantzi, S; Lewis, M J; Malony, A D; McInnes, L C; Nieplocha, J; Norris, B; Parker, S G; Ray, J; Shende, S; Windus, T L; Zhou, S
2004-12-14
The Common Component Architecture (CCA) provides a means for software developers to manage the complexity of large-scale scientific simulations and to move toward a plug-and-play environment for high-performance computing. In the scientific computing context, component models also promote collaboration using independently developed software, thereby allowing particular individuals or groups to focus on the aspects of greatest interest to them. The CCA supports parallel and distributed computing as well as local high-performance connections between components in a language-independent manner. The design places minimal requirements on components and thus facilitates the integration of existing code into the CCA environment. The CCA model imposes minimal overhead to minimize the impact on application performance. The focus on high performance distinguishes the CCA from most other component models. The CCA is being applied within an increasing range of disciplines, including combustion research, global climate simulation, and computational chemistry.
A Component Architecture for High-Performance Scientific Computing
Bernholdt, David E; Allan, Benjamin A; Armstrong, Robert C; Bertrand, Felipe; Chiu, Kenneth; Dahlgren, Tamara L; Damevski, Kostadin; Elwasif, Wael R; Epperly, Thomas G; Govindaraju, Madhusudhan; Katz, Daniel S; Kohl, James A; Krishnan, Manoj Kumar; Kumfert, Gary K; Larson, J Walter; Lefantzi, Sophia; Lewis, Michael J; Malony, Allen D; McInnes, Lois C; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G; Ray, Jaideep; Shende, Sameer; Windus, Theresa L; Zhou, Shujia
2006-07-03
The Common Component Architecture (CCA) provides a means for software developers to manage the complexity of large-scale scientific simulations and to move toward a plug-and-play environment for high-performance computing. In the scientific computing context, component models also promote collaboration using independently developed software, thereby allowing particular individuals or groups to focus on the aspects of greatest interest to them. The CCA supports parallel and distributed computing as well as local high-performance connections between components in a language-independent manner. The design places minimal requirements on components and thus facilitates the integration of existing code into the CCA environment. The CCA model imposes minimal overhead to minimize the impact on application performance. The focus on high performance distinguishes the CCA from most other component models. The CCA is being applied within an increasing range of disciplines, including combustion research, global climate simulation, and computational chemistry.
Methodology of modeling and measuring computer architectures for plasma simulations
NASA Technical Reports Server (NTRS)
Wang, L. P. T.
1977-01-01
A brief introduction to plasma simulation using computers and the difficulties on currently available computers is given. Through the use of an analyzing and measuring methodology - SARA, the control flow and data flow of a particle simulation model REM2-1/2D are exemplified. After recursive refinements the total execution time may be greatly shortened and a fully parallel data flow can be obtained. From this data flow, a matched computer architecture or organization could be configured to achieve the computation bound of an application problem. A sequential type simulation model, an array/pipeline type simulation model, and a fully parallel simulation model of a code REM2-1/2D are proposed and analyzed. This methodology can be applied to other application problems which have implicitly parallel nature.
Parallel algorithm for computing points on a computation front hyperplane
NASA Astrophysics Data System (ADS)
Krasnov, M. M.
2015-01-01
A parallel algorithm for computing points on a computation front hyperplane is described. This task arises in the computation of a quantity defined on a multidimensional rectangular domain. Three-dimensional domains are usually discussed, but the material is given in the general form when the number of measurements is at least two. When the values of a quantity at different points are internally independent (which is frequently the case), the corresponding computations are independent as well and can be performed in parallel. However, if there are internal dependences (as, for example, in the Gauss-Seidel method for systems of linear equations), then the order of scanning points of the domain is an important issue. A conventional approach in this case is to form a computation front hyperplane (a usual plane in the three-dimensional case and a line in the two-dimensional case) that moves linearly across the domain at a certain angle. At every step in the course of motion of this hyperplane, its intersection points with the domain can be treated independently and, hence, in parallel, but the steps themselves are executed sequentially. At different steps, the intersection of the hyperplane with the entire domain can have a rather complex geometry and the search for all points of the domain lying on the hyperplane at a given step is a nontrivial problem. This problem (i.e., the computation of the coordinates of points lying in the intersection of the domain with the hyperplane at a given step in the course of hyperplane motion) is addressed below. The computations over the points of the hyperplane can be executed in parallel.
Thermodynamic cost of computation, algorithmic complexity and the information metric
NASA Technical Reports Server (NTRS)
Zurek, W. H.
1989-01-01
Algorithmic complexity is discussed as a computational counterpart to the second law of thermodynamics. It is shown that algorithmic complexity, which is a measure of randomness, sets limits on the thermodynamic cost of computations and casts a new light on the limitations of Maxwell's demon. Algorithmic complexity can also be used to define distance between binary strings.
Distributed sequence alignment applications for the public computing architecture.
Pellicer, S; Chen, G; Chan, K C C; Pan, Y
2008-03-01
The public computer architecture shows promise as a platform for solving fundamental problems in bioinformatics such as global gene sequence alignment and data mining with tools such as the basic local alignment search tool (BLAST). Our implementation of these two problems on the Berkeley open infrastructure for network computing (BOINC) platform demonstrates a runtime reduction factor of 1.15 for sequence alignment and 16.76 for BLAST. While the runtime reduction factor of the global gene sequence alignment application is modest, this value is based on a theoretical sequential runtime extrapolated from the calculation of a smaller problem. Because this runtime is extrapolated from running the calculation in memory, the theoretical sequential runtime would require 37.3 GB of memory on a single system. With this in mind, the BOINC implementation not only offers the reduced runtime, but also the aggregation of the available memory of all participant nodes. If an actual sequential run of the problem were compared, a more drastic reduction in the runtime would be seen due to an additional secondary storage I/O overhead for a practical system. Despite the limitations of the public computer architecture, most notably in communication overhead, it represents a practical platform for grid- and cluster-scale bioinformatics computations today and shows great potential for future implementations.
Domain decomposition algorithms and computation fluid dynamics
NASA Technical Reports Server (NTRS)
Chan, Tony F.
1988-01-01
In the past several years, domain decomposition was a very popular topic, partly motivated by the potential of parallelization. While a large body of theory and algorithms were developed for model elliptic problems, they are only recently starting to be tested on realistic applications. The application of some of these methods to two model problems in computational fluid dynamics are investigated. Some examples are two dimensional convection-diffusion problems and the incompressible driven cavity flow problem. The construction and analysis of efficient preconditioners for the interface operator to be used in the iterative solution of the interface solution is described. For the convection-diffusion problems, the effect of the convection term and its discretization on the performance of some of the preconditioners is discussed. For the driven cavity problem, the effectiveness of a class of boundary probe preconditioners is discussed.
Algorithmic support for commodity-based parallel computing systems.
Leung, Vitus Joseph; Bender, Michael A.; Bunde, David P.; Phillips, Cynthia Ann
2003-10-01
The Computational Plant or Cplant is a commodity-based distributed-memory supercomputer under development at Sandia National Laboratories. Distributed-memory supercomputers run many parallel programs simultaneously. Users submit their programs to a job queue. When a job is scheduled to run, it is assigned to a set of available processors. Job runtime depends not only on the number of processors but also on the particular set of processors assigned to it. Jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This report introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in Release 2.0 of the Cplant System Software that was phased into the Cplant systems at Sandia by May 2002. Experimental results then demonstrated that the average number of communication hops between the processors allocated to a job strongly correlates with the job's completion time. This report also gives processor-allocation algorithms for minimizing the average number of communication hops between the assigned processors for grid architectures. The associated clustering problem is as follows: Given n points in {Re}d, find k points that minimize their average pairwise L{sub 1} distance. Exact and approximate algorithms are given for these optimization problems. One of these algorithms has been implemented on Cplant and will be included in Cplant System Software, Version 2.1, to be released. In more preliminary work, we suggest improvements to the scheduler separate from the allocator.
Detecting Neonatal Seizures With Computer Algorithms.
Temko, Andriy; Lightbody, Gordon
2016-10-01
It is now generally accepted that EEG is the only reliable way to accurately detect newborn seizures and, as such, prolonged EEG monitoring is increasingly being adopted in neonatal intensive care units. Long EEG recordings may last from several hours to a few days. With neurophysiologists not always available to review the EEG during unsociable hours, there is a pressing need to develop a reliable and robust automatic seizure detection method-a computer algorithm that can take the EEG signal, process it, and output information that supports clinical decision making. In this study, we review existing algorithms based on how the relevant seizure information is exploited. We start with commonly used methods to extract signatures from seizure signals that range from those that mimic the clinical neurophysiologist to those that exploit mathematical models of neonatal EEG generation. Commonly used classification methods are reviewed that are based on a set of rules and thresholds that are either heuristically tuned or automatically derived from the data. These are followed by techniques to use information about spatiotemporal seizure context. The usual errors in system design and validation are discussed. Current clinical decision support tools that have met regulatory requirements and are available to detect neonatal seizures are reviewed with progress and the outstanding challenges are outlined. This review discusses the current state of the art regarding automatic detection of neonatal seizures.
The computational structural mechanics testbed architecture. Volume 2: The interface
NASA Technical Reports Server (NTRS)
Felippa, Carlos A.
1988-01-01
This is the third set of five volumes which describe the software architecture for the Computational Structural Mechanics Testbed. Derived from NICE, an integrated software system developed at Lockheed Palo Alto Research Laboratory, the architecture is composed of the command language CLAMP, the command language interpreter CLIP, and the data manager GAL. Volumes 1, 2, and 3 (NASA CR's 178384, 178385, and 178386, respectively) describe CLAMP and CLIP and the CLIP-processor interface. Volumes 4 and 5 (NASA CR's 178387 and 178388, respectively) describe GAL and its low-level I/O. CLAMP, an acronym for Command Language for Applied Mechanics Processors, is designed to control the flow of execution of processors written for NICE. Volume 3 describes the CLIP-Processor interface and related topics. It is intended only for processor developers.
The computational structural mechanics testbed architecture. Volume 2: Directives
NASA Technical Reports Server (NTRS)
Felippa, Carlos A.
1989-01-01
This is the second of a set of five volumes which describe the software architecture for the Computational Structural Mechanics Testbed. Derived from NICE, an integrated software system developed at Lockheed Palo Alto Research Laboratory, the architecture is composed of the command language (CLAMP), the command language interpreter (CLIP), and the data manager (GAL). Volumes 1, 2, and 3 (NASA CR's 178384, 178385, and 178386, respectively) describe CLAMP and CLIP and the CLIP-processor interface. Volumes 4 and 5 (NASA CR's 178387 and 178388, respectively) describe GAL and its low-level I/O. CLAMP, an acronym for Command Language for Applied Mechanics Processors, is designed to control the flow of execution of processors written for NICE. Volume 2 describes the CLIP directives in detail. It is intended for intermediate and advanced users.
Thrifty: An Exascale Architecture for Energy Proportional Computing
Torrellas, Josep
2014-12-23
The objective of this project is to design different aspects of a novel exascale architecture called Thrifty. Our goal is to focus on the challenges of power/energy efficiency, performance, and resiliency in exascale systems. The project includes work on computer architecture (Josep Torrellas from University of Illinois), compilation (Daniel Quinlan from Lawrence Livermore National Laboratory), runtime and applications (Laura Carrington from University of California San Diego), and circuits (Wilfred Pinfold from Intel Corporation). In this report, we focus on the progress at the University of Illinois during the last year of the grant (September 1, 2013 to August 31, 2014). We also point to the progress in the other collaborating institutions when needed.
The computational structural mechanics testbed architecture. Volume 1: The language
NASA Technical Reports Server (NTRS)
Felippa, Carlos A.
1988-01-01
This is the first set of five volumes which describe the software architecture for the Computational Structural Mechanics Testbed. Derived from NICE, an integrated software system developed at Lockheed Palo Alto Research Laboratory, the architecture is composed of the command language CLAMP, the command language interpreter CLIP, and the data manager GAL. Volumes 1, 2, and 3 (NASA CR's 178384, 178385, and 178386, respectively) describe CLAMP and CLIP, and the CLIP-processor interface. Volumes 4 and 5 (NASA CR's 178387 and 178388, respectively) describe GAL and its low-level I/O. CLAMP, an acronym for Command Language for Applied Mechanics Processors, is designed to control the flow of execution of processors written for NICE. Volume 1 presents the basic elements of the CLAMP language and is intended for all users.
NASA Technical Reports Server (NTRS)
Rutishauser, David
2006-01-01
The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters
Fast algorithm for computing complex number-theoretic transforms
NASA Technical Reports Server (NTRS)
Reed, I. S.; Liu, K. Y.; Truong, T. K.
1977-01-01
A high-radix FFT algorithm for computing transforms over FFT, where q is a Mersenne prime, is developed to implement fast circular convolutions. This new algorithm requires substantially fewer multiplications than the conventional FFT.
Earth Tide Algorithms for the OMNIS Computer Program System.
1986-04-01
This report presents five computer algorithms that jointly specify the gravitational action by which the tidal redistributions of the Earth’s masses...routine is a simplified version of the fourth and is provided for use during computer program verification. All computer algorithms express the tidal
Evaluation of leading scalar and vector architectures for scientific computations
Simon, Horst D.; Oliker, Leonid; Canning, Andrew; Carter, Jonathan; Ethier, Stephane; Shalf, John
2004-04-20
The growing gap between sustained and peak performance for scientific applications is a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to reduce this gap for many computational science codes and deliver a substantial increase in computing capabilities. This project examines the performance of the cacheless vector Earth Simulator (ES) and compares it to superscalar cache-based IBM Power3 system. Results demonstrate that the ES is significantly faster than the Power3 architecture, highlighting the tremendous potential advantage of the ES for numerical simulation. However, vectorization of a particle-in-cell application (GTC) greatly increased the memory footprint preventing loop-level parallelism and limiting scalability potential.
Biomorphic Multi-Agent Architecture for Persistent Computing
NASA Technical Reports Server (NTRS)
Lodding, Kenneth N.; Brewster, Paul
2009-01-01
A multi-agent software/hardware architecture, inspired by the multicellular nature of living organisms, has been proposed as the basis of design of a robust, reliable, persistent computing system. Just as a multicellular organism can adapt to changing environmental conditions and can survive despite the failure of individual cells, a multi-agent computing system, as envisioned, could adapt to changing hardware, software, and environmental conditions. In particular, the computing system could continue to function (perhaps at a reduced but still reasonable level of performance) if one or more component( s) of the system were to fail. One of the defining characteristics of a multicellular organism is unity of purpose. In biology, the purpose is survival of the organism. The purpose of the proposed multi-agent architecture is to provide a persistent computing environment in harsh conditions in which repair is difficult or impossible. A multi-agent, organism-like computing system would be a single entity built from agents or cells. Each agent or cell would be a discrete hardware processing unit that would include a data processor with local memory, an internal clock, and a suite of communication equipment capable of both local line-of-sight communications and global broadcast communications. Some cells, denoted specialist cells, could contain such additional hardware as sensors and emitters. Each cell would be independent in the sense that there would be no global clock, no global (shared) memory, no pre-assigned cell identifiers, no pre-defined network topology, and no centralized brain or control structure. Like each cell in a living organism, each agent or cell of the computing system would contain a full description of the system encoded as genes, but in this case, the genes would be components of a software genome.
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob
2003-01-01
The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.
Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations
Oliker, Leonid; Canning, Andrew; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; Van der Wijngaart, Rob
2003-05-01
The growing gap between sustained and peak performance for scientific applications is a well-known problem in high end computing. The recent development of parallel vector systems offers the potential to bridge this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of scientific computing areas. First, we present the performance of a microbenchmark suite that examines low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Results demonstrate that the SX-6 achieves high performance on a large fraction of our applications and often significantly out performs the cache-based architectures. However, certain applications are not easily amenable to vectorization and would re quire extensive algorithm and implementation reengineering to utilize the SX-6 effectively.
A Component Architecture for High-Performance Computing
Bernholdt, D E; Elwasif, W R; Kohl, J A; Epperly, T G W
2003-01-21
The Common Component Architecture (CCA) provides a means for developers to manage the complexity of large-scale scientific software systems and to move toward a ''plug and play'' environment for high-performance computing. The CCA model allows for a direct connection between components within the same process to maintain performance on inter-component calls. It is neutral with respect to parallelism, allowing components to use whatever means they desire to communicate within their parallel ''cohort.'' We will discuss in detail the importance of performance in the design of the CCA and will analyze the performance costs associated with features of the CCA.
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)
NASA Technical Reports Server (NTRS)
Straeter, T. A.; Markos, A. T.
1975-01-01
A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
Supporting Undergraduate Computer Architecture Students Using a Visual MIPS64 CPU Simulator
ERIC Educational Resources Information Center
Patti, D.; Spadaccini, A.; Palesi, M.; Fazzino, F.; Catania, V.
2012-01-01
The topics of computer architecture are always taught using an Assembly dialect as an example. The most commonly used textbooks in this field use the MIPS64 Instruction Set Architecture (ISA) to help students in learning the fundamentals of computer architecture because of its orthogonality and its suitability for real-world applications. This…
PIC codes for plasma accelerators on emerging computer architectures (GPUS, Multicore/Manycore CPUS)
NASA Astrophysics Data System (ADS)
Vincenti, Henri
2016-03-01
The advent of exascale computers will enable 3D simulations of a new laser-plasma interaction regimes that were previously out of reach of current Petasale computers. However, the paradigm used to write current PIC codes will have to change in order to fully exploit the potentialities of these new computing architectures. Indeed, achieving Exascale computing facilities in the next decade will be a great challenge in terms of energy consumption and will imply hardware developments directly impacting our way of implementing PIC codes. As data movement (from die to network) is by far the most energy consuming part of an algorithm future computers will tend to increase memory locality at the hardware level and reduce energy consumption related to data movement by using more and more cores on each compute nodes (''fat nodes'') that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, CPU machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. GPU's also have a reduced clock speed per core and can process Multiple Instructions on Multiple Datas (MIMD). At the software level Particle-In-Cell (PIC) codes will thus have to achieve both good memory locality and vectorization (for Multicore/Manycore CPU) to fully take advantage of these upcoming architectures. In this talk, we present the portable solutions we implemented in our high performance skeleton PIC code PICSAR to both achieve good memory locality and cache reuse as well as good vectorization on SIMD architectures. We also present the portable solutions used to parallelize the Pseudo-sepctral quasi-cylindrical code FBPIC on GPUs using the Numba python compiler.
NASA Technical Reports Server (NTRS)
Rickard, D. A.; Bodenheimer, R. E.
1976-01-01
Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.
The Snowcloud System: Architecture and Algorithms for Snow Hydrology Studies
NASA Astrophysics Data System (ADS)
Skalka, C.; Brown, I.; Frolik, J.
2013-12-01
Snowcloud is an embedded data collection system for snow hydrology field research campaigns conducted in harsh climates and remote areas. The system combines distributed wireless sensor network technology and computational techniques to provide data at lower cost and higher spatio-temporal resolution than ground-based systems using traditional methods. Snowcloud has seen multiple Winter deployments in settings ranging from high desert to arctic, resulting in over a dozen node-years of practical experience. The Snowcloud system architecture consists of multiple TinyOS mesh-networked sensor stations collecting environmental data above and, in some deployments, below the snowpack. Monitored data modalities include snow depth, ground and air temperature, PAR and leaf-area index (LAI), and soil moisture. To enable power cycling and control of multiple sensors a custom power and sensor conditioning board was developed. The electronics and structural systems for individual stations have been designed and tested (in the lab and in situ) for ease of assembly and robustness to harsh winter conditions. Battery systems and solar chargers enable seasonal operation even under low/no light arctic conditions. Station costs range between 500 and 1000 depending on the instrumentation suite. For remote field locations, a custom designed hand-held device and data retrieval protocol serves as the primary data collection method. We are also developing and testing a Gateway device that will report data in near-real-time (NRT) over a cellular connection. Data is made available to users via web interfaces that also provide basic data analysis and visualization tools. For applications to snow hydrology studies, the better spatiotemporal resolution of snowpack data provided by Snowcloud is beneficial in several aspects. It provides insight into snowpack evolution, and allows us to investigate differences across different spatial and temporal scales in deployment areas. It enables the
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures
NASA Technical Reports Server (NTRS)
Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland
1998-01-01
Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.
Sorting on STAR. [CDC computer algorithm timing comparison
NASA Technical Reports Server (NTRS)
Stone, H. S.
1978-01-01
Timing comparisons are given for three sorting algorithms written for the CDC STAR computer. One algorithm is Hoare's (1962) Quicksort, which is the fastest or nearly the fastest sorting algorithm for most computers. A second algorithm is a vector version of Quicksort that takes advantage of the STAR's vector operations. The third algorithm is an adaptation of Batcher's (1968) sorting algorithm, which makes especially good use of vector operations but has a complexity of N(log N)-squared as compared with a complexity of N log N for the Quicksort algorithms. In spite of its worse complexity, Batcher's sorting algorithm is competitive with the serial version of Quicksort for vectors up to the largest that can be treated by STAR. Vector Quicksort outperforms the other two algorithms and is generally preferred. These results indicate that unusual instruction sets can introduce biases in program execution time that counter results predicted by worst-case asymptotic complexity analysis.
A scalable architecture for quantum computation with molecular nanomagnets.
Jenkins, M D; Zueco, D; Roubeau, O; Aromí, G; Majer, J; Luis, F
2016-11-14
A proposal for a magnetic quantum processor that consists of individual molecular spins coupled to superconducting coplanar resonators and transmission lines is carefully examined. We derive a simple magnetic quantum electrodynamics Hamiltonian to describe the underlying physics. It is shown that these hybrid devices can perform arbitrary operations on each spin qubit and induce tunable interactions between any pair of them. The combination of these two operations ensures that the processor can perform universal quantum computations. The feasibility of this proposal is critically discussed using the results of realistic calculations, based on parameters of existing devices and molecular qubits. These results show that the proposal is feasible, provided that molecules with sufficiently long coherence times can be developed and accurately integrated into specific areas of the device. This architecture has an enormous potential for scaling up quantum computation thanks to the microscopic nature of the individual constituents, the molecules, and the possibility of using their internal spin degrees of freedom.
Low-cost space-varying FIR filter architecture for computational imaging systems
NASA Astrophysics Data System (ADS)
Feng, Guotong; Shoaib, Mohammed; Schwartz, Edward L.; Dirk Robinson, M.
2010-01-01
Recent research demonstrates the advantage of designing electro-optical imaging systems by jointly optimizing the optical and digital subsystems. The optical systems designed using this joint approach intentionally introduce large and often space-varying optical aberrations that produce blurry optical images. Digital sharpening restores reduced contrast due to these intentional optical aberrations. Computational imaging systems designed in this fashion have several advantages including extended depth-of-field, lower system costs, and improved low-light performance. Currently, most consumer imaging systems lack the necessary computational resources to compensate for these optical systems with large aberrations in the digital processor. Hence, the exploitation of the advantages of the jointly designed computational imaging system requires low-complexity algorithms enabling space-varying sharpening. In this paper, we describe a low-cost algorithmic framework and associated hardware enabling the space-varying finite impulse response (FIR) sharpening required to restore largely aberrated optical images. Our framework leverages the space-varying properties of optical images formed using rotationally-symmetric optical lens elements. First, we describe an approach to leverage the rotational symmetry of the point spread function (PSF) about the optical axis allowing computational savings. Second, we employ a specially designed bank of sharpening filters tuned to the specific radial variation common to optical aberrations. We evaluate the computational efficiency and image quality achieved by using this low-cost space-varying FIR filter architecture.
Norton, Kerri-Ann; Namazi, Sameera; Barnard, Nicola; Fujibayashi, Mariko; Bhanot, Gyan; Ganesan, Shridar; Iyatomi, Hitoshi; Ogawa, Koichi; Shinbrot, Troy
2012-01-01
Ductal carcinoma in situ (DCIS) is a pre-invasive carcinoma of the breast that exhibits several distinct morphologies but the link between morphology and patient outcome is not clear. We hypothesize that different mechanisms of growth may still result in similar 2D morphologies, which may look different in 3D. To elucidate the connection between growth and 3D morphology, we reconstruct the 3D architecture of cribriform DCIS from resected patient material. We produce a fully automated algorithm that aligns, segments, and reconstructs 3D architectures from microscopy images of 2D serial sections from human specimens. The alignment algorithm is based on normalized cross correlation, the segmentation algorithm uses histogram equilization, Otsu's thresholding, and morphology techniques to segment the duct and cribra. The reconstruction method combines these images in 3D. We show that two distinct 3D architectures are indeed found in samples whose 2D histological sections are similarly identified as cribriform DCIS. These differences in architecture support the hypothesis that luminal spaces may form due to different mechanisms, either isolated cell death or merging fronds, leading to the different architectures. We find that out of 15 samples, 6 were found to have ‘bubble-like’ cribra, 6 were found to have ‘tube-like’ criba and 3 were ‘unknown.’ We propose that the 3D architectures found, ‘bubbles’ and ‘tubes’, account for some of the heterogeneity of the disease and may be prognostic indicators of different patient outcomes. PMID:22970156
Computational architecture for image processing on a small unmanned ground vehicle
NASA Astrophysics Data System (ADS)
Ho, Sean; Nguyen, Hung
2010-08-01
Man-portable Unmanned Ground Vehicles (UGVs) have been fielded on the battlefield with limited computing power. This limitation constrains their use primarily to teleoperation control mode for clearing areas and bomb defusing. In order to extend their capability to include the reconnaissance and surveillance missions of dismounted soldiers, a separate processing payload is desired. This paper presents a processing architecture and the design details on the payload module that enables the PackBot to perform sophisticated, real-time image processing algorithms using data collected from its onboard imaging sensors including LADAR, IMU, visible, IR, stereo, and the Ladybug spherical cameras. The entire payload is constructed from currently available Commercial off-the-shelf (COTS) components including an Intel multi-core CPU and a Nvidia GPU. The result of this work enables a small UGV to perform computationally expensive image processing tasks that once were only feasible on a large workstation.
A smart sensor architecture based on emergent computation in an array of outer-totalistic cells
NASA Astrophysics Data System (ADS)
Dogaru, Radu; Dogaru, Ioana; Glesner, Manfred
2005-06-01
A novel smart-sensor architecture is proposed, capable to segment and recognize characters in a monochrome image. It is capable to provide a list of ASCII codes representing the recognized characters from the monochrome visual field. It can operate as a blind's aid or for industrial applications. A bio-inspired cellular model with simple linear neurons was found the best to perform the nontrivial task of cropping isolated compact objects such as handwritten digits or characters. By attaching a simple outer-totalistic cell to each pixel sensor, emergent computation in the resulting cellular automata lattice provides a straightforward and compact solution to the otherwise computationally intensive problem of character segmentation. A simple and robust recognition algorithm is built in a compact sequential controller accessing the array of cells so that the integrated device can provide directly a list of codes of the recognized characters. Preliminary simulation tests indicate good performance and robustness to various distortions of the visual field.
Examining the architecture of cellular computing through a comparative study with a computer.
Wang, Degeng; Gribskov, Michael
2005-06-22
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software-hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's "hardware" equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the "bandwidth" of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed.
An architecture for quantum computation with magnetically trapped Holmium atoms
NASA Astrophysics Data System (ADS)
Saffman, Mark; Hostetter, James; Booth, Donald; Collett, Jeffrey
2016-05-01
Outstanding challenges for scalable neutral atom quantum computation include correction of atom loss due to collisions with untrapped background gas, reduction of crosstalk during state preparation and measurement due to scattering of near resonant light, and the need to improve quantum gate fidelity. We present a scalable architecture based on loading single Holmium atoms into an array of Ioffe-Pritchard traps. The traps are formed by grids of superconducting wires giving a trap array with 40 μm period, suitable for entanglement via long range Rydberg gates. The states | F = 5 , M = 5 > and | F = 7 , M = 7 > provide a magic trapping condition at a low field of 3.5 G for long coherence time qubit encoding. The F = 11 level will be used for state preparation and measurement. The availability of different states for encoding, gate operations, and measurement, spectroscopically isolates the different operations and will prevent crosstalk to neighboring qubits. Operation in a cryogenic environment with ultra low pressure will increase atom lifetime and Rydberg gate fidelity by reduction of blackbody induced Rydberg decay. We will present a complete description of the architecture including estimates of achievable performance metrics. Work supported by NSF award PHY-1404357.
Parallel algorithm for computation of second-order sequential best rotations
NASA Astrophysics Data System (ADS)
Redif, Soydan; Kasap, Server
2013-12-01
Algorithms for computing an approximate polynomial matrix eigenvalue decomposition of para-Hermitian systems have emerged as a powerful, generic signal processing tool. A technique that has shown much success in this regard is the sequential best rotation (SBR2) algorithm. Proposed is a scheme for parallelising SBR2 with a view to exploiting the modern architectural features and inherent parallelism of field-programmable gate array (FPGA) technology. Experiments show that the proposed scheme can achieve low execution times while requiring minimal FPGA resources.
Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching
Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel
2012-12-28
DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data, which needs to be matched against exponentially growing databases of known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also include heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variability, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. In this paper, we discuss the implementation of the Aho-Corasick algorithm for GPU-accelerated high performance systems. We present an optimized implementation of Aho-Corasick for GPUs and discuss its tradeoffs on the Tesla T10 and he new Tesla T20 (codename Fermi) GPUs. We then integrate the optimized GPU code, respectively, in a MPI-based and in a pthreads-based load balancer to enable execution of the algorithm on clusters and large sharedmemory multiprocessors (SMPs) accelerated with multiple GPUs.
An Integrated Architecture and Feature Selection Algorithm for Radial Basis Neural Networks
2002-03-01
The research contribution of this thesis is the first known integrated architecture and feature selection algorithm for Radial Basis Neural Networks (RBNNs...Additionally, this thesis compares three different classification techniques, Discriminant Analysis (DA), Feed-Forward Neural Networks (FFN) and RBNNs against
Image-restoration algorithms for a fully connected architecture.
Abbiss, J B; Brames, B J; Byrne, C L; Fiddy, M A
1990-06-15
We describe the implementation of a technique for achieving image superresolution using a fully connected network of simple processors operating in an iterative mode. We show that an updating scheme can be specified that ensures convergence for the serial (asynchronous) updating case. With the appropriate hardware, parallel (synchronous) updating becomes of particular interest because of the potential for accelerated convergence; it is this approach that we envisage implementing in optical hardware. For this case also, we present a convergent scheme that can be related to a regularized form of the Gerchberg-Papoulis algorithm.
New SIMD Algorithms for Cluster Labeling on Parallel Computers
NASA Astrophysics Data System (ADS)
Apostolakis, John; Coddington, Paul; Marinari, Enzo
Cluster algorithms are non-local Monte Carlo update schemes which can greatly increase the efficiency of computer simulations of spin models of magnets. The major computational task in these algorithms is connected component labeling, to identify clusters of connected sites on a lattice. We have devised some new SIMD component labeling algorithms, and implemented them on the Connection Machine. We investigate their performance when applied to the cluster update of the two-dimensional Ising spin model. These algorithms could also be applied to other problems which use connected component labeling, such as percolation and image analysis.
NASA Astrophysics Data System (ADS)
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Computationally efficient algorithms for real-time attitude estimation
NASA Technical Reports Server (NTRS)
Pringle, Steven R.
1993-01-01
For many practical spacecraft applications, algorithms for determining spacecraft attitude must combine inputs from diverse sensors and provide redundancy in the event of sensor failure. A Kalman filter is suitable for this task, however, it may impose a computational burden which may be avoided by sub optimal methods. A suboptimal estimator is presented which was implemented successfully on the Delta Star spacecraft which performed a 9 month SDI flight experiment in 1989. This design sought to minimize algorithm complexity to accommodate the limitations of an 8K guidance computer. The algorithm used is interpreted in the framework of Kalman filtering and a derivation is given for the computation.
A novel bit-quad-based Euler number computing algorithm.
Yao, Bin; He, Lifeng; Kang, Shiying; Chao, Yuyan; Zhao, Xiao
2015-01-01
The Euler number of a binary image is an important topological property in computer vision and pattern recognition. This paper proposes a novel bit-quad-based Euler number computing algorithm. Based on graph theory and analysis on bit-quad patterns, our algorithm only needs to count two bit-quad patterns. Moreover, by use of the information obtained during processing the previous bit-quad, the average number of pixels to be checked for processing a bit-quad is only 1.75. Experimental results demonstrated that our method outperforms significantly conventional Euler number computing algorithms.
Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures
Dongarra, Jack
2013-03-14
There is a widening gap between the peak performance of high performance computers and the performance realized by full applications. Over the next decade, extreme-scale systems will present major new challenges to software development that could widen the gap so much that it prevents the productive use of future DOE Leadership computers.
NASA Technical Reports Server (NTRS)
Gentzsch, W.
1982-01-01
Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.
The RISC (Reduced Instruction Set Computer) Architecture and Computer Performance Evaluation.
1986-03-01
ISPONSORING O b. OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER *ORGANIZATION (it applicable) Sc aDDRE SS (City, State, and ZIP Code) 10 SOURCE...began by making an identification and char- acterization of a new and controversial type of computer architecture called RISC for Reduced Instruction...1000 Lisboa Portugal 6. Manuel Pedrosa de Barros 4 Celula 5 Bloco 5 Lote D, 3 Direito 2795 Linda-a-Velha Portugal t~m " 96" ..... ...... |f
E-Governance and Service Oriented Computing Architecture Model
NASA Astrophysics Data System (ADS)
Tejasvee, Sanjay; Sarangdevot, S. S.
2010-11-01
E-Governance is the effective application of information communication and technology (ICT) in the government processes to accomplish safe and reliable information lifecycle management. Lifecycle of the information involves various processes as capturing, preserving, manipulating and delivering information. E-Governance is meant to transform of governance in better manner to the citizens which is transparent, reliable, participatory, and accountable in point of view. The purpose of this paper is to attempt e-governance model, focus on the Service Oriented Computing Architecture (SOCA) that includes combination of information and services provided by the government, innovation, find out the way of optimal service delivery to citizens and implementation in transparent and liable practice. This paper also try to enhance focus on the E-government Service Manager as a essential or key factors service oriented and computing model that provides a dynamically extensible structural design in which all area or branch can bring in innovative services. The heart of this paper examine is an intangible model that enables E-government communication for trade and business, citizen and government and autonomous bodies.
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation
NASA Technical Reports Server (NTRS)
Sterling, Thomas; Bergman, Larry
2000-01-01
Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention
High-order hydrodynamic algorithms for exascale computing
Morgan, Nathaniel Ray
2016-02-05
Hydrodynamic algorithms are at the core of many laboratory missions ranging from simulating ICF implosions to climate modeling. The hydrodynamic algorithms commonly employed at the laboratory and in industry (1) typically lack requisite accuracy for complex multi- material vortical flows and (2) are not well suited for exascale computing due to poor data locality and poor FLOP/memory ratios. Exascale computing requires advances in both computer science and numerical algorithms. We propose to research the second requirement and create a new high-order hydrodynamic algorithm that has superior accuracy, excellent data locality, and excellent FLOP/memory ratios. This proposal will impact a broad range of research areas including numerical theory, discrete mathematics, vorticity evolution, gas dynamics, interface instability evolution, turbulent flows, fluid dynamics and shock driven flows. If successful, the proposed research has the potential to radically transform simulation capabilities and help position the laboratory for computing at the exascale.
Parallel algorithms for computation of the manipulator inertia matrix
NASA Technical Reports Server (NTRS)
Amin-Javaheri, Masoud; Orin, David E.
1989-01-01
The development of an O(log2N) parallel algorithm for the manipulator inertia matrix is presented. It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal elements of the matrix. It results in O(log2N) levels of computation. Computation of the off-diagonal elements involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position and orientation transforms for the manipulator, is developed. The O(log2N) algorithm is presented in both equation and graphic forms which clearly show the parallelism inherent in the algorithm.
Efficient Homotopy Continuation Algorithms with Application to Computational Fluid Dynamics
NASA Astrophysics Data System (ADS)
Brown, David A.
New homotopy continuation algorithms are developed and applied to a parallel implicit finite-difference Newton-Krylov-Schur external aerodynamic flow solver for the compressible Euler, Navier-Stokes, and Reynolds-averaged Navier-Stokes equations with the Spalart-Allmaras one-equation turbulence model. Many new analysis tools, calculations, and numerical algorithms are presented for the study and design of efficient and robust homotopy continuation algorithms applicable to solving very large and sparse nonlinear systems of equations. Several specific homotopies are presented and studied and a methodology is presented for assessing the suitability of specific homotopies for homotopy continuation. . A new class of homotopy continuation algorithms, referred to as monolithic homotopy continuation algorithms, is developed. These algorithms differ from classical predictor-corrector algorithms by combining the predictor and corrector stages into a single update, significantly reducing the amount of computation and avoiding wasted computational effort resulting from over-solving in the corrector phase. The new algorithms are also simpler from a user perspective, with fewer input parameters, which also improves the user's ability to choose effective parameters on the first flow solve attempt. Conditional convergence is proved analytically and studied numerically for the new algorithms. The performance of a fully-implicit monolithic homotopy continuation algorithm is evaluated for several inviscid, laminar, and turbulent flows over NACA 0012 airfoils and ONERA M6 wings. The monolithic algorithm is demonstrated to be more efficient than the predictor-corrector algorithm for all applications investigated. It is also demonstrated to be more efficient than the widely-used pseudo-transient continuation algorithm for all inviscid and laminar cases investigated, and good performance scaling with grid refinement is demonstrated for the inviscid cases. Performance is also demonstrated
Electro-Optic Computing Architectures: Volume II. Components and System Design and Analysis
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit
ERIC Educational Resources Information Center
Guidon, Jacques; Pierre, Samuel
1996-01-01
Discusses the use of computers in education and training and proposes a client-server architecture for an experimental computer environment as an approach to a virtual classroom. Highlights include the World Wide Web and client software, document delivery, hardware architecture, and Internet resources and services. (Author/LRW)
A computational fluid dynamics algorithm on a massively parallel computer
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.; Levit, Creon
1989-01-01
The implementation and performance of a finite-difference algorithm for the compressible Navier-Stokes equations in two or three dimensions on the Connection Machine are described. This machine is a single-instruction multiple-data machine with up to 65536 physical processors. The implicit portion of the algorithm is of particular interest. Running times and megadrop rates are given for two- and three-dimensional problems. Included are comparisons with the standard codes on a Cray X-MP/48.
Adiabatic Quantum Computing and Quantum Walks: Algorithms and Architectures
2011-02-15
0807.0929 Title: Environment-Assisted Quantum Transport Authors: Patrick Rebentrost, Masoud Mohseni, Ivan Kassal, Seth Lloyd, Alán Aspuru-Guzik...this effect, Environment Assisted Quantum Transport (ENAQT).The use of environmental effects to enhance transport rates appears to be ubiquitous in
Parallel-Computing Architecture for JWST Wavefront-Sensing Algorithms
2011-09-01
Hubble Space Telescope and will be NASA’s premier observatory of the next decade. Image-based wavefront sensing (phase retrieval) is the primary...INTRODUCTION The James Webb Space Telescope (JWST) is the next-generation successor to the Hubble Space Telescope . It is a large, space -based infrared...ABSTRACT The James Webb Space Telescope (JWST) is the successor to the Hubble Space Telescope and will be NASA?s premier
NASA Technical Reports Server (NTRS)
Ioup, G. E.
1985-01-01
The development of vector processing computers of the streaming array architectures has made possible a dramatic decrease in the time required for the solution of problems, i.e., for those algorithms which readily lend themselves to sequential operations on long vectors. There has concurrently been rapid growth in the applications of the techniques generally known as mathematical digital filtering, also called signal analysis, digital signal processing, time series analysis, and digital image processing. The best known applications of these techniques is to seismic data and two-dimensional images, data types consisting of very large collections of numbers. A major limitation for these cases is the size and speed of the computer available. Therefore, a move to the streaming array architecture can result in marked improvement in the data analysis techniques which may be employed.
Algorithm implementation on the Navier-Stokes computer
NASA Technical Reports Server (NTRS)
Krist, Steven E.; Zang, Thomas A.
1987-01-01
The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Hybrid VLSI/QCA Architecture for Computing FFTs
NASA Technical Reports Server (NTRS)
Fijany, Amir; Toomarian, Nikzad; Modarres, Katayoon; Spotnitz, Matthew
2003-01-01
A data-processor architecture that would incorporate elements of both conventional very-large-scale integrated (VLSI) circuitry and quantum-dot cellular automata (QCA) has been proposed to enable the highly parallel and systolic computation of fast Fourier transforms (FFTs). The proposed circuit would complement the QCA-based circuits described in several prior NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; and Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35. The cited prior articles described the limitations of very-large-scale integrated (VLSI) circuitry and the major potential advantage afforded by QCA. To recapitulate: In a VLSI circuit, signal paths that are required not to interact with each other must not cross in the same plane. In contrast, for reasons too complex to describe in the limited space available for this article, suitably designed and operated QCAbased signal paths that are required not to interact with each other can nevertheless be allowed to cross each other in the same plane without adverse effect. In principle, this characteristic could be exploited to design compact, coplanar, simple (relative to VLSI) QCA-based networks to implement complex, advanced interconnection schemes.
Algorithm for Computing Particle/Surface Interactions
NASA Technical Reports Server (NTRS)
Hughes, David W.
2009-01-01
An algorithm has been devised for predicting the behaviors of sparsely spatially distributed particles impinging on a solid surface in a rarefied atmosphere. Under the stated conditions, prior particle-transport models in which (1) dense distributions of particles are treated as continuum fluids; or (2) sparse distributions of particles are considered to be suspended in and to diffuse through fluid streams are not valid.
Genetic algorithms in a distributed computing environment using PVM
Cronje, G.A.; Steeb, W.H.
1997-04-01
The Parallel Virtual Machine (PVM) is a software system that enables a collection of heterogeneous computer systems to be used as a coherent and flexible concurrent computation resource. We show that genetic algorithms can be implemented using a Parallel Virtual Machine and C++. Problems with constraints are also discussed.
Limited-data computed tomography algorithms for the physical sciences.
Verhoeven, D
1993-07-10
Five limited-data computed tomography algorithms are compared. The algorithms used are adapted versions of the algebraic reconstruction technique, the multiplicative algebraic reconstruction technique, the Gerchberg-Papoulis algorithm, a spectral extrapolation algorithm descended from that of Harris [J. Opt. Soc. Am. 54, 931-936 (1964)], and an algorithm based on the singular value decomposition technique. These algorithms were used to reconstruct phantom data with realistic levels of noise from a number of different imaging geometries. The phantoms, the imaging geometries, and the noise were chosen to simulate the conditions encountered in typical computed tomography applications in the physical sciences, and the implementations of the algorithms were optimized for these applications. The multiplicative algebraic reconstruction technique algorithm gave the best results overall; the algebraic reconstruction technique gave the best results for very smooth objects or very noisy (20-dB signal-to-noise ratio) data. My implementations of both of these algorithms incorporate apriori knowledge of the sign of the object, its extent, and its smoothness. The smoothness of the reconstruction is enforced through the use of an appropriate object model (by use of cubic B-spline basis functions and a number of object coefficients appropriate to the object being reconstructed). The average reconstruction error was 1.7% of the maximum phantom value with the multiplicative algebraic reconstruction technique of a phantom with moderate-to-steep gradients by use of data from five viewing angles with a 30-dB signal-to-noise ratio.
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-03-24
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today's electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Iterative restoration algorithms for nonlinear constraint computing
NASA Astrophysics Data System (ADS)
Szu, Harold
A general iterative-restoration principle is introduced to facilitate the implementation of nonlinear optical processors. The von Neumann convergence theorem is generalized to include nonorthogonal subspaces which can be reduced to a special orthogonal projection operator by applying an orthogonality condition. This principle is shown to permit derivation of the Jacobi algorithm, the recursive principle, the van Cittert (1931) deconvolution method, the iteration schemes of Gerchberg (1974) and Papoulis (1975), and iteration schemes using two Fourier conjugate domains (e.g., Fienup, 1981). Applications to restoring the image of a double star and division by hard and soft zeros are discussed, and sample results are presented graphically.
Decomposition algorithms for stochastic programming on a computational grid.
Linderoth, J.; Wright, S.; Mathematics and Computer Science; Axioma Inc.
2003-01-01
We describe algorithms for two-stage stochastic linear programming with recourse and their implementation on a grid computing platform. In particular, we examine serial and asynchronous versions of the L-shaped method and a trust-region method. The parallel platform of choice is the dynamic, heterogeneous, opportunistic platform provided by the Condor system. The algorithms are of master-worker type (with the workers being used to solve second-stage problems), and the MW runtime support library (which supports master-worker computations) is key to the implementation. Computational results are presented on large sample-average approximations of problems from the literature.
Some Computer Algorithms to Implement a Reliability Shorthand.
1982-10-01
AD-A123 781 SOME COMPUTER ALGORITHMS TO IMPLEMENT A RELIAILITY /I I SHORTHAND(U) N VAL POSTGRADUATE SCHOOL MONTEREY CA UNCLASSIFIED SGREOC82F/G 12...California THESIS SOME COMPUTER ALGORITHMS TO IMPLEMENT A RELIABILITY SHORTHAND Sadan Gursel October 1982 JAN 26I A :: Thesis Advisor: J. D. Esary...DOCMEWTATION PAGE ISSFORK COMPLZT’Nc FORM .REPORTNMU1EUGW CKO N.3 19IiNI CATALOG mao d. TMTE (od Sid"Ifte) $. ?’V9E OF 1119000 & PEUoOŔ COVERED Some Computer
A class of least-squares filtering and identification algorithms with systolic array architectures
NASA Technical Reports Server (NTRS)
Kalson, Seth Z.; Yao, Kung
1991-01-01
A unified approach is presented for deriving a large class of new and previously known time- and order-recursive least-squares algorithms with systolic array architectures, suitable for high-throughput-rate and VLSI implementations of space-time filtering and system identification problems. The geometrical derivation given is unique in that no assumption is made concerning the rank of the sample data correlation matrix. This method utilizes and extends the concept of oblique projections, as used previously in the derivations of the least-squares lattice algorithms. Exponentially weighted least-squares criteria are considered for both sliding and growing memory.
NASA Technical Reports Server (NTRS)
Liu, Kuojuey Ray
1990-01-01
Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.
Architecture-Based Refinements for Secure Computer Systems Design
2006-01-01
Government. REFERENCES [1] N. S. Rosa, G. R. R. Justo , and P. R. F. Cunha, “A framework for building non-functional software architectures,” in Proc. 2001 ACM...IWSSD’98), 1998, p. 60. [7] N. S. Rosa, G. R. R. Justo , and P. R. F. Cunha, “Incorporating non- functional requirements into software architectures,” in
Limited-data computed tomograpy algorithms for the physical sciences
NASA Astrophysics Data System (ADS)
Verhoeven, Dean
1993-07-01
Results are presented from a comparison of implementations of five computed tomography algorithms which were either designed expressly to work with, or have been shown to work with, limited data and which may be applied to a wide variety of objects. These include the adapted versions of the algebraic reconstruction technique, the multiplicative algebraic reconstruction technique (MART), the Gerchberg-Papoulis algorithgm, a spectral extrapolation algorithm derived from that of Harris (1964), and an algorithm based on the singular value decomposition technique. The algorithms were used to reconstruct phantom data with realistic levels of noise from a number of different imaging geometries. It was found that the MART algorithm has a combination of advantages that makes it superior to other algorithms tested.
On the performances of computer vision algorithms on mobile platforms
NASA Astrophysics Data System (ADS)
Battiato, S.; Farinella, G. M.; Messina, E.; Puglisi, G.; Ravì, D.; Capra, A.; Tomaselli, V.
2012-01-01
Computer Vision enables mobile devices to extract the meaning of the observed scene from the information acquired with the onboard sensor cameras. Nowadays, there is a growing interest in Computer Vision algorithms able to work on mobile platform (e.g., phone camera, point-and-shot-camera, etc.). Indeed, bringing Computer Vision capabilities on mobile devices open new opportunities in different application contexts. The implementation of vision algorithms on mobile devices is still a challenging task since these devices have poor image sensors and optics as well as limited processing power. In this paper we have considered different algorithms covering classic Computer Vision tasks: keypoint extraction, face detection, image segmentation. Several tests have been done to compare the performances of the involved mobile platforms: Nokia N900, LG Optimus One, Samsung Galaxy SII.
Algorithms and software for solving finite element equations on serial and parallel architectures
NASA Technical Reports Server (NTRS)
George, Alan
1989-01-01
Over the past 15 years numerous new techniques have been developed for solving systems of equations and eigenvalue problems arising in finite element computations. A package called SPARSPAK has been developed by the author and his co-workers which exploits these new methods. The broad objective of this research project is to incorporate some of this software in the Computational Structural Mechanics (CSM) testbed, and to extend the techniques for use on multiprocessor architectures.
NASA Astrophysics Data System (ADS)
Azimi, Ehsan; Behrad, Alireza; Ghaznavi-Ghoushchi, Mohammad Bagher; Shanbehzadeh, Jamshid
2016-11-01
The projective model is an important mapping function for the calculation of global transformation between two images. However, its hardware implementation is challenging because of a large number of coefficients with different required precisions for fixed point representation. A VLSI hardware architecture is proposed for the calculation of a global projective model between input and reference images and refining false matches using random sample consensus (RANSAC) algorithm. To make the hardware implementation feasible, it is proved that the calculation of the projective model can be divided into four submodels comprising two translations, an affine model and a simpler projective mapping. This approach makes the hardware implementation feasible and considerably reduces the required number of bits for fixed point representation of model coefficients and intermediate variables. The proposed hardware architecture for the calculation of a global projective model using the RANSAC algorithm was implemented using Verilog hardware description language and the functionality of the design was validated through several experiments. The proposed architecture was synthesized by using an application-specific integrated circuit digital design flow utilizing 180-nm CMOS technology as well as a Virtex-6 field programmable gate array. Experimental results confirm the efficiency of the proposed hardware architecture in comparison with software implementation.
Visual pattern recognition network: its training algorithm and its optoelectronic architecture
NASA Astrophysics Data System (ADS)
Wang, Ning; Liu, Liren
1996-07-01
A visual pattern recognition network and its training algorithm are proposed. The network constructed of a one-layer morphology network and a two-layer modified Hamming net. This visual network can implement invariant pattern recognition with respect to image translation and size projection. After supervised learning takes place, the visual network extracts image features and classifies patterns much the same as living beings do. Moreover we set up its optoelectronic architecture for real-time pattern recognition.
Computational Aspects of Realization & Design Algorithms in Linear Systems Theory.
NASA Astrophysics Data System (ADS)
Tsui, Chia-Chi
Realization and design problems are two major problems in linear time-invariant systems control theory and have been solved theoretically. However, little is understood about their numerical properties. Due to the large scale of the problem and the finite precision of computer computation, it is very important and is the purpose of this study to investigate the computational reliability and efficiency of the algorithms for these two problems. In this dissertation, a reliable algorithm to achieve canonical form realization via Hankel matrix is developed. A comparative study of three general realization algorithms, for both numerical reliability and efficiency, shows that the proposed algorithm (via Hankel matrix) is the most preferable one among the three. The design problems, such as the state feedback design for pole placement, the state observer design, and the low order single and multi-functional observer design, have been solved by using canonical form systems matrices. In this dissertation, a set of algorithms for solving these three design problems is developed and analysed. These algorithms are based on Hessenberg form systems matrices which are numerically more reliable to compute than the canonical form systems matrices.
Durand, Melissa A; Wang, Steven; Hooley, Regina J; Raghu, Madhavi; Philpotts, Liane E
2016-01-01
As use of digital breast tomosynthesis becomes increasingly widespread, new management challenges are inevitable because tomosynthesis may reveal suspicious lesions not visible at conventional two-dimensional (2D) full-field digital mammography. Architectural distortion is a mammographic finding associated with a high positive predictive value for malignancy. It is detected more frequently at tomosynthesis than at 2D digital mammography and may even be occult at conventional 2D imaging. Few studies have focused on tomosynthesis-detected architectural distortions to date, and optimal management of these distortions has yet to be well defined. Since implementing tomosynthesis at our institution in 2011, we have learned some practical ways to assess architectural distortion. Because distortions may be subtle, tomosynthesis localization tools plus improved visualization of adjacent landmarks are crucial elements in guiding mammographic identification of elusive distortions. These same tools can guide more focused ultrasonography (US) of the breast, which facilitates detection and permits US-guided tissue sampling. Some distortions may be sonographically occult, in which case magnetic resonance imaging may be a reasonable option, both to increase diagnostic confidence and to provide a means for image-guided biopsy. As an alternative, tomosynthesis-guided biopsy, conventional stereotactic biopsy (when possible), or tomosynthesis-guided needle localization may be used to achieve tissue diagnosis. Practical uses for tomosynthesis in evaluation of architectural distortion are highlighted, potential complications are identified, and a working algorithm for management of tomosynthesis-detected architectural distortion is proposed.
Computational Algorithms for Device-Circuit Coupling
KEITER, ERIC R.; HUTCHINSON, SCOTT A.; HOEKSTRA, ROBERT J.; RANKIN, ERIC LAMONT; RUSSO, THOMAS V.; WATERS, LON J.
2003-01-01
Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. Similarly, device-scale simulation tools (e.g., DaVinci) are commonly used in the design of individual semiconductor components. Some problems, such as single-event upset (SEU), require the fidelity of a mesh-based device simulator but are only meaningful when dynamically coupled with an external circuit. For such problems a mixed-level simulator is desirable, but the two types of simulation generally have different (sometimes conflicting) numerical requirements. To address these considerations, we have investigated variations of the two-level Newton algorithm, which preserves tight coupling between the circuit and the partial differential equations (PDE) device, while optimizing the numerics for both.
NASA Technical Reports Server (NTRS)
1985-01-01
Slides are reproduced that describe the importance of having high performance number crunching and graphics capability. They also indicate the types of research and development underway at Ames Research Center to ensure that, in the near term, Ames is a smart buyer and user, and in the long-term that Ames knows the best possible solutions for number crunching and graphics needs. The drivers for this research are real computational physics applications of interest to Ames and NASA. They are concerned with how to map the applications, and how to maximize the physics learned from the results of the calculations. The computer graphics activities are aimed at getting maximum information from the three-dimensional calculations by using the real time manipulation of three-dimensional data on the Silicon Graphics workstation. Work is underway on new algorithms that will permit the display of experimental results that are sparse and random, the same way that the dense and regular computed results are displayed.
General purpose architecture for intelligent computer-aided training
NASA Technical Reports Server (NTRS)
Loftin, R. Bowen (Inventor); Wang, Lui (Inventor); Baffes, Paul T. (Inventor); Hua, Grace C. (Inventor)
1994-01-01
An intelligent computer-aided training system having a general modular architecture is provided for use in a wide variety of training tasks and environments. It is comprised of a user interface which permits the trainee to access the same information available in the task environment and serves as a means for the trainee to assert actions to the system; a domain expert which is sufficiently intelligent to use the same information available to the trainee and carry out the task assigned to the trainee; a training session manager for examining the assertions made by the domain expert and by the trainee for evaluating such trainee assertions and providing guidance to the trainee which are appropriate to his acquired skill level; a trainee model which contains a history of the trainee interactions with the system together with summary evaluative data; an intelligent training scenario generator for designing increasingly complex training exercises based on the current skill level contained in the trainee model and on any weaknesses or deficiencies that the trainee has exhibited in previous interactions; and a blackboard that provides a common fact base for communication between the other components of the system. Preferably, the domain expert contains a list of 'mal-rules' which typifies errors that are usually made by novice trainees. Also preferably, the training session manager comprises an intelligent error detection means and an intelligent error handling means. The present invention utilizes a rule-based language having a control structure whereby a specific message passing protocol is utilized with respect to tasks which are procedural or step-by-step in structure. The rules can be activated by the trainee in any order to reach the solution by any valid or correct path.
Optical pattern recognition architecture implementing the mean-square error correlation algorithm
Molley, P.A.
1991-10-22
This patent describes an optical architecture implementing the mean-square error correlation algorithm, MSE = {Sigma}(I {minus} R){sup 2} for discriminating the presence of a reference image R in an input image scene I by computing the mean-square-error between a time-varying reference image signal s{sub 1}(t) and a time-varying input image signal s{sub 2}(t) includes a laser diode light source which is temporally modulated by a double-sideband suppressed-carrier source modulation signal I{sub 1}(t) having the form I{sub 1}(t) = A{sub 1}(1 = sq. root 2m{sub 1}s{sub 1}(t)cos (2{pi} f{sub 0}t)) and the modulated light output from the laser diode source is diffracted by an acousto-optic deflector. The resultant intensity of the +1 diffracted order from the acousto-optic device is given by I{sub 2}(t) = A{sub 2}(+2m{sub 2}{sup 2}s{sub 2}{sup 2}(t) {minus} 2 sq. root 2m{sub 2}(t) cos (2{pi}f{sub 0}t)). The time integration of the two signals I{sub 1}(t) and I{sub 2}(t) on the CCD deflector plane produces the result R{tau} of the mean-square error having the form: R({tau}) = A{sub 1}A{sub 2}{l brace}(T) +(2m{sub 2}{sup 2 {integral} s}{sub 2}{sup 2}(t {minus} {tau})dt) {minus} (2m{sub 1}m{sub 2} cos (2{tau}f{sub 0}{tau}) {integral} s{sub 1}(t)s{sub 2}(t {minus} {tau}) dt){r brace}.
Generic architecture for real-time multisensor fusion tracking algorithm development and evaluation
NASA Astrophysics Data System (ADS)
Queeney, Tom; Woods, Edward
1994-10-01
Westinghouse has developed and demonstrated a system for the rapid prototyping of Sensor Fusion Tracking (SFT) algorithms. The system provides an object-oriented envelope with three sets of generic software objects to aid in the development and evaluation of SFT algorithms. The first is a generic tracker model that encapsulates the idea of a tracker being a series of SFT algorithms along with the data manipulated by those algorithms and is capable of simultaneously supporting multiple, independent trackers. The second is a set of flexible, easily extensible sensor and target models which allows many types of sensors and targets to be used. Live, recorded and simulated sensors and combinations thereof can be utilized as sources for the trackers. The sensor models also provide an easily extensible interface to the generic tracker model so that all sensors provide input to the SFT algorithms in the same fashion. The third is a highly versatile display and user interface that allows easy access to many of the performance measures for sensors and trackers for easy evaluation and debugging of the SFT algorithms. The system is an object-oriented design programmed in C++. This system with several of the SFT algorithms developed for it has been used with live sensors as a real-time tracking system. This paper outlines the salient features of the sensor fusion architecture and programming environment.
Llanes, Antonio; Muñoz, Andrés; Bueno-Crespo, Andrés; García-Valverde, Teresa; Sánchez, Antonia; Arcas-Túnez, Francisco; Pérez-Sánchez, Horacio; Cecilia, José M
2016-01-01
The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.
JPL control/structure interaction test bed real-time control computer architecture
NASA Technical Reports Server (NTRS)
Briggs, Hugh C.
1989-01-01
The Control/Structure Interaction Program is a technology development program for spacecraft that exhibit interactions between the control system and structural dynamics. The program objectives include development and verification of new design concepts - such as active structure - and new tools - such as combined structure and control optimization algorithm - and their verification in ground and possibly flight test. A focus mission spacecraft was designed based upon a space interferometer and is the basis for design of the ground test article. The ground test bed objectives include verification of the spacecraft design concepts, the active structure elements and certain design tools such as the new combined structures and controls optimization tool. In anticipation of CSI technology flight experiments, the test bed control electronics must emulate the computation capacity and control architectures of space qualifiable systems as well as the command and control networks that will be used to connect investigators with the flight experiment hardware. The Test Bed facility electronics were functionally partitioned into three units: a laboratory data acquisition system for structural parameter identification and performance verification; an experiment supervisory computer to oversee the experiment, monitor the environmental parameters and perform data logging; and a multilevel real-time control computing system. The design of the Test Bed electronics is presented along with hardware and software component descriptions. The system should break new ground in experimental control electronics and is of interest to anyone working in the verification of control concepts for large structures.
Data bank homology search algorithm with linear computation complexity.
Strelets, V B; Ptitsyn, A A; Milanesi, L; Lim, H A
1994-06-01
A new algorithm for data bank homology search is proposed. The principal advantages of the new algorithm are: (i) linear computation complexity; (ii) low memory requirements; and (iii) high sensitivity to the presence of local region homology. The algorithm first calculates indicative matrices of k-tuple 'realization' in the query sequence and then searches for an appropriate number of matching k-tuples within a narrow range in database sequences. It does not require k-tuple coordinates tabulation and in-memory placement for database sequences. The algorithm is implemented in a program for execution on PC-compatible computers and tested on PIR and GenBank databases with good results. A few modifications designed to improve the selectivity are also discussed. As an application example, the search for homology of the mouse homeotic protein HOX 3.1 is given.
A computer algorithm for automatic beam steering
Drennan, E.
1992-06-01
Beam steering is done by modifying the current in a trim or bending magnet. If the current change is the right amount the beam can be made to bend in such a manner that it will hit a swic or BPM downstream from the magnet at a predetermined set point. Although both bending magnets and trim magnets can be used to modify beam angle, beam steering is usually done with trim magnets. This is so because, during beam steering the beam angle is usually modified only by a small amount which can be easily achieved with a trim magnet. Thus in this note, all steering magnets will be assumed to be trim magnets. There are two ways of monitoring beam position. One way is done using a BPM and the other is done using a swic. For simplicity, beam position monitoring in this paper will be referred to being done with a swic. Beam steering can be done manually by changing the current through a trim magnet and monitoring the position of the beam downstream from the magnet with a swic. Alternatively the beam can be positioned automatically using a computer which periodically updates the current through a specific number of trim magnets. The purpose of this note is to describe the steps involved in coming up with such a computer program. There are two main aspects to automatic beam steering. First a relationship between the beam position and the bending magnet is needed. Secondly a beamline setup of swics and trim magnets has to be chosen that will position the beam according to the desired specifications. A simple example will be looked at that will show that once a mathematical relationship between the needed change of the beam position on a swic and the change in trim currents is established, a computer could be programmed to calculate and update the trim currents.
Gradient Learning Algorithms for Ontology Computing
Gao, Wei; Zhu, Linli
2014-01-01
The gradient learning model has been raising great attention in view of its promising perspectives for applications in statistics, data dimensionality reducing, and other specific fields. In this paper, we raise a new gradient learning model for ontology similarity measuring and ontology mapping in multidividing setting. The sample error in this setting is given by virtue of the hypothesis space and the trick of ontology dividing operator. Finally, two experiments presented on plant and humanoid robotics field verify the efficiency of the new computation model for ontology similarity measure and ontology mapping applications in multidividing setting. PMID:25530752
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2002-01-01
Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
NASA Astrophysics Data System (ADS)
Schempp, Walter
Metaplectic harmonic analysis is well matched with high resolution image processing. The metaplectic representation of the symplectic group and its twofold cover arises when the symplectic group is considered as a group of outer automorphisms of the irreducible linear representations of the Heisenberg two-step nilpotent Lie group. Starting with the Paley-Wiener theorem which forms the classical result for information-preserving sequential bandwidth compression, and its Stone-von Neumann-Segal analogue for the Heisenberg group which is at the basis of holographic reciprocity and coupling, the paper points out a unified metaplectic approach to signal geometry such as holographic image processing, coherent optical computing, and neural computer architecture for pattern recognition. Brief descriptions of hardware implementations are also included.
Adaptive kinetic-fluid solvers for heterogeneous computing architectures
NASA Astrophysics Data System (ADS)
Zabelok, Sergey; Arslanbekov, Robert; Kolobov, Vladimir
2015-12-01
We show feasibility and benefits of porting an adaptive multi-scale kinetic-fluid code to CPU-GPU systems. Challenges are due to the irregular data access for adaptive Cartesian mesh, vast difference of computational cost between kinetic and fluid cells, and desire to evenly load all CPUs and GPUs during grid adaptation and algorithm refinement. Our Unified Flow Solver (UFS) combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. Using GPUs enables hybrid simulations of mixed rarefied-continuum flows with a million of Boltzmann cells each having a 24 × 24 × 24 velocity mesh. We describe the implementation of CUDA kernels for three modules in UFS: the direct Boltzmann solver using the discrete velocity method (DVM), the Direct Simulation Monte Carlo (DSMC) solver, and a mesoscopic solver based on the Lattice Boltzmann Method (LBM), all using adaptive Cartesian mesh. Double digit speedups on single GPU and good scaling for multi-GPUs have been demonstrated.
Data Compression Algorithm Architecture for Large Depth-of-Field Particle Image Velocimeters
NASA Technical Reports Server (NTRS)
Bos, Brent; Memarsadeghi, Nargess; Kizhner, Semion; Antonille, Scott
2013-01-01
A large depth-of-field particle image velocimeter (PIV) is designed to characterize dynamic dust environments on planetary surfaces. This instrument detects lofted dust particles, and senses the number of particles per unit volume, measuring their sizes, velocities (both speed and direction), and shape factors when the particles are large. To measure these particle characteristics in-flight, the instrument gathers two-dimensional image data at a high frame rate, typically >4,000 Hz, generating large amounts of data for every second of operation, approximately 6 GB/s. To characterize a planetary dust environment that is dynamic, the instrument would have to operate for at least several minutes during an observation period, easily producing more than a terabyte of data per observation. Given current technology, this amount of data would be very difficult to store onboard a spacecraft, and downlink to Earth. Since 2007, innovators have been developing an autonomous image analysis algorithm architecture for the PIV instrument to greatly reduce the amount of data that it has to store and downlink. The algorithm analyzes PIV images and automatically reduces the image information down to only the particle measurement data that is of interest, reducing the amount of data that is handled by more than 10(exp 3). The state of development for this innovation is now fairly mature, with a functional algorithm architecture, along with several key pieces of algorithm logic, that has been proven through field test data acquired with a proof-of-concept PIV instrument.
LAWS simulation: Sampling strategies and wind computation algorithms
NASA Technical Reports Server (NTRS)
Emmitt, G. D. A.; Wood, S. A.; Houston, S. H.
1989-01-01
In general, work has continued on developing and evaluating algorithms designed to manage the Laser Atmospheric Wind Sounder (LAWS) lidar pulses and to compute the horizontal wind vectors from the line-of-sight (LOS) measurements. These efforts fall into three categories: Improvements to the shot management and multi-pair algorithms (SMA/MPA); observing system simulation experiments; and ground-based simulations of LAWS.
Parallel grid generation algorithm for distributed memory computers
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Moitra, Anutosh
1994-01-01
A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.
Multidomain solution algorithm for potential flow computations around complex configurations
NASA Astrophysics Data System (ADS)
Jacquotte, Olivier-Pierre; Godard, Jean-Luc
1994-04-01
A method is presented for the computation of irrotational transonic flows of perfect gas around a wide class of geometries. It is based on the construction of a multidomain structured grid and then on the solution of the full potential equation discretized with finite elements. The novelty of the paper is the combination of three embedded algorithms: a mixed fixed-point/Newton algorithm to treat the non-linearity, a multidomain conjugate gradient algorithm to handle the grid topology and another conjugate gradient algorithm in each of the structured domains. This method has made possible the calculations of flows around geometries that cannot be treated in a structured approach without the multidomain algorithm; an application of this method to the study of the wing-pylon-nacelle interactions is presented.
An Agent Inspired Reconfigurable Computing Implementation of a Genetic Algorithm
NASA Technical Reports Server (NTRS)
Weir, John M.; Wells, B. Earl
2003-01-01
Many software systems have been successfully implemented using an agent paradigm which employs a number of independent entities that communicate with one another to achieve a common goal. The distributed nature of such a paradigm makes it an excellent candidate for use in high speed reconfigurable computing hardware environments such as those present in modem FPGA's. In this paper, a distributed genetic algorithm that can be applied to the agent based reconfigurable hardware model is introduced. The effectiveness of this new algorithm is evaluated by comparing the quality of the solutions found by the new algorithm with those found by traditional genetic algorithms. The performance of a reconfigurable hardware implementation of the new algorithm on an FPGA is compared to traditional single processor implementations.
NASA Technical Reports Server (NTRS)
Eberhardt, D. S.; Baganoff, D.; Stevens, K.
1984-01-01
Implicit approximate-factored algorithms have certain properties that are suitable for parallel processing. A particular computational fluid dynamics (CFD) code, using this algorithm, is mapped onto a multiple-instruction/multiple-data-stream (MIMD) computer architecture. An explanation of this mapping procedure is presented, as well as some of the difficulties encountered when trying to run the code concurrently. Timing results are given for runs on the Ames Research Center's MIMD test facility which consists of two VAX 11/780's with a common MA780 multi-ported memory. Speedups exceeding 1.9 for characteristic CFD runs were indicated by the timing results.
A computational study of routing algorithms for realistic transportation networks
Jacob, R.; Marathe, M.V.; Nagel, K.
1998-12-01
The authors carry out an experimental analysis of a number of shortest path (routing) algorithms investigated in the context of the TRANSIMS (Transportation Analysis and Simulation System) project. The main focus of the paper is to study how various heuristic and exact solutions, associated data structures affected the computational performance of the software developed especially for realistic transportation networks. For this purpose the authors have used Dallas Fort-Worth road network with very high degree of resolution. The following general results are obtained: (1) they discuss and experimentally analyze various one-one shortest path algorithms, which include classical exact algorithms studied in the literature as well as heuristic solutions that are designed to take into account the geometric structure of the input instances; (2) they describe a number of extensions to the basic shortest path algorithm. These extensions were primarily motivated by practical problems arising in TRANSIMS and ITS (Intelligent Transportation Systems) related technologies. Extensions discussed include--(i) time dependent networks, (ii) multi-modal networks, (iii) networks with public transportation and associated schedules. Computational results are provided to empirically compare the efficiency of various algorithms. The studies indicate that a modified Dijkstra`s algorithm is computationally fast and an excellent candidate for use in various transportation planning applications as well as ITS related technologies.
A fast algorithm for sparse matrix computations related to inversion
Li, S.; Wu, W.; Darve, E.
2013-06-01
We have developed a fast algorithm for computing certain entries of the inverse of a sparse matrix. Such computations are critical to many applications, such as the calculation of non-equilibrium Green’s functions G{sup r} and G{sup <} for nano-devices. The FIND (Fast Inverse using Nested Dissection) algorithm is optimal in the big-O sense. However, in practice, FIND suffers from two problems due to the width-2 separators used by its partitioning scheme. One problem is the presence of a large constant factor in the computational cost of FIND. The other problem is that the partitioning scheme used by FIND is incompatible with most existing partitioning methods and libraries for nested dissection, which all use width-1 separators. Our new algorithm resolves these problems by thoroughly decomposing the computation process such that width-1 separators can be used, resulting in a significant speedup over FIND for realistic devices — up to twelve-fold in simulation. The new algorithm also has the added advantage that desired off-diagonal entries can be computed for free. Consequently, our algorithm is faster than the current state-of-the-art recursive methods for meshes of any size. Furthermore, the framework used in the analysis of our algorithm is the first attempt to explicitly apply the widely-used relationship between mesh nodes and matrix computations to the problem of multiple eliminations with reuse of intermediate results. This framework makes our algorithm easier to generalize, and also easier to compare against other methods related to elimination trees. Finally, our accuracy analysis shows that the algorithms that require back-substitution are subject to significant extra round-off errors, which become extremely large even for some well-conditioned matrices or matrices with only moderately large condition numbers. When compared to these back-substitution algorithms, our algorithm is generally a few orders of magnitude more accurate, and our produced round
Multipole Algorithms for Molecular Dynamics Simulation on High Performance Computers.
NASA Astrophysics Data System (ADS)
Elliott, William Dewey
1995-01-01
A fundamental problem in modeling large molecular systems with molecular dynamics (MD) simulations is the underlying N-body problem of computing the interactions between all pairs of N atoms. The simplest algorithm to compute pair-wise atomic interactions scales in runtime {cal O}(N^2), making it impractical for interesting biomolecular systems, which can contain millions of atoms. Recently, several algorithms have become available that solve the N-body problem by computing the effects of all pair-wise interactions while scaling in runtime less than {cal O}(N^2). One algorithm, which scales {cal O}(N) for a uniform distribution of particles, is called the Greengard-Rokhlin Fast Multipole Algorithm (FMA). This work describes an FMA-like algorithm called the Molecular Dynamics Multipole Algorithm (MDMA). The algorithm contains several features that are new to N-body algorithms. MDMA uses new, efficient series expansion equations to compute general 1/r^{n } potentials to arbitrary accuracy. In particular, the 1/r Coulomb potential and the 1/r^6 portion of the Lennard-Jones potential are implemented. The new equations are based on multivariate Taylor series expansions. In addition, MDMA uses a cell-to-cell interaction region of cells that is closely tied to worst case error bounds. The worst case error bounds for MDMA are derived in this work also. These bounds apply to other multipole algorithms as well. Several implementation enhancements are described which apply to MDMA as well as other N-body algorithms such as FMA and tree codes. The mathematics of the cell -to-cell interactions are converted to the Fourier domain for reduced operation count and faster computation. A relative indexing scheme was devised to locate cells in the interaction region which allows efficient pre-computation of redundant information and prestorage of much of the cell-to-cell interaction. Also, MDMA was integrated into the MD program SIgMA to demonstrate the performance of the program over
Integrating Computing Resources: A Shared Distributed Architecture for Academics and Administrators.
ERIC Educational Resources Information Center
Beltrametti, Monica; English, Will
1994-01-01
Development and implementation of a shared distributed computing architecture at the University of Alberta (Canada) are described. Aspects discussed include design of the architecture, users' views of the electronic environment, technical and managerial challenges, and the campuswide human infrastructures needed to manage such an integrated…
Toward a Fault Tolerant Architecture for Vital Medical-Based Wearable Computing.
Abdali-Mohammadi, Fardin; Bajalan, Vahid; Fathi, Abdolhossein
2015-12-01
Advancements in computers and electronic technologies have led to the emergence of a new generation of efficient small intelligent systems. The products of such technologies might include Smartphones and wearable devices, which have attracted the attention of medical applications. These products are used less in critical medical applications because of their resource constraint and failure sensitivity. This is due to the fact that without safety considerations, small-integrated hardware will endanger patients' lives. Therefore, proposing some principals is required to construct wearable systems in healthcare so that the existing concerns are dealt with. Accordingly, this paper proposes an architecture for constructing wearable systems in critical medical applications. The proposed architecture is a three-tier one, supporting data flow from body sensors to cloud. The tiers of this architecture include wearable computers, mobile computing, and mobile cloud computing. One of the features of this architecture is its high possible fault tolerance due to the nature of its components. Moreover, the required protocols are presented to coordinate the components of this architecture. Finally, the reliability of this architecture is assessed by simulating the architecture and its components, and other aspects of the proposed architecture are discussed.
Parallel matrix transpose algorithms on distributed memory concurrent computers
Choi, J.; Walker, D.W.; Dongarra, J.J. |
1993-10-01
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. It is assumed that the matrix is distributed over a P x Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor (GCD) of P and Q. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and Q are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C = A{center_dot}B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A{sup T}{center_dot}B{sup T}, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.
Schraad, Mark William; Luscher, Darby Jon
2016-09-06
Additive Manufacturing techniques are presenting the Department of Energy and the NNSA Laboratories with new opportunities to consider novel component production and repair processes, and to manufacture materials with tailored response and optimized performance characteristics. Additive Manufacturing technologies already are being applied to primary NNSA mission areas, including Nuclear Weapons. These mission areas are adapting to these new manufacturing methods, because of potential advantages, such as smaller manufacturing footprints, reduced needs for specialized tooling, an ability to embed sensing, novel part repair options, an ability to accommodate complex geometries, and lighter weight materials. To realize the full potential of Additive Manufacturing as a game-changing technology for the NNSA’s national security missions; however, significant progress must be made in several key technical areas. In addition to advances in engineering design, process optimization and automation, and accelerated feedstock design and manufacture, significant progress must be made in modeling and simulation. First and foremost, a more mature understanding of the process-structure-property-performance relationships must be developed. Because Additive Manufacturing processes change the nature of a material’s structure below the engineering scale, new models are required to predict materials response across the spectrum of relevant length scales, from the atomistic to the continuum. New diagnostics will be required to characterize materials response across these scales. And not just models, but advanced algorithms, next-generation codes, and advanced computer architectures will be required to complement the associated modeling activities. Based on preliminary work in each of these areas, a strong argument for the need for Exascale computing architectures can be made, if a legitimate predictive capability is to be developed.
An Implementation of MIL-STD-1750 Airborne Computer Instruction Set Architecture.
1981-05-01
printing I May 1981 AN IMPLEMENTATION OF MIL- STD - 1750 AIRBORNE COMPUTER INSTRUCTION SET ARCHITECTURE by S. J. Shrimpton r SUMMARY is Memorandum...describes the design of a processor implementing the Mil- Std -1750 Airborne Computer Instruction Set Architecture, using Advanced Micro Devices 2901 bit-slice...microprocessor devices. The aspects of the hard- ware design and microcode specific to Mil- Std -1750 are discussed and reviewed in the light of the
Computational Fluid Dynamics. [numerical methods and algorithm development
NASA Technical Reports Server (NTRS)
1992-01-01
This collection of papers was presented at the Computational Fluid Dynamics (CFD) Conference held at Ames Research Center in California on March 12 through 14, 1991. It is an overview of CFD activities at NASA Lewis Research Center. The main thrust of computational work at Lewis is aimed at propulsion systems. Specific issues related to propulsion CFD and associated modeling will also be presented. Examples of results obtained with the most recent algorithm development will also be presented.
Using advanced computer vision algorithms on small mobile robots
NASA Astrophysics Data System (ADS)
Kogut, G.; Birchmore, F.; Biagtan Pacis, E.; Everett, H. R.
2006-05-01
The Technology Transfer project employs a spiral development process to enhance the functionality and autonomy of mobile robot systems in the Joint Robotics Program (JRP) Robotic Systems Pool by converging existing component technologies onto a transition platform for optimization. An example of this approach is the implementation of advanced computer vision algorithms on small mobile robots. We demonstrate the implementation and testing of the following two algorithms useful on mobile robots: 1) object classification using a boosted Cascade of classifiers trained with the Adaboost training algorithm, and 2) human presence detection from a moving platform. Object classification is performed with an Adaboost training system developed at the University of California, San Diego (UCSD) Computer Vision Lab. This classification algorithm has been used to successfully detect the license plates of automobiles in motion in real-time. While working towards a solution to increase the robustness of this system to perform generic object recognition, this paper demonstrates an extension to this application by detecting soda cans in a cluttered indoor environment. The human presence detection from a moving platform system uses a data fusion algorithm which combines results from a scanning laser and a thermal imager. The system is able to detect the presence of humans while both the humans and the robot are moving simultaneously. In both systems, the two aforementioned algorithms were implemented on embedded hardware and optimized for use in real-time. Test results are shown for a variety of environments.
Plagiarism Detection Algorithm for Source Code in Computer Science Education
ERIC Educational Resources Information Center
Liu, Xin; Xu, Chan; Ouyang, Boyu
2015-01-01
Nowadays, computer programming is getting more necessary in the course of program design in college education. However, the trick of plagiarizing plus a little modification exists among some students' home works. It's not easy for teachers to judge if there's plagiarizing in source code or not. Traditional detection algorithms cannot fit this…
Splign: algorithms for computing spliced alignments with identification of paralogs
Kapustin, Yuri; Souvorov, Alexander; Tatusova, Tatiana; Lipman, David
2008-01-01
Background The computation of accurate alignments of cDNA sequences against a genome is at the foundation of modern genome annotation pipelines. Several factors such as presence of paralogs, small exons, non-consensus splice signals, sequencing errors and polymorphic sites pose recognized difficulties to existing spliced alignment algorithms. Results We describe a set of algorithms behind a tool called Splign for computing cDNA-to-Genome alignments. The algorithms include a high-performance preliminary alignment, a compartment identification based on a formally defined model of adjacent duplicated regions, and a refined sequence alignment. In a series of tests, Splign has produced more accurate results than other tools commonly used to compute spliced alignments, in a reasonable amount of time. Conclusion Splign's ability to deal with various issues complicating the spliced alignment problem makes it a helpful tool in eukaryotic genome annotation processes and alternative splicing studies. Its performance is enough to align the largest currently available pools of cDNA data such as the human EST set on a moderate-sized computing cluster in a matter of hours. The duplications identification (compartmentization) algorithm can be used independently in other areas such as the study of pseudogenes. Reviewers This article was reviewed by: Steven Salzberg, Arcady Mushegian and Andrey Mironov (nominated by Mikhail Gelfand). PMID:18495041
Computations and algorithms in physical and biological problems
NASA Astrophysics Data System (ADS)
Qin, Yu
This dissertation presents the applications of state-of-the-art computation techniques and data analysis algorithms in three physical and biological problems: assembling DNA pieces, optimizing self-assembly yield, and identifying correlations from large multivariate datasets. In the first topic, in-depth analysis of using Sequencing by Hybridization (SBH) to reconstruct target DNA sequences shows that a modified reconstruction algorithm can overcome the theoretical boundary without the need for different types of biochemical assays and is robust to error. In the second topic, consistent with theoretical predictions, simulations using Graphics Processing Unit (GPU) demonstrate how controlling the short-ranged interactions between particles and controlling the concentrations optimize the self-assembly yield of a desired structure, and nonequilibrium behavior when optimizing concentrations is also unveiled by leveraging the computation capacity of GPUs. In the last topic, a methodology to incorporate existing categorization information into the search process to efficiently reconstruct the optimal true correlation matrix for multivariate datasets is introduced. Simulations on both synthetic and real financial datasets show that the algorithm is able to detect signals below the Random Matrix Theory (RMT) threshold. These three problems are representatives of using massive computation techniques and data analysis algorithms to tackle optimization problems, and outperform theoretical boundary when incorporating prior information into the computation.
Parallelization of Nullspace Algorithm for the computation of metabolic pathways.
Jevremović, Dimitrije; Trinh, Cong T; Srienc, Friedrich; Sosa, Carlos P; Boley, Daniel
2011-06-01
Elementary mode analysis is a useful metabolic pathway analysis tool in understanding and analyzing cellular metabolism, since elementary modes can represent metabolic pathways with unique and minimal sets of enzyme-catalyzed reactions of a metabolic network under steady state conditions. However, computation of the elementary modes of a genome- scale metabolic network with 100-1000 reactions is very expensive and sometimes not feasible with the commonly used serial Nullspace Algorithm. In this work, we develop a distributed memory parallelization of the Nullspace Algorithm to handle efficiently the computation of the elementary modes of a large metabolic network. We give an implementation in C++ language with the support of MPI library functions for the parallel communication. Our proposed algorithm is accompanied with an analysis of the complexity and identification of major bottlenecks during computation of all possible pathways of a large metabolic network. The algorithm includes methods to achieve load balancing among the compute-nodes and specific communication patterns to reduce the communication overhead and improve efficiency.
ERIC Educational Resources Information Center
Amenyo, John-Thones
2012-01-01
Carefully engineered playable games can serve as vehicles for students and practitioners to learn and explore the programming of advanced computer architectures to execute applications, such as high performance computing (HPC) and complex, inter-networked, distributed systems. The article presents families of playable games that are grounded in…
Optimization of computer-generated binary holograms using genetic algorithms
NASA Astrophysics Data System (ADS)
Cojoc, Dan; Alexandrescu, Adrian
1999-11-01
The aim of this paper is to compare genetic algorithms against direct point oriented coding in the design of binary phase Fourier holograms, computer generated. These are used as fan-out elements for free space optical interconnection. Genetic algorithms are optimization methods which model the natural process of genetic evolution. The configuration of the hologram is encoded to form a chromosome. To start the optimization, a population of different chromosomes randomly generated is considered. The chromosomes compete, mate and mutate until the best chromosome is obtained according to a cost function. After explaining the operators that are used by genetic algorithms, this paper presents two examples with 32 X 32 genes in a chromosome. The crossover type and the number of mutations are shown to be important factors which influence the convergence of the algorithm. GA is demonstrated to be a useful tool to design namely binary phase holograms of complicate structures.
Survivable algorithms and redundancy management in NASA's distributed computing systems
NASA Technical Reports Server (NTRS)
Malek, Miroslaw
1992-01-01
The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given.
An efficient FPGA architecture for integer ƞth root computation
NASA Astrophysics Data System (ADS)
Rangel-Valdez, Nelson; Barron-Zambrano, Jose Hugo; Torres-Huitzil, Cesar; Torres-Jimenez, Jose
2015-10-01
In embedded computing, it is common to find applications such as signal processing, image processing, computer graphics or data compression that might benefit from hardware implementation for the computation of integer roots of order ?. However, the scientific literature lacks architectural designs that implement such operations for different values of N, using a low amount of resources. This article presents a parameterisable field programmable gate array (FPGA) architecture for an efficient Nth root calculator that uses only adders/subtractors and ? location memory elements. The architecture was tested for different values of ?, using 64-bit number representation. The results show a consumption up to 10% of the logical resources of a Xilinx XC6SLX45-CSG324C device, depending on the value of N. The hardware implementation improved the performance of its corresponding software implementations in one order of magnitude. The architecture performance varies from several thousands to seven millions of root operations per second.
Ehsan, Shoaib; Clark, Adrian F.; ur Rehman, Naveed; McDonald-Maier, Klaus D.
2015-01-01
The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems. PMID:26184211
Reliable ISR algorithms for a very-low-power approximate computer
NASA Astrophysics Data System (ADS)
Eaton, Ross S.; McBride, Jonah C.; Bates, Joseph
2013-05-01
The Office of Naval Research (ONR) is looking for methods to perform higher levels of sensor processing onboard UAVs to alleviate the need to transmit full motion video to ground stations over constrained data links. Charles River Analytics is particularly interested in performing intelligence, surveillance, and reconnaissance (ISR) tasks using UAV sensor feeds. Computing with approximate arithmetic can provide 10,000x improvement in size, weight, and power (SWAP) over desktop CPUs, thereby enabling ISR processing onboard small UAVs. Charles River and Singular Computing are teaming on an ONR program to develop these low-SWAP ISR capabilities using a small, low power, single chip machine, developed by Singular Computing, with many thousands of cores. Producing reliable results efficiently on massively parallel approximate machines requires adapting the core kernels of algorithms. We describe a feature-aided tracking algorithm adapted for the novel hardware architecture, which will be suitable for use onboard a UAV. Tests have shown the algorithm produces results equivalent to state-of-the-art traditional approaches while achieving a 6400x improvement in speed/power ratio.
Ehsan, Shoaib; Clark, Adrian F; Naveed ur Rehman; McDonald-Maier, Klaus D
2015-07-10
The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems.
Optical pattern recognition architecture implementing the mean-square error correlation algorithm
Molley, Perry A.
1991-01-01
An optical architecture implementing the mean-square error correlation algorithm, MSE=.SIGMA.[I-R].sup.2 for discriminating the presence of a reference image R in an input image scene I by computing the mean-square-error between a time-varying reference image signal s.sub.1 (t) and a time-varying input image signal s.sub.2 (t) includes a laser diode light source which is temporally modulated by a double-sideband suppressed-carrier source modulation signal I.sub.1 (t) having the form I.sub.1 (t)=A.sub.1 [1+.sqroot.2m.sub.1 s.sub.1 (t)cos (2.pi.f.sub.o t)] and the modulated light output from the laser diode source is diffracted by an acousto-optic deflector. The resultant intensity of the +1 diffracted order from the acousto-optic device is given by: I.sub.2 (t)=A.sub.2 [+2m.sub.2.sup.2 s.sub.2.sup.2 (t)-2.sqroot.2m.sub.2 (t) cos (2.pi.f.sub.o t] The time integration of the two signals I.sub.1 (t) and I.sub.2 (t) on the CCD deflector plane produces the result R(.tau.) of the mean-square error having the form: R(.tau.)=A.sub.1 A.sub.2 {[T]+[2m.sub.2.sup.2.multidot..intg.s.sub.2.sup.2 (t-.tau.)dt]-[2m.sub.1 m.sub.2 cos (2.tau.f.sub.o .tau.).multidot..intg.s.sub.1 (t)s.sub.2 (t-.tau.)dt]} where: s.sub.1 (t) is the signal input to the diode modulation source: s.sub.2 (t) is the signal input to the AOD modulation source; A.sub.1 is the light intensity; A.sub.2 is the diffraction efficiency; m.sub.1 and m.sub.2 are constants that determine the signal-to-bias ratio; f.sub.o is the frequency offset between the oscillator at f.sub.c and the modulation at f.sub.c +f.sub.o ; and a.sub.o and a.sub.1 are constant chosen to bias the diode source and the acousto-optic deflector into their respective linear operating regions so that the diode source exhibits a linear intensity characteristic and the AOD exhibits a linear amplitude characteristic.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation.
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
The algorithmic level is the bridge between computation and brain.
Love, Bradley C
2015-04-01
Every scientist chooses a preferred level of analysis and this choice shapes the research program, even determining what counts as evidence. This contribution revisits Marr's (1982) three levels of analysis (implementation, algorithmic, and computational) and evaluates the prospect of making progress at each individual level. After reviewing limitations of theorizing within a level, two strategies for integration across levels are considered. One is top-down in that it attempts to build a bridge from the computational to algorithmic level. Limitations of this approach include insufficient theoretical constraint at the computation level to provide a foundation for integration, and that people are suboptimal for reasons other than capacity limitations. Instead, an inside-out approach is forwarded in which all three levels of analysis are integrated via the algorithmic level. This approach maximally leverages mutual data constraints at all levels. For example, algorithmic models can be used to interpret brain imaging data, and brain imaging data can be used to select among competing models. Examples of this approach to integration are provided. This merging of levels raises questions about the relevance of Marr's tripartite view.
Reference Architecture for High Dependability On-Board Computers
NASA Astrophysics Data System (ADS)
Silva, Nuno; Esper, Alexandre; Zandin, Johan; Barbosa, Ricardo; Monteleone, Claudio
2014-08-01
The industrial process in the area of on-board computers is characterized by small production series of on- board computers (hardware and software) configuration items with little recurrence at unit or set level (e.g. computer equipment unit, set of interconnected redundant units). These small production series result into a reduced amount of statistical data related to dependability, which influence on the way on-board computers are specified, designed and verified. In the context of ESA harmonization policy for the deployment of enhanced and homogeneous industrial processes in the area of avionics embedded systems and on-board computers for the space industry, this study aimed at rationalizing the initiation phase of the development or procurement of on-board computers and at improving dependability assurance. This aim was achieved by establishing generic requirements for the procurement or development of on-board computers with a focus on well-defined reliability, availability, and maintainability requirements, as well as a generic methodology for planning, predicting and assessing the dependability of on- board computers hardware and software throughout their life cycle. It also provides guidelines for producing evidence material and arguments to support dependability assurance of on-board computers hardware and software throughout the complete lifecycle, including an assessment of feasibility aspects of the dependability assurance process and how the use of computer-aided environment can contribute to the on-board computer dependability assurance.
NASA Technical Reports Server (NTRS)
Matthews, Bryan L.; Srivastava, Ashok N.
2010-01-01
Prior to the launch of STS-119 NASA had completed a study of an issue in the flow control valve (FCV) in the Main Propulsion System of the Space Shuttle using an adaptive learning method known as Virtual Sensors. Virtual Sensors are a class of algorithms that estimate the value of a time series given other potentially nonlinearly correlated sensor readings. In the case presented here, the Virtual Sensors algorithm is based on an ensemble learning approach and takes sensor readings and control signals as input to estimate the pressure in a subsystem of the Main Propulsion System. Our results indicate that this method can detect faults in the FCV at the time when they occur. We use the standard deviation of the predictions of the ensemble as a measure of uncertainty in the estimate. This uncertainty estimate was crucial to understanding the nature and magnitude of transient characteristics during startup of the engine. This paper overviews the Virtual Sensors algorithm and discusses results on a comprehensive set of Shuttle missions and also discusses the architecture necessary for deploying such algorithms in a real-time, closed-loop system or a human-in-the-loop monitoring system. These results were presented at a Flight Readiness Review of the Space Shuttle in early 2009.
An Efficient Circulant MIMO Equalizer for CDMA Downlink: Algorithm and VLSI Architecture
NASA Astrophysics Data System (ADS)
Guo, Yuanbin; Zhang, Jianzhong(Charlie); McCain, Dennis; Cavallaro, Joseph R.
2006-12-01
We present an efficient circulant approximation-based MIMO equalizer architecture for the CDMA downlink. This reduces the direct matrix inverse (DMI) of size[InlineEquation not available: see fulltext.] with[InlineEquation not available: see fulltext.] complexity to some FFT operations with[InlineEquation not available: see fulltext.] complexity and the inverse of some[InlineEquation not available: see fulltext.] submatrices. We then propose parallel and pipelined VLSI architectures with Hermitian optimization and reduced-state FFT for further complexity optimization. Generic VLSI architectures are derived for the[InlineEquation not available: see fulltext.] high-order receiver from partitioned[InlineEquation not available: see fulltext.] submatrices. This leads to more parallel VLSI design with[InlineEquation not available: see fulltext.] further complexity reduction. Comparative study with both the conjugate-gradient and DMI algorithms shows very promising performance/complexity tradeoff. VLSI design space in terms of area/time efficiency is explored extensively for layered parallelism and pipelining with a Catapult C high-level-synthesis methodology.
A novel algorithm and its VLSI architecture for connected component labeling
NASA Astrophysics Data System (ADS)
Zhao, Hualong; Sang, Hongshi; Zhang, Tianxu
2011-11-01
A novel line-based streaming labeling algorithm with its VLSI architecture is proposed in this paper. Line-based neighborhood examination scheme is used for efficient local connected components extraction. A novel reversed rooted tree hook-up strategy, which is very suitable for hardware implementation, is applied on the mergence stage of equivalent connected components. The reversed rooted tree hook-up strategy significant reduces the requirement of on-chip memory, which makes the chip area smaller. Clock domains crossing FIFOs are also applied for connecting the label core and external memory interface, which makes the label engine working in a higher frequency and raises the throughput of the label engine. Several performance tests have been performed for our proposed hardware implementation. The processing bandwidth of our hardware architecture can reach the I/O transfer boundary according to the external interface clock in all the real image tests. Beside the advantage of reducing the processing time, our hardware implementation can support the image size as large as 4096*4096, which will be very appealing in remote sensing or any other high-resolution image applications. The implementation of proposed architecture is synthesized with SMIC 180nm standard cell library. The work frequency of the label engine reaches 200MHz.
Computational Discovery of Materials Using the Firefly Algorithm
NASA Astrophysics Data System (ADS)
Avendaño-Franco, Guillermo; Romero, Aldo
Our current ability to model physical phenomena accurately, the increase computational power and better algorithms are the driving forces behind the computational discovery and design of novel materials, allowing for virtual characterization before their realization in the laboratory. We present the implementation of a novel firefly algorithm, a population-based algorithm for global optimization for searching the structure/composition space. This novel computation-intensive approach naturally take advantage of concurrency, targeted exploration and still keeping enough diversity. We apply the new method in both periodic and non-periodic structures and we present the implementation challenges and solutions to improve efficiency. The implementation makes use of computational materials databases and network analysis to optimize the search and get insights about the geometric structure of local minima on the energy landscape. The method has been implemented in our software PyChemia, an open-source package for materials discovery. We acknowledge the support of DMREF-NSF 1434897 and the Donors of the American Chemical Society Petroleum Research Fund for partial support of this research under Contract 54075-ND10.
NASA Astrophysics Data System (ADS)
Venkateswaran, Vijay; Pivit, Florian; Guan, Lei
2016-07-01
Modern wireless communication networks, particularly cellular networks utilize multiple antennas to improve the capacity and signal coverage. In these systems, typically an active transceiver is connected to each antenna. However, this one-to-one mapping between transceivers and antennas will dramatically increase the cost and complexity of a large phased antenna array system. In this paper, firstly we propose a \\emph{partially adaptive} beamformer architecture where a reduced number of transceivers with a digital beamformer (DBF) is connected to an increased number of antennas through an RF beamforming network (RFBN). Then, based on the proposed architecture, we present a methodology to derive the minimum number of transceivers that are required for marco-cell and small-cell base stations, respectively. Subsequently, in order to achieve optimal beampatterns with given cellular standard requirements and RF operational constraints, we propose efficient algorithms to jointly design DBF and RFBN. Starting from the proposed algorithms, we specify generic microwave RFBNs for optimal marco-cell and small-cell networks. In order to verify the proposed approaches, we compare the performance of RFBN using simulations and anechoic chamber measurements. Experimental measurement results confirm the robustness and performance of the proposed hybrid DBF-RFBN concept eventually ensuring that theoretical multi-antenna capacity and coverage are achieved at a little incremental cost.
ERIC Educational Resources Information Center
Stanley, Timothy D.; Wong, Lap Kei; Prigmore, Daniel; Benson, Justin; Fishler, Nathan; Fife, Leslie; Colton, Don
2007-01-01
Students learn better when they both hear and do. In computer architecture courses "doing" can be difficult in small schools without hardware laboratories hosted by computer engineering, electrical engineering, or similar departments. Software solutions exist. Our success with George Mills' Multimedia Logic (MML) is the focus of this paper. MML…
ERIC Educational Resources Information Center
Nikolic, B.; Radivojevic, Z.; Djordjevic, J.; Milutinovic, V.
2009-01-01
Courses in Computer Architecture and Organization are regularly included in Computer Engineering curricula. These courses are usually organized in such a way that students obtain not only a purely theoretical experience, but also a practical understanding of the topics lectured. This practical work is usually done in a laboratory using simulators…
A Project-Based Learning Approach to Programmable Logic Design and Computer Architecture
ERIC Educational Resources Information Center
Kellett, C. M.
2012-01-01
This paper describes a course in programmable logic design and computer architecture as it is taught at the University of Newcastle, Australia. The course is designed around a major design project and has two supplemental assessment tasks that are also described. The context of the Computer Engineering degree program within which the course is…
An Efficient VLSI Architecture of the Enhanced Three Step Search Algorithm
NASA Astrophysics Data System (ADS)
Biswas, Baishik; Mukherjee, Rohan; Saha, Priyabrata; Chakrabarti, Indrajit
2016-09-01
The intense computational complexity of any video codec is largely due to the motion estimation unit. The Enhanced Three Step Search is a popular technique that can be adopted for fast motion estimation. This paper proposes a novel VLSI architecture for the implementation of the Enhanced Three Step Search Technique. A new addressing mechanism has been introduced which enhances the speed of operation and reduces the area requirements. The proposed architecture when implemented in Verilog HDL on Virtex-5 Technology and synthesized using Xilinx ISE Design Suite 14.1 achieves a critical path delay of 4.8 ns while the area comes out to be 2.9K gate equivalent. It can be incorporated in commercial devices like smart-phones, camcorders, video conferencing systems etc.
Architectural and Algorithmic Requirements for a Next-Generation System Analysis Code
V.A. Mousseau
2010-05-01
This document presents high-level architectural and system requirements for a next-generation system analysis code (NGSAC) to support reactor safety decision-making by plant operators and others, especially in the context of light water reactor plant life extension. The capabilities of NGSAC will be different from those of current-generation codes, not only because computers have evolved significantly in the generations since the current paradigm was first implemented, but because the decision-making processes that need the support of next-generation codes are very different from the decision-making processes that drove the licensing and design of the current fleet of commercial nuclear power reactors. The implications of these newer decision-making processes for NGSAC requirements are discussed, and resulting top-level goals for the NGSAC are formulated. From these goals, the general architectural and system requirements for the NGSAC are derived.
Hybrid architecture for encoded measurement-based quantum computation
Zwerger, M.; Briegel, H. J.; Dür, W.
2014-01-01
We present a hybrid scheme for quantum computation that combines the modular structure of elementary building blocks used in the circuit model with the advantages of a measurement-based approach to quantum computation. We show how to construct optimal resource states of minimal size to implement elementary building blocks for encoded quantum computation in a measurement-based way, including states for error correction and encoded gates. The performance of the scheme is determined by the quality of the resource states, where within the considered error model a threshold of the order of 10% local noise per particle for fault-tolerant quantum computation and quantum communication. PMID:24946906
Opportunities for X-ray Science in Future Computing Architectures
Foster, Ian
2011-02-09
The world of computing continues to evolve rapidly. In just the past 10 years, we have seen the emergence of petascale supercomputing, cloud computing that provides on-demand computing and storage with considerable economies of scale, software-as-a-service methods that permit outsourcing of complex processes, and grid computing that enables federation of resources across institutional boundaries. These trends show no sign of slowing down. The next 10 years will surely see exascale, new cloud offerings, and other terabit networks. This talk reviews various of these developments and discusses their potential implications for x-ray science and x-ray facilities.
An efficient algorithm for computing the crossovers in satellite altimetry
NASA Technical Reports Server (NTRS)
Tai, Chang-Kou
1988-01-01
An efficient algorithm has been devised to compute the crossovers in satellite altimetry. The significance of the crossovers is twofold. First, they are needed to perform the crossover adjustment to remove the orbit error. Secondly, they yield important insight into oceanic variability. Nevertheless, there is no published algorithm to make this very time consuming task easier, which is the goal of this report. The success of the algorithm is predicated on the ability to predict (by analytical means) the crossover coordinates to within 6 km and 1 sec of the true values. Hence, only one interpolation/extrapolation step on the data is needed to derive the crossover coordinates in contrast to the many interpolation/extrapolation operations usually needed to arrive at the same accuracy level if deprived of this information.
State-Estimation Algorithm Based on Computer Vision
NASA Technical Reports Server (NTRS)
Bayard, David; Brugarolas, Paul
2007-01-01
An algorithm and software to implement the algorithm are being developed as means to estimate the state (that is, the position and velocity) of an autonomous vehicle, relative to a visible nearby target object, to provide guidance for maneuvering the vehicle. In the original intended application, the autonomous vehicle would be a spacecraft and the nearby object would be a small astronomical body (typically, a comet or asteroid) to be explored by the spacecraft. The algorithm could also be used on Earth in analogous applications -- for example, for guiding underwater robots near such objects of interest as sunken ships, mineral deposits, or submerged mines. It is assumed that the robot would be equipped with a vision system that would include one or more electronic cameras, image-digitizing circuitry, and an imagedata- processing computer that would generate feature-recognition data products.
Computer algorithms in the search for unrelated stem cell donors.
Steiner, David
2012-01-01
Hematopoietic stem cell transplantation (HSCT) is a medical procedure in the field of hematology and oncology, most often performed for patients with certain cancers of the blood or bone marrow. A lot of patients have no suitable HLA-matched donor within their family, so physicians must activate a "donor search process" by interacting with national and international donor registries who will search their databases for adult unrelated donors or cord blood units (CBU). Information and communication technologies play a key role in the donor search process in donor registries both nationally and internationaly. One of the major challenges for donor registry computer systems is the development of a reliable search algorithm. This work discusses the top-down design of such algorithms and current practice. Based on our experience with systems used by several stem cell donor registries, we highlight typical pitfalls in the implementation of an algorithm and underlying data structure.
A learnable parallel processing architecture towards unity of memory and computing.
Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J
2015-08-14
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing
Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.
2015-01-01
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area. PMID:26271243
A learnable parallel processing architecture towards unity of memory and computing
NASA Astrophysics Data System (ADS)
Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.
2015-08-01
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Neuromorphic Computing: A Post-Moore's Law Complementary Architecture
Schuman, Catherine D; Birdwell, John Douglas; Dean, Mark; Plank, James; Rose, Garrett
2016-01-01
We describe our approach to post-Moore's law computing with three neuromorphic computing models that share a RISC philosophy, featuring simple components combined with a flexible and programmable structure. We envision these to be leveraged as co-processors, or as data filters to provide in situ data analysis in supercomputing environments.
NASA Technical Reports Server (NTRS)
Tamma, Kumar K.; Namburu, Raju R.
1990-01-01
A robust self-starting explicit architecture for computational structural dynamics is described. The proposed methodology involves expressing the governing equations of motion in conservation form and temporal discretization is accomplished in the spirit of the Lax-Wendroff type formulations. The development of the basic methodology is shown. Discretization in space is accomplished by introducing stress-based representations and employing the classical Galerkin scheme. Numerical test model results are presented which validate the architecture.
A Moving Target Environment for Computer Configurations Using Genetic Algorithms
Crouse, Michael; Fulp, Errin W.
2011-10-31
Moving Target (MT) environments for computer systems provide security through diversity by changing various system properties that are explicitly defined in the computer configuration. Temporal diversity can be achieved by making periodic configuration changes; however in an infrastructure of multiple similarly purposed computers diversity must also be spatial, ensuring multiple computers do not simultaneously share the same configuration and potential vulnerabilities. Given the number of possible changes and their potential interdependencies discovering computer configurations that are secure, functional, and diverse is challenging. This paper describes how a Genetic Algorithm (GA) can be employed to find temporally and spatially diverse secure computer configurations. In the proposed approach a computer configuration is modeled as a chromosome, where an individual configuration setting is a trait or allele. The GA operates by combining multiple chromosomes (configurations) which are tested for feasibility and ranked based on performance which will be measured as resistance to attack. The result of successive iterations of the GA are secure configurations that are diverse due to the crossover and mutation processes. Simulations results will demonstrate this approach can provide at MT environment for a large infrastructure of similarly purposed computers by discovering temporally and spatially diverse secure configurations.
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan
2006-01-01
Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.
Science-driven system architecture: A new process for leadership class computing
Simon, Horst; Kramer, William; Saphir, William; Shalf, John; Bailey, David; Oliker, Leonid; Banda, Michael; McCurdy, C. William; Hules, John; Canning, Andrew; Day, Marc; Colella, Philip; Serafini, David; Wehner, Michael; Nugent, Peter
2004-10-19
Over the past several years, computational scientists have observed a frustrating trend of stagnating application performance despite dramatic increases in peak performance of high performance computers. In 2002, researchers at Lawrence Berkeley National Laboratory, Argonne National Laboratory, and IBM proposed a new process to reverse this situation [1]. This strategy is based on new types of development partnerships with computer vendors based on the concept of science-driven computer system design. This strategy will engage applications scientists well before an architecture is available for commercialization. The process is already producing results, and has further potential for dramatically improving system efficiency. This paper documents the progress to date and the potential for future benefits. An example of this process is discussed, using IBM Power architecture with a computer architecture design that can lead to a sustained performance of 50 to 100 Tflo p/s on a broad spectrum of applications in 2006 for a reasonable cost. This partnership will establish a collaborative approach to modifying computer architecture to enable heretofore unrealized achievements in computer capability-limited fields such as nanoscience, combustion modeling, fusion, climate modeling, and astrophysics.
Using Advanced Computer Vision Algorithms on Small Mobile Robots
2006-04-20
Lab. This classification algorithm has been used to successfully detect the license plates of automobiles in motion in real-time. While working...use in real-time. Test results are shown for a variety of environments. KEYWORDS: robotics, computer vision, car /license plate detection, SIFT...when detecting the make and model of automobiles , SIFT can be used to achieve very high detection rates at the expense of a hefty performance cost when
The development and evaluation of numerical algorithms for MIMD computers
NASA Technical Reports Server (NTRS)
Voigt, Robert G.
1990-01-01
Two activities were pursued under this grant. The first was a visitor program to conduct research on numerical algorithms for MIMD computers. The program is summarized in the following attachments. Attachment A - List of Researchers Supported; Attachment B - List of Reports Completed; and Attachment C - Reports. The second activity was a workshop on the Control of fluid Dynamic Systems held on March 28 to 29, 1989. The workshop is summarized in attachments. Attachment D - Workshop Summary; and Attachment E - List of Workshop Participants.
Sort-Mid tasks scheduling algorithm in grid computing.
Reda, Naglaa M; Tawfik, A; Marzok, Mohamed A; Khamis, Soheir M
2015-11-01
Scheduling tasks on heterogeneous resources distributed over a grid computing system is an NP-complete problem. The main aim for several researchers is to develop variant scheduling algorithms for achieving optimality, and they have shown a good performance for tasks scheduling regarding resources selection. However, using of the full power of resources is still a challenge. In this paper, a new heuristic algorithm called Sort-Mid is proposed. It aims to maximizing the utilization and minimizing the makespan. The new strategy of Sort-Mid algorithm is to find appropriate resources. The base step is to get the average value via sorting list of completion time of each task. Then, the maximum average is obtained. Finally, the task has the maximum average is allocated to the machine that has the minimum completion time. The allocated task is deleted and then, these steps are repeated until all tasks are allocated. Experimental tests show that the proposed algorithm outperforms almost other algorithms in terms of resources utilization and makespan.
A Study of Alternative Computer Architectures for System Reliability and Software Simplification.
1981-04-22
proposal for a machine different from but similar to data flow machines which we have termed a program structured computer . * Number in brackets refer...programmed to run on a PDP-11/03 and is currently operational. 2.1.3 A Program Structured Computer [19] Another type of HHL architecture, a Data Flow...Program Structured Computer , M.S. thesis, The Ohio State University, August, 1980. 20. Chugh, R., The Design of Combinatorial Networks Testable by a Small
Arranging computer architectures to create higher-performance controllers
NASA Technical Reports Server (NTRS)
Jacklin, Stephen A.
1988-01-01
Techniques for integrating microprocessors, array processors, and other intelligent devices in control systems are reviewed, with an emphasis on the (re)arrangement of components to form distributed or parallel processing systems. Consideration is given to the selection of the host microprocessor, increasing the power and/or memory capacity of the host, multitasking software for the host, array processors to reduce computation time, the allocation of real-time and non-real-time events to different computer subsystems, intelligent devices to share the computational burden for real-time events, and intelligent interfaces to increase communication speeds. The case of a helicopter vibration-suppression and stabilization controller is analyzed as an example, and significant improvements in computation and throughput rates are demonstrated.
Computer-Aided Design of Organic Host Architectures for Selective Chemosensors
Hay, Benjamin; Bryantsev, Vyacheslav S.
2009-01-01
Selective organic hosts provide the foundation for the development of many types of sensors. The deliberate design of host molecules with predetermined selectivity, however, remains a challenge in supramolecular chemistry. To address this issue we have developed a de novo structure-based design approach for the unbiased construction of complementary host architectures. This chapter summarizes recent progress including improvements on a computer software program, HostDesigner, specifically tailored to discover host architectures for small guest molecules. HostDesigner is capable of generating and evaluating millions of candidate structures in minutes on a desktop personal computer, allowing a user to rapidly identify three-dimensional architectures that are structurally organized for binding a targeted guest species. The efficacy of this computational methodology is illustrated with a search for cation hosts containing aliphatic ether oxygen groups and anion hosts containing urea groups.
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-01-01
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance. PMID:28338069
NASA Astrophysics Data System (ADS)
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-03-01
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
2014-05-01
giovanni.lapenta@wis.kuleuven.be Contents 1 Introduction 1 2 Electrostatic Explicit PIC Algorithm 2 3 Overview of the Test Architectures 3 3.1 Sandy Bridge...Acknowledgments 8 References 8 1. Introduction Simulations of physical plasma systems are quite challenging because they require extensive use of computing...fusion, space and astrophysical plasmas, but still the general picture can be presented quite well with the fluid approach [6, 7]. The microscopic
Direct Fourier Inversion Reconstruction Algorithm for Computed Laminography.
Voropaev, Alexey; Myagotin, Anton; Helfen, Lukas; Baumbach, Tilo
2016-05-01
Synchrotron radiation computed laminography (CL) was developed to complement the conventional computed tomography as a non-destructive 3D imaging method for the inspection of flat thin objects. Recent progress in hardware at synchrotron sources allows one to record internal evolution of specimens at the micrometer scale and sub-second range but also requires increased reconstruction speed to follow structural changes online. A 3D image of the sample interior is usually reconstructed by the well-established filtered backprojection (FBP) approach. Despite of a great success in the reduction of reconstruction time via parallel computations, the FBP algorithm still remains a time-consuming procedure. A promising way to significantly shorten computation time is to directly perform backprojection in frequency domain (a direct Fourier inversion approach). The corresponding algorithms are rarely considered in the literature because of a poor performance or inferior reconstruction quality resulted from inaccurate interpolation in Fourier domain. In this paper, we derive a Fourier-based reconstruction equation designed for the CL scanning geometry. Furthermore, we outline the translation of the continuous solution to a discrete version, which utilizes 3D sinc interpolation. A projection resampling technique allowing for the reduction of the expensive interpolation to its 1D version is proposed. A series of numerical experiments confirms that the resulting image quality is well comparable with the FBP approach while reconstruction time is drastically reduced.
An Efficient Cloud Computing-Based Architecture for Freight System Application in China Railway
NASA Astrophysics Data System (ADS)
Zhang, Baopeng; Zhang, Ning; Li, Honghui; Liu, Feng; Miao, Kai
Cloud computing is a new network computing paradigm of distributed application environment. It utilizes the computing resource and storage resource to dynamically provide on-demand service for users. The distribution and parallel characters of cloud computing can leverage the railway freight system. We implement a cloud computing-based architecture for freight system application, which explores the Tashi and Hadoop for virtual resource management and MapReduce-based search technology. We propose the semantic model and setup configuration parameter by experiment, and develop the prototype system for freight search and tracking.
An efficient parallel algorithm for accelerating computational protein design
Zhou, Yichao; Xu, Wei; Donald, Bruce R.; Zeng, Jianyang
2014-01-01
Motivation: Structure-based computational protein design (SCPR) is an important topic in protein engineering. Under the assumption of a rigid backbone and a finite set of discrete conformations of side-chains, various methods have been proposed to address this problem. A popular method is to combine the dead-end elimination (DEE) and A* tree search algorithms, which provably finds the global minimum energy conformation (GMEC) solution. Results: In this article, we improve the efficiency of computing A* heuristic functions for protein design and propose a variant of A* algorithm in which the search process can be performed on a single GPU in a massively parallel fashion. In addition, we make some efforts to address the memory exceeding problem in A* search. As a result, our enhancements can achieve a significant speedup of the A*-based protein design algorithm by four orders of magnitude on large-scale test data through pre-computation and parallelization, while still maintaining an acceptable memory overhead. We also show that our parallel A* search algorithm could be successfully combined with iMinDEE, a state-of-the-art DEE criterion, for rotamer pruning to further improve SCPR with the consideration of continuous side-chain flexibility. Availability: Our software is available and distributed open-source under the GNU Lesser General License Version 2.1 (GNU, February 1999). The source code can be downloaded from http://www.cs.duke.edu/donaldlab/osprey.php or http://iiis.tsinghua.edu.cn/∼compbio/software.html. Contact: zengjy321@tsinghua.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24931991
Final Report: Super Instruction Architecture for Scalable Parallel Computations
Sanders, Beverly Ann; Bartlett, Rodney; Deumens, Erik
2013-12-23
The most advanced methods for reliable and accurate computation of the electronic structure of molecular and nano systems are the coupled-cluster techniques. These high-accuracy methods help us to understand, for example, how biological enzymes operate and contribute to the design of new organic explosives. The ACES III software provides a modern, high-performance implementation of these methods optimized for high performance parallel computer systems, ranging from small clusters typical in individual research groups, through larger clusters available in campus and regional computer centers, all the way to high-end petascale systems at national labs, including exploiting GPUs if available. This project enhanced the ACESIII software package and used it to study interesting scientific problems.
A Survey of Architectural Techniques for Near-Threshold Computing
Mittal, Sparsh
2015-12-28
Energy efficiency has now become the primary obstacle in scaling the performance of all classes of computing systems. In low-voltage computing and specifically, near-threshold voltage computing (NTC), which involves operating the transistor very close to and yet above its threshold voltage, holds the promise of providing many-fold improvement in energy efficiency. However, use of NTC also presents several challenges such as increased parametric variation, failure rate and performance loss etc. Our paper surveys several re- cent techniques which aim to offset these challenges for fully leveraging the potential of NTC. By classifying these techniques along several dimensions, we also highlightmore » their similarities and differences. Ultimately, we hope that this paper will provide insights into state-of-art NTC techniques to researchers and system-designers and inspire further research in this field.« less
A Survey of Architectural Techniques for Near-Threshold Computing
Mittal, Sparsh
2015-12-28
Energy efficiency has now become the primary obstacle in scaling the performance of all classes of computing systems. In low-voltage computing and specifically, near-threshold voltage computing (NTC), which involves operating the transistor very close to and yet above its threshold voltage, holds the promise of providing many-fold improvement in energy efficiency. However, use of NTC also presents several challenges such as increased parametric variation, failure rate and performance loss etc. Our paper surveys several re- cent techniques which aim to offset these challenges for fully leveraging the potential of NTC. By classifying these techniques along several dimensions, we also highlight their similarities and differences. Ultimately, we hope that this paper will provide insights into state-of-art NTC techniques to researchers and system-designers and inspire further research in this field.
Suggested architecture for a specialized fluid dynamics computer
NASA Technical Reports Server (NTRS)
Fornberg, B.
1978-01-01
Future flow simulations in 3-D will require computers with extremely large main memories and an advantageous ratio between computer cost and arithmetic speed. Since random access memories are very expensive, a pipeline design is proposed which allows the use of much cheaper sequential devices without any sacrifice in speed for vector references (even with arbitrary spacing between successive elements). Also scalar arithmetic can be performed efficiently. The comparatively low speed of the proposed machine (about 10 to the 7th power operations per second) would be offset by a very low price per unit, making mass production possible.
A language comparison for scientific computing on MIMD architectures
NASA Technical Reports Server (NTRS)
Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.
1989-01-01
Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.
Computer aided lung cancer diagnosis with deep learning algorithms
NASA Astrophysics Data System (ADS)
Sun, Wenqing; Zheng, Bin; Qian, Wei
2016-03-01
Deep learning is considered as a popular and powerful method in pattern recognition and classification. However, there are not many deep structured applications used in medical imaging diagnosis area, because large dataset is not always available for medical images. In this study we tested the feasibility of using deep learning algorithms for lung cancer diagnosis with the cases from Lung Image Database Consortium (LIDC) database. The nodules on each computed tomography (CT) slice were segmented according to marks provided by the radiologists. After down sampling and rotating we acquired 174412 samples with 52 by 52 pixel each and the corresponding truth files. Three deep learning algorithms were designed and implemented, including Convolutional Neural Network (CNN), Deep Belief Networks (DBNs), Stacked Denoising Autoencoder (SDAE). To compare the performance of deep learning algorithms with traditional computer aided diagnosis (CADx) system, we designed a scheme with 28 image features and support vector machine. The accuracies of CNN, DBNs, and SDAE are 0.7976, 0.8119, and 0.7929, respectively; the accuracy of our designed traditional CADx is 0.7940, which is slightly lower than CNN and DBNs. We also noticed that the mislabeled nodules using DBNs are 4% larger than using traditional CADx, this might be resulting from down sampling process lost some size information of the nodules.
Impact of Cognitive Architectures on Human-Computer Interaction
2014-09-01
sciences of linear programming, engineering , and parsing have relegated the soft sciences into the background. I have seen this in software... engineering , where the hard functional requirements push the soft nonfunctional requirements into the background. Our terminology, functional versus...human-computer interaction (HCI), it must harden. Their vision is for psychology to provide engineering style theory that influences the design of
Computer architectures demand languages that deal with time
Basset, S.
1984-05-01
The author discusses the effect of time on computer operations and the need for programming languages that allow this to be taken into account. Some problems require languages structured to cope with time independent, static relationships using artificial intelligence; others require languages structured to cope with time dependencies.
An Architectural Design System Based on Computer Graphics.
ERIC Educational Resources Information Center
MacDonald, Stephen L.; Wehrli, Robert
The recent developments in computer hardware and software are presented to inform architects of this design tool. Technical advancements in equipment include--(1) cathode ray tube displays, (2) light pens, (3) print-out and photo copying attachments, (4) controls for comparison and selection of images, (5) chording keyboards, (6) plotters, and (7)…
A Low Cost Microcomputer Laboratory for Investigating Computer Architecture.
ERIC Educational Resources Information Center
Mitchell, Eugene E., Ed.
1980-01-01
Described is a microcomputer laboratory at the United States Military Academy at West Point, New York, which provides easy access to non-volatile memory and a single input/output file system for 16 microcomputer laboratory positions. A microcomputer network that has a centralized data base is implemented using the concepts of computer network…
CSP: A Multifaceted Hybrid Architecture for Space Computing
NASA Technical Reports Server (NTRS)
Rudolph, Dylan; Wilson, Christopher; Stewart, Jacob; Gauvin, Patrick; George, Alan; Lam, Herman; Crum, Gary Alex; Wirthlin, Mike; Wilson, Alex; Stoddard, Aaron
2014-01-01
Research on the CHREC Space Processor (CSP) takes a multifaceted hybrid approach to embedded space computing. Working closely with the NASA Goddard SpaceCube team, researchers at the National Science Foundation (NSF) Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida and Brigham Young University are developing hybrid space computers that feature an innovative combination of three technologies: commercial-off-the-shelf (COTS) devices, radiation-hardened (RadHard) devices, and fault-tolerant computing. Modern COTS processors provide the utmost in performance and energy-efficiency but are susceptible to ionizing radiation in space, whereas RadHard processors are virtually immune to this radiation but are more expensive, larger, less energy-efficient, and generations behind in speed and functionality. By featuring COTS devices to perform the critical data processing, supported by simpler RadHard devices that monitor and manage the COTS devices, and augmented with novel uses of fault-tolerant hardware, software, information, and networking within and between COTS devices, the resulting system can maximize performance and reliability while minimizing energy consumption and cost. NASA Goddard has adopted the CSP concept and technology with plans underway to feature flight-ready CSP boards on two upcoming space missions.
Region-Oriented Placement Algorithm for Coarse-Grained Power-Gating FPGA Architecture
NASA Astrophysics Data System (ADS)
Li, Ce; Dong, Yiping; Watanabe, Takahiro
An FPGA plays an essential role in industrial products due to its fast, stable and flexible features. But the power consumption of FPGAs used in portable devices is one of critical issues. Top-down hierarchical design method is commonly used in both ASIC and FPGA design. But, in the case where plural modules are integrated in an FPGA and some of them might be in sleep-mode, current FPGA architecture cannot be fully effective. In this paper, coarse-grained power gating FPGA architecture is proposed where a whole area of an FPGA is partitioned into several regions and power supply is controlled for each region, so that modules in sleep mode can be effectively power-off. We also propose a region oriented FPGA placement algorithm fitted to this user's hierarchical design based on VPR[1]. Simulation results show that this proposed method could reduce power consumption of FPGA by 38% on average by setting unused modules or regions in sleep mode.
Real-Time Cognitive Computing Architecture for Data Fusion in a Dynamic Environment
NASA Technical Reports Server (NTRS)
Duong, Tuan A.; Duong, Vu A.
2012-01-01
A novel cognitive computing architecture is conceptualized for processing multiple channels of multi-modal sensory data streams simultaneously, and fusing the information in real time to generate intelligent reaction sequences. This unique architecture is capable of assimilating parallel data streams that could be analog, digital, synchronous/asynchronous, and could be programmed to act as a knowledge synthesizer and/or an "intelligent perception" processor. In this architecture, the bio-inspired models of visual pathway and olfactory receptor processing are combined as processing components, to achieve the composite function of "searching for a source of food while avoiding the predator." The architecture is particularly suited for scene analysis from visual data and odorant.
Decoupled Computer Architectures for Very High-Speed Technologies
1993-10-01
Postgraduate School),. and two have formed a private computer company. The other groups within our project have also generated their share of PhDs. A...signals applied at the test pins of a chip and generates the clocks and control signals for internal test circuits. " Instruction Register. A two (or more...all the chips for analysis and debugging. We have designed the JTAG control program to be as general as possible. It has a mod- ular structure
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
How computer science can help in understanding the 3D genome architecture.
Shavit, Yoli; Merelli, Ivan; Milanesi, Luciano; Lio', Pietro
2016-09-01
Chromosome conformation capture techniques are producing a huge amount of data about the architecture of our genome. These data can provide us with a better understanding of the events that induce critical regulations of the cellular function from small changes in the three-dimensional genome architecture. Generating a unified view of spatial, temporal, genetic and epigenetic properties poses various challenges of data analysis, visualization, integration and mining, as well as of high performance computing and big data management. Here, we describe the critical issues of this new branch of bioinformatics, oriented at the comprehension of the three-dimensional genome architecture, which we call 'Nucleome Bioinformatics', looking beyond the currently available tools and methods, and highlight yet unaddressed challenges and the potential approaches that could be applied for tackling them. Our review provides a map for researchers interested in using computer science for studying 'Nucleome Bioinformatics', to achieve a better understanding of the biological processes that occur inside the nucleus.
Fixed-point image orthorectification algorithms for reduced computational cost
NASA Astrophysics Data System (ADS)
French, Joseph Clinton
Imaging systems have been applied to many new applications in recent years. With the advent of low-cost, low-power focal planes and more powerful, lower cost computers, remote sensing applications have become more wide spread. Many of these applications require some form of geolocation, especially when relative distances are desired. However, when greater global positional accuracy is needed, orthorectification becomes necessary. Orthorectification is the process of projecting an image onto a Digital Elevation Map (DEM), which removes terrain distortions and corrects the perspective distortion by changing the viewing angle to be perpendicular to the projection plane. Orthorectification is used in disaster tracking, landscape management, wildlife monitoring and many other applications. However, orthorectification is a computationally expensive process due to floating point operations and divisions in the algorithm. To reduce the computational cost of on-board processing, two novel algorithm modifications are proposed. One modification is projection utilizing fixed-point arithmetic. Fixed point arithmetic removes the floating point operations and reduces the processing time by operating only on integers. The second modification is replacement of the division inherent in projection with a multiplication of the inverse. The inverse must operate iteratively. Therefore, the inverse is replaced with a linear approximation. As a result of these modifications, the processing time of projection is reduced by a factor of 1.3x with an average pixel position error of 0.2% of a pixel size for 128-bit integer processing and over 4x with an average pixel position error of less than 13% of a pixel size for a 64-bit integer processing. A secondary inverse function approximation is also developed that replaces the linear approximation with a quadratic. The quadratic approximation produces a more accurate approximation of the inverse, allowing for an integer multiplication calculation
NASA Astrophysics Data System (ADS)
Huck, F. O.; Davis, R. E.; Fales, C. L.; Aherron, R. M.
1982-11-01
A computational model of the deterministic and stochastic processes involved in remote sensing is used to study spectral feature identification techniques for real-time onboard processing of data acquired with advanced Earth-resources sensors. Preliminary results indicate that: Narrow spectral responses are advantageous; signal normalization improves mean-square distance (MDS) classification accuracy but tends to degrade maximum-likelihood (MLH) classification accuracy; and MSD classification of normalized signals performs better than the computationally more complex MLH classification when imaging conditions change appreciably from those conditions during which reference data were acquired. The results also indicate that autonomous categorization of TM signals into vegetation, bare land, water, snow and clouds can be accomplished with adequate reliability for many applications over a reasonably wide range of imaging conditions. However, further analysis is required to develop computationally efficient boundary approximation algorithms for such categorization.
NASA Technical Reports Server (NTRS)
Huck, F. O.; Davis, R. E.; Fales, C. L.; Aherron, R. M.
1982-01-01
A computational model of the deterministic and stochastic processes involved in remote sensing is used to study spectral feature identification techniques for real-time onboard processing of data acquired with advanced earth-resources sensors. Preliminary results indicate that: Narrow spectral responses are advantageous; signal normalization improves mean-square distance (MSD) classification accuracy but tends to degrade maximum-likelihood (MLH) classification accuracy; and MSD classification of normalized signals performs better than the computationally more complex MLH classification when imaging conditions change appreciably from those conditions during which reference data were acquired. The results also indicate that autonomous categorization of TM signals into vegetation, bare land, water, snow and clouds can be accomplished with adequate reliability for many applications over a reasonably wide range of imaging conditions. However, further analysis is required to develop computationally efficient boundary approximation algorithms for such categorization.
Not Available
1994-09-01
This third volume of the Information Management Architecture for an Integrated Computing Environment for the Environmental Restoration Program--the Interim Technical Architecture (TA) (referred to throughout the remainder of this document as the ER TA)--represents a key milestone in establishing a coordinated information management environment in which information initiatives can be pursued with the confidence that redundancy and inconsistencies will be held to a minimum. This architecture is intended to be used as a reference by anyone whose responsibilities include the acquisition or development of information technology for use by the ER Program. The interim ER TA provides technical guidance at three levels. At the highest level, the technical architecture provides an overall computing philosophy or direction. At this level, the guidance does not address specific technologies or products but addresses more general concepts, such as the use of open systems, modular architectures, graphical user interfaces, and architecture-based development. At the next level, the technical architecture provides specific information technology recommendations regarding a wide variety of specific technologies. These technologies include computing hardware, operating systems, communications software, database management software, application development software, and personal productivity software, among others. These recommendations range from the adoption of specific industry or Martin Marietta Energy Systems, Inc. (Energy Systems) standards to the specification of individual products. At the third level, the architecture provides guidance regarding implementation strategies for the recommended technologies that can be applied to individual projects and to the ER Program as a whole.
Efficient quantum algorithm for computing n-time correlation functions.
Pedernales, J S; Di Candia, R; Egusquiza, I L; Casanova, J; Solano, E
2014-07-11
We propose a method for computing n-time correlation functions of arbitrary spinorial, fermionic, and bosonic operators, consisting of an efficient quantum algorithm that encodes these correlations in an initially added ancillary qubit for probe and control tasks. For spinorial and fermionic systems, the reconstruction of arbitrary n-time correlation functions requires the measurement of two ancilla observables, while for bosonic variables time derivatives of the same observables are needed. Finally, we provide examples applicable to different quantum platforms in the frame of the linear response theory.
Usage of Thin-Client/Server Architecture in Computer Aided Education
ERIC Educational Resources Information Center
Cimen, Caghan; Kavurucu, Yusuf; Aydin, Halit
2014-01-01
With the advances of technology, thin-client/server architecture has become popular in multi-user/single network environments. Thin-client is a user terminal in which the user can login to a domain and run programs by connecting to a remote server. Recent developments in network and hardware technologies (cloud computing, virtualization, etc.)…
ERIC Educational Resources Information Center
Hung, Y.-C.
2012-01-01
This paper investigates the impact of combining self explaining (SE) with computer architecture diagrams to help novice students learn assembly language programming. Pre- and post-test scores for the experimental and control groups were compared and subjected to covariance (ANCOVA) statistical analysis. Results indicate that the SE-plus-diagram…
Gangadari, Bhoopal Rao; Ahamed, Shaik Rafi
2016-12-01
In this paper, we presented a novel approach of low energy consumption architecture of S-Box used in Advanced Encryption Standard (AES) algorithm using programmable second order reversible cellular automata (RCA (2)). The architecture entails a low power implementation with minimal delay overhead and the performance of proposed RCA (2) based S-Box in terms of security is evaluated using the cryptographic properties such as nonlinearity, correlation immunity bias, strict avalanche criteria, entropy and also found that the proposed architecture is secure enough for cryptographic applications. Moreover, the proposed AES algorithm architecture simulation studies show that energy consumption of 68.726 nJ, power dissipation of 3.856 mW for 0.18- μm at 13.69 MHz and energy consumption of 29.408 nJ, power dissipation of 1.65 mW for 0.13- μm at 13.69 MHz. The proposed AES algorithm with RCA (2) based S-Box shows a reduction power consumption by 50 % and energy consumption by 5 % compared to best classical S-Box and composite field arithmetic based AES algorithm. Apart from that, it is also shown that RCA (2) based S-Boxes are dynamic in nature, invertible, low power dissipation compared to that of LUT based S-Box and hence suitable for Wireless Body Area Network (WBAN) applications.
Towards automatic Markov reliability modeling of computer architectures
NASA Technical Reports Server (NTRS)
Liceaga, C. A.; Siewiorek, D. P.
1986-01-01
The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation.
Newmark local time stepping on high-performance computing architectures
NASA Astrophysics Data System (ADS)
Rietmann, Max; Grote, Marcus; Peter, Daniel; Schenk, Olaf
2017-04-01
In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100x). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.
Tsai, Ming-Chi; Tsui, Fu-Chiang; Wagner, Michael M
2007-10-11
Performing fast data analysis to detect disease outbreaks plays a critical role in real-time biosurveillance. In this paper, we described and evaluated an Algorithm Distribution Manager Service (ADMS) based on grid technologies, which dynamically partition and distribute detection algorithms across multiple computers. We compared the execution time to perform the analysis on a single computer and on a grid network (3 computing nodes) with and without using dynamic algorithm distribution. We found that algorithms with long runtime completed approximately three times earlier in distributed environment than in a single computer while short runtime algorithms performed worse in distributed environment. A dynamic algorithm distribution approach also performed better than static algorithm distribution approach. This pilot study shows a great potential to reduce lengthy analysis time through dynamic algorithm partitioning and parallel processing, and provides the opportunity of distributing algorithms from a client to remote computers in a grid network.
Novel photonic bandgap based architectures for quantum computers and networks
NASA Astrophysics Data System (ADS)
Guney, Durdu
All of the approaches for quantum information processing have their own advantages, but unfortunately also their own drawbacks. Ideally, one would merge the most attractive features of those different approaches in a single technology. We envision that large-scale photonic crystal (PC) integrated circuits and fibers could be the basis for robust and compact quantum circuits and processors of the next generation quantum computers and networking devices. Cavity QED, solid-state, and (non)linear optical models for computing, and optical fiber approach for communications are the most promising candidates to be improved through this novel technology. In our work, we consider both digital and analog quantum computing. In the digital domain, we first perform gate-level analysis. To achieve this task, we solve the Jaynes-Cummings Hamiltonian with time-dependent coupling parameters under the dipole and rotating-wave approximations for a 3D PC single-mode cavity with a sufficiently high Q-factor. We then exploit the results to show how to create a maximally entangled state of two atoms and how to implement several quantum logic gates: a dual-rail Hadamard gate, a dual-rail NOT gate, and a SWAP gate. In all of these operations, we synchronize atoms, as opposed to previous studies with PCs. The method has the potential for extension to N-atom entanglement, universal quantum logic operations, and the implementation of other useful, cavity QED-based quantum information processing tasks. In the next part of the digital domain, we study circuit-level implementations. We design and simulate an integrated teleportation and readout circuit on a single PC chip. The readout part of our device can not only be used on its own but can also be integrated with other compatible optical circuits to achieve atomic state detection. Further improvement of the device in terms of compactness and robustness is possible by integrating with sources and detectors in the optical regime. In the analog
Efficient computer algebra algorithms for polynomial matrices in control design
NASA Technical Reports Server (NTRS)
Baras, J. S.; Macenany, D. C.; Munach, R.
1989-01-01
The theory of polynomial matrices plays a key role in the design and analysis of multi-input multi-output control and communications systems using frequency domain methods. Examples include coprime factorizations of transfer functions, cannonical realizations from matrix fraction descriptions, and the transfer function design of feedback compensators. Typically, such problems abstract in a natural way to the need to solve systems of Diophantine equations or systems of linear equations over polynomials. These and other problems involving polynomial matrices can in turn be reduced to polynomial matrix triangularization procedures, a result which is not surprising given the importance of matrix triangularization techniques in numerical linear algebra. Matrices with entries from a field and Gaussian elimination play a fundamental role in understanding the triangularization process. In the case of polynomial matrices, matrices with entries from a ring for which Gaussian elimination is not defined and triangularization is accomplished by what is quite properly called Euclidean elimination. Unfortunately, the numerical stability and sensitivity issues which accompany floating point approaches to Euclidean elimination are not very well understood. New algorithms are presented which circumvent entirely such numerical issues through the use of exact, symbolic methods in computer algebra. The use of such error-free algorithms guarantees that the results are accurate to within the precision of the model data--the best that can be hoped for. Care must be taken in the design of such algorithms due to the phenomenon of intermediate expressions swell.
Using animation to help students learn computer algorithms.
Catrambone, Richard; Seay, A Fleming
2002-01-01
This paper compares the effects of graphical study aids and animation on the problem-solving performance of students learning computer algorithms. Prior research has found inconsistent effects of animation on learning, and we believe this is partly attributable to animations not being designed to convey key information to learners. We performed an instructional analysis of the to-be-learned algorithms and designed the teaching materials based on that analysis. Participants studied stronger or weaker text-based information about the algorithm, and then some participants additionally studied still frames or an animation. Across 2 studies, learners who studied materials based on the instructional analysis tended to outperform other participants on both near and far transfer tasks. Animation also aided performance, particularly for participants who initially read the weaker text. These results suggest that animation might be added to curricula as a way of improving learning without needing revisions of existing texts and materials. Actual or potential applications of this research include the development of animations for learning complex systems as well as guidelines for determining when animations can aid learning.
Adaptation of the anelastic solver EULAG to high performance computing architectures.
NASA Astrophysics Data System (ADS)
Wójcik, Damian; Ciżnicki, Miłosz; Kopta, Piotr; Kulczewski, Michał; Kurowski, Krzysztof; Piotrowski, Zbigniew; Rojek, Krzysztof; Rosa, Bogdan; Szustak, Łukasz; Wyrzykowski, Roman
2014-05-01
In recent years there has been widespread interest in employing heterogeneous and hybrid supercomputing architectures for geophysical research. Especially promising application for the modern supercomputing architectures is the numerical weather prediction (NWP). Adopting traditional NWP codes to the new machines based on multi- and many-core processors, such as GPUs allows to increase computational efficiency and decrease energy consumption. This offers unique opportunity to develop simulations with finer grid resolutions and computational domains larger than ever before. Further, it enables to extend the range of scales represented in the model so that the accuracy of representation of the simulated atmospheric processes can be improved. Consequently, it allows to improve quality of weather forecasts. Coalition of Polish scientific institutions launched a project aimed at adopting EULAG fluid solver for future high-performance computing platforms. EULAG is currently being implemented as a new dynamical core of COSMO Consortium weather prediction framework. The solver code combines features of a stencil and point wise computations. Its communication scheme consists of both halo exchange subroutines and global reduction functions. Within the project, two main modules of EULAG, namely MPDATA advection and iterative GCR elliptic solver are analyzed and optimized. Relevant techniques have been chosen and applied to accelerate code execution on modern HPC architectures: stencil decomposition, block decomposition (with weighting analysis between computation and communication), reduction of inter-cache communication by partitioning of cores into independent teams, cache reusing and vectorization. Experiments with matching computational domain topology to cluster topology are performed as well. The parallel formulation was extended from pure MPI to hybrid MPI - OpenMP approach. Porting to GPU using CUDA directives is in progress. Preliminary results of performance of the
NASA Technical Reports Server (NTRS)
Forman, P.; Moses, K.
1979-01-01
A brief description of a SIFT (Software Implemented Fault Tolerance) Flight Control Computer with emphasis on implementation is presented. A multiprocessor system that relies on software-implemented fault detection and reconfiguration algorithms is described. A high level reliability and fault tolerance is achieved by the replication of computing tasks among processing units.
Survey of Computational Algorithms for MicroRNA Target Prediction
Yue, Dong; Liu, Hui; Huang, Yufei
2009-01-01
MicroRNAs (miRNAs) are 19 to 25 nucleotides non-coding RNAs known to possess important post-transcriptional regulatory functions. Identifying targeting genes that miRNAs regulate are important for understanding their specific biological functions. Usually, miRNAs down-regulate target genes through binding to the complementary sites in the 3' untranslated region (UTR) of the targets. In part, due to the large number of miRNAs and potential targets, an experimental based prediction design would be extremely laborious and economically unfavorable. However, since the bindings of the animal miRNAs are not a perfect one-to-one match with the complementary sites of their targets, it is difficult to predict targets of animal miRNAs by accessing their alignment to the 3' UTRs of potential targets. Consequently, sophisticated computational approaches for miRNA target prediction are being considered as essential methods in miRNA research. We surveyed most of the current computational miRNA target prediction algorithms in this paper. Particularly, we provided a mathematical definition and formulated the problem of target prediction under the framework of statistical classification. Moreover, we summarized the features of miRNA-target pairs in target prediction approaches and discussed these approaches according to two categories, which are the rule-based and the data-driven approaches. The rule-based approach derives the classifier mainly on biological prior knowledge and important observations from biological experiments, whereas the data driven approach builds statistic models using the training data and makes predictions based on the models. Finally, we tested a few different algorithms on a set of experimentally validated true miRNA-target pairs [1] and a set of false miRNA-target pairs, derived from miRNA overexpression experiment [2]. Receiver Operating Characteristic (ROC) curves were drawn to show the performances of these algorithms. PMID:20436875
Oryspayev, Dossay; Aktulga, Hasan Metin; Sosonkina, Masha; ...
2015-07-14
In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important featuresmore » of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.« less
Oryspayev, Dossay; Aktulga, Hasan Metin; Sosonkina, Masha; Maris, Pieter; Vary, James P.
2015-07-14
In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important features of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.
NASA Astrophysics Data System (ADS)
Ullah, Muhammed Zafar
Neural Network and Fuzzy Logic are the two key technologies that have recently received growing attention in solving real world, nonlinear, time variant problems. Because of their learning and/or reasoning capabilities, these techniques do not need a mathematical model of the system, which may be difficult, if not impossible, to obtain for complex systems. One of the major problems in portable or electric vehicle world is secondary cell charging, which shows non-linear characteristics. Portable-electronic equipment, such as notebook computers, cordless and cellular telephones and cordless-electric lawn tools use batteries in increasing numbers. These consumers demand fast charging times, increased battery lifetime and fuel gauge capabilities. All of these demands require that the state-of-charge within a battery be known. Charging secondary cells Fast is a problem, which is difficult to solve using conventional techniques. Charge control is important in fast charging, preventing overcharging and improving battery life. This research work provides a quick and reliable approach to charger design using Neural-Fuzzy technology, which learns the exact battery charging characteristics. Neural-Fuzzy technology is an intelligent combination of neural net with fuzzy logic that learns system behavior by using system input-output data rather than mathematical modeling. The primary objective of this research is to improve the secondary cell charging algorithm and to have faster charging time based on neural network and fuzzy logic technique. Also a new architecture of a controller will be developed for implementing the charging algorithm for the secondary battery.
Synchronized computational architecture for generalized bilateral control of robot arms
NASA Technical Reports Server (NTRS)
Szakaly, Zoltan F. (Inventor)
1991-01-01
A master six degree of freedom Force Reflecting Hand Controller (FRHC) is available at a master site where a received image displays, in essentially real time, a remote robotic manipulator which is being controlled in the corresponding six degree freedom by command signals which are transmitted to the remote site in accordance with the movement of the FRHC at the master site. Software is user-initiated at the master site in order to establish the basic system conditions, and then a physical movement of the FRHC in Cartesean space is reflected at the master site by six absolute numbers that are sensed, translated and computed as a difference signal relative to the earlier position. The change in position is then transmitted in that differential signal form over a high speed synchronized bilateral communication channel which simultaneously returns robot-sensed response information to the master site as forces applied to the FRHC so that the FRHC reflects the feel of what is taking place at the remote site. A system wide clock rate is selected at a sufficiently high rate that the operator at the master site experiences the Force Reflecting operation in real time.
A Framework to Simulate Semiconductor Devices Using Parallel Computer Architecture
NASA Astrophysics Data System (ADS)
Kumar, Gaurav; Singh, Mandeep; Bulusu, Anand; Trivedi, Gaurav
2016-10-01
Device simulations have become an integral part of semiconductor technology to address many issues (short channel effects, narrow width effects, hot-electron effect) as it goes into nano regime, helping us to continue further with the Moore's Law. TCAD provides a simulation environment to design and develop novel devices, thus a leap forward to study their electrical behaviour in advance. In this paper, a parallel 2D simulator for semiconductor devices using Discontinuous Galerkin Finite Element Method (DG-FEM) is presented. Discontinuous Galerkin (DG) method is used to discretize essential device equations and later these equations are analyzed by using a suitable methodology to find the solution. DG method is characterized to provide more accurate solution as it efficiently conserve the flux and easily handles complex geometries. OpenMP is used to parallelize solution of device equations on manycore processors and a speed of 1.4x is achieved during assembly process of discretization. This study is important for more accurate analysis of novel devices (such as FinFET, GAAFET etc.) on a parallel computing platform and will help us to develop a parallel device simulator which will be able to address this issue efficiently. A case study of PN junction diode is presented to show the effectiveness of proposed approach.
Design of a fault tolerant airborne digital computer. Volume 1: Architecture
NASA Technical Reports Server (NTRS)
Wensley, J. H.; Levitt, K. N.; Green, M. W.; Goldberg, J.; Neumann, P. G.
1973-01-01
This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive.
Scalable quantum computer architecture with coupled donor-quantum dot qubits
Schenkel, Thomas; Lo, Cheuk Chi; Weis, Christoph; Lyon, Stephen; Tyryshkin, Alexei; Bokor, Jeffrey
2014-08-26
A quantum bit computing architecture includes a plurality of single spin memory donor atoms embedded in a semiconductor layer, a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, wherein a first voltage applied across at least one pair of the aligned quantum dot and donor atom controls a donor-quantum dot coupling. A method of performing quantum computing in a scalable architecture quantum computing apparatus includes arranging a pattern of single spin memory donor atoms in a semiconductor layer, forming a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, applying a first voltage across at least one aligned pair of a quantum dot and donor atom to control a donor-quantum dot coupling, and applying a second voltage between one or more quantum dots to control a Heisenberg exchange J coupling between quantum dots and to cause transport of a single spin polarized electron between quantum dots.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1997-01-01
We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1996-01-01
We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
A Cerebellar Neuroprosthetic System: Computational Architecture and in vivo Test
Herreros, Ivan; Giovannucci, Andrea; Taub, Aryeh H.; Hogri, Roni; Magal, Ari; Bamford, Sim; Prueckl, Robert; Verschure, Paul F. M. J.
2014-01-01
Emulating the input–output functions performed by a brain structure opens the possibility for developing neuroprosthetic systems that replace damaged neuronal circuits. Here, we demonstrate the feasibility of this approach by replacing the cerebellar circuit responsible for the acquisition and extinction of motor memories. Specifically, we show that a rat can undergo acquisition, retention, and extinction of the eye-blink reflex even though the biological circuit responsible for this task has been chemically inactivated via anesthesia. This is achieved by first developing a computational model of the cerebellar microcircuit involved in the acquisition of conditioned reflexes and training it with synthetic data generated based on physiological recordings. Secondly, the cerebellar model is interfaced with the brain of an anesthetized rat, connecting the model’s inputs and outputs to afferent and efferent cerebellar structures. As a result, we show that the anesthetized rat, equipped with our neuroprosthetic system, can be classically conditioned to the acquisition of an eye-blink response. However, non-stationarities in the recorded biological signals limit the performance of the cerebellar model. Thus, we introduce an updated cerebellar model and validate it with physiological recordings showing that learning becomes stable and reliable. The resulting system represents an important step toward replacing lost functions of the central nervous system via neuroprosthetics, obtained by integrating a synthetic circuit with the afferent and efferent pathways of a damaged brain region. These results also embody an early example of science-based medicine, where on the one hand the neuroprosthetic system directly validates a theory of cerebellar learning that informed the design of the system, and on the other one it takes a step toward the development of neuro-prostheses that could recover lost learning functions in animals and, in the longer term, humans. PMID:25152887
Cloud identification using genetic algorithms and massively parallel computation
NASA Technical Reports Server (NTRS)
Buckles, Bill P.; Petry, Frederick E.
1996-01-01
As a Guest Computational Investigator under the NASA administered component of the High Performance Computing and Communication Program, we implemented a massively parallel genetic algorithm on the MasPar SIMD computer. Experiments were conducted using Earth Science data in the domains of meteorology and oceanography. Results obtained in these domains are competitive with, and in most cases better than, similar problems solved using other methods. In the meteorological domain, we chose to identify clouds using AVHRR spectral data. Four cloud speciations were used although most researchers settle for three. Results were remarkedly consistent across all tests (91% accuracy). Refinements of this method may lead to more timely and complete information for Global Circulation Models (GCMS) that are prevalent in weather forecasting and global environment studies. In the oceanographic domain, we chose to identify ocean currents from a spectrometer having similar characteristics to AVHRR. Here the results were mixed (60% to 80% accuracy). Given that one is willing to run the experiment several times (say 10), then it is acceptable to claim the higher accuracy rating. This problem has never been successfully automated. Therefore, these results are encouraging even though less impressive than the cloud experiment. Successful conclusion of an automated ocean current detection system would impact coastal fishing, naval tactics, and the study of micro-climates. Finally we contributed to the basic knowledge of GA (genetic algorithm) behavior in parallel environments. We developed better knowledge of the use of subpopulations in the context of shared breeding pools and the migration of individuals. Rigorous experiments were conducted based on quantifiable performance criteria. While much of the work confirmed current wisdom, for the first time we were able to submit conclusive evidence. The software developed under this grant was placed in the public domain. An extensive user
ICASE Computer Science Program
NASA Technical Reports Server (NTRS)
1985-01-01
The Institute for Computer Applications in Science and Engineering computer science program is discussed in outline form. Information is given on such topics as problem decomposition, algorithm development, programming languages, and parallel architectures.
Integrating Computer Architectures into the Design of High-Performance Controllers
NASA Technical Reports Server (NTRS)
Jacklin, Stephen A.; Leyland, Jane A.; Warmbrodt, William
1986-01-01
Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, on-line graphics, and file management. This paper discusses five global design considerations that are useful to integrate array processor, multimicroprocessor, and host computer system architecture into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the non-real-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration will be briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind-tunnel environment, the control architecture can generally be applied to a wide range of automatic control applications.
NASA Astrophysics Data System (ADS)
Liu, Lei; Hong, Xiaobin; Wu, Jian; Lin, Jintong
As Grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make it challenging. Clearly, all the current Client/Server(C/S)-based architecture will become unfeasible for supporting large-scale Grid applications due to its poor scalability and poor fault-tolerance. In this paper, based on our previous works [1, 2], a novel self-organized architecture to realize a highly scalable and flexible platform for Grids is proposed. Experimental results show that this architecture is suitable and efficient for consumer-oriented Grids.
Advanced entry guidance algorithm with landing footprint computation
NASA Astrophysics Data System (ADS)
Leavitt, James Aaron
The design and performance evaluation of an entry guidance algorithm for future space transportation vehicles is presented. The algorithm performs two functions: on-board trajectory planning and trajectory tracking. The planned longitudinal path is followed by tracking drag acceleration, as is done by the Space Shuttle entry guidance. Unlike the Shuttle entry guidance, lateral path curvature is also planned and followed. A new trajectory planning function for the guidance algorithm is developed that is suitable for suborbital entry and that significantly enhances the overall performance of the algorithm for both orbital and suborbital entry. In comparison with the previous trajectory planner, the new planner produces trajectories that are easier to track, especially near the upper and lower drag boundaries and for suborbital entry. The new planner accomplishes this by matching the vehicle's initial flight path angle and bank angle, and by enforcing the full three-degree-of-freedom equations of motion with control derivative limits. Insights gained from trajectory optimization results contribute to the design of the new planner, giving it near-optimal downrange and crossrange capabilities. Planned trajectories and guidance simulation results are presented that demonstrate the improved performance. Based on the new planner, a method is developed for approximating the landing footprint for entry vehicles in near real-time, as would be needed for an on-board flight management system. The boundary of the footprint is constructed from the endpoints of extreme downrange and crossrange trajectories generated by the new trajectory planner. The footprint algorithm inherently possesses many of the qualities of the new planner, including quick execution, the ability to accurately approximate the vehicle's glide capabilities, and applicability to a wide range of entry conditions. Footprints can be generated for orbital and suborbital entry conditions using a pre
Computer-aided detection of architectural distortion in prior mammograms of interval cancer.
Rangayyan, Rangaraj M; Banik, Shantanu; Desautels, J E Leo
2010-10-01
Architectural distortion is an important sign of breast cancer, but because of its subtlety, it is a common cause of false-negative findings on screening mammograms. This paper presents methods for the detection of architectural distortion in mammograms of interval cancer cases taken prior to the detection of breast cancer using Gabor filters, phase portrait analysis, fractal analysis, and texture analysis. The methods were used to detect initial candidates for sites of architectural distortion in prior mammograms of interval cancer and also normal control cases. A total of 4,224 regions of interest (ROIs) were automatically obtained from 106 prior mammograms of 56 interval cancer cases, including 301 ROIs related to architectural distortion, and from 52 prior mammograms of 13 normal cases. For each ROI, the fractal dimension and Haralick's texture features were computed. Feature selection was performed separately using stepwise logistic regression and stepwise regression. The best results achieved, in terms of the area under the receiver operating characteristics curve, with the features selected by stepwise logistic regression are 0.76 with the Bayesian classifier, 0.73 with Fisher linear discriminant analysis, 0.77 with an artificial neural network based on radial basis functions, and 0.77 with a support vector machine. Analysis of the performance of the methods with free-response receiver operating characteristics indicated a sensitivity of 0.80 at 7.6 false positives per image. The methods have good potential in detecting architectural distortion in mammograms of interval cancer cases.
Noise reduction in selective computational ghost imaging using genetic algorithm
NASA Astrophysics Data System (ADS)
Zafari, Mohammad; Ahmadi-Kandjani, Sohrab; Kheradmand, Reza
2017-03-01
Recently, we have presented a selective computational ghost imaging (SCGI) method as an advanced technique for enhancing the security level of the encrypted ghost images. In this paper, we propose a modified method to improve the ghost image quality reconstructed by SCGI technique. The method is based on background subtraction using genetic algorithm (GA) which eliminates background noise and gives background-free ghost images. Analyzing the universal image quality index by using experimental data proves the advantage of this modification method. In particular, the calculated value of the image quality index for modified SCGI over 4225 realization shows an 11 times improvement with respect to SCGI technique. This improvement is 20 times in comparison to conventional CGI technique.
Development of computer algorithms for radiation treatment planning.
Cunningham, J R
1989-06-01
As a result of an analysis of data relating tissue response to radiation absorbed dose the ICRU has recommended a target for accuracy of +/- 5 for dose delivery in radiation therapy. This is a difficult overall objective to achieve because of the many steps that make up a course of radiotherapy. The calculation of absorbed dose is only one of the steps and so to achieve an overall accuracy of better than +/- 5% the accuracy in dose calculation must be better yet. The physics behind the problem is sufficiently complicated so that no exact method of calculation has been found and consequently approximate solutions must be used. The development of computer algorithms for this task involves the search for better and better approximate solutions. To achieve the desired target of accuracy a fairly sophisticated calculation procedure must be used. Only when this is done can we hope to further improve our knowledge of the way in which tissues respond to radiation treatments.
Design and Analysis of a Neuromemristive Reservoir Computing Architecture for Biosignal Processing
Kudithipudi, Dhireesha; Saleh, Qutaiba; Merkel, Cory; Thesing, James; Wysocki, Bryant
2016-01-01
Reservoir computing (RC) is gaining traction in several signal processing domains, owing to its non-linear stateful computation, spatiotemporal encoding, and reduced training complexity over recurrent neural networks (RNNs). Previous studies have shown the effectiveness of software-based RCs for a wide spectrum of applications. A parallel body of work indicates that realizing RNN architectures using custom integrated circuits and reconfigurable hardware platforms yields significant improvements in power and latency. In this research, we propose a neuromemristive RC architecture, with doubly twisted toroidal structure, that is validated for biosignal processing applications. We exploit the device mismatch to implement the random weight distributions within the reservoir and propose mixed-signal subthreshold circuits for energy efficiency. A comprehensive analysis is performed to compare the efficiency of the neuromemristive RC architecture in both digital(reconfigurable) and subthreshold mixed-signal realizations. Both Electroencephalogram (EEG) and Electromyogram (EMG) biosignal benchmarks are used for validating the RC designs. The proposed RC architecture demonstrated an accuracy of 90 and 84% for epileptic seizure detection and EMG prosthetic finger control, respectively. PMID:26869876
Parallel multiphysics algorithms and software for computational nuclear engineering
NASA Astrophysics Data System (ADS)
Gaston, D.; Hansen, G.; Kadioglu, S.; Knoll, D. A.; Newman, C.; Park, H.; Permann, C.; Taitano, W.
2009-07-01
There is a growing trend in nuclear reactor simulation to consider multiphysics problems. This can be seen in reactor analysis where analysts are interested in coupled flow, heat transfer and neutronics, and in fuel performance simulation where analysts are interested in thermomechanics with contact coupled to species transport and chemistry. These more ambitious simulations usually motivate some level of parallel computing. Many of the coupling efforts to date utilize simple code coupling or first-order operator splitting, often referred to as loose coupling. While these approaches can produce answers, they usually leave questions of accuracy and stability unanswered. Additionally, the different physics often reside on separate grids which are coupled via simple interpolation, again leaving open questions of stability and accuracy. Utilizing state of the art mathematics and software development techniques we are deploying next generation tools for nuclear engineering applications. The Jacobian-free Newton-Krylov (JFNK) method combined with physics-based preconditioning provide the underlying mathematical structure for our tools. JFNK is understood to be a modern multiphysics algorithm, but we are also utilizing its unique properties as a scale bridging algorithm. To facilitate rapid development of multiphysics applications we have developed the Multiphysics Object-Oriented Simulation Environment (MOOSE). Examples from two MOOSE-based applications: PRONGHORN, our multiphysics gas cooled reactor simulation tool and BISON, our multiphysics, multiscale fuel performance simulation tool will be presented.
Paralel Multiphysics Algorithms and Software for Computational Nuclear Engineering
D. Gaston; G. Hansen; S. Kadioglu; D. A. Knoll; C. Newman; H. Park; C. Permann; W. Taitano
2009-08-01
There is a growing trend in nuclear reactor simulation to consider multiphysics problems. This can be seen in reactor analysis where analysts are interested in coupled flow, heat transfer and neutronics, and in fuel performance simulation where analysts are interested in thermomechanics with contact coupled to species transport and chemistry. These more ambitious simulations usually motivate some level of parallel computing. Many of the coupling efforts to date utilize simple 'code coupling' or first-order operator splitting, often referred to as loose coupling. While these approaches can produce answers, they usually leave questions of accuracy and stability unanswered. Additionally, the different physics often reside on separate grids which are coupled via simple interpolation, again leaving open questions of stability and accuracy. Utilizing state of the art mathematics and software development techniques we are deploying next generation tools for nuclear engineering applications. The Jacobian-free Newton-Krylov (JFNK) method combined with physics-based preconditioning provide the underlying mathematical structure for our tools. JFNK is understood to be a modern multiphysics algorithm, but we are also utilizing its unique properties as a scale bridging algorithm. To facilitate rapid development of multiphysics applications we have developed the Multiphysics Object-Oriented Simulation Environment (MOOSE). Examples from two MOOSE based applications: PRONGHORN, our multiphysics gas cooled reactor simulation tool and BISON, our multiphysics, multiscale fuel performance simulation tool will be presented.
NASA Astrophysics Data System (ADS)
Nakamura, Kazuhiro; Yamamoto, Masatoshi; Takagi, Kazuyoshi; Takagi, Naofumi
In this paper, a fast and memory-efficient VLSI architecture for output probability computations of continuous Hidden Markov Models (HMMs) is presented. These computations are the most time-consuming part of HMM-based recognition systems. High-speed VLSI architectures with small registers and low-power dissipation are required for the development of mobile embedded systems with capable human interfaces. We demonstrate store-based block parallel processing (StoreBPP) for output probability computations and present a VLSI architecture that supports it. When the number of HMM states is adequate for accurate recognition, compared with conventional stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and processing elements and less processing time. The processing elements used in the StreamBPP architecture are identical to those used in the StoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows the efficiency of the proposed architecture through efficient use of registers for storing input feature vectors and intermediate results during computation.
NASA Technical Reports Server (NTRS)
Kao, M. H.; Bodenheimer, R. E.
1976-01-01
The tse computer's capability of achieving image congruence between temporal and multiple images with misregistration due to rotational differences is reported. The coordinate transformations are obtained and a general algorithms is devised to perform image rotation using tse operations very efficiently. The details of this algorithm as well as its theoretical implications are presented. Step by step procedures of image registration are described in detail. Numerous examples are also employed to demonstrate the correctness and the effectiveness of the algorithms and conclusions and recommendations are made.
Service-Oriented Architecture for NVO and TeraGrid Computing
NASA Technical Reports Server (NTRS)
Jacob, Joseph; Miller, Craig; Williams, Roy; Steenberg, Conrad; Graham, Matthew
2008-01-01
The National Virtual Observatory (NVO) Extensible Secure Scalable Service Infrastructure (NESSSI) is a Web service architecture and software framework that enables Web-based astronomical data publishing and processing on grid computers such as the National Science Foundation's TeraGrid. Characteristics of this architecture include the following: (1) Services are created, managed, and upgraded by their developers, who are trusted users of computing platforms on which the services are deployed. (2) Service jobs can be initiated by means of Java or Python client programs run on a command line or with Web portals. (3) Access is granted within a graduated security scheme in which the size of a job that can be initiated depends on the level of authentication of the user.
Chappard, D; Legrand, E; Haettich, B; Chalès, G; Auvinet, B; Eschard, J P; Hamelin, J P; Baslé, M F; Audran, M
2001-11-01
Trabecular bone has been reported as having two-dimensional (2-D) fractal characteristics at the histological level, a finding correlated with biomechanical properties. However, several fractal dimensions (D) are known and computational ways to obtain them vary considerably. This study compared three algorithms on the same series of bone biopsies, to obtain the Kolmogorov, Minkowski-Bouligand, and mass-radius fractal dimensions. The relationships with histomorphometric descriptors of the 2-D trabecular architecture were investigated. Bone biopsies were obtained from 148 osteoporotic male patients. Bone volume (BV/TV), trabecular characteristics (Tb.N, Tb.Sp, Tb.Th), strut analysis, star volumes (marrow spaces and trabeculae), inter-connectivity index, and Euler-Poincaré number were computed. The box-counting method was used to obtain the Kolmogorov dimension (D(k)), the dilatation method for the Minkowski-Bouligand dimension (D(MB)), and the sandbox for the mass-radius dimension (D(MR)) and lacunarity (L). Logarithmic relationships were observed between BV/TV and the fractal dimensions. The best correlation was obtained with D(MR) and the lowest with D(MB). Lacunarity was correlated with descriptors of the marrow cavities (ICI, star volume, Tb.Sp). Linear relationships were observed among the three fractal techniques which appeared highly correlated. A cluster analysis of all histomorphometric parameters provided a tree with three groups of descriptors: for trabeculae (Tb.Th, strut); for marrow cavities (Euler, ICI, Tb.Sp, star volume, L); and for the complexity of the network (Tb.N and the three D's). A sole fractal dimension cannot be used instead of the classic 2-D descriptors of architecture; D rather reflects the complexity of branching trabeculae. Computation time is also an important determinant when choosing one of these methods.
ASIC-based architecture for the real-time computation of 2D convolution with large kernel size
NASA Astrophysics Data System (ADS)
Shao, Rui; Zhong, Sheng; Yan, Luxin
2015-12-01
Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the ASIC-based implementation of 2-D convolution with medium-large kernels. Aiming to improve the efficiency of storage resources on-chip, reducing off-chip bandwidth of these two issues, proposed construction of a data cache reuse. Multi-block SPRAM to cross cached images and the on-chip ping-pong operation takes full advantage of the data convolution calculation reuse, design a new ASIC data scheduling scheme and overall architecture. Experimental results show that the structure can achieve 40× 32 size of template real-time convolution operations, and improve the utilization of on-chip memory bandwidth and on-chip memory resources, the experimental results show that the structure satisfies the conditions to maximize data throughput output , reducing the need for off-chip memory bandwidth.
Exploration and Evaluation of Nanometer Low-power Multi-core VLSI Computer Architectures
2015-03-01
reliable system that can be utilized for producing state-of-the- art computer architectures, especially for silicon implementations. The research...stitch elements together via placing each layout and routing wire between known pins. Early layout editors, such as the Magic Layout Editor, had...within the University of Berkeley mainly for a public domain VLSI tool called Magic [15]. The Tcl language is useful in that it has an easy-to- learn
Integrated Computer-Aided Manufacturing (ICAM) Architecture. Part 3. Volume 8. Technology Transfer.
1983-09-01
VALIDATIOA IS EXPERT REVIEW4, MULTIPLE MODELS ALLOW FOR SIMPLIFICATION1 ’OF THE CONCEP-TS IND SYTAX FOR EA CH REVIEW AND THEREBY ENHANCE COMMUNICATON . 0...THE FACTORY OF THE FUTURE CAN BE DEFINED. "THE VOUGHT CORPORATION HAS BEEN AWARDED A "TO-BE" ARCHITECTURE CONTRACT FOR A "CONCEPTUAL DESIGN FOR...COMPUTER INTEGRATED MANUFACTURING (CIM)" FOR THE AEROSPACE FACTORY OF THE FUTURE. (AS A POINT OF INTEREST, THE VOUGHT CORPORATION HAS INDEPENDENTLY DECIDED
A high-speed algorithm for computation of fractional differentiation and fractional integration.
Fukunaga, Masataka; Shimizu, Nobuyuki
2013-05-13
A high-speed algorithm for computing fractional differentiations and fractional integrations in fractional differential equations is proposed. In this algorithm, the stored data are not the function to be differentiated or integrated but the weighted integrals of the function. The intervals of integration for the memory can be increased without loss of accuracy as the computing time-step n increases. The computing cost varies as n log n, as opposed to n(2) of standard algorithms.
Target Impact Detection Algorithm Using Computer-aided Design (CAD) Model Geometry
2014-09-01
UNCLASSIFIED AD-E403 558 Technical Report ARMET-TR-13024 TARGET IMPACT DETECTION ALGORITHM USING COMPUTER-AIDED DESIGN ( CAD ...DETECTION ALGORITHM USING COMPUTER-AIDED DESIGN ( CAD ) MODEL GEOMETRY 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6...This report documents a method and algorithm to export geometry from a three-dimensional, computer-aided design ( CAD ) model in a format that can be
1989-01-01
J. and Brown, Susan E. SPECT Single-Photon Emission Computed Tomography: A Primer. New York, New York: The Society of Nuclear Medicine, 1986. [Eri84...Rigid Objects," Computer Graphics, vol. 17, no. 3, pp. 65-69, July 1983. 457 [Fuc85] Fuchs, Henry; Goldfeather, Jack; Hultquist, Jeff P.; Spach, Susan ...Hay87] Hayes, John P.; Mudge, Trevor N.; Stout, Quentin F.; Colley Stephen; and Palmer, John. "Architecture of a Hypercube Supercomputer," Hypercube
An Architecture and Supporting Environment of Service-Oriented Computing Based-On Context Awareness
NASA Astrophysics Data System (ADS)
Ma, Tianxiao; Wu, Gang; Huang, Jun
Service-oriented computing (SOC) is emerging to be an important computing paradigm of the next future. Based on context awareness, this paper proposes an architecture of SOC. A definition of the context in open environments such as Internet is given, which is based on ontology. The paper also proposes a supporting environment for the context-aware SOC, which focus on services on-demand composition and context-awareness evolving. A reference implementation of the supporting environment based on OSGi[11] is given at last.
NASA Astrophysics Data System (ADS)
Dushanov, E.; Kholmurodov, Kh.; Aru, G.; Korenkov, V.; Smith, W.; Ohno, Y.; Narumi, T.; Morimoto, G.; Taiji, M.; Yasuoka, K.
2009-05-01
This report compares the performance of the DL_POLY general-purpose molecular dynamics simulation package on the LIT JINR computing cluster CICC with various communication systems. The comparison involved two cluster architectures: Gigabit Ethernet and InfiniBand technologies, respectively. The code performance tests include some comparison of the CICC cluster with the special-purpose computer MDGRAPE-3 developed at RIKEN for a high-speed acceleration of the MD (molecular dynamics) without a fixed cutoff. The DL_POLY benchmark covers a set of typical MD system simulations detailed below.
Semi-analytic texturing algorithm for polygon computer-generated holograms.
Lee, Wooyoung; Im, Dajeong; Paek, Jeongyeup; Hahn, Joonku; Kim, Hwi
2014-12-15
A texturing method for the semi-analytic polygon computer-generated hologram synthesis algorithm is studied. Through this, the full-potential and development direction of the semi-analytic polygon computer-generated holograms are discussed and compared to that of the conventional numerical algorithm of polygon computer-generated hologram generation based on the fast Fourier transform and bilinear interpolation. The theoretical hurdle of the semi-analytic texturing algorithm is manifested and an approach to resolve this problen. A key mathematical approximation in the angular spectrum computer-generated hologram computation, as well as the trade-offs between texturing effects and computational efficiencies are analyzed through numerical simulation. In this fundamental study, theoretical potential of the semi-analytic polygon computer-generated hologram algorithm is revealed and the ultimate goal of research into the algorithm clarified.
Computer architectures: varied blueprints will lift system speeds to dizzying heights
Weiss, R.
1985-05-30
This paper examines the new developments in computer research. Predictions are made regarding computer design and use. These include the general continued use of reduced-instruction-set computers (RISCS); the tailoring of computer architectures to individual programming languages; software as the key factor in hardware design. The authors believe that the less complex structure of RISC chips will shorten the time from design through production, and that parallel processing will become economically feasible for heavy numerical lookup, artificial intelligence, and other types of jobs. A special compressor, the programmed logic machine (PLM), is discussed. It was designed to pipeline Prolog execution. The Aquarius I is also examined. It consists of a host processor based on the NCR/32 chip set, a special interface that buffers and sets priorities for memory traffic, and the PLM itself.
An efficient three-dimensional Poisson solver for SIMD high-performance-computing architectures
NASA Technical Reports Server (NTRS)
Cohl, H.
1994-01-01
We present an algorithm that solves the three-dimensional Poisson equation on a cylindrical grid. The technique uses a finite-difference scheme with operator splitting. This splitting maps the banded structure of the operator matrix into a two-dimensional set of tridiagonal matrices, which are then solved in parallel. Our algorithm couples FFT techniques with the well-known ADI (Alternating Direction Implicit) method for solving Elliptic PDE's, and the implementation is extremely well suited for a massively parallel environment like the SIMD architecture of the MasPar MP-1. Due to the highly recursive nature of our problem, we believe that our method is highly efficient, as it avoids excessive interprocessor communication.
ERIC Educational Resources Information Center
Arumi, Francisco N.
Computer programs capable of describing the thermal behavior of buildings are used to help architectural students understand environmental systems. The Numerical Simulation Laboratory at the Architectural School of the University of Texas at Austin was developed to provide the necessary software capable of simulating the energy transactions…
NASA Astrophysics Data System (ADS)
Taghibakhsh, F.; Karim, K. S.
2007-03-01
Cone beam computed tomography (CBCT) has been recently reported using flat panel imagers (FPI). Here, detector technology capable of high speed imaging, high spatial resolution, large volume coverage, better contrast resolution and, in particular, lowered patient dose is required. Employing active matrix flat panel imagers (AMFPIs) as cone beam CT detectors has been proposed as a solution for improving volume coverage, contrast and resolution; however, clinical evaluations have shown that they suffer from low speed read out. Unlike passive pixel architecture which is currently the state-of-the-art technology for AMFPIs, our preliminary studies have shown that novel amplified pixel sensor (APS) architectures can overcome the low readout speed, and moreover, they provide gain which can be traded for higher frame rate and lower X-ray doses. Although APS architectures can meet the high dynamic range and low noise requirements of CT imaging, linearity and variations between pixel characteristics are major issues. In this study we will investigate novel APS architectures to address these concerns.
CCARES: A computer algorithm for the reliability analysis of laminated CMC components
NASA Technical Reports Server (NTRS)
Duffy, Stephen F.; Gyekenyesi, John P.
1993-01-01
Structural components produced from laminated CMC (ceramic matrix composite) materials are being considered for a broad range of aerospace applications that include various structural components for the national aerospace plane, the space shuttle main engine, and advanced gas turbines. Specifically, these applications include segmented engine liners, small missile engine turbine rotors, and exhaust nozzles. Use of these materials allows for improvements in fuel efficiency due to increased engine temperatures and pressures, which in turn generate more power and thrust. Furthermore, this class of materials offers significant potential for raising the thrust-to-weight ratio of gas turbine engines by tailoring directions of high specific reliability. The emerging composite systems, particularly those with silicon nitride or silicon carbide matrix, can compete with metals in many demanding applications. Laminated CMC prototypes have already demonstrated functional capabilities at temperatures approaching 1400 C, which is well beyond the operational limits of most metallic materials. Laminated CMC material systems have several mechanical characteristics which must be carefully considered in the design process. Test bed software programs are needed that incorporate stochastic design concepts that are user friendly, computationally efficient, and have flexible architectures that readily incorporate changes in design philosophy. The CCARES (Composite Ceramics Analysis and Reliability Evaluation of Structures) program is representative of an effort to fill this need. CCARES is a public domain computer algorithm, coupled to a general purpose finite element program, which predicts the fast fracture reliability of a structural component under multiaxial loading conditions.
Numerical linear algebra algorithms and software
NASA Astrophysics Data System (ADS)
Dongarra, Jack J.; Eijkhout, Victor
2000-11-01
The increasing availability of advanced-architecture computers has a significant effect on all spheres of scientific computation, including algorithm research and software development in numerical linear algebra. Linear algebra - in particular, the solution of linear systems of equations - lies at the heart of most calculations in scientific computing. This paper discusses some of the recent developments in linear algebra designed to exploit these advanced-architecture computers. We discuss two broad classes of algorithms: those for dense, and those for sparse matrices.
New algorithms for the symmetric tridiagonal eigenvalue computation
Pan, V. |
1994-12-31
The author presents new algorithms that accelerate the bisection method for the symmetric eigenvalue problem. The algorithms rely on some new techniques, which include acceleration of Newton`s iteration and can also be further applied to acceleration of some other iterative processes, in particular, of iterative algorithms for approximating polynomial zeros.
NASA Astrophysics Data System (ADS)
Malmir, Hessam; Sahimi, Muhammad; Tabar, M. Reza Rahimi
2016-12-01
Packing of cubic particles arises in a variety of problems, ranging from biological materials to colloids and the fabrication of new types of porous materials with controlled morphology. The properties of such packings may also be relevant to problems involving suspensions of cubic zeolites, precipitation of salt crystals during CO2 sequestration in rock, and intrusion of fresh water in aquifers by saline water. Not much is known, however, about the structure and statistical descriptors of such packings. We present a detailed simulation and microstructural characterization of packings of nonoverlapping monodisperse cubic particles, following up on our preliminary results [H. Malmir et al., Sci. Rep. 6, 35024 (2016), 10.1038/srep35024]. A modification of the random sequential addition (RSA) algorithm has been developed to generate such packings, and a variety of microstructural descriptors, including the radial distribution function, the face-normal correlation function, two-point probability and cluster functions, the lineal-path function, the pore-size distribution function, and surface-surface and surface-void correlation functions, have been computed, along with the specific surface and mean chord length of the packings. The results indicate the existence of both spatial and orientational long-range order as the the packing density increases. The maximum packing fraction achievable with the RSA method is about 0.57, which represents the limit for a structure similar to liquid crystals.
A constrained conjugate gradient algorithm for computed tomography
Azevedo, S.G.; Goodman, D.M.
1994-11-15
Image reconstruction from projections of x-ray, gamma-ray, protons and other penetrating radiation is a well-known problem in a variety of fields, and is commonly referred to as computed tomography (CT). Various analytical and series expansion methods of reconstruction and been used in the past to provide three-dimensional (3D) views of some interior quantity. The difficulties of these approaches lie in the cases where (a) the number of views attainable is limited, (b) the Poisson (or other) uncertainties are significant, (c) quantifiable knowledge of the object is available, but not implementable, or (d) other limitations of the data exist. We have adapted a novel nonlinear optimization procedure developed at LLNL to address limited-data image reconstruction problems. The technique, known as nonlinear least squares with general constraints or constrained conjugate gradients (CCG), has been successfully applied to a number of signal and image processing problems, and is now of great interest to the image reconstruction community. Previous applications of this algorithm to deconvolution problems and x-ray diffraction images for crystallography have shown the great promise.
Multiprocessor architecture: Synthesis and evaluation
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1990-01-01
Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
NASA Technical Reports Server (NTRS)
Felippa, Carlos A.
1989-01-01
This is the fifth of a set of five volumes which describe the software architecture for the Computational Structural Mechanics Testbed. Derived from NICE, an integrated software system developed at Lockheed Palo Alto Research Laboratory, the architecture is composed of the command language (CLAMP), the command language interpreter (CLIP), and the data manager (GAL). Volumes 1, 2, and 3 (NASA CR's 178384, 178385, and 178386, respectively) describe CLAMP and CLIP and the CLIP-processor interface. Volumes 4 and 5 (NASA CR's 178387 and 178388, respectively) describe GAL and its low-level I/O. CLAMP, an acronym for Command Language for Applied Mechanics Processors, is designed to control the flow of execution of processors written for NICE. Volume 5 describes the low-level data management component of the NICE software. It is intended only for advanced programmers involved in maintenance of the software.
Shen, Chia-Ping; Chen, Wei-Hsin; Chen, Jia-Ming; Hsu, Kai-Ping; Lin, Jeng-Wei; Chiu, Ming-Jang; Chen, Chi-Huang; Lai, Feipei
2010-01-01
Today, many bio-signals such as Electroencephalography (EEG) are recorded in digital format. It is an emerging research area of analyzing these digital bio-signals to extract useful health information in biomedical engineering. In this paper, a bio-signal analyzing cloud computing architecture, called BACCA, is proposed. The system has been designed with the purpose of seamless integration into the National Taiwan University Health Information System. Based on the concept of. NET Service Oriented Architecture, the system integrates heterogeneous platforms, protocols, as well as applications. In this system, we add modern analytic functions such as approximated entropy and adaptive support vector machine (SVM). It is shown that the overall accuracy of EEG bio-signal analysis has increased to nearly 98% for different data sets, including open-source and clinical data sets.
Ultra-Fast Data-Mining Hardware Architecture Based on Stochastic Computing
Oliver, Antoni; Alomar, Miquel L.
2015-01-01
Minimal hardware implementations able to cope with the processing of large amounts of data in reasonable times are highly desired in our information-driven society. In this work we review the application of stochastic computing to probabilistic-based pattern-recognition analysis of huge database sets. The proposed technique consists in the hardware implementation of a parallel architecture implementing a similarity search of data with respect to different pre-stored categories. We design pulse-based stochastic-logic blocks to obtain an efficient pattern recognition system. The proposed architecture speeds up the screening process of huge databases by a factor of 7 when compared to a conventional digital implementation using the same hardware area. PMID:25955274
NASA Technical Reports Server (NTRS)
Wright, Mary A.; Regelbrugge, Marc E.; Felippa, Carlos A.
1989-01-01
This is the fourth of a set of five volumes which describe the software architecture for the Computational Structural Mechanics Testbed. Derived from NICE, an integrated software system developed at Lockheed Palo Alto Research Laboratory, the architecture is composed of the command language CLAMP, the command language interpreter CLIP, and the data manager GAL. Volumes 1, 2, and 3 (NASA CR's 178384, 178385, and 178386, respectively) describe CLAMP and CLIP and the CLIP-processor interface. Volumes 4 and 5 (NASA CR's 178387 and 178388, respectively) describe GAL and its low-level I/O. CLAMP, an acronym for Command Language for Applied Mechanics Processors, is designed to control the flow of execution of processors written for NICE. Volume 4 describes the nominal-record data management component of the NICE software. It is intended for all users.
Ultra-fast data-mining hardware architecture based on stochastic computing.
Morro, Antoni; Canals, Vincent; Oliver, Antoni; Alomar, Miquel L; Rossello, Josep L
2015-01-01
Minimal hardware implementations able to cope with the processing of large amounts of data in reasonable times are highly desired in our information-driven society. In this work we review the application of stochastic computing to probabilistic-based pattern-recognition analysis of huge database sets. The proposed technique consists in the hardware implementation of a parallel architecture implementing a similarity search of data with respect to different pre-stored categories. We design pulse-based stochastic-logic blocks to obtain an efficient pattern recognition system. The proposed architecture speeds up the screening process of huge databases by a factor of 7 when compared to a conventional digital implementation using the same hardware area.
ERIC Educational Resources Information Center
Uwakonye, Obioha; Alagbe, Oluwole; Oluwatayo, Adedapo; Alagbe, Taiye; Alalade, Gbenga
2015-01-01
As a result of globalization of digital technology, intellectual discourse on what constitutes the basic body of architectural knowledge to be imparted to future professionals has been on the increase. This digital revolution has brought to the fore the need to review the already overloaded architectural education curriculum of Nigerian schools of…
NASA Astrophysics Data System (ADS)
Lei, Weiwei; Li, Kai
2016-12-01
There are four recursive algorithms used in the computation of the fully normalized associated Legendre functions (FNALFs): the standard forward column algorithm, the standard forward row algorithm, the recursive algorithm between every other degree, and the Belikov algorithm. These algorithms were evaluated in terms of their first relative numerical accuracy, second relative numerical accuracy, and computation speed and efficiency. The results show that when the degree n reaches 3000, both the recursive algorithm between every other degree and the Belikov algorithm are applicable for | cos θ | ∈[0, 1], with the latter better second relative numerical accuracy than the former at a slower computation speed. In terms of | cos θ | ∈[0, 1], the standard forward column algorithm, the recursive algorithm between every other degree, and the Belikov algorithm are applicable within degree n of 1900, and the standard forward column algorithm has the highest computation speed. The standard forward column algorithm is applicable for | cos θ | ∈[0, 1] within degree n of 1900. This algorithm's range of applicability decreases as the degree increases beyond 1900; however, it remains applicable within a minute range when | cos θ | is approximately equal to 1. The standard forward row algorithm has the smallest range of applicability: it is only applicable within degree n of 100 for | cos θ | ∈[0, 1], and its range of applicability decreases rapidly when the degree is greater than 100. The results of this research are expected to be useful to researchers in choosing the best algorithms for use in the computation of the FNALFs.
Quantum computation: algorithms and implementation in quantum dot devices
NASA Astrophysics Data System (ADS)
Gamble, John King
In this thesis, we explore several aspects of both the software and hardware of quantum computation. First, we examine the computational power of multi-particle quantum random walks in terms of distinguishing mathematical graphs. We study both interacting and non-interacting multi-particle walks on strongly regular graphs, proving some limitations on distinguishing powers and presenting extensive numerical evidence indicative of interactions providing more distinguishing power. We then study the recently proposed adiabatic quantum algorithm for Google PageRank, and show that it exhibits power-law scaling for realistic WWW-like graphs. Turning to hardware, we next analyze the thermal physics of two nearby 2D electron gas (2DEG), and show that an analogue of the Coulomb drag effect exists for heat transfer. In some distance and temperature, this heat transfer is more significant than phonon dissipation channels. After that, we study the dephasing of two-electron states in a single silicon quantum dot. Specifically, we consider dephasing due to the electron-phonon coupling and charge noise, separately treating orbital and valley excitations. In an ideal system, dephasing due to charge noise is strongly suppressed due to a vanishing dipole moment. However, introduction of disorder or anharmonicity leads to large effective dipole moments, and hence possibly strong dephasing. Building on this work, we next consider more realistic systems, including structural disorder systems. We present experiment and theory, which demonstrate energy levels that vary with quantum dot translation, implying a structurally disordered system. Finally, we turn to the issues of valley mixing and valley-orbit hybridization, which occurs due to atomic-scale disorder at quantum well interfaces. We develop a new theoretical approach to study these effects, which we name the disorder-expansion technique. We demonstrate that this method successfully reproduces atomistic tight-binding techniques
In this paper we develop and computationally test three implicit enumeration algorithms for solving the asymmetric traveling salesman problem. All...three algorithms use the assignment problem relaxation of the traveling salesman problem with subtour elimination similar to the previous approaches by...previous subtour elimination algorithms and (2) the 1-arborescence approach of Held and Karp for the asymmetric traveling salesman problem.
Apparatuses and Methods for Producing Runtime Architectures of Computer Program Modules
NASA Technical Reports Server (NTRS)
Abi-Antoun, Marwan Elia (Inventor); Aldrich, Jonathan Erik (Inventor)
2013-01-01
Apparatuses and methods for producing run-time architectures of computer program modules. One embodiment includes creating an abstract graph from the computer program module and from containment information corresponding to the computer program module, wherein the abstract graph has nodes including types and objects, and wherein the abstract graph relates an object to a type, and wherein for a specific object the abstract graph relates the specific object to a type containing the specific object; and creating a runtime graph from the abstract graph, wherein the runtime graph is a representation of the true runtime object graph, wherein the runtime graph represents containment information such that, for a specific object, the runtime graph relates the specific object to another object that contains the specific object.
Scott, Gregory D; Fryer, Allison D; Jacoby, David B
2013-01-01
The quantitative histological analysis of airway innervation using tissue sections is challenging because of the sparse and patchy distribution of nerves. Here we demonstrate a method using a computational approach to measure airway nerve architecture that will allow for more complete nerve quantification and the measurement of structural peripheral neuroplasticity in lung development and disease. We demonstrate how our computer analysis outperforms manual scoring in quantifying three-dimensional nerve branchpoints and lengths. In murine lungs, we detected airway epithelial nerves that have not been previously identified because of their patchy distribution, and we quantified their three-dimensional morphology using our computer mapping approach. Furthermore, we show the utility of this approach in bronchoscopic forceps biopsies of human airways, as well as the esophagus, colon, and skin.
NASA Astrophysics Data System (ADS)
Chen, Yufeng; Wu, Zebin; Sun, Le; Wei, Zhihui; Li, Yonglong
2016-04-01
With the gradual increase in the spatial and spectral resolution of hyperspectral images, the size of image data becomes larger and larger, and the complexity of processing algorithms is growing, which poses a big challenge to efficient massive hyperspectral image processing. Cloud computing technologies distribute computing tasks to a large number of computing resources for handling large data sets without the limitation of memory and computing resource of a single machine. This paper proposes a parallel pixel purity index (PPI) algorithm for unmixing massive hyperspectral images based on a MapReduce programming model for the first time in the literature. According to the characteristics of hyperspectral images, we describe the design principle of the algorithm, illustrate the main cloud unmixing processes of PPI, and analyze the time complexity of serial and parallel algorithms. Experimental results demonstrate that the parallel implementation of the PPI algorithm on the cloud can effectively process big hyperspectral data and accelerate the algorithm.
Mental Computation or Standard Algorithm? Children's Strategy Choices on Multi-Digit Subtractions
ERIC Educational Resources Information Center
Torbeyns, Joke; Verschaffel, Lieven
2016-01-01
This study analyzed children's use of mental computation strategies and the standard algorithm on multi-digit subtractions. Fifty-eight Flemish 4th graders of varying mathematical achievement level were individually offered subtractions that either stimulated the use of mental computation strategies or the standard algorithm in one choice and two…
A Unified Computational Architecture for Preprocessing Visual Information in Space and Time.
NASA Astrophysics Data System (ADS)
Skrzypek, Josef
1986-06-01
The success of autonomous mobile robots depends on the ability to understand continuously changing scenery. Present techniques for analysis of images are not always suitable because in sequential paradigm, computation of visual functions based on absolute values of stimuli is inefficient. Important aspects of visual information are encoded in discontinuities of intensity, hence a representation in terms of relative values seems advantageous. We present the computing architecture of a massively parallel vision module which optimizes the detection of relative intensity changes in space and time. Visual information must remain constant despite variation in ambient light level or velocity of target and robot. Constancy can be achieved by normalizing motion and lightness scales. In both cases, basic computation involves a comparison of the center pixels with the context of surrounding values. Therefore, a similar computing architecture, composed of three functionally-different and hierarchically-arranged layers of overlapping operators, can be used for two integrated parts of the module. The first part maintains high sensitivity to spatial changes by reducing noise and normalizing the lightness scale. The result is used by the second part to maintain high sensitivity to temporal discontinuities and to compute relative motion information. Simulation results show that response of the module is proportional to contrast of the stimulus and remains constant over the whole domain of intensity. It is also proportional to velocity of motion limited to any small portion of the visual field. Uniform motion throughout the visual field results in constant response, independent of velocity. Spatial and temporal intensity changes are enhanced because computationally, the module resembles the behavior of a DOG function.
Reading, Writing and Algorithms: Computer Literacy in the Schools.
ERIC Educational Resources Information Center
Neufeld, Helen H.
Given the state of the art of computing in 1982, it is not necessary to know a computer language to use a computer. Three aspects of the current state of computing make it mandatory that educators from elementary through postsecondary levels rapidly incorporate this skill into the curriculum: (1) computers have permeated society--they are used in…
Azmy, Yousry
2014-06-10
We employ the Integral Transport Matrix Method (ITMM) as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells' fluxes and between the cells' and boundary surfaces' fluxes. The main goals of this work are to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and parallel performance of the developed methods with increasing number of processes, P. The fastest observed parallel solution method, Parallel Gauss-Seidel (PGS), was used in a weak scaling comparison with the PARTISN transport code, which uses the source iteration (SI) scheme parallelized with the Koch-baker-Alcouffe (KBA) method. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method- even without acceleration/preconditioning-is completitive for optically thick problems as P is increased to the tens of thousands range. For the most optically thick cells tested, PGS reduced execution time by an approximate factor of three for problems with more than 130 million computational cells on P = 32,768. Moreover, the SI-DSA execution times's trend rises generally more steeply with increasing P than the PGS trend. Furthermore, the PGS method outperforms SI for the periodic heterogeneous layers (PHL) configuration problems. The PGS method outperforms SI and SI-DSA on as few as P = 16 for PHL problems and reduces execution time by a factor of ten or more for all problems considered with more than 2 million computational cells on P = 4.096.
NASA Astrophysics Data System (ADS)
Niknam, Mehdi; Thulasiraman, Parimala; Camorlinga, Sergio
2010-11-01
Connected component labelling is an essential step in image processing. We provide a parallel version of Suzuki's sequential connected component algorithm in order to speed up the labelling process. Also, we modify the algorithm to enable labelling gray-scale images. Due to the data dependencies in the algorithm we used a method similar to pipeline to exploit parallelism. The parallel algorithm method achieved a speedup of 2.5 for image size of 256 × 256 pixels using 4 processing threads.
Hung, Peter W; Paik, David S; Napel, Sandy; Yee, Judy; Jeffrey, R Brooke; Steinauer-Gebauer, Andreas; Min, Juno; Jathavedam, Ashwin; Beaulieu, Christopher F
2002-02-01
Three bowel distention-measuring algorithms for use at computed tomographic (CT) colonography were developed, validated in phantoms, and applied to a human CT colonographic data set. The three algorithms are the cross-sectional area method, the moving spheres method, and the segmental volume method. Each algorithm effectively quantified distention, but accuracy varied between methods. Clinical feasibility was demonstrated. Depending on the desired spatial resolution and accuracy, each algorithm can quantitatively depict colonic diameter in CT colonography.
Linnanto, Juha Matti; Freiberg, Arvi
2014-10-06
We have used different computational methods to study structural architecture, and light-harvesting and energy transfer properties of the photosynthetic unit of filamentous anoxygenic phototrophs. Due to the huge number of atoms in the photosynthetic unit, a combination of atomistic and coarse methods was used for electronic structure calculations. The calculations reveal that the light energy absorbed by the peripheral chlorosome antenna complex transfers efficiently via the baseplate and the core B808–866 antenna complexes to the reaction center complex, in general agreement with the present understanding of this complex system.
NASA Technical Reports Server (NTRS)
Fijany, Amir; Toomarian, Benny N.
2000-01-01
There has been significant improvement in the performance of VLSI devices, in terms of size, power consumption, and speed, in recent years and this trend may also continue for some near future. However, it is a well known fact that there are major obstacles, i.e., physical limitation of feature size reduction and ever increasing cost of foundry, that would prevent the long term continuation of this trend. This has motivated the exploration of some fundamentally new technologies that are not dependent on the conventional feature size approach. Such technologies are expected to enable scaling to continue to the ultimate level, i.e., molecular and atomistic size. Quantum computing, quantum dot-based computing, DNA based computing, biologically inspired computing, etc., are examples of such new technologies. In particular, quantum-dots based computing by using Quantum-dot Cellular Automata (QCA) has recently been intensely investigated as a promising new technology capable of offering significant improvement over conventional VLSI in terms of reduction of feature size (and hence increase in integration level), reduction of power consumption, and increase of switching speed. Quantum dot-based computing and memory in general and QCA specifically, are intriguing to NASA due to their high packing density (10(exp 11) - 10(exp 12) per square cm ) and low power consumption (no transfer of current) and potentially higher radiation tolerant. Under Revolutionary Computing Technology (RTC) Program at the NASA/JPL Center for Integrated Space Microelectronics (CISM), we have been investigating the potential applications of QCA for the space program. To this end, exploiting the intrinsic features of QCA, we have designed novel QCA-based circuits for co-planner (i.e., single layer) and compact implementation of a class of data permutation matrices, a class of interconnection networks, and a bit-serial processor. Building upon these circuits, we have developed novel algorithms and QCA
2014-11-01
Integrated Cognitive- neuroscience Architectures for Understanding Sensemaking (ICArUS): A Computational Basis for ICArUS Challenge...4. TITLE AND SUBTITLE Integrated Cognitive- neuroscience Architectures for Understanding Sensemaking (ICArUS): A Computational Basis for ICArUS...Advanced Research Projects Activity) program ICArUS (Integrated Cognitive- neuroscience Architectures for Understanding Sensemaking) requires
NASA Astrophysics Data System (ADS)
Bitter, Ingmar; Brown, John E.; Brickman, Daniel; Summers, Ronald M.
2004-04-01
The presented method significantly reduces the time necessary to validate a computed tomographic colonography (CTC) computer aided detection (CAD) algorithm of colonic polyps applied to a large patient database. As the algorithm is being developed on Windows PCs and our target, a Beowulf cluster, is running on Linux PCs, we made the application dual platform compatible using a single source code tree. To maintain, share, and deploy source code, we used CVS (concurrent versions system) software. We built the libraries from their sources for each operating system. Next, we made the CTC CAD algorithm dual-platform compatible and validate that both Windows and Linux produced the same results. Eliminating system dependencies was mostly achieved using the Qt programming library, which encapsulates most of the system dependent functionality in order to present the same interface on either platform. Finally, we wrote scripts to execute the CTC CAD algorithm in parallel. Running hundreds of simultaneous copies of the CTC CAD algorithm on a Beowulf cluster computing network enables execution in less than four hours on our entire collection of over 2400 CT scans, as compared to a month a single PC. As a consequence, our complete patient database can be processed daily, boosting research productivity. Large scale validation of a computer aided polyp detection algorithm for CT colonography using cluster computing significantly improves the round trip time of algorithm improvement and revalidation.
Jang, In Gwun; Kim, Il Yong
2008-08-07
In the field of bone adaptation, it is believed that the morphology of bone is affected by its mechanical loads, and bone has self-optimizing capability; this phenomenon is well known as Wolff's law of the transformation of bone. In this paper, we simulated trabecular bone adaptation in the human proximal femur using topology optimization and quantitatively investigated the validity of Wolff's law. Topology optimization iteratively distributes material in a design domain producing optimal layout or configuration, and it has been widely and successfully used in many engineering fields. We used a two-dimensional micro-FE model with 50 microm pixel resolution to represent the full trabecular architecture in the proximal femur, and performed topology optimization to study the trabecular morphological changes under three loading cases in daily activities. The simulation results were compared to the actual trabecular architecture in previous experimental studies. We discovered that there are strong similarities in trabecular patterns between the computational results and observed data in the literature. The results showed that the strain energy distribution of the trabecular architecture became more uniform during the optimization; from the viewpoint of structural topology optimization, this bone morphology may be considered as an optimal structure. We also showed that the non-orthogonal intersections were constructed to support daily activity loadings in the sense of optimization, as opposed to Wolff's drawing.
SSME structural computer program development: BOPACE theoretical manual, addendum. [algorithms
NASA Technical Reports Server (NTRS)
1975-01-01
An algorithm developed and incorporated into BOPACE for improving the convergence and accuracy of the inelastic stress-strain calculations is discussed. The implementation of separation of strains in the residual-force iterative procedure is defined. The elastic-plastic quantities used in the strain-space algorithm are defined and compared with previous quantities.
Teaching Computation in Primary School without Traditional Written Algorithms
ERIC Educational Resources Information Center
Hartnett, Judy
2015-01-01
Concerns regarding the dominance of the traditional written algorithms in schools have been raised by many mathematics educators, yet the teaching of these procedures remains a dominant focus in in primary schools. This paper reports on a project in one school where the staff agreed to put the teaching of the traditional written algorithm aside,…
Grytz, Rafael; Meschke, Günther; Jonas, Jost B
2011-06-01
The biomechanics of the optic nerve head is assumed to play an important role in ganglion cell loss in glaucoma. Organized collagen fibrils form complex networks that introduce strong anisotropic and nonlinear attributes into the constitutive response of the peripapillary sclera (PPS) and lamina cribrosa (LC) dominating the biomechanics of the optic nerve head. The recently presented computational remodeling approach (Grytz and Meschke in Biomech Model Mechanobiol 9:225-235, 2010) was used to predict the micro-architecture in the LC and PPS, and to investigate its impact on intraocular pressure-related deformations. The mechanical properties of the LC and PPS were derived from a microstructure-oriented constitutive model that included the stretch-dependent stiffening and the statistically distributed orientations of the collagen fibrils. Biomechanically induced adaptation of the local micro-architecture was captured by allowing collagen fibrils to be reoriented in response to the intraocular pressure-related loading conditions. In agreement with experimental observations, the remodeling algorithm predicted the existence of an annulus of fibrils around the scleral canal in the PPS, and a predominant radial orientation of fibrils in the periphery of the LC. The peripapillary annulus significantly reduced the intraocular pressure-related expansion of the scleral canal and shielded the LC from high tensile stresses. The radial oriented fibrils in the LC periphery reinforced the LC against transversal shear stresses and reduced LC bending deformations. The numerical approach presents a novel and reasonable biomechanical explanation of the spatial orientation of fibrillar collagen in the optic nerve head.
Simulation of Si:P spin-based quantum computer architecture
Chang Yiachung; Fang Angbo
2008-11-07
We present realistic simulation for single and double phosphorous donors in a silicon-based quantum computer design by solving a valley-orbit coupled effective-mass equation for describing phosphorous donors in strained silicon quantum well (QW). Using a generalized unrestricted Hartree-Fock method, we solve the two-electron effective-mass equation with quantum well confinement and realistic gate potentials. The effects of QW width, gate voltages, donor separation, and donor position shift on the lowest singlet and triplet energies and their charge distributions for a neighboring donor pair in the quantum computer(QC) architecture are analyzed. The gate tunability are defined and evaluated for a typical QC design. Estimates are obtained for the duration of spin half-swap gate operation.
A decoupled graph/computation data-driven architecture with variable-resolution actors
Evripidou, P.; Gaudiot, J.L.
1990-12-31
This paper presents a hybrid multiprocessor architecture that combines the advantages of the dynamic data-flow principles of execution with those of the control-flow model of execution. Two major design ideas are utilized by the proposed model: asynchronous execution of graph and computation operations, and variable- resolution actors. The independence of the two main unites of the machine allows an efficient implementation of functional/data-flow principles with conventional, mature technology. The compiler generates graphs with variable-sized actors in order to match the characteristics of the application to the target machine. For instance, vector actors are proposed for many aspects of scientific computing, while lower resolution (Compound Macro Actors) or conversely higher resolution (atomic instruction actors) is used for unvectorizable programs.
One high-accuracy camera calibration algorithm based on computer vision images
NASA Astrophysics Data System (ADS)
Wang, Ying; Huang, Jianming; Wei, Xiangquan
2015-12-01
Camera calibration is the first step of computer vision and one of the most active research fields nowadays. In order to improve the measurement precision, the internal parameters of the camera should be accurately calibrated. So one high-accuracy camera calibration algorithm is proposed based on the images of planar targets or tridimensional targets. By using the algorithm, the internal parameters of the camera are calibrated based on the existing planar target at the vision-based navigation experiment. The experimental results show that the accuracy of the proposed algorithm is obviously improved compared with the conventional linear algorithm, Tsai general algorithm, and Zhang Zhengyou calibration algorithm. The algorithm proposed by the article can satisfy the need of computer vision and provide reference for precise measurement of the relative position and attitude.
Amirfattahi, Rassoul
2013-10-01
Owing to its simplicity radix-2 is a popular algorithm to implement fast fourier transform. Radix-2(p) algorithms have the same order of computational complexity as higher radices algorithms, but still retain the simplicity of radix-2. By defining a new concept, twiddle factor template, in this paper, we propose a method for exact calculation of multiplicative complexity for radix-2(p) algorithms. The methodology is described for radix-2, radix-2 (2) and radix-2 (3) algorithms. Results show that radix-2 (2) and radix-2 (3) have significantly less computational complexity compared with radix-2. Another interesting result is that while the number of complex multiplications in radix-2 (3) algorithm is slightly more than radix-2 (2), the number of real multiplications for radix-2 (3) is less than radix-2 (2). This is because of the twiddle factors in the form of which need less number of real multiplications and are more frequent in radix-2 (3) algorithm.
Peer-to-peer architectures for exascale computing : LDRD final report.
Vorobeychik, Yevgeniy; Mayo, Jackson R.; Minnich, Ronald G.; Armstrong, Robert C.; Rudish, Donald W.
2010-09-01
The goal of this research was to investigate the potential for employing dynamic, decentralized software architectures to achieve reliability in future high-performance computing platforms. These architectures, inspired by peer-to-peer networks such as botnets that already scale to millions of unreliable nodes, hold promise for enabling scientific applications to run usefully on next-generation exascale platforms ({approx} 10{sup 18} operations per second). Traditional parallel programming techniques suffer rapid deterioration of performance scaling with growing platform size, as the work of coping with increasingly frequent failures dominates over useful computation. Our studies suggest that new architectures, in which failures are treated as ubiquitous and their effects are considered as simply another controllable source of error in a scientific computation, can remove such obstacles to exascale computing for certain applications. We have developed a simulation framework, as well as a preliminary implementation in a large-scale emulation environment, for exploration of these 'fault-oblivious computing' approaches. High-performance computing (HPC) faces a fundamental problem of increasing total component failure rates due to increasing system sizes, which threaten to degrade system reliability to an unusable level by the time the exascale range is reached ({approx} 10{sup 18} operations per second, requiring of order millions of processors). As computer scientists seek a way to scale system software for next-generation exascale machines, it is worth considering peer-to-peer (P2P) architectures that are already capable of supporting 10{sup 6}-10{sup 7} unreliable nodes. Exascale platforms will require a different way of looking at systems and software because the machine will likely not be available in its entirety for a meaningful execution time. Realistic estimates of failure rates range from a few times per day to more than once per hour for these platforms. P2
1983-01-01
COMPUTER ALGORITHM USED IN COMPUTING THE FINAL W 15/16 CONSTANT 0.7 ATA OXYGEN PARTIAL . PERFORING ORG. REPORT MUNDER PRESSURE DECOMPRESSION TABLES 7...earlier Model Parameter Input Files bad only one subfile which could then be read and printed before an end of file is encounted and the program stops
Mixed-radix Algorithm for the Computation of Forward and Inverse MDCT
Wu, Jiasong; Shu, Huazhong; Senhadji, Lotfi; Luo, Limin
2008-01-01
The modified discrete cosine transform (MDCT) and inverse MDCT (IMDCT) are two of the most computational intensive operations in MPEG audio coding standards. A new mixed-radix algorithm for efficient computing the MDCT/IMDCT is presented. The proposed mixed-radix MDCT algorithm is composed of two recursive algorithms. The first algorithm, called the radix-2 decimation in frequency (DIF) algorithm, is obtained by decomposing an N-point MDCT into two MDCTs with the length N/2. The second algorithm, called the radix-3 decimation in time (DIT) algorithm, is obtained by decomposing an N-point MDCT into three MDCTs with the length N/3. Since the proposed MDCT algorithm is also expressed in the form of a simple sparse matrix factorization, the corresponding IMDCT algorithm can be easily derived by simply transposing the matrix factorization. Comparison of the proposed algorithm with some existing ones shows that our proposed algorithm is more suitable for parallel implementation and especially suitable for the layer III of MPEG-1 and MPEG-2 audio encoding and decoding. Moreover, the proposed algorithm can be easily extended to the multidimensional case by using the vector-radix method. PMID:21258639
NASA Astrophysics Data System (ADS)
Zhang, Leihong; Liang, Dong; Li, Bei; Kang, Yi; Pan, Zilan; Zhang, Dawei; Gao, Xiumin; Ma, Xiuhua
2016-07-01
On the basis of analyzing the cosine light field with determined analytic expression and the pseudo-inverse method, the object is illuminated by a presetting light field with a determined discrete Fourier transform measurement matrix, and the object image is reconstructed by the pseudo-inverse method. The analytic expression of the algorithm of computational ghost imaging based on discrete Fourier transform measurement matrix is deduced theoretically, and compared with the algorithm of compressive computational ghost imaging based on random measurement matrix. The reconstruction process and the reconstruction error are analyzed. On this basis, the simulation is done to verify the theoretical analysis. When the sampling measurement number is similar to the number of object pixel, the rank of discrete Fourier transform matrix is the same as the one of the random measurement matrix, the PSNR of the reconstruction image of FGI algorithm and PGI algorithm are similar, the reconstruction error of the traditional CGI algorithm is lower than that of reconstruction image based on FGI algorithm and PGI algorithm. As the decreasing of the number of sampling measurement, the PSNR of reconstruction image based on FGI algorithm decreases slowly, and the PSNR of reconstruction image based on PGI algorithm and CGI algorithm decreases sharply. The reconstruction time of FGI algorithm is lower than that of other algorithms and is not affected by the number of sampling measurement. The FGI algorithm can effectively filter out the random white noise through a low-pass filter and realize the reconstruction denoising which has a higher denoising capability than that of the CGI algorithm. The FGI algorithm can improve the reconstruction accuracy and the reconstruction speed of computational ghost imaging.
Algorithm development for Maxwell's equations for computational electromagnetism
NASA Technical Reports Server (NTRS)
Goorjian, Peter M.
1990-01-01
A new algorithm has been developed for solving Maxwell's equations for the electromagnetic field. It solves the equations in the time domain with central, finite differences. The time advancement is performed implicitly, using an alternating direction implicit procedure. The space discretization is performed with finite volumes, using curvilinear coordinates with electromagnetic components along those directions. Sample calculations are presented of scattering from a metal pin, a square and a circle to demonstrate the capabilities of the new algorithm.
Fast algorithm for automatically computing Strahler stream order
Lanfear, Kenneth J.
1990-01-01
An efficient algorithm was developed to determine Strahler stream order for segments of stream networks represented in a Geographic Information System (GIS). The algorithm correctly assigns Strahler stream order in topologically complex situations such as braided streams and multiple drainage outlets. Execution time varies nearly linearly with the number of stream segments in the network. This technique is expected to be particularly useful for studying the topology of dense stream networks derived from digital elevation model data.
Efficient Algorithms for Computing Stackelberg Strategies in Security Games
2012-05-30
Korzhyk, Ondrej Vanek , Vincent Conitzer, Michal Pechoucek, Milind Tambe. A double oracle algorithm for zero-sum security games on graphs, Proceedings...average over many randomly drawn games, the benefits from commitment tend to be much less extreme. In another AAMAS paper (Jain, Korzhyk, Vanek ...Korzhyk, Ondrej Vanek , Vincent Conitzer, Michal Pe- choucek, and Milind Tambe. A double oracle algorithm for zero-sum security games on graphs. In
Topics in Computational Learning Theory and Graph Algorithms.
ERIC Educational Resources Information Center
Board, Raymond Acton
This thesis addresses problems from two areas of theoretical computer science. The first area is that of computational learning theory, which is the study of the phenomenon of concept learning using formal mathematical models. The goal of computational learning theory is to investigate learning in a rigorous manner through the use of techniques…
Multi-Rate Digital Control Systems with Simulation Applications. Volume II. Computer Algorithms
1980-09-01
34 ~AFWAL-TR-80-31 01 • • Volume II L IL MULTI-RATE DIGITAL CONTROL SYSTEMS WITH SIMULATiON APPLICATIONS Volume II: Computer Algorithms DENNIS G. J...29 Ma -8 - Volume II. Computer Algorithms ~ / ’+ 44MWLxkQT N Uwe ~~ 4 ~jjskYIF336l5-79-C-369~ 9. PER~rORMING ORGANIZATION NAME AND ADDRESS IPROG AMEL...additional options. The analytical basis for the computer algorithms is discussed in Ref. 12. However, to provide a complete description of the program, some
Nascov, Victor; Logofătu, Petre Cătălin
2009-08-01
We describe a fast computational algorithm able to evaluate the Rayleigh-Sommerfeld diffraction formula, based on a special formulation of the convolution theorem and the fast Fourier transform. What is new in our approach compared to other algorithms is the use of a more general type of convolution with a scale parameter, which allows for independent sampling intervals in the input and output computation windows. Comparison between the calculations made using our algorithm and direct numeric integration show a very good agreement, while the computation speed is increased by orders of magnitude.
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2014-01-01
This report presents an example of the application of multi-criteria decision analysis to the selection of an architecture for a safety-critical distributed computer system. The design problem includes constraints on minimum system availability and integrity, and the decision is based on the optimal balance of power, weight and cost. The analysis process includes the generation of alternative architectures, evaluation of individual decision criteria, and the selection of an alternative based on overall value. In this example presented here, iterative application of the quantitative evaluation process made it possible to deliberately generate an alternative architecture that is superior to all others regardless of the relative importance of cost.
NASA Astrophysics Data System (ADS)
Rueda, Antonio J.; Noguera, José M.; Luque, Adrián
2016-02-01
In recent years GPU computing has gained wide acceptance as a simple low-cost solution for speeding up computationally expensive processing in many scientific and engineering applications. However, in most cases accelerating a traditional CPU implementation for a GPU is a non-trivial task that requires a thorough refactorization of the code and specific optimizations that depend on the architecture of the device. OpenACC is a promising technology that aims at reducing the effort required to accelerate C/C++/Fortran code on an attached multicore device. Virtually with this technology the CPU code only has to be augmented with a few compiler directives to identify the areas to be accelerated and the way in which data has to be moved between the CPU and GPU. Its potential benefits are multiple: better code readability, less development time, lower risk of errors and less dependency on the underlying architecture and future evolution of the GPU technology. Our aim with this work is to evaluate the pros and cons of using OpenACC against native GPU implementations in computationally expensive hydrological applications, using the classic D8 algorithm of O'Callaghan and Mark for river network extraction as case-study. We implemented the flow accumulation step of this algorithm in CPU, using OpenACC and two different CUDA versions, comparing the length and complexity of the code and its performance with different datasets. We advance that although OpenACC can not match the performance of a CUDA optimized implementation (×3.5 slower in average), it provides a significant performance improvement against a CPU implementation (×2-6) with by far a simpler code and less implementation effort.
NASA Astrophysics Data System (ADS)
Nakamura, Kazuhiro; Shimazaki, Ryo; Yamamoto, Masatoshi; Takagi, Kazuyoshi; Takagi, Naofumi
This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.
NASA Astrophysics Data System (ADS)
Reif, John H.; Tyagi, Akhilesh
1997-10-01
Optical-computing technology offers new challenges to algorithm designers since it can perform an n -point discrete Fourier transform (DFT) computation in only unit time. Note that the DFT is a nontrivial computation in the parallel random-access machine model, a model of computing commonly used by parallel-algorithm designers. We develop two new models, the DFT VLSIO (very-large-scale integrated optics) and the DFT circuit, to capture this characteristic of optical computing. We also provide two paradigms for developing parallel algorithms in these models. Efficient parallel algorithms for many problems, including polynomial and matrix computations, sorting, and string matching, are presented. The sorting and string-matching algorithms are particularly noteworthy. Almost all these algorithms are within a polylog factor of the optical-computing (VLSIO) lower bounds derived by Barakat and Reif Appl. Opt. 26, 1015 (1987) and by Tyagi and Reif Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing (Institute of Electrical and Electronics Engineers, New York, 1990) p. 14 .
NASA Astrophysics Data System (ADS)
Vecharynski, Eugene; Yang, Chao; Pask, John E.
2015-06-01
We present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh-Ritz calculations compared to existing algorithms such as the locally optimal block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.
Reconciling fault-tolerant distributed algorithms and real-time computing.
Moser, Heinrich; Schmid, Ulrich
We present generic transformations, which allow to translate classic fault-tolerant distributed algorithms and their correctness proofs into a real-time distributed computing model (and vice versa). Owing to the non-zero-time, non-preemptible state transitions employed in our real-time model, scheduling and queuing effects (which are inherently abstracted away in classic zero step-time models, sometimes leading to overly optimistic time complexity results) can be accurately modeled. Our results thus make fault-tolerant distributed algorithms amenable to a sound real-time analysis, without sacrificing the wealth of algorithms and correctness proofs established in classic distributed computing research. By means of an example, we demonstrate that real-time algorithms generated by transforming classic algorithms can be competitive even w.r.t. optimal real-time algorithms, despite their comparatively simple real-time analysis.
Impact of Multiscale Retinex Computation on Performance of Segmentation Algorithms
NASA Technical Reports Server (NTRS)
Rahman, Zia-ur; Jobson, Daniel J.; Woodell, Glenn A.; Hines, Glenn D.
2004-01-01
Classical segmentation algorithms subdivide an image into its constituent components based upon some metric that defines commonality between pixels. Often, these metrics incorporate some measure of "activity" in the scene, e.g. the amount of detail that is in a region. The Multiscale Retinex with Color Restoration (MSRCR) is a general purpose, non-linear image enhancement algorithm that significantly affects the brightness, contrast and sharpness within an image. In this paper, we will analyze the impact the MSRCR has on segmentation results and performance.
Computer program for fast Karhunen Loeve transform algorithm
NASA Technical Reports Server (NTRS)
Jain, A. K.
1976-01-01
The fast KL transform algorithm was applied for data compression of a set of four ERTS multispectral images and its performance was compared with other techniques previously studied on the same image data. The performance criteria used here are mean square error and signal to noise ratio. The results obtained show a superior performance of the fast KL transform coding algorithm on the data set used with respect to the above stated perfomance criteria. A summary of the results is given in Chapter I and details of comparisons and discussion on conclusions are given in Chapter IV.
1986-10-01
these theorems to find steady-state solutions of Markov chains are analysed. The results obtained in this way are then applied to quasi birth-death processes. Keywords: computations; algorithms; equalibrium equations.
Fast computing global structural balance in signed networks based on memetic algorithm
NASA Astrophysics Data System (ADS)
Sun, Yixiang; Du, Haifeng; Gong, Maoguo; Ma, Lijia; Wang, Shanfeng
2014-12-01
Structural balance is a large area of study in signed networks, and it is intrinsically a global property of the whole network. Computing global structural balance in signed networks, which has attracted some attention in recent years, is to measure how unbalanced a signed network is and it is a nondeterministic polynomial-time hard problem. Many approaches are developed to compute global balance. However, the results obtained by them are partial and unsatisfactory. In this study, the computation of global structural balance is solved as an optimization problem by using the Memetic Algorithm. The optimization algorithm, named Meme-SB, is proposed to optimize an evaluation function, energy function, which is used to compute a distance to exact balance. Our proposed algorithm combines Genetic Algorithm and a greedy strategy as the local search procedure. Experiments on social and biological networks show the excellent effectiveness and efficiency of the proposed method.
A New Computer Algorithm for Simultaneous Test Construction of Two-Stage and Multistage Testing.
ERIC Educational Resources Information Center
Wu, Ing-Long
2001-01-01
Presents two binary programming models with a special network structure that can be explored computationally for simultaneous test construction. Uses an efficient special purpose network algorithm to solve these models. An empirical study illustrates the approach. (SLD)
NASA Technical Reports Server (NTRS)
Neal, L.
1981-01-01
A simple numerical algorithm was developed for use in computer simulations of systems which are both stiff and stable. The method is implemented in subroutine form and applied to the simulation of physiological systems.
A comparison of computational methods and algorithms for the complex gamma function
NASA Technical Reports Server (NTRS)
Ng, E. W.
1974-01-01
A survey and comparison of some computational methods and algorithms for gamma and log-gamma functions of complex arguments are presented. Methods and algorithms reported include Chebyshev approximations, Pade expansion and Stirling's asymptotic series. The comparison leads to the conclusion that Algorithm 421 published in the Communications of ACM by H. Kuki is the best program either for individual application or for the inclusion in subroutine libraries.
Toward a scalable quantum computing architecture with mixed species ion chains
NASA Astrophysics Data System (ADS)
Wright, John; Auchter, Carolyn; Chou, Chen-Kuan; Graham, Richard D.; Noel, Thomas W.; Sakrejda, Tomasz; Zhou, Zichao; Blinov, Boris B.
2016-12-01
We report on progress toward implementing mixed ion species quantum information processing for a scalable ion-trap architecture. Mixed species chains may help solve several problems with scaling ion-trap quantum computation to large numbers of qubits. Initial temperature measurements of linear Coulomb crystals containing barium and ytterbium ions indicate that the mass difference does not significantly impede cooling at low ion numbers. Average motional occupation numbers are estimated to be bar{n} ≈ 130 quanta per mode for chains with small numbers of ions, which is within a factor of three of the Doppler limit for barium ions in our trap. We also discuss generation of ion-photon entanglement with barium ions with a fidelity of F ≥ 0.84, which is an initial step towards remote ion-ion coupling in a more scalable quantum information architecture. Further, we are working to implement these techniques in surface traps in order to exercise greater control over ion chain ordering and positioning.
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws
NASA Technical Reports Server (NTRS)
Cooke, Daniel; Rushton, Nelson
2013-01-01
With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less
Embedded assessment algorithms within home-based cognitive computer game exercises for elders.
Jimison, Holly; Pavel, Misha
2006-01-01
With the recent consumer interest in computer-based activities designed to improve cognitive performance, there is a growing need for scientific assessment algorithms to validate the potential contributions of cognitive exercises. In this paper, we present a novel methodology for incorporating dynamic cognitive assessment algorithms within computer games designed to enhance cognitive performance. We describe how this approach works for variety of computer applications and describe cognitive monitoring results for one of the computer game exercises. The real-time cognitive assessments also provide a control signal for adapting the difficulty of the game exercises and providing tailored help for elders of varying abilities.
Physics and computer architecture informed improvements to the Implicit Monte Carlo method
NASA Astrophysics Data System (ADS)
Long, Alex Roberts
The Implicit Monte Carlo (IMC) method has been a standard method for thermal radiative transfer for the past 40 years. In this time, the hydrodynamics methods that are coupled to IMC have evolved and improved, as have the supercomputers used to run large simulations with IMC. Several modern hydrodynamics methods use unstructured non-orthogonal meshes and high-order spatial discretizations. The IMC method has been used primarily with simple Cartesian meshes and always has a first order spatial discretization. Supercomputers are now made up of compute nodes that have a large number of cores. Current IMC parallel methods have significant problems with load imbalance. To utilize many core systems, algorithms must move beyond simple spatial decomposition parallel algorithms. To make IMC better suited for large scale multiphysics simulations in high energy density physics, new spatial discretizations and parallel strategies are needed. Several modifications are made to the IMC method to facilitate running on node-centered, unstructured tetrahedral meshes. These modifications produce results that converge to the expected solution under mesh refinement. A new finite element IMC method is also explored on these meshes, which offer a simulation runtime benefit but does not perform correctly in the diffusion limit. A parallel algorithm that utilizes on-node parallelism and respects memory hierarchies is studied. This method scales almost linearly when using physical cores on a node and benefits from multiple threads per core. A multi-compute node algorithm for domain decomposed IMC that passes mesh data instead of particles is explored as a means to solve load balance issues. This method scales better than the particle passing method on highly scattering problems with short time steps.
A simple algorithm for computing positively weighted straight skeletons of monotone polygons.
Biedl, Therese; Held, Martin; Huber, Stefan; Kaaser, Dominik; Palfrader, Peter
2015-02-01
We study the characteristics of straight skeletons of monotone polygonal chains and use them to devise an algorithm for computing positively weighted straight skeletons of monotone polygons. Our algorithm runs in [Formula: see text] time and [Formula: see text] space, where n denotes the number of vertices of the polygon.
ERIC Educational Resources Information Center
Avancena, Aimee Theresa; Nishihara, Akinori; Vergara, John Paul
2012-01-01
This paper presents the online cognitive and algorithm tests, which were developed in order to determine if certain cognitive factors and fundamental algorithms correlate with the performance of students in their introductory computer science course. The tests were implemented among Management Information Systems majors from the Philippines and…
A new fast algorithm for computing a complex number: Theoretic transforms
NASA Technical Reports Server (NTRS)
Reed, I. S.; Liu, K. Y.; Truong, T. K.
1977-01-01
A high-radix fast Fourier transformation (FFT) algorithm for computing transforms over GF(sq q), where q is a Mersenne prime, is developed to implement fast circular convolutions. This new algorithm requires substantially fewer multiplications than the conventional FFT.
Rosa, Massimiliano; Warsa, James S; Perks, Michael
2010-12-14
We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the S{sub n} transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine{trademark} (Cell/B.E.). LAPACK routines have been ported to the Cell/B.E. in order to make use of its parallel Synergistic Processing Elements (SPEs). The bGS algorithm is based on the LU factorization and solution of a linear system that couples the fluxes for all S{sub n} angles and energy groups on a mesh cell. For every cell of a mesh that has been parallel decomposed on the higher-level Opteron processors, a linear system is transferred to the Cell/B.E. and the parallel LAPACK routines are used to compute a solution, which is then transferred back to the Opteron, where the rest of the computations for the S{sub n} transport problem take place. Compared to standard parallel machines, a hundred-fold speedup of the bGS was observed on the hybrid Roadrunner architecture. Numerical experiments with strong and weak parallel scaling demonstrate the bGS method is viable and compares favorably to full parallel sweeps (FPS) on two-dimensional, unstructured meshes when it is applied to optically thick, multi-material problems. As expected, however, it is not as efficient as FPS in optically thin problems.
1987-12-01
algorithms studied are a research network developed by K. Brayer of the Mitre Corporation, Digital Equipment Corporation’s Digital Network Architecture...developed by K. Brayer at Rome Air Devel- opment Center. The second network is Digital Equipment Corporation’s (DEC) Digital Network Architecture (DNA...Network. K. Brayer of the Mitre Corporation developed a research packet switch system that is loop-free and survivable. His algorithm is divided into a
1984-06-06
Iterative ReusinUnfodin * Algorithms And Computer Codes to Find More Apropriate Neutron Spectra L A. LOWRY AND T. L. JOHNSON Healt Plvwlcs S June 6, 1984...Classification) Modifications to Iterative Recursion Unfolding Algorithms and Computer Codes to Find More Appropriate Neutron Spectra 18. SUBJECT TERMS... TO FIND MORE APPROPRIATE NEUTRON SPECTRA INTRODUCTION The unfolding of neutron spectra using data from activation foils, Bonner spheres, or other
2014-01-01
Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel
The Reliability of Diagnoses by Technician, Computer, and Algorithm.
ERIC Educational Resources Information Center
Johnson, James H.; And Others
1980-01-01
Describes a computer assisted system for intake assessment. Reports on two experiments that compared the reliability of a diagnostic procedure that involves technicians, a structured interview schedule, and a computerized diagnostic program with diagnoses made by clinicians. Results show the computer assisted technician approach is as reliable as…
Simple and Effective Algorithms: Computer-Adaptive Testing.
ERIC Educational Resources Information Center
Linacre, John Michael
Computer-adaptive testing (CAT) allows improved security, greater scoring accuracy, shorter testing periods, quicker availability of results, and reduced guessing and other undesirable test behavior. Simple approaches can be applied by the classroom teacher, or other content specialist, who possesses simple computer equipment and elementary…
Timing formulas for dissection algorithms on vector computers
NASA Technical Reports Server (NTRS)
Poole, W. G., Jr.
1977-01-01
The use of the finite element and finite difference methods often leads to the problem of solving large, sparse, positive definite systems of linear equations. MACSYMA plays a major role in the generation of formulas representing the time required for execution of the dissection algorithms. The use of MACSYMA in the generation of those formulas is described.
Integrated Computer-Aided Manufacturing (ICAM) Architecture. Part 3. Volume 7. MFG01 Glossary.
1983-09-01
lat oe C % 000 Sj C4 le ’ 4, 4,4 2:4 tj t*K l- X: 44 0’~ k IW- C4 % 40 Go r 00 c ~ IL P v C c -. . v IL zl - 4 1 S . v I .I a w I 1 v c u > c- -FT U 4...CC c .C C aI o , 0 A Z . 6--c 0 cCrV 46 a .0 b. 1 c 1 . Ga, v 0 M 0 44 c c c f-V - 4 0 aLS C C - . l-l - 2 - 0 S- C (~ b , J O’~ 0 .4. hSV b... ~ b...RD-R144 426 INTEGRATED COMPUTER-AIDED MANUFACTURING (ICAM)_ 1 /3 ARCHITECTURE PART 3 VOLUME.. (U) SOFTECH INC WALTHAM MRR HEINE ET RL. SEP 83 RFWRL-TR
Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A
2013-11-05
Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.
NASA Astrophysics Data System (ADS)
Cary, John R.; Abell, D.; Amundson, J.; Bruhwiler, D. L.; Busby, R.; Carlsson, J. A.; Dimitrov, D. A.; Kashdan, E.; Messmer, P.; Nieter, C.; Smithe, D. N.; Spentzouris, P.; Stoltz, P.; Trines, R. M.; Wang, H.; Werner, G. R.
2006-09-01
As the size and cost of particle accelerators escalate, high-performance computing plays an increasingly important role; optimization through accurate, detailed computermodeling increases performance and reduces costs. But consequently, computer simulations face enormous challenges. Early approximation methods, such as expansions in distance from the design orbit, were unable to supply detailed accurate results, such as in the computation of wake fields in complex cavities. Since the advent of message-passing supercomputers with thousands of processors, earlier approximations are no longer necessary, and it is now possible to compute wake fields, the effects of dampers, and self-consistent dynamics in cavities accurately. In this environment, the focus has shifted towards the development and implementation of algorithms that scale to large numbers of processors. So-called charge-conserving algorithms evolve the electromagnetic fields without the need for any global solves (which are difficult to scale up to many processors). Using cut-cell (or embedded) boundaries, these algorithms can simulate the fields in complex accelerator cavities with curved walls. New implicit algorithms, which are stable for any time-step, conserve charge as well, allowing faster simulation of structures with details small compared to the characteristic wavelength. These algorithmic and computational advances have been implemented in the VORPAL7 Framework, a flexible, object-oriented, massively parallel computational application that allows run-time assembly of algorithms and objects, thus composing an application on the fly.
Arnold, Susan F; Ramachandran, Gurumurthy
2014-01-01
This study evaluated the influence of parameter values and variances and model architecture on modeled exposures, and identified important data gaps that influence lack-of-knowledge-related uncertainty, using Consexpo 4.1 as an illustrative case study. Understanding the influential determinants in exposure estimates enables more informed and appropriate use of this model and the resulting exposure estimates. In exploring the influence of parameter placement in an algorithm and of the values and variances chosen to characterize the parameters within ConsExpo, "sensitive" and "important" parameters were identified: product amount, weight fraction, exposure duration, exposure time, and ventilation rate were deemed "important," or "always sensitive." With this awareness, exposure assessors can strategically focus on acquiring the most robust estimates for these parameters. ConsExpo relies predominantly on three algorithms to assess the default scenarios: inhalation vapors evaporation equation using the Langmuir mass transfer, the dermal instant application with diffusion through the skin, and the oral ingestion by direct uptake algorithm. These algorithms, which do not necessarily render health conservative estimates, account for 87, 89 and 59% of the inhalation, dermal and oral default scenario assessments,respectively, according them greater influence relative to the less frequently used algorithms. Default data provided in ConsExpo may be useful to initiate assessments, but are insufficient for determining exposure acceptability or setting policy, as parameters defined by highly uncertain values produce biased estimates that may not be health conservative. Furthermore, this lack-of-knowledge uncertainty makes the magnitude of this bias uncertain. Significant data gaps persist for product amount, exposure time, and exposure duration. These "important" parameters exert influence in requiring broad values and variances to account for their uncertainty. Prioritizing
Markov Algorithms for Computing the Reliability of Staged Networks.
1986-04-01
on an IBM Personal Computer AT, to calculated Pst for the dodecahedron network of Fig. 3 and the grid network of Fig. 4. The computation time was 41/2...network used in [1], which is in effect a dodecahedron reduced by 3 nodes and 5 arcs, Bailey and Kulkarni report timings of 54 minutes, 8 minutes and...staging reduced the computing time, from the 52 minutes quoted previously, to 1 minute 58 seconds. A similar use of overlapping stages for the dodecahedron
ERIC Educational Resources Information Center
Farid, Ayman A.; Zaghloul, Weaam M.; Dewidar, Khaled M.
2014-01-01
The great shift in sustainability and computer aided design in the field of architecture caused a remarkable change in the architecture philosophy, new aspects of beauty and aesthetic values are being introduced, and traditional definitions for beauty cannot fully cover this aspects, which causes a gap between; new architecture works criticism and…
1986-11-29
Madison, Wiscon- sin, August 1982. [161 Fitzpatrick, D. T., Foderaro, J. K., Katevenis, M . G. H., Landman, H. A.. Patterson, D. A., Peek, J. B ., Peshkess...October 18-22, 1982. [33] Levitan , S. P., Parallel Algorithms and Architectures: A Programmer’s Per- 35 AN I%. . m ,,-1we, V .r V . , - .7...e. . . e. ** -! ~ * ~ - . . . . . 0.Wty C^11Cri m . op~ bo* pa, U FILE- copy(4 REPORT DOCUMENTATION PAGE e PQTSIC%.RSTV C6AUSIPCATION 16
Williams, P.T.
1993-09-01
As the field of computational fluid dynamics (CFD) continues to mature, algorithms are required to exploit the most recent advances in approximation theory, numerical mathematics, computing architectures, and hardware. Meeting this requirement is particularly challenging in incompressible fluid mechanics, where primitive-variable CFD formulations that are robust, while also accurate and efficient in three dimensions, remain an elusive goal. This dissertation asserts that one key to accomplishing this goal is recognition of the dual role assumed by the pressure, i.e., a mechanism for instantaneously enforcing conservation of mass and a force in the mechanical balance law for conservation of momentum. Proving this assertion has motivated the development of a new, primitive-variable, incompressible, CFD algorithm called the Continuity Constraint Method (CCM). The theoretical basis for the CCM consists of a finite-element spatial semi-discretization of a Galerkin weak statement, equal-order interpolation for all state-variables, a 0-implicit time-integration scheme, and a quasi-Newton iterative procedure extended by a Taylor Weak Statement (TWS) formulation for dispersion error control. Original contributions to algorithmic theory include: (a) formulation of the unsteady evolution of the divergence error, (b) investigation of the role of non-smoothness in the discretized continuity-constraint function, (c) development of a uniformly H{sup 1} Galerkin weak statement for the Reynolds-averaged Navier-Stokes pressure Poisson equation, (d) derivation of physically and numerically well-posed boundary conditions, and (e) investigation of sparse data structures and iterative methods for solving the matrix algebra statements generated by the algorithm.
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chen, C. L.
1989-01-01
Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
NASA Astrophysics Data System (ADS)
Cheng, Jie-Zhi; Ni, Dong; Chou, Yi-Hong; Qin, Jing; Tiu, Chui-Mei; Chang, Yeun-Chung; Huang, Chiun-Sheng; Shen, Dinggang; Chen, Chung-Ming
2016-04-01
This paper performs a comprehensive study on the deep-learning-based computer-aided diagnosis (CADx) for the differential diagnosis of benign and malignant nodules/lesions by avoiding the potential errors caused by inaccurate image processing results (e.g., boundary segmentation), as well as the classification bias resulting from a less robust feature set, as involved in most conventional CADx algorithms. Specifically, the stacked denoising auto-encoder (SDAE) is exploited on the two CADx applications for the differentiation of breast ultrasound lesions and lung CT nodules. The SDAE architecture is well equipped with the automatic feature exploration mechanism and noise tolerance advantage, and hence may be suitable to deal with the intrinsically noisy property of medical image data from various imaging modalities. To show the outperformance of SDAE-based CADx over the conventional scheme, two latest conventional CADx algorithms are implemented for comparison. 10 times of 10-fold cross-validations are conducted to illustrate the efficacy of the SDAE-based CADx algorithm. The experimental results show the significant performance boost by the SDAE-based CADx algorithm over the two conventional methods, suggesting that deep learning techniques can potentially change the design paradigm of the CADx systems without the need of explicit design and selection of problem-oriented features.
Cheng, Jie-Zhi; Ni, Dong; Chou, Yi-Hong; Qin, Jing; Tiu, Chui-Mei; Chang, Yeun-Chung; Huang, Chiun-Sheng; Shen, Dinggang; Chen, Chung-Ming
2016-04-15
This paper performs a comprehensive study on the deep-learning-based computer-aided diagnosis (CADx) for the differential diagnosis of benign and malignant nodules/lesions by avoiding the potential errors caused by inaccurate image processing results (e.g., boundary segmentation), as well as the classification bias resulting from a less robust feature set, as involved in most conventional CADx algorithms. Specifically, the stacked denoising auto-encoder (SDAE) is exploited on the two CADx applications for the differentiation of breast ultrasound lesions and lung CT nodules. The SDAE architecture is well equipped with the automatic feature exploration mechanism and noise tolerance advantage, and hence may be suitable to deal with the intrinsically noisy property of medical image data from various imaging modalities. To show the outperformance of SDAE-based CADx over the conventional scheme, two latest conventional CADx algorithms are implemented for comparison. 10 times of 10-fold cross-validations are conducted to illustrate the efficacy of the SDAE-based CADx algorithm. The experimental results show the significant performance boost by the SDAE-based CADx algorithm over the two conventional methods, suggesting that deep learning techniques can potentially change the design paradigm of the CADx systems without the need of explicit design and selection of problem-oriented features.