Tactical Operations Analysis Support Facility.
1981-05-01
Punch/Reader 2 DMC-11AR DDCMP Micro Processor 2 DMC-11DA Network Link Line Unit 2 DL-11E Async Serial Line Interface 4 Intel IN-1670 448K Words MOS Memory...86 5.3 VIRTUAL PROCESSORS - VAX-11/750 ........................... 89 5.4 A RELATIONAL DATA MANAGEMENT SYSTEM - ORACLE...Central Processing Unit (CPU) is a 16 bit processor for high-speed, real time applications, and for large multi-user, multi- task, time shared
Next Generation Space Telescope Integrated Science Module Data System
NASA Technical Reports Server (NTRS)
Schnurr, Richard G.; Greenhouse, Matthew A.; Jurotich, Matthew M.; Whitley, Raymond; Kalinowski, Keith J.; Love, Bruce W.; Travis, Jeffrey W.; Long, Knox S.
1999-01-01
The Data system for the Next Generation Space Telescope (NGST) Integrated Science Module (ISIM) is the primary data interface between the spacecraft, telescope, and science instrument systems. This poster includes block diagrams of the ISIM data system and its components derived during the pre-phase A Yardstick feasibility study. The poster details the hardware and software components used to acquire and process science data for the Yardstick instrument compliment, and depicts the baseline external interfaces to science instruments and other systems. This baseline data system is a fully redundant, high performance computing system. Each redundant computer contains three 150 MHz power PC processors. All processors execute a commercially available real time multi-tasking operating system supporting, preemptive multi-tasking, file management and network interfaces. These six processors in the system are networked together. The spacecraft interface baseline is an extension of the network, which links the six processors. The final selection for Processor busses, processor chips, network interfaces, and high-speed data interfaces will be made during mid 2002.
Neurovision processor for designing intelligent sensors
NASA Astrophysics Data System (ADS)
Gupta, Madan M.; Knopf, George K.
1992-03-01
A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Page, Andrew J.; Keane, Thomas M.; Naughton, Thomas J.
2010-01-01
We present a multi-heuristic evolutionary task allocation algorithm to dynamically map tasks to processors in a heterogeneous distributed system. It utilizes a genetic algorithm, combined with eight common heuristics, in an effort to minimize the total execution time. It operates on batches of unmapped tasks and can preemptively remap tasks to processors. The algorithm has been implemented on a Java distributed system and evaluated with a set of six problems from the areas of bioinformatics, biomedical engineering, computer science and cryptography. Experiments using up to 150 heterogeneous processors show that the algorithm achieves better efficiency than other state-of-the-art heuristic algorithms. PMID:20862190
Safe and Efficient Support for Embeded Multi-Processors in ADA
NASA Astrophysics Data System (ADS)
Ruiz, Jose F.
2010-08-01
New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.
Parallel evolution of image processing tools for multispectral imagery
NASA Astrophysics Data System (ADS)
Harvey, Neal R.; Brumby, Steven P.; Perkins, Simon J.; Porter, Reid B.; Theiler, James P.; Young, Aaron C.; Szymanski, John J.; Bloch, Jeffrey J.
2000-11-01
We describe the implementation and performance of a parallel, hybrid evolutionary-algorithm-based system, which optimizes image processing tools for feature-finding tasks in multi-spectral imagery (MSI) data sets. Our system uses an integrated spatio-spectral approach and is capable of combining suitably-registered data from different sensors. We investigate the speed-up obtained by parallelization of the evolutionary process via multiple processors (a workstation cluster) and develop a model for prediction of run-times for different numbers of processors. We demonstrate our system on Landsat Thematic Mapper MSI , covering the recent Cerro Grande fire at Los Alamos, NM, USA.
Parallelising a molecular dynamics algorithm on a multi-processor workstation
NASA Astrophysics Data System (ADS)
Müller-Plathe, Florian
1990-12-01
The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.
Efficiency of static core turn-off in a system-on-a-chip with variation
Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong
2013-10-29
A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.
CQPSO scheduling algorithm for heterogeneous multi-core DAG task model
NASA Astrophysics Data System (ADS)
Zhai, Wenzheng; Hu, Yue-Li; Ran, Feng
2017-07-01
Efficient task scheduling is critical to achieve high performance in a heterogeneous multi-core computing environment. The paper focuses on the heterogeneous multi-core directed acyclic graph (DAG) task model and proposes a novel task scheduling method based on an improved chaotic quantum-behaved particle swarm optimization (CQPSO) algorithm. A task priority scheduling list was built. A processor with minimum cumulative earliest finish time (EFT) was acted as the object of the first task assignment. The task precedence relationships were satisfied and the total execution time of all tasks was minimized. The experimental results show that the proposed algorithm has the advantage of optimization abilities, simple and feasible, fast convergence, and can be applied to the task scheduling optimization for other heterogeneous and distributed environment.
Energy Efficient Real-Time Scheduling Using DPM on Mobile Sensors with a Uniform Multi-Cores
Kim, Youngmin; Lee, Chan-Gun
2017-01-01
In wireless sensor networks (WSNs), sensor nodes are deployed for collecting and analyzing data. These nodes use limited energy batteries for easy deployment and low cost. The use of limited energy batteries is closely related to the lifetime of the sensor nodes when using wireless sensor networks. Efficient-energy management is important to extending the lifetime of the sensor nodes. Most effort for improving power efficiency in tiny sensor nodes has focused mainly on reducing the power consumed during data transmission. However, recent emergence of sensor nodes equipped with multi-cores strongly requires attention to be given to the problem of reducing power consumption in multi-cores. In this paper, we propose an energy efficient scheduling method for sensor nodes supporting a uniform multi-cores. We extend the proposed T-Ler plane based scheduling for global optimal scheduling of a uniform multi-cores and multi-processors to enable power management using dynamic power management. In the proposed approach, processor selection for a scheduling and mapping method between the tasks and processors is proposed to efficiently utilize dynamic power management. Experiments show the effectiveness of the proposed approach compared to other existing methods. PMID:29240695
Image matrix processor for fast multi-dimensional computations
Roberson, George P.; Skeate, Michael F.
1996-01-01
An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm
NASA Astrophysics Data System (ADS)
Backes, Werner; Wetzel, Susanne
In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Real-time SHVC software decoding with multi-threaded parallel processing
NASA Astrophysics Data System (ADS)
Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu
2014-09-01
This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.
Using all of your CPU's in HIPE
NASA Astrophysics Data System (ADS)
Jacobson, J. D.; Fadda, D.
2012-09-01
Modern computer architectures increasingly feature multi-core CPU's. For example, the MacbookPro features the Intel quad-core i7 processors. Through the use of hyper-threading, where each core can execute two threads simultaneously, the quad-core i7 can support eight simultaneous processing threads. All this on your laptop! This CPU power can now be put into service by scientists to perform data reduction tasks, but only if the software has been designed to take advantage of the multiple processor architectures. Up to now, software written for Herschel data reduction (HIPE), written in Jython and JAVA, is single-threaded and can only utilize a single processor. Users of HIPE do not get any advantage from the additional processors. Why not put all of the CPU resources to work reducing your data? We present a multi-threaded software application that corrects long-term transients in the signal from the PACS unchopped spectroscopy line scan mode. In this poster, we present a multi-threaded software framework to achieve performance improvements from parallel execution. We will show how a task to correct transients in the PACS Spectroscopy Pipeline for the un-chopped line scan mode, has been threaded. This computation-intensive task uses either a one-parameter or a three parameter exponential function, to characterize the transient. The task uses a JAVA implementation of Minpack, translated from the C (Moshier) and IDL (Markwardt) by the authors, to optimize the correction parameters. We also explain how to determine if a task can benefit from threading (Amdahl's Law), and if it is safe to thread. The design and implementation, using the JAVA concurrency package completions service is described. Pitfalls, timing bugs, thread safety, resource control, testing and performance improvements are described and plotted.
Progress Towards a Rad-Hydro Code for Modern Computing Architectures LA-UR-10-02825
NASA Astrophysics Data System (ADS)
Wohlbier, J. G.; Lowrie, R. B.; Bergen, B.; Calef, M.
2010-11-01
We are entering an era of high performance computing where data movement is the overwhelming bottleneck to scalable performance, as opposed to the speed of floating-point operations per processor. All multi-core hardware paradigms, whether heterogeneous or homogeneous, be it the Cell processor, GPGPU, or multi-core x86, share this common trait. In multi-physics applications such as inertial confinement fusion or astrophysics, one may be solving multi-material hydrodynamics with tabular equation of state data lookups, radiation transport, nuclear reactions, and charged particle transport in a single time cycle. The algorithms are intensely data dependent, e.g., EOS, opacity, nuclear data, and multi-core hardware memory restrictions are forcing code developers to rethink code and algorithm design. For the past two years LANL has been funding a small effort referred to as Multi-Physics on Multi-Core to explore ideas for code design as pertaining to inertial confinement fusion and astrophysics applications. The near term goals of this project are to have a multi-material radiation hydrodynamics capability, with tabular equation of state lookups, on cartesian and curvilinear block structured meshes. In the longer term we plan to add fully implicit multi-group radiation diffusion and material heat conduction, and block structured AMR. We will report on our progress to date.
Design of the ANTARES LCM-DAQ board test bench using a FPGA-based system-on-chip approach
NASA Astrophysics Data System (ADS)
Anvar, S.; Kestener, P.; Le Provost, H.
2006-11-01
The System-on-Chip (SoC) approach consists in using state-of-the-art FPGA devices with embedded RISC processor cores, high-speed differential LVDS links and ready-to-use multi-gigabit transceivers allowing development of compact systems with substantial number of IO channels. Required performances are obtained through a subtle separation of tasks between closely cooperating programmable hardware logic and user-friendly software environment. We report about our experience in using the SoC approach for designing the production test bench of the off-shore readout system for the ANTARES neutrino experiment.
Image matrix processor for fast multi-dimensional computations
Roberson, G.P.; Skeate, M.F.
1996-10-15
An apparatus for multi-dimensional computation is disclosed which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination. 10 figs.
Robotic Attention Processing And Its Application To Visual Guidance
NASA Astrophysics Data System (ADS)
Barth, Matthew; Inoue, Hirochika
1988-03-01
This paper describes a method of real-time visual attention processing for robots performing visual guidance. This robot attention processing is based on a novel vision processor, the multi-window vision system that was developed at the University of Tokyo. The multi-window vision system is unique in that it only processes visual information inside local area windows. These local area windows are quite flexible in their ability to move anywhere on the visual screen, change their size and shape, and alter their pixel sampling rate. By using these windows for specific attention tasks, it is possible to perform high speed attention processing. The primary attention skills of detecting motion, tracking an object, and interpreting an image are all performed at high speed on the multi-window vision system. A basic robotic attention scheme using the attention skills was developed. The attention skills involved detection and tracking of salient visual features. The tracking and motion information thus obtained was utilized in producing the response to the visual stimulus. The response of the attention scheme was quick enough to be applicable to the real-time vision processing tasks of playing a video 'pong' game, and later using an automobile driving simulator. By detecting the motion of a 'ball' on a video screen and then tracking the movement, the attention scheme was able to control a 'paddle' in order to keep the ball in play. The response was faster than that of a human's, allowing the attention scheme to play the video game at higher speeds. Further, in the application to the driving simulator, the attention scheme was able to control both direction and velocity of a simulated vehicle following a lead car. These two applications show the potential of local visual processing in its use for robotic attention processing.
Foerster, Rebecca M.; Carbone, Elena; Schneider, Werner X.
2014-01-01
Evidence for long-term memory (LTM)-based control of attention has been found during the execution of highly practiced multi-step tasks. However, does LTM directly control for attention or are working memory (WM) processes involved? In the present study, this question was investigated with a dual-task paradigm. Participants executed either a highly practiced visuospatial sensorimotor task (speed stacking) or a verbal task (high-speed poem reciting), while maintaining visuospatial or verbal information in WM. Results revealed unidirectional and domain-specific interference. Neither speed stacking nor high-speed poem reciting was influenced by WM retention. Stacking disrupted the retention of visuospatial locations, but did not modify memory performance of verbal material (letters). Reciting reduced the retention of verbal material substantially whereas it affected the memory performance of visuospatial locations to a smaller degree. We suggest that the selection of task-relevant information from LTM for the execution of overlearned multi-step tasks recruits domain-specific WM. PMID:24847304
An embedded multi-core parallel model for real-time stereo imaging
NASA Astrophysics Data System (ADS)
He, Wenjing; Hu, Jian; Niu, Jingyu; Li, Chuanrong; Liu, Guangyu
2018-04-01
The real-time processing based on embedded system will enhance the application capability of stereo imaging for LiDAR and hyperspectral sensor. The task partitioning and scheduling strategies for embedded multiprocessor system starts relatively late, compared with that for PC computer. In this paper, aimed at embedded multi-core processing platform, a parallel model for stereo imaging is studied and verified. After analyzing the computing amount, throughout capacity and buffering requirements, a two-stage pipeline parallel model based on message transmission is established. This model can be applied to fast stereo imaging for airborne sensors with various characteristics. To demonstrate the feasibility and effectiveness of the parallel model, a parallel software was designed using test flight data, based on the 8-core DSP processor TMS320C6678. The results indicate that the design performed well in workload distribution and had a speed-up ratio up to 6.4.
Multi-Core Processor Memory Contention Benchmark Analysis Case Study
NASA Technical Reports Server (NTRS)
Simon, Tyler; McGalliard, James
2009-01-01
Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard
2004-09-09
Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Fault-Tolerant, Real-Time, Multi-Core Computer System
NASA Technical Reports Server (NTRS)
Gostelow, Kim P.
2012-01-01
A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2015-04-01
PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
NASA Astrophysics Data System (ADS)
Lapotre, Vianney; Gogniat, Guy; Baghdadi, Amer; Diguet, Jean-Philippe
2017-12-01
The multiplication of connected devices goes along with a large variety of applications and traffic types needing diverse requirements. Accompanying this connectivity evolution, the last years have seen considerable evolutions of wireless communication standards in the domain of mobile telephone networks, local/wide wireless area networks, and Digital Video Broadcasting (DVB). In this context, intensive research has been conducted to provide flexible turbo decoder targeting high throughput, multi-mode, multi-standard, and power consumption efficiency. However, flexible turbo decoder implementations have not often considered dynamic reconfiguration issues in this context that requires high speed configuration switching. Starting from this assessment, this paper proposes the first solution that allows frame-by-frame run-time configuration management of a multi-processor turbo decoder without compromising the decoding performances.
Full-Authority Fault-Tolerant Electronic Engine Control System for Variable Cycle Engines.
1982-04-01
single internally self-checked VLSI micro - processor . The selected configuration is an externally checked pair of com- mercially available...Electronic Engine Control FPMH Failures per Million Hours FTMP Fault Tolerant Multi- Processor FTSC Fault Tolerant Spaceborn Computer GRAMP Generalized...Removal * MTBR Mean Time Between Repair MTTF Mean Time to Failure xiii List of Abbreviations (continued) - NH High Pressure Rotor Speed O&S Operating
Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Interactive high-resolution isosurface ray casting on multicore processors.
Wang, Qin; JaJa, Joseph
2008-01-01
We present a new method for the interactive rendering of isosurfaces using ray casting on multi-core processors. This method consists of a combination of an object-order traversal that coarsely identifies possible candidate 3D data blocks for each small set of contiguous pixels, and an isosurface ray casting strategy tailored for the resulting limited-size lists of candidate 3D data blocks. While static screen partitioning is widely used in the literature, our scheme performs dynamic allocation of groups of ray casting tasks to ensure almost equal loads among the different threads running on multi-cores while maintaining spatial locality. We also make careful use of memory management environment commonly present in multi-core processors. We test our system on a two-processor Clovertown platform, each consisting of a Quad-Core 1.86-GHz Intel Xeon Processor, for a number of widely different benchmarks. The detailed experimental results show that our system is efficient and scalable, and achieves high cache performance and excellent load balancing, resulting in an overall performance that is superior to any of the previous algorithms. In fact, we achieve an interactive isosurface rendering on a 1024(2) screen for all the datasets tested up to the maximum size of the main memory of our platform.
CMS Readiness for Multi-Core Workload Scheduling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.
In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides amore » solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.« less
CMS readiness for multi-core workload scheduling
NASA Astrophysics Data System (ADS)
Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.; Aftab Khan, F.; Letts, J.; Mason, D.; Verguilov, V.
2017-10-01
In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.
Using Multi-Core Systems for Rover Autonomy
NASA Technical Reports Server (NTRS)
Clement, Brad; Estlin, Tara; Bornstein, Benjamin; Springer, Paul; Anderson, Robert C.
2010-01-01
Task Objectives are: (1) Develop and demonstrate key capabilities for rover long-range science operations using multi-core computing, (a) Adapt three rover technologies to execute on SOA multi-core processor (b) Illustrate performance improvements achieved (c) Demonstrate adapted capabilities with rover hardware, (2) Targeting three high-level autonomy technologies (a) Two for onboard data analysis (b) One for onboard command sequencing/planning, (3) Technologies identified as enabling for future missions, (4)Benefits will be measured along several metrics: (a) Execution time / Power requirements (b) Number of data products processed per unit time (c) Solution quality
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Crosetto, Dario B.
1996-01-01
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Application of Advanced Multi-Core Processor Technologies to Oceanographic Research
2013-09-30
STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the
System, methods and apparatus for program optimization for multi-threaded processor architectures
Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E
2015-01-06
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Multiprocessing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1990-01-01
Very little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPs or more) in computational aerodynamics to significantly improve turnaround time. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, the improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) through multi-tasking is applied via a strategy which requires relatively minor modifications to an existing code for a single processor. Essentially, this approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. The existing single processor code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor. As a demonstration of this approach, a Multiple Processor Multiple Grid (MPMG) code is developed. It is capable of using nine processors, and can be easily extended to a larger number of processors. This code solves the three-dimensional, Reynolds averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. The solver is applied to generic oblique-wing aircraft problem on a four processor Cray-2 computer. A tricubic interpolation scheme is developed to increase the accuracy of coupling of overlapped grids. For the oblique-wing aircraft problem, a speedup of two in elapsed (turnaround) time is observed in a saturated time-sharing environment.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.
1991-01-01
A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
High speed real-time wavefront processing system for a solid-state laser system
NASA Astrophysics Data System (ADS)
Liu, Yuan; Yang, Ping; Chen, Shanqiu; Ma, Lifang; Xu, Bing
2008-03-01
A high speed real-time wavefront processing system for a solid-state laser beam cleanup system has been built. This system consists of a core2 Industrial PC (IPC) using Linux and real-time Linux (RT-Linux) operation system (OS), a PCI image grabber, a D/A card. More often than not, the phase aberrations of the output beam from solid-state lasers vary fast with intracavity thermal effects and environmental influence. To compensate the phase aberrations of solid-state lasers successfully, a high speed real-time wavefront processing system is presented. Compared to former systems, this system can improve the speed efficiently. In the new system, the acquisition of image data, the output of control voltage data and the implementation of reconstructor control algorithm are treated as real-time tasks in kernel-space, the display of wavefront information and man-machine conversation are treated as non real-time tasks in user-space. The parallel processing of real-time tasks in Symmetric Multi Processors (SMP) mode is the main strategy of improving the speed. In this paper, the performance and efficiency of this wavefront processing system are analyzed. The opened-loop experimental results show that the sampling frequency of this system is up to 3300Hz, and this system can well deal with phase aberrations from solid-state lasers.
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Tumeo, Antonino; Secchi, Simone
Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
Parallel and fault-tolerant algorithms for hypercube multiprocessors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aykanat, C.
1988-01-01
Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
System Architecture For High Speed Sorting Of Potatoes
NASA Astrophysics Data System (ADS)
Marchant, J. A.; Onyango, C. M.; Street, M. J.
1989-03-01
This paper illustrates an industrial application of vision processing in which potatoes are sorted according to their size and shape at speeds of up to 40 objects per second. The result is a multi-processing approach built around the VME bus. A hardware unit has been designed and constructed to encode the boundary of the potatoes, to reducing the amount of data to be processed. A master 68000 processor is used to control this unit and to handle data transfers along the bus. Boundary data is passed to one of three 68010 slave processors each responsible for a line of potatoes across a conveyor belt. The slave processors calculate attributes such as shape, size and estimated weight of each potato and the master processor uses this data to operate the sorting mechanism. The system has been interfaced with a commercial grading machine and performance trials are now in progress.
Simplifying and speeding the management of intra-node cache coherence
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Phillip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2012-04-17
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Gonzales, Joaquin U; James, C Roger; Yang, Hyung Suk; Jensen, Daniel; Atkins, Lee; Al-Khalil, Kareem; O'Boyle, Michael
2017-05-01
Central arterial hemodynamics is associated with cognitive impairment. Reductions in gait speed during walking while performing concurrent tasks known as dual-tasking (DT) or multi-tasking (MT) is thought to reflect the cognitive cost that exceeds neural capacity to share resources. We hypothesized that central vascular function would associate with decrements in gait speed during DT or MT. Gait speed was measured using a motion capture system in 56 women (30-80y) without mild-cognitive impairment. Dual-tasking was considered walking at a fast-pace while balancing a tray. Multi-tasking was the DT condition plus subtracting by serial 7's. Applanation tonometry was used for measurement of aortic stiffness and central pulse pressure. Doppler-ultrasound was used to measure blood flow velocity and β-stiffness index in the common carotid artery. The percent change in gait speed was larger for MT than DT (14.1±11.2 vs. 8.7±9.6%, p <0.01). Tertiles were formed based on the percent change in gait speed for each condition. No vascular parameters differed across tertiles for DT. In contrast, carotid flow pulsatility (1.85±0.43 vs. 1.47±0.42, p=0.02) and resistance (0.75±0.07 vs. 0.68±0.07, p=0.01) indices were higher in women with more decrement (third tertile) as compared to women with less decrement (first tertile) in gait speed during MT after adjusting for age, gait speed, and task error. Carotid pulse pressure and β-stiffness did not contribute to these tertile differences. Elevated carotid flow pulsatility and resistance are characteristics found in healthy women that show lower cognitive capacity to walk and perform multiple concurrent tasks. Copyright © 2017 Elsevier B.V. All rights reserved.
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less
Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V
2010-06-01
Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Halvarsson, Alexandra; Franzén, Erika; Ståhle, Agneta
2015-04-01
To evaluate the effects of a balance training program including dual- and multi-task exercises on fall-related self-efficacy, fear of falling, gait and balance performance, and physical function in older adults with osteoporosis with an increased risk of falling and to evaluate whether additional physical activity would further improve the effects. Randomized controlled trial, including three groups: two intervention groups (Training, or Training+Physical activity) and one Control group, with a 12-week follow-up. Stockholm County, Sweden. Ninety-six older adults, aged 66-87, with verified osteoporosis. A specific and progressive balance training program including dual- and multi-task three times/week for 12 weeks, and physical activity for 30 minutes, three times/week. Fall-related self-efficacy (Falls Efficacy Scale-International), fear of falling (single-item question - 'In general, are you afraid of falling?'), gait speed with and without a cognitive dual-task at preferred pace and fast walking (GAITRite®), balance performance tests (one-leg stance, and modified figure of eight), and physical function (Late-Life Function and Disability Instrument). Both intervention groups significantly improved their fall-related self-efficacy as compared to the controls (p ≤ 0.034, 4 points) and improved their balance performance. Significant differences over time and between groups in favour of the intervention groups were found for walking speed with a dual-task (p=0.003), at fast walking speed (p=0.008), and for advanced lower extremity physical function (p=0.034). This balance training program, including dual- and multi-task, improves fall-related self-efficacy, gait speed, balance performance, and physical function in older adults with osteoporosis. © The Author(s) 2014.
Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks
Kim, Deokho; Park, Karam; Ro, Won W.
2011-01-01
While network coding is well known for its efficiency and usefulness in wireless sensor networks, the excessive costs associated with decoding computation and complexity still hinder its adoption into practical use. On the other hand, high-performance microprocessors with heterogeneous multi-cores would be used as processing nodes of the wireless sensor networks in the near future. To this end, this paper introduces an efficient network coding algorithm developed for the heterogenous multi-core processors. The proposed idea is fully tested on one of the currently available heterogeneous multi-core processors referred to as the Cell Broadband Engine. PMID:22164053
Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline.
Zhang, Jie; Li, Qingyang; Caselli, Richard J; Thompson, Paul M; Ye, Jieping; Wang, Yalin
2017-06-01
Alzheimer's Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms.
Development of land based radar polarimeter processor system
NASA Technical Reports Server (NTRS)
Kronke, C. W.; Blanchard, A. J.
1983-01-01
The processing subsystem of a land based radar polarimeter was designed and constructed. This subsystem is labeled the remote data acquisition and distribution system (RDADS). The radar polarimeter, an experimental remote sensor, incorporates the RDADS to control all operations of the sensor. The RDADS uses industrial standard components including an 8-bit microprocessor based single board computer, analog input/output boards, a dynamic random access memory board, and power supplis. A high-speed digital electronics board was specially designed and constructed to control range-gating for the radar. A complete system of software programs was developed to operate the RDADS. The software uses a powerful real time, multi-tasking, executive package as an operating system. The hardware and software used in the RDADS are detailed. Future system improvements are recommended.
Cheung, Kit; Schultz, Simon R; Luk, Wayne
2015-01-01
NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.
Cheung, Kit; Schultz, Simon R.; Luk, Wayne
2016-01-01
NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542
Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline
Zhang, Jie; Li, Qingyang; Caselli, Richard J.; Thompson, Paul M.; Ye, Jieping; Wang, Yalin
2017-01-01
Alzheimer’s Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms. PMID:28943731
Optimization of the coherence function estimation for multi-core central processing unit
NASA Astrophysics Data System (ADS)
Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.
2017-02-01
The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response
NASA Technical Reports Server (NTRS)
Abdi, Frank
1996-01-01
A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
Implementing TCP/IP and a socket interface as a server in a message-passing operating system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hipp, E.; Wiltzius, D.
1990-03-01
The UNICOS 4.3BSD network code and socket transport interface are the basis of an explicit network server for NLTSS, a message passing operating system on the Cray YMP. A BSD socket user library provides access to the network server using an RPC mechanism. The advantages of this server methodology are its modularity and extensibility to migrate to future protocol suites (e.g. OSI) and transport interfaces. In addition, the network server is implemented in an explicit multi-tasking environment to take advantage of the Cray YMP multi-processor platform. 19 refs., 5 figs.
Genetic Parallel Programming: design and implementation.
Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong
2006-01-01
This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
Multi-stage learning aids applied to hands-on software training.
Rother, Kristian; Rother, Magdalena; Pleus, Alexandra; Upmeier zu Belzen, Annette
2010-11-01
Delivering hands-on tutorials on bioinformatics software and web applications is a challenging didactic scenario. The main reason is that trainees have heterogeneous backgrounds, different previous knowledge and vary in learning speed. In this article, we demonstrate how multi-stage learning aids can be used to allow all trainees to progress at a similar speed. In this technique, the trainees can utilize cards with hints and answers to guide themselves self-dependently through a complex task. We have successfully conducted a tutorial for the molecular viewer PyMOL using two sets of learning aid cards. The trainees responded positively, were able to complete the task, and the trainer had spare time to respond to individual questions. This encourages us to conclude that multi-stage learning aids overcome many disadvantages of established forms of hands-on software training.
T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors.
Kim, Youngmin; Lee, Ki-Seong; Pham, Ngoc-Son; Lee, Sun-Ro; Lee, Chan-Gun
2016-07-08
Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction.
NASA Technical Reports Server (NTRS)
Jacklin, S. A.; Leyland, J. A.; Warmbrodt, W.
1985-01-01
Modern control systems must typically perform real-time identification and control, as well as coordinate a host of other activities related to user interaction, online graphics, and file management. This paper discusses five global design considerations which are useful to integrate array processor, multimicroprocessor, and host computer system architectures into versatile, high-speed controllers. Such controllers are capable of very high control throughput, and can maintain constant interaction with the nonreal-time or user environment. As an application example, the architecture of a high-speed, closed-loop controller used to actively control helicopter vibration is briefly discussed. Although this system has been designed for use as the controller for real-time rotorcraft dynamics and control studies in a wind tunnel environment, the controller architecture can generally be applied to a wide range of automatic control applications.
Multitask neurovision processor with extensive feedback and feedforward connections
NASA Astrophysics Data System (ADS)
Gupta, Madan M.; Knopf, George K.
1991-11-01
A multi-task neuro-vision parameter which performs a variety of information processing operations associated with the early stages of biological vision is presented. The network architecture of this neuro-vision processor, called the positive-negative (PN) neural processor, is loosely based on the neural activity fields exhibited by thalamic and cortical nervous tissue layers. The computational operation performed by the processor arises from the strength of the recurrent feedback among the numerous positive and negative neural computing units. By adjusting the feedback connections it is possible to generate diverse dynamic behavior that may be used for short-term visual memory (STVM), spatio-temporal filtering (STF), and pulse frequency modulation (PFM). The information attributes that are to be processes may be regulated by modifying the feedforward connections from the signal space to the neural processor.
Managing coherence via put/get windows
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2011-01-11
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2012-02-21
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Scheldrup, Melissa; Greenwood, Pamela M.; McKendrick, Ryan; Strohl, Jon; Bikson, Marom; Alam, Mahtab; McKinley, R. Andy; Parasuraman, Raja
2014-01-01
There is a need to facilitate acquisition of real world cognitive multi-tasks that require long periods of training (e.g., air traffic control, intelligence analysis, medicine). Non-invasive brain stimulation—specifically transcranial Direct Current Stimulation (tDCS)—has promise as a method to speed multi-task training. We hypothesized that during acquisition of the complex multi-task Space Fortress, subtasks that require focused attention on ship control would benefit from tDCS aimed at the dorsal attention network while subtasks that require redirection of attention would benefit from tDCS aimed at the right hemisphere ventral attention network. We compared effects of 30 min prefrontal and parietal stimulation to right and left hemispheres on subtask performance during the first 45 min of training. The strongest effects both overall and for ship flying (control and velocity subtasks) were seen with a right parietal (C4, reference to left shoulder) montage, shown by modeling to induce an electric field that includes nodes in both dorsal and ventral attention networks. This is consistent with the re-orienting hypothesis that the ventral attention network is activated along with the dorsal attention network if a new, task-relevant event occurs while visuospatial attention is focused (Corbetta et al., 2008). No effects were seen with anodes over sites that stimulated only dorsal (C3) or only ventral (F10) attention networks. The speed subtask (update memory for symbols) benefited from an F9 anode over left prefrontal cortex. These results argue for development of tDCS as a training aid in real world settings where multi-tasking is critical. PMID:25249958
Scheldrup, Melissa; Greenwood, Pamela M; McKendrick, Ryan; Strohl, Jon; Bikson, Marom; Alam, Mahtab; McKinley, R Andy; Parasuraman, Raja
2014-01-01
There is a need to facilitate acquisition of real world cognitive multi-tasks that require long periods of training (e.g., air traffic control, intelligence analysis, medicine). Non-invasive brain stimulation-specifically transcranial Direct Current Stimulation (tDCS)-has promise as a method to speed multi-task training. We hypothesized that during acquisition of the complex multi-task Space Fortress, subtasks that require focused attention on ship control would benefit from tDCS aimed at the dorsal attention network while subtasks that require redirection of attention would benefit from tDCS aimed at the right hemisphere ventral attention network. We compared effects of 30 min prefrontal and parietal stimulation to right and left hemispheres on subtask performance during the first 45 min of training. The strongest effects both overall and for ship flying (control and velocity subtasks) were seen with a right parietal (C4, reference to left shoulder) montage, shown by modeling to induce an electric field that includes nodes in both dorsal and ventral attention networks. This is consistent with the re-orienting hypothesis that the ventral attention network is activated along with the dorsal attention network if a new, task-relevant event occurs while visuospatial attention is focused (Corbetta et al., 2008). No effects were seen with anodes over sites that stimulated only dorsal (C3) or only ventral (F10) attention networks. The speed subtask (update memory for symbols) benefited from an F9 anode over left prefrontal cortex. These results argue for development of tDCS as a training aid in real world settings where multi-tasking is critical.
DOE Office of Scientific and Technical Information (OSTI.GOV)
C. Cuevas, B. Raydo, H. Dong, A. Gupta, F.J. Barbosa, J. Wilson, W.M. Taylor, E. Jastrzembski, D. Abbott
We will demonstrate a hardware and firmware solution for a complete fully pipelined multi-crate trigger system that takes advantage of the elegant high speed VXS serial extensions for VME. This trigger system includes three sections starting with the front end crate trigger processor (CTP), a global Sub-System Processor (SSP) and a Trigger Supervisor that manages the timing, synchronization and front end event readout. Within a front end crate, trigger information is gathered from each 16 Channel, 12 bit Flash ADC module at 4 nS intervals via the VXS backplane, to a Crate Trigger Processor (CTP). Each Crate Trigger Processor receivesmore » these 500 MB/S VXS links from the 16 FADC-250 modules, aligns skewed data inherent of Aurora protocol, and performs real time crate level trigger algorithms. The algorithm results are encoded using a Reed-Solomon technique and transmission of this Level 1 trigger data is sent to the SSP using a multi-fiber link. The multi-fiber link achieves an aggregate trigger data transfer rate to the global trigger at 8 Gb/s. The SSP receives and decodes Reed-Solomon error correcting transmission from each crate, aligns the data, and performs the global level trigger algorithms. The entire trigger system is synchronous and operates at 250 MHz with the Trigger Supervisor managing not only the front end event readout, but also the distribution of the critical timing clocks, synchronization signals, and the global trigger signals to each front end readout crate. These signals are distributed to the front end crates on a separate fiber link and each crate is synchronized using a unique encoding scheme to guarantee that each front end crate is synchronous with a fixed latency, independent of the distance between each crate. The overall trigger signal latency is <3 uS, and the proposed 12GeV experiments at Jefferson Lab require up to 200KHz Level 1 trigger rate.« less
Real-time correction of beamforming time delay errors in abdominal ultrasound imaging
NASA Astrophysics Data System (ADS)
Rigby, K. W.
2000-04-01
The speed of sound varies with tissue type, yet commercial ultrasound imagers assume a constant sound speed. Sound speed variation in abdominal fat and muscle layers is widely believed to be largely responsible for poor contrast and resolution in some patients. The simplest model of the abdominal wall assumes that it adds a spatially varying time delay to the ultrasound wavefront. The adequacy of this model is controversial. We describe an adaptive imaging system consisting of a GE LOGIQ 700 imager connected to a multi- processor computer. Arrival time errors for each beamforming channel, estimated by correlating each channel signal with the beamsummed signal, are used to correct the imager's beamforming time delays at the acoustic frame rate. A multi- row transducer provides two-dimensional sampling of arrival time errors. We observe significant improvement in abdominal images of healthy male volunteers: increased contrast of blood vessels, increased visibility of the renal capsule, and increased brightness of the liver.
Dynamic Task Allocation in Multi-Hop Multimedia Wireless Sensor Networks with Low Mobility
Jin, Yichao; Vural, Serdar; Gluhak, Alexander; Moessner, Klaus
2013-01-01
This paper presents a task allocation-oriented framework to enable efficient in-network processing and cost-effective multi-hop resource sharing for dynamic multi-hop multimedia wireless sensor networks with low node mobility, e.g., pedestrian speeds. The proposed system incorporates a fast task reallocation algorithm to quickly recover from possible network service disruptions, such as node or link failures. An evolutional self-learning mechanism based on a genetic algorithm continuously adapts the system parameters in order to meet the desired application delay requirements, while also achieving a sufficiently long network lifetime. Since the algorithm runtime incurs considerable time delay while updating task assignments, we introduce an adaptive window size to limit the delay periods and ensure an up-to-date solution based on node mobility patterns and device processing capabilities. To the best of our knowledge, this is the first study that yields multi-objective task allocation in a mobile multi-hop wireless environment under dynamic conditions. Simulations are performed in various settings, and the results show considerable performance improvement in extending network lifetime compared to heuristic mechanisms. Furthermore, the proposed framework provides noticeable reduction in the frequency of missing application deadlines. PMID:24135992
A universal computer control system for motors
NASA Technical Reports Server (NTRS)
Szakaly, Zoltan F. (Inventor)
1991-01-01
A control system for a multi-motor system such as a space telerobot, having a remote computational node and a local computational node interconnected with one another by a high speed data link is described. A Universal Computer Control System (UCCS) for the telerobot is located at each node. Each node is provided with a multibus computer system which is characterized by a plurality of processors with all processors being connected to a common bus, and including at least one command processor. The command processor communicates over the bus with a plurality of joint controller cards. A plurality of direct current torque motors, of the type used in telerobot joints and telerobot hand-held controllers, are connected to the controller cards and responds to digital control signals from the command processor. Essential motor operating parameters are sensed by analog sensing circuits and the sensed analog signals are converted to digital signals for storage at the controller cards where such signals can be read during an address read/write cycle of the command processing processor.
Stamatakis, Alexandros; Ott, Michael
2008-12-27
The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on 'gappy' multi-gene alignments. By 'gappy' we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAXML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.
Multi-level Hierarchical Poly Tree computer architectures
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug
1990-01-01
Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Experiments with a Parallel Multi-Objective Evolutionary Algorithm for Scheduling
NASA Technical Reports Server (NTRS)
Brown, Matthew; Johnston, Mark D.
2013-01-01
Evolutionary multi-objective algorithms have great potential for scheduling in those situations where tradeoffs among competing objectives represent a key requirement. One challenge, however, is runtime performance, as a consequence of evolving not just a single schedule, but an entire population, while attempting to sample the Pareto frontier as accurately and uniformly as possible. The growing availability of multi-core processors in end user workstations, and even laptops, has raised the question of the extent to which such hardware can be used to speed up evolutionary algorithms. In this paper we report on early experiments in parallelizing a Generalized Differential Evolution (GDE) algorithm for scheduling long-range activities on NASA's Deep Space Network. Initial results show that significant speedups can be achieved, but that performance does not necessarily improve as more cores are utilized. We describe our preliminary results and some initial suggestions from parallelizing the GDE algorithm. Directions for future work are outlined.
An Application-Based Performance Characterization of the Columbia Supercluster
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Djomehri, Jahed M.; Hood, Robert; Jin, Hoaqiang; Kiris, Cetin; Saini, Subhash
2005-01-01
Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as the second-fastest computer in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floating-point performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and the InfiniBand hold promise for application scaling to a large number of processors.
Managing coherence via put/get windows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A; Chen, Dong; Coteus, Paul W
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an areamore » of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.« less
Real-time video compressing under DSP/BIOS
NASA Astrophysics Data System (ADS)
Chen, Qiu-ping; Li, Gui-ju
2009-10-01
This paper presents real-time MPEG-4 Simple Profile video compressing based on the DSP processor. The programming framework of video compressing is constructed using TMS320C6416 Microprocessor, TDS510 simulator and PC. It uses embedded real-time operating system DSP/BIOS and the API functions to build periodic function, tasks and interruptions etcs. Realize real-time video compressing. To the questions of data transferring among the system. Based on the architecture of the C64x DSP, utilized double buffer switched and EDMA data transfer controller to transit data from external memory to internal, and realize data transition and processing at the same time; the architecture level optimizations are used to improve software pipeline. The system used DSP/BIOS to realize multi-thread scheduling. The whole system realizes high speed transition of a great deal of data. Experimental results show the encoder can realize real-time encoding of 768*576, 25 frame/s video images.
Equalizer: a scalable parallel rendering framework.
Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato
2009-01-01
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
Li, Xiangyu; Xie, Nijie; Tian, Xinyue
2017-01-01
This paper proposes a scheduling and power management solution for energy harvesting heterogeneous multi-core WSN node SoC such that the system continues to operate perennially and uses the harvested energy efficiently. The solution consists of a heterogeneous multi-core system oriented task scheduling algorithm and a low-complexity dynamic workload scaling and configuration optimization algorithm suitable for light-weight platforms. Moreover, considering the power consumption of most WSN applications have the characteristic of data dependent behavior, we introduce branches handling mechanism into the solution as well. The experimental result shows that the proposed algorithm can operate in real-time on a lightweight embedded processor (MSP430), and that it can make a system do more valuable works and make more than 99.9% use of the power budget. PMID:28208730
Li, Xiangyu; Xie, Nijie; Tian, Xinyue
2017-02-08
This paper proposes a scheduling and power management solution for energy harvesting heterogeneous multi-core WSN node SoC such that the system continues to operate perennially and uses the harvested energy efficiently. The solution consists of a heterogeneous multi-core system oriented task scheduling algorithm and a low-complexity dynamic workload scaling and configuration optimization algorithm suitable for light-weight platforms. Moreover, considering the power consumption of most WSN applications have the characteristic of data dependent behavior, we introduce branches handling mechanism into the solution as well. The experimental result shows that the proposed algorithm can operate in real-time on a lightweight embedded processor (MSP430), and that it can make a system do more valuable works and make more than 99.9% use of the power budget.
Distributed micro-radar system for detection and tracking of low-profile, low-altitude targets
NASA Astrophysics Data System (ADS)
Gorwara, Ashok; Molchanov, Pavlo
2016-05-01
Proposed airborne surveillance radar system can detect, locate, track, and classify low-profile, low-altitude targets: from traditional fixed and rotary wing aircraft to non-traditional targets like unmanned aircraft systems (drones) and even small projectiles. Distributed micro-radar system is the next step in the development of passive monopulse direction finder proposed by Stephen E. Lipsky in the 80s. To extend high frequency limit and provide high sensitivity over the broadband of frequencies, multiple angularly spaced directional antennas are coupled with front end circuits and separately connected to a direction finder processor by a digital interface. Integration of antennas with front end circuits allows to exclude waveguide lines which limits system bandwidth and creates frequency dependent phase errors. Digitizing of received signals proximate to antennas allows loose distribution of antennas and dramatically decrease phase errors connected with waveguides. Accuracy of direction finding in proposed micro-radar in this case will be determined by time accuracy of digital processor and sampling frequency. Multi-band, multi-functional antennas can be distributed around the perimeter of a Unmanned Aircraft System (UAS) and connected to the processor by digital interface or can be distributed between swarm/formation of mini/micro UAS and connected wirelessly. Expendable micro-radars can be distributed by perimeter of defense object and create multi-static radar network. Low-profile, lowaltitude, high speed targets, like small projectiles, create a Doppler shift in a narrow frequency band. This signal can be effectively filtrated and detected with high probability. Proposed micro-radar can work in passive, monostatic or bistatic regime.
T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors
Kim, Youngmin; Lee, Ki-Seong; Pham, Ngoc-Son; Lee, Sun-Ro; Lee, Chan-Gun
2016-01-01
Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction. PMID:27399722
Multi-Objective Optimization for Trustworthy Tactical Networks: A Survey and Insights
2013-06-01
existing data sources, gathering and maintaining the data needed , and completing and reviewing the collection of information. Send comments regarding...problems: using repeated cooperative games [12], hedonic games [25], and nontransferable utility cooperative games [27]. It should be noted that trust...examined an optimal task allocation problem in a distributed computing system where program modules need to be allocated to different processors to
Spacewire on Earth orbiting scatterometers
NASA Technical Reports Server (NTRS)
Bachmann, Alex; Lang, Minh; Lux, James; Steffke, Richard
2002-01-01
The need for a high speed, reliable and easy to implement communication link has led to the development of a space flight oriented version of IEEE 1355 called SpaceWire. SpaceWire is based on high-speed (200 Mbps) serial point-to-point links using Low Voltage Differential Signaling (LVDS). SpaceWIre has provisions for routing messages between a large network of processors, using wormhole routing for low overhead and latency. {additionally, there are available space qualified hybrids, which provide the Link layer to the user's bus}. A test bed of multiple digital signal processor breadboards, demonstrating the ability to meet signal processing requirements for an orbiting scatterometer has been implemented using three Astrium MCM-DSPs, each breadboard consists of a Multi Chip Module (MCM) that combines a space qualified Digital Signal Processor and peripherals, including IEEE-1355 links. With the addition of appropriate physical layer interfaces and software on the DSP, the SpaceWire link is used to communicate between processors on the test bed, e.g. sending timing references, commands, status, and science data among the processors. Results are presented on development issues surrounding the use of SpaceWire in this environment, from physical layer implementation (cables, connectors, LVDS drivers) to diagnostic tools, driver firmware, and development methodology. The tools, methods, and hardware, software challenges and preliminary performance are investigated and discussed.
Options for Parallelizing a Planning and Scheduling Algorithm
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.
2011-01-01
Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Switch for serial or parallel communication networks
Crosette, D.B.
1994-07-19
A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.
Switch for serial or parallel communication networks
Crosette, Dario B.
1994-01-01
A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.
Development of Parallel Code for the Alaska Tsunami Forecast Model
NASA Astrophysics Data System (ADS)
Bahng, B.; Knight, W. R.; Whitmore, P.
2014-12-01
The Alaska Tsunami Forecast Model (ATFM) is a numerical model used to forecast propagation and inundation of tsunamis generated by earthquakes and other means in both the Pacific and Atlantic Oceans. At the U.S. National Tsunami Warning Center (NTWC), the model is mainly used in a pre-computed fashion. That is, results for hundreds of hypothetical events are computed before alerts, and are accessed and calibrated with observations during tsunamis to immediately produce forecasts. ATFM uses the non-linear, depth-averaged, shallow-water equations of motion with multiply nested grids in two-way communications between domains of each parent-child pair as waves get closer to coastal waters. Even with the pre-computation the task becomes non-trivial as sub-grid resolution gets finer. Currently, the finest resolution Digital Elevation Models (DEM) used by ATFM are 1/3 arc-seconds. With a serial code, large or multiple areas of very high resolution can produce run-times that are unrealistic even in a pre-computed approach. One way to increase the model performance is code parallelization used in conjunction with a multi-processor computing environment. NTWC developers have undertaken an ATFM code-parallelization effort to streamline the creation of the pre-computed database of results with the long term aim of tsunami forecasts from source to high resolution shoreline grids in real time. Parallelization will also permit timely regeneration of the forecast model database with new DEMs; and, will make possible future inclusion of new physics such as the non-hydrostatic treatment of tsunami propagation. The purpose of our presentation is to elaborate on the parallelization approach and to show the compute speed increase on various multi-processor systems.
Chatterjee, Siddhartha [Yorktown Heights, NY; Gunnels, John A [Brewster, NY
2011-11-08
A method and structure of distributing elements of an array of data in a computer memory to a specific processor of a multi-dimensional mesh of parallel processors includes designating a distribution of elements of at least a portion of the array to be executed by specific processors in the multi-dimensional mesh of parallel processors. The pattern of the designating includes a cyclical repetitive pattern of the parallel processor mesh, as modified to have a skew in at least one dimension so that both a row of data in the array and a column of data in the array map to respective contiguous groupings of the processors such that a dimension of the contiguous groupings is greater than one.
A Hybrid Task Graph Scheduler for High Performance Image Processing Workflows.
Blattner, Timothy; Keyrouz, Walid; Bhattacharyya, Shuvra S; Halem, Milton; Brady, Mary
2017-12-01
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3× and 1.8× speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.
Backend Control Processor for a Multi-Processor Relational Database Computer System.
1984-12-01
SCHOOL OF ENGI. UNCRSIFID MPONTIFF DEC 84 AFXT/GCS/ENG/84D-22 F/O 9/2 L ommhhhhmhhml mhhhommhhhhhm i-2 8 -- U0. 11111= Q. 2 111.8IIII- 1111111..6...THESIS Presented to the Faculty of the School of Engineering of the Air Force Institute of Technology Air University In Partial Fulfillment of the...development of a Backend Multi-Processor Relational Database Computer System. This thesis addresses a single component of this system, the Backend Control
L2 Word Recognition: Influence of L1 Orthography on Multi-syllabic Word Recognition.
Hamada, Megumi
2017-10-01
L2 reading research suggests that L1 orthographic experience influences L2 word recognition. Nevertheless, the findings on multi-syllabic words in English are still limited despite the fact that a vast majority of words are multi-syllabic. The study investigated whether L1 orthography influences the recognition of multi-syllabic words, focusing on the position of an embedded word. The participants were Arabic ESL learners, Chinese ESL learners, and native speakers of English. The task was a word search task, in which the participants identified a target word embedded in a pseudoword at the initial, middle, or final position. The search accuracy and speed indicated that all groups showed a strong preference for the initial position. The accuracy data further indicated group differences. The Arabic group showed higher accuracy in the final than middle, while the Chinese group showed the opposite and the native speakers showed no difference between the two positions. The findings suggest that L2 multi-syllabic word recognition involves unique processes.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.
Scharfe, Michael; Pielot, Rainer; Schreiber, Falk
2010-01-11
Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Some Issues in Programming Multi-Mini-Processors
1975-01-01
Hardware ^nd software are to be combined optimally to perform that specialized task. This in essence is the stategy followed by the BBN group in...large memory is directly addressable. MIXED SOLUTIONS The most promising approach appears to involve mixing several of the previous solutions...mini- or micro-computers. Possibly the problem will be solved by avoiding it. Some new minis are appearing on the market now with large physical
Iwao, Yasunori; Kimura, Shin-Ichiro; Ishida, Masayuki; Mise, Ryohei; Yamada, Masaki; Namiki, Noriyuki; Noguchi, Shuji; Itai, Shigeru
2015-01-01
The manufacture of highly drug-loaded fine globular granules eventually applied for orally disintegrating tablets has been investigated using a unique multi-functional rotor processor with acetaminophen, which was used as a model drug substance. Experimental design and statistical analysis were used to evaluate potential relationships between three key operating parameters (i.e., the binder flow rate, atomization pressure and rotating speed) and a series of associated micromeritics (i.e., granule mean size, proportion of fine particles (106-212 µm), flowability, roundness and water content). The results of multiple linear regression analysis revealed several trends, including (1) the binder flow rate and atomization pressure had significant positive and negative effects on the granule mean size value, Carr's flowability index, granular roundness and water content, respectively; (2) the proportion of fine particles was positively affected by the product of interaction between the binder flow rate and atomization pressure; and (3) the granular roundness was negatively and positively affected by the product of interactions between the binder flow rate and the atomization pressure, and the binder flow rate and rotating speed, respectively. The results of this study led to the identification of optimal operating conditions for the preparation of granules, and could therefore be used to provide important information for the development of processes for the manufacture of highly drug-loaded fine globular granules.
Benchmarking NWP Kernels on Multi- and Many-core Processors
NASA Astrophysics Data System (ADS)
Michalakes, J.; Vachharajani, M.
2008-12-01
Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Energy-Efficient Scheduling for Hybrid Tasks in Control Devices for the Internet of Things
Gao, Zhigang; Wu, Yifan; Dai, Guojun; Xia, Haixia
2012-01-01
In control devices for the Internet of Things (IoT), energy is one of the critical restriction factors. Dynamic voltage scaling (DVS) has been proved to be an effective method for reducing the energy consumption of processors. This paper proposes an energy-efficient scheduling algorithm for IoT control devices with hard real-time control tasks (HRCTs) and soft real-time tasks (SRTs). The main contribution of this paper includes two parts. First, it builds the Hybrid tasks with multi-subtasks of different function Weight (HoW) task model for IoT control devices. HoW describes the structure of HRCTs and SRTs, and their properties, e.g., deadlines, execution time, preemption properties, and energy-saving goals, etc. Second, it presents the Hybrid Tasks' Dynamic Voltage Scaling (HTDVS) algorithm. HTDVS first sets the slowdown factors of subtasks while meeting the different real-time requirements of HRCTs and SRTs, and then dynamically reclaims, reserves, and reuses the slack time of the subtasks to meet their ideal energy-saving goals. Experimental results show HTDVS can reduce energy consumption about 10%–80% while meeting the real-time requirements of HRCTs, HRCTs help to reduce the deadline miss ratio (DMR) of systems, and HTDVS has comparable performance with the greedy algorithm and is more favorable to keep the subtasks' ideal speeds. PMID:23112659
Benchmarking GNU Radio Kernels and Multi-Processor Scheduling
2013-01-14
AMD E350 APU , comparable to Atom • ARM Cortex A8 running on a Gumstix Overo on an Ettus USRP E110 The general testing procedure consists of • Build...Intel Atom, and the AMD E350 APU . 3.2 Multi-Processor Scheduling Figure 1: GFLOPs per second through an FFT array on an Intel i7. Example output from
NASA Technical Reports Server (NTRS)
Rincon, Rafael F.
2008-01-01
The reconfigurable L-Band radar is an ongoing development at NASA/GSFC that exploits the capability inherently in phased array radar systems with a state-of-the-art data acquisition and real-time processor in order to enable multi-mode measurement techniques in a single radar architecture. The development leverages on the L-Band Imaging Scatterometer, a radar system designed for the development and testing of new radar techniques; and the custom-built DBSAR processor, a highly reconfigurable, high speed data acquisition and processing system. The radar modes currently implemented include scatterometer, synthetic aperture radar, and altimetry; and plans to add new modes such as radiometry and bi-static GNSS signals are being formulated. This development is aimed at enhancing the radar remote sensing capabilities for airborne and spaceborne applications in support of Earth Science and planetary exploration This paper describes the design of the radar and processor systems, explains the operational modes, and discusses preliminary measurements and future plans.
Huntley, Andrew H; Zettel, John L; Vallis, Lori Ann
2016-01-01
A "reach and transport object" task that represents common activities of daily living may provide improved insight into dynamic postural stability and movement variability deficits in older adults compared to previous lean to reach and functional reach tests. Healthy young and older, community dwelling adults performed three same elevation object transport tasks and two multiple elevation object transport tasks under two self-selected speeds, self-paced and fast-paced. Dynamic postural stability and movement variability was quantified by whole-body center of mass motion. Older adults demonstrated significant decrements in frontal plane stability during the multiple elevation tasks while exhibiting the same movement variability as their younger counterparts, regardless of task speed. Interestingly, older adults did not exhibit a tradeoff in maneuverability in favour of maintaining stability throughout the tasks, as has previously been reported. In conclusion, the multi-planar, ecologically relevant tasks employed in the current study were specific enough to elucidate decrements in dynamic stability, and thus may be useful for assessing fall risk in older adults with suspected postural instability. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Microdot - A Four-Bit Microcontroller Designed for Distributed Low-End Computing in Satellites
NASA Astrophysics Data System (ADS)
2002-03-01
Many satellites are an integrated collection of sensors and actuators that require dedicated real-time control. For single processor systems, additional sensors require an increase in computing power and speed to provide the multi-tasking capability needed to service each sensor. Faster processors cost more and consume more power, which taxes a satellite's power resources and may lead to shorter satellite lifetimes. An alternative design approach is a distributed network of small and low power microcontrollers designed for space that handle the computing requirements of each individual sensor and actuator. The design of microdot, a four-bit microcontroller for distributed low-end computing, is presented. The design is based on previous research completed at the Space Electronics Branch, Air Force Research Laboratory (AFRL/VSSE) at Kirtland AFB, NM, and the Air Force Institute of Technology at Wright-Patterson AFB, OH. The Microdot has 29 instructions and a 1K x 4 instruction memory. The distributed computing architecture is based on the Philips Semiconductor I2C Serial Bus Protocol. A prototype was implemented and tested using an Altera Field Programmable Gate Array (FPGA). The prototype was operable to 9.1 MHz. The design was targeted for fabrication in a radiation-hardened-by-design gate-array cell library for the TSMC 0.35 micrometer CMOS process.
NASA Astrophysics Data System (ADS)
Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav
2017-10-01
In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
Sanbonmatsu, David M; Strayer, David L; Medeiros-Ward, Nathan; Watson, Jason M
2013-01-01
The present study examined the relationship between personality and individual differences in multi-tasking ability. Participants enrolled at the University of Utah completed measures of multi-tasking activity, perceived multi-tasking ability, impulsivity, and sensation seeking. In addition, they performed the Operation Span in order to assess their executive control and actual multi-tasking ability. The findings indicate that the persons who are most capable of multi-tasking effectively are not the persons who are most likely to engage in multiple tasks simultaneously. To the contrary, multi-tasking activity as measured by the Media Multitasking Inventory and self-reported cell phone usage while driving were negatively correlated with actual multi-tasking ability. Multi-tasking was positively correlated with participants' perceived ability to multi-task ability which was found to be significantly inflated. Participants with a strong approach orientation and a weak avoidance orientation--high levels of impulsivity and sensation seeking--reported greater multi-tasking behavior. Finally, the findings suggest that people often engage in multi-tasking because they are less able to block out distractions and focus on a singular task. Participants with less executive control--low scorers on the Operation Span task and persons high in impulsivity--tended to report higher levels of multi-tasking activity.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets
2010-01-01
Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
Early MIMD experience on the CRAY X-MP
NASA Astrophysics Data System (ADS)
Rhoades, Clifford E.; Stevens, K. G.
1985-07-01
This paper describes some early experience with converting four physics simulation programs to the CRAY X-MP, a current Multiple Instruction, Multiple Data (MIMD) computer consisting of two processors each with an architecture similar to that of the CRAY-1. As a multi-processor, the CRAY X-MP together with the high speed Solid-state Storage Device (SSD) in an ideal machine upon which to study MIMD algorithms for solving the equations of mathematical physics because it is fast enough to run real problems. The computer programs used in this study are all FORTRAN versions of original production codes. They range in sophistication from a one-dimensional numerical simulation of collisionless plasma to a two-dimensional hydrodynamics code with heat flow to a couple of three-dimensional fluid dynamics codes with varying degrees of viscous modeling. Early research with a dual processor configuration has shown speed-ups ranging from 1.55 to 1.98. It has been observed that a few simple extensions to FORTRAN allow a typical programmer to achieve a remarkable level of efficiency. These extensions involve the concept of memory local to a concurrent subprogram and memory common to all concurrent subprograms.
Computer Aided Battery Engineering Consortium
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pesaran, Ahmad
A multi-national lab collaborative team was assembled that includes experts from academia and industry to enhance recently developed Computer-Aided Battery Engineering for Electric Drive Vehicles (CAEBAT)-II battery crush modeling tools and to develop microstructure models for electrode design - both computationally efficient. Task 1. The new Multi-Scale Multi-Domain model framework (GH-MSMD) provides 100x to 1,000x computation speed-up in battery electrochemical/thermal simulation while retaining modularity of particles and electrode-, cell-, and pack-level domains. The increased speed enables direct use of the full model in parameter identification. Task 2. Mechanical-electrochemical-thermal (MECT) models for mechanical abuse simulation were simultaneously coupled, enabling simultaneous modelingmore » of electrochemical reactions during the short circuit, when necessary. The interactions between mechanical failure and battery cell performance were studied, and the flexibility of the model for various batteries structures and loading conditions was improved. Model validation is ongoing to compare with test data from Sandia National Laboratories. The ABDT tool was established in ANSYS. Task 3. Microstructural modeling was conducted to enhance next-generation electrode designs. This 3- year project will validate models for a variety of electrodes, complementing Advanced Battery Research programs. Prototype tools have been developed for electrochemical simulation and geometric reconstruction.« less
NASA Astrophysics Data System (ADS)
Giusi, Giovanni; Liu, Scige J.; Galli, Emanuele; Di Giorgio, Anna M.; Farina, Maria; Vertolli, Nello; Di Lellis, Andrea M.
2016-07-01
In this paper we present the results of a series of performance tests carried out on a prototype board mounting the Cobham Gaisler GR712RC Dual Core LEON3FT processor. The aim was the characterization of the performances of the dual core processor when used for executing a highly demanding lossless compression task, acting on data segments continuously copied from the static memory to the processor RAM. The selection of the compression activity to evaluate the performances was driven by the possibility of a comparison with previously executed tests on the Cobham/Aeroflex Gaisler UT699 LEON3FT SPARC™ V8. The results of the test activity have shown a factor 1.6 of improvement with respect to the previous tests, which can easily be improved by adopting a faster onboard board clock, and provided indications on the best size of the data chunks to be used in the compression activity.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee
2008-01-01
High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Scheduling algorithms for automatic control systems for technological processes
NASA Astrophysics Data System (ADS)
Chernigovskiy, A. S.; Tsarev, R. Yu; Kapulin, D. V.
2017-01-01
Wide use of automatic process control systems and the usage of high-performance systems containing a number of computers (processors) give opportunities for creation of high-quality and fast production that increases competitiveness of an enterprise. Exact and fast calculations, control computation, and processing of the big data arrays - all of this requires the high level of productivity and, at the same time, minimum time of data handling and result receiving. In order to reach the best time, it is necessary not only to use computing resources optimally, but also to design and develop the software so that time gain will be maximal. For this purpose task (jobs or operations), scheduling techniques for the multi-machine/multiprocessor systems are applied. Some of basic task scheduling methods for the multi-machine process control systems are considered in this paper, their advantages and disadvantages come to light, and also some usage considerations, in case of the software for automatic process control systems developing, are made.
2016-05-07
REPORT DOCUMENTATION PAGE I . ... ... .. . ,...,.., ............. OMB No. 0704-0188 The public reporting burden for this collection of...Student Support for Appl ication of Advanced Multi- Core Processor N00014-12-1-0298 Technologies to Oceanographic Research Sb. GRANT NUMBER Sc...communications protocols (i.e. UART, I2C, and SPI), through the , ’ . handing off of the data to the server APis. By providing a common set of tools
Optimization of the Multi-Spectral Euclidean Distance Calculation for FPGA-based Spaceborne Systems
NASA Technical Reports Server (NTRS)
Cristo, Alejandro; Fisher, Kevin; Perez, Rosa M.; Martinez, Pablo; Gualtieri, Anthony J.
2012-01-01
Due to the high quantity of operations that spaceborne processing systems must carry out in space, new methodologies and techniques are being presented as good alternatives in order to free the main processor from work and improve the overall performance. These include the development of ancillary dedicated hardware circuits that carry out the more redundant and computationally expensive operations in a faster way, leaving the main processor free to carry out other tasks while waiting for the result. One of these devices is SpaceCube, a FPGA-based system designed by NASA. The opportunity to use FPGA reconfigurable architectures in space allows not only the optimization of the mission operations with hardware-level solutions, but also the ability to create new and improved versions of the circuits, including error corrections, once the satellite is already in orbit. In this work, we propose the optimization of a common operation in remote sensing: the Multi-Spectral Euclidean Distance calculation. For that, two different hardware architectures have been designed and implemented in a Xilinx Virtex-5 FPGA, the same model of FPGAs used by SpaceCube. Previous results have shown that the communications between the embedded processor and the circuit create a bottleneck that affects the overall performance in a negative way. In order to avoid this, advanced methods including memory sharing, Native Port Interface (NPI) connections and Data Burst Transfers have been used.
Study on fault diagnosis and load feedback control system of combine harvester
NASA Astrophysics Data System (ADS)
Li, Ying; Wang, Kun
2017-01-01
In order to timely gain working status parameters of operating parts in combine harvester and improve its operating efficiency, fault diagnosis and load feedback control system is designed. In the system, rotation speed sensors were used to gather these signals of forward speed and rotation speeds of intermediate shaft, conveying trough, tangential and longitudinal flow threshing rotors, grain conveying auger. Using C8051 single chip microcomputer (SCM) as processor for main control unit, faults diagnosis and forward speed control were carried through by rotation speed ratio analysis of each channel rotation speed and intermediate shaft rotation speed by use of multi-sensor fused fuzzy control algorithm, and these processing results would be sent to touch screen and display work status of combine harvester. Field trials manifest that fault monitoring and load feedback control system has good man-machine interaction and the fault diagnosis method based on rotation speed ratios has low false alarm rate, and the system can realize automation control of forward speed for combine harvester.
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, Fred A.; Morel, Michael R.
1989-01-01
A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
NASA Technical Reports Server (NTRS)
1997-01-01
In 1990, Avtec Systems, Inc. developed its first telemetry boards for Goddard Space Flight Center. Avtec products now include PC/AT, PCI and VME-based high speed I/O boards and turn-key systems. The most recent and most successful technology transfer from NASA to Avtec is the Programmable Telemetry Processor (PTP), a personal computer- based, multi-channel telemetry front-end processing system originally developed to support the NASA communication (NASCOM) network. The PTP performs data acquisition, real-time network transfer, and store and forward operations. There are over 100 PTP systems located in NASA facilities and throughout the world.
Investigation into the Use of Texturing for Real-Time Computer Animation.
1987-12-01
produce a rough polygon surface [7]. Research in the area of real time texturing has also been conducted. Using a specially designed multi-processor system ...Oka, Tsutsui, Ohba, Kurauchi and Tago have introduced real-time manipulation of texture mapped surfaces [8]. Using multi- processors, systems will...a call to the system function defpattern(n,size,mask) short n,size; short *mask, which takes as input an index to a system table of patterns, a
NASA Astrophysics Data System (ADS)
Urnes, James M., Sr.; Cushing, John; Bond, William E.; Nunes, Steve
1996-10-01
Fly-by-Light control systems offer higher performance for fighter and transport aircraft, with efficient fiber optic data transmission, electric control surface actuation, and multi-channel high capacity centralized processing combining to provide maximum aircraft flight control system handling qualities and safety. The key to efficient support for these vehicles is timely and accurate fault diagnostics of all control system components. These diagnostic tests are best conducted during flight when all facts relating to the failure are present. The resulting data can be used by the ground crew for efficient repair and turnaround of the aircraft, saving time and money in support costs. These difficult to diagnose (Cannot Duplicate) fault indications average 40 - 50% of maintenance activities on today's fighter and transport aircraft, adding significantly to fleet support cost. Fiber optic data transmission can support a wealth of data for fault monitoring; the most efficient method of fault diagnostics is accurate modeling of the component response under normal and failed conditions for use in comparison with the actual component flight data. Neural Network hardware processors offer an efficient and cost-effective method to install fault diagnostics in flight systems, permitting on-board diagnostic modeling of very complex subsystems. Task 2C of the ARPA FLASH program is a design demonstration of this diagnostics approach, using the very high speed computation of the Adaptive Solutions Neural Network processor to monitor an advanced Electrohydrostatic control surface actuator linked through a AS-1773A fiber optic bus. This paper describes the design approach and projected performance of this on-line diagnostics system.
AltiVec performance increases for autonomous robotics for the MARSSCAPE architecture program
NASA Astrophysics Data System (ADS)
Gothard, Benny M.
2002-02-01
One of the main tall poles that must be overcome to develop a fully autonomous vehicle is the inability of the computer to understand its surrounding environment to a level that is required for the intended task. The military mission scenario requires a robot to interact in a complex, unstructured, dynamic environment. Reference A High Fidelity Multi-Sensor Scene Understanding System for Autonomous Navigation The Mobile Autonomous Robot Software Self Composing Adaptive Programming Environment (MarsScape) perception research addresses three aspects of the problem; sensor system design, processing architectures, and algorithm enhancements. A prototype perception system has been demonstrated on robotic High Mobility Multi-purpose Wheeled Vehicle and All Terrain Vehicle testbeds. This paper addresses the tall pole of processing requirements and the performance improvements based on the selected MarsScape Processing Architecture. The processor chosen is the Motorola Altivec-G4 Power PC(PPC) (1998 Motorola, Inc.), a highly parallized commercial Single Instruction Multiple Data processor. Both derived perception benchmarks and actual perception subsystems code will be benchmarked and compared against previous Demo II-Semi-autonomous Surrogate Vehicle processing architectures along with desktop Personal Computers(PC). Performance gains are highlighted with progress to date, and lessons learned and future directions are described.
Parallel Multi-Step/Multi-Rate Integration of Two-Time Scale Dynamic Systems
NASA Technical Reports Server (NTRS)
Chang, Johnny T.; Ploen, Scott R.; Sohl, Garett. A,; Martin, Bryan J.
2004-01-01
Increasing demands on the fidelity of simulations for real-time and high-fidelity simulations are stressing the capacity of modern processors. New integration techniques are required that provide maximum efficiency for systems that are parallelizable. However many current techniques make assumptions that are at odds with non-cascadable systems. A new serial multi-step/multi-rate integration algorithm for dual-timescale continuous state systems is presented which applies to these systems, and is extended to a parallel multi-step/multi-rate algorithm. The superior performance of both algorithms is demonstrated through a representative example.
System and method to determine thermophysical properties of a multi-component gas
Morrow, Thomas B.; Behring, II, Kendricks A.
2003-08-05
A system and method to characterize natural gas hydrocarbons using a single inferential property, such as standard sound speed, when the concentrations of the diluent gases (e.g., carbon dioxide and nitrogen) are known. The system to determine a thermophysical property of a gas having a first plurality of components comprises a sound velocity measurement device, a concentration measurement device, and a processor to determine a thermophysical property as a function of a correlation between the thermophysical property, the speed of sound, and the concentration measurements, wherein the number of concentration measurements is less than the number of components in the gas. The method includes the steps of determining the speed of sound in the gas, determining a plurality of gas component concentrations in the gas, and determining the thermophysical property as a function of a correlation between the thermophysical property, the speed of sound, and the plurality of concentrations.
Analysis of high speed flow, thermal and structural interactions
NASA Technical Reports Server (NTRS)
Thornton, Earl A.
1994-01-01
Research for this grant focused on the following tasks: (1) the prediction of severe, localized aerodynamic heating for complex, high speed flows; (2) finite element adaptive refinement methodology for multi-disciplinary analyses; (3) the prediction of thermoviscoplastic structural response with rate-dependent effects and large deformations; (4) thermoviscoplastic constitutive models for metals; and (5) coolant flow/structural heat transfer analyses.
Highway Traffic Simulations on Multi-Processor Computers
DOT National Transportation Integrated Search
1997-01-01
A computer model has been developed to simulate highway traffic for various degrees of automation with a high degree of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway ...
Multi-Objective Optimization for Speed and Stability of a Sony AIBO Gait
2007-09-01
MULTI-OBJECTIVE OPTIMIZATION FOR SPEED AND STABILITY OF A SONY AIBO GAIT THESIS Christopher A. Patterson, Second Lieutenant, USAF AFIT/GCS...07-17 MULTI-OBJECTIVE OPTIMIZATION FOR SPEED AND STABILITY OF A SONY AIBO GAIT THESIS Presented to the Faculty Department of...MULTI-OBJECTIVE OPTIMIZATION FOR SPEED AND STABILITY OF A SONY AIBO GAIT Christopher A. Patterson, BS Second Lieutenant, USAF
RTEMS SMP and MTAPI for Efficient Multi-Core Space Applications on LEON3/LEON4 Processors
NASA Astrophysics Data System (ADS)
Cederman, Daniel; Hellstrom, Daniel; Sherrill, Joel; Bloom, Gedare; Patte, Mathieu; Zulianello, Marco
2015-09-01
This paper presents the final result of an European Space Agency (ESA) activity aimed at improving the software support for LEON processors used in SMP configurations. One of the benefits of using a multicore system in a SMP configuration is that in many instances it is possible to better utilize the available processing resources by load balancing between cores. This however comes with the cost of having to synchronize operations between cores, leading to increased complexity. While in an AMP system one can use multiple instances of operating systems that are only uni-processor capable, a SMP system requires the operating system to be written to support multicore systems. In this activity we have improved and extended the SMP support of the RTEMS real-time operating system and ensured that it fully supports the multicore capable LEON processors. The targeted hardware in the activity has been the GR712RC, a dual-core core LEON3FT processor, and the functional prototype of ESA's Next Generation Multiprocessor (NGMP), a quad core LEON4 processor. The final version of the NGMP is now available as a product under the name GR740. An implementation of the Multicore Task Management API (MTAPI) has been developed as part of this activity to aid in the parallelization of applications for RTEMS SMP. It allows for simplified development of parallel applications using the task-based programming model. An existing space application, the Gaia Video Processing Unit, has been ported to RTEMS SMP using the MTAPI implementation to demonstrate the feasibility and usefulness of multicore processors for space payload software. The activity is funded by ESA under contract 4000108560/13/NL/JK. Gedare Bloom is supported in part by NSF CNS-0934725.
A multi-satellite orbit determination problem in a parallel processing environment
NASA Technical Reports Server (NTRS)
Deakyne, M. S.; Anderle, R. J.
1988-01-01
The Engineering Orbit Analysis Unit at GE Valley Forge used an Intel Hypercube Parallel Processor to investigate the performance and gain experience of parallel processors with a multi-satellite orbit determination problem. A general study was selected in which major blocks of computation for the multi-satellite orbit computations were used as units to be assigned to the various processors on the Hypercube. Problems encountered or successes achieved in addressing the orbit determination problem would be more likely to be transferable to other parallel processors. The prime objective was to study the algorithm to allow processing of observations later in time than those employed in the state update. Expertise in ephemeris determination was exploited in addressing these problems and the facility used to bring a realism to the study which would highlight the problems which may not otherwise be anticipated. Secondary objectives were to gain experience of a non-trivial problem in a parallel processor environment, to explore the necessary interplay of serial and parallel sections of the algorithm in terms of timing studies, to explore the granularity (coarse vs. fine grain) to discover the granularity limit above which there would be a risk of starvation where the majority of nodes would be idle or under the limit where the overhead associated with splitting the problem may require more work and communication time than is useful.
The Multi-energy High precision Data Processor Based on AD7606
NASA Astrophysics Data System (ADS)
Zhao, Chen; Zhang, Yanchi; Xie, Da
2017-11-01
This paper designs an information collector based on AD7606 to realize the high-precision simultaneous acquisition of multi-source information of multi-energy systems to form the information platform of the energy Internet at Laogang with electricty as its major energy source. Combined with information fusion technologies, this paper analyzes the data to improve the overall energy system scheduling capability and reliability.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Supinski, B.; Caliga, D.
2017-09-28
The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.
Systems-on-chip approach for real-time simulation of wheel-rail contact laws
NASA Astrophysics Data System (ADS)
Mei, T. X.; Zhou, Y. J.
2013-04-01
This paper presents the development of a systems-on-chip approach to speed up the simulation of wheel-rail contact laws, which can be used to reduce the requirement for high-performance computers and enable simulation in real time for the use of hardware-in-loop for experimental studies of the latest vehicle dynamic and control technologies. The wheel-rail contact laws are implemented using a field programmable gate array (FPGA) device with a design that substantially outperforms modern general-purpose PC platforms or fixed architecture digital signal processor devices in terms of processing time, configuration flexibility and cost. In order to utilise the FPGA's parallel-processing capability, the operations in the contact laws algorithms are arranged in a parallel manner and multi-contact patches are tackled simultaneously in the design. The interface between the FPGA device and the host PC is achieved by using a high-throughput and low-latency Ethernet link. The development is based on FASTSIM algorithms, although the design can be adapted and expanded for even more computationally demanding tasks.
NASA Astrophysics Data System (ADS)
Leggett, C.; Binet, S.; Jackson, K.; Levinthal, D.; Tatarkhanov, M.; Yao, Y.
2011-12-01
Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Task Integration Facilitates Multitasking.
de Oliveira, Rita F; Raab, Markus; Hegele, Mathias; Schorer, Jörg
2017-01-01
The aim of this study was to investigate multi-task integration in a continuous tracking task. We were particularly interested in how manipulating task structure in a dual-task situation affects learning of a constant segment embedded in a pursuit-tracking task. Importantly, we examined if dual-task effects could be attributed to task integration by varying the structural similarity and difficulty of the primary and secondary tasks. In Experiment 1 participants performed a pursuit tracking task while counting high-pitched tones and ignoring low-pitched tones. The tones were either presented randomly or structurally 250 ms before each tracking turn. Experiment 2 increased the motor load of the secondary tasks by asking participants to tap their feet to the tones. Experiment 3 further increased motor load of the primary task by increasing its speed and having participants tracking with their non-dominant hand. The results show that dual-task interference can be moderated by secondary task conditions that match the structure of the primary task. Therefore our results support proposals of task integration in continuous tracking paradigms. We conclude that multi-tasking is not always detrimental for motor learning but can be facilitated through task-integration.
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor
NASA Astrophysics Data System (ADS)
Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.
2014-03-01
List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
Exact diagonalization of quantum lattice models on coprocessors
NASA Astrophysics Data System (ADS)
Siro, T.; Harju, A.
2016-10-01
We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
McBride, Dawn M; Abney, Drew H
2012-01-01
We examined multi-process (MP) and transfer-appropriate processing descriptions of prospective memory (PM). Three conditions were compared that varied the overlap in processing type (perceptual/conceptual) between the ongoing and PM tasks such that two conditions involved a match of perceptual processing and one condition involved a mismatch in processing (conceptual ongoing task/perceptual PM task). One of the matched processing conditions also created a focal PM task, whereas the other two conditions were considered non-focal (Einstein & McDaniel, 2005). PM task accuracy and ongoing task completion speed in baseline and PM task conditions were measured. Accuracy results indicated a higher PM task completion rate for the focal condition than the non-focal conditions, a finding that is consistent with predictions made by the MP view. However, reaction time (RT) analyses indicated that PM task cost did not differ across conditions when practice effects are considered. Thus, the PM accuracy results are consistent with a MP description of PM, but RT results did not support the MP view predictions regarding PM cost.
Douglas, Heather E; Raban, Magdalena Z; Walter, Scott R; Westbrook, Johanna I
2017-03-01
Multi-tasking is an important skill for clinical work which has received limited research attention. Its impacts on clinical work are poorly understood. In contrast, there is substantial multi-tasking research in cognitive psychology, driver distraction, and human-computer interaction. This review synthesises evidence of the extent and impacts of multi-tasking on efficiency and task performance from health and non-healthcare literature, to compare and contrast approaches, identify implications for clinical work, and to develop an evidence-informed framework for guiding the measurement of multi-tasking in future healthcare studies. The results showed healthcare studies using direct observation have focused on descriptive studies to quantify concurrent multi-tasking and its frequency in different contexts, with limited study of impact. In comparison, non-healthcare studies have applied predominantly experimental and simulation designs, focusing on interleaved and concurrent multi-tasking, and testing theories of the mechanisms by which multi-tasking impacts task efficiency and performance. We propose a framework to guide the measurement of multi-tasking in clinical settings that draws together lessons from these siloed research efforts. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
NASA Technical Reports Server (NTRS)
Perkinson, J. A.
1974-01-01
The application of associative memory processor equipment to conventional host processors type systems is discussed. Efforts were made to demonstrate how such application relieves the task burden of conventional systems, and enhance system speed and efficiency. Data cover comparative theoretical performance analysis, demonstration of expanded growth capabilities, and demonstrations of actual hardware in simulated environment.
System architecture for asynchronous multi-processor robotic control system
NASA Technical Reports Server (NTRS)
Steele, Robert D.; Long, Mark; Backes, Paul
1993-01-01
The architecture for the Modular Telerobot Task Execution System (MOTES) as implemented in the Supervisory Telerobotics (STELER) Laboratory is described. MOTES is the software component of the remote site of a local-remote telerobotic system which is being developed for NASA for space applications, in particular Space Station Freedom applications. The system is being developed to provide control and supervised autonomous control to support both space based operation and ground-remote control with time delay. The local-remote architecture places task planning responsibilities at the local site and task execution responsibilities at the remote site. This separation allows the remote site to be designed to optimize task execution capability within a limited computational environment such as is expected in flight systems. The local site task planning system could be placed on the ground where few computational limitations are expected. MOTES is written in the Ada programming language for a multiprocessor environment.
NASA Technical Reports Server (NTRS)
Haimes, Robert; Follen, Gregory J.
1998-01-01
CAPRI is a CAD-vendor neutral application programming interface designed for the construction of analysis and design systems. By allowing access to the geometry from within all modules (grid generators, solvers and post-processors) such tasks as meshing on the actual surfaces, node enrichment by solvers and defining which mesh faces are boundaries (for the solver and visualization system) become simpler. The overall reliance on file 'standards' is minimized. This 'Geometry Centric' approach makes multi-physics (multi-disciplinary) analysis codes much easier to build. By using the shared (coupled) surface as the foundation, CAPRI provides a single call to interpolate grid-node based data from the surface discretization in one volume to another. Finally, design systems are possible where the results can be brought back into the CAD system (and therefore manufactured) because all geometry construction and modification are performed using the CAD system's geometry kernel.
Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-core Processors
2009-09-01
TFLOPS of Playstation 3 (PS3) nodes with IBM Cell Broadband Engine multi-cores and 15 dual-quad Xeon head nodes. The interconnect fabric includes... 4 3. INFORMATION MANAGEMENT FOR PARALLELIZATION AND...STREAMING............................................................. 7 4 . RESULTS
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Geospace simulations on the Cell BE processor
NASA Astrophysics Data System (ADS)
Germaschewski, K.; Raeder, J.; Larson, D.
2008-12-01
OpenGGCM (Open Geospace General circulation Model) is an established numerical code that simulates the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is limited by computational constraints on grid resolution. We investigate porting of the MHD solver to the Cell BE architecture, a novel inhomogeneous multicore architecture capable of up to 230 GFlops per processor. Realizing this high performance on the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallel approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the vector/SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We obtained excellent performance numbers, a speed-up of a factor of 25 compared to just using the main processor, while still keeping the numerical implementation details of the code maintainable.
Zhang, Zhen; Ma, Cheng; Zhu, Rong
2017-08-23
Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.
Zhang, Zhen; Zhu, Rong
2017-01-01
Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas. PMID:28832522
Binaural unmasking of multi-channel stimuli in bilateral cochlear implant users.
Van Deun, Lieselot; van Wieringen, Astrid; Francart, Tom; Büchner, Andreas; Lenarz, Thomas; Wouters, Jan
2011-10-01
Previous work suggests that bilateral cochlear implant users are sensitive to interaural cues if experimental speech processors are used to preserve accurate interaural information in the electrical stimulation pattern. Binaural unmasking occurs in adults and children when an interaural delay is applied to the envelope of a high-rate pulse train. Nevertheless, for speech perception, binaural unmasking benefits have not been demonstrated consistently, even with coordinated stimulation at both ears. The present study aimed at bridging the gap between basic psychophysical performance on binaural signal detection tasks on the one hand and binaural perception of speech in noise on the other hand. Therefore, binaural signal detection was expanded to multi-channel stimulation and biologically relevant interaural delays. A harmonic complex, consisting of three sinusoids (125, 250, and 375 Hz), was added to three 125-Hz-wide noise bands centered on the sinusoids. When an interaural delay of 700 μs was introduced, an average BMLD of 3 dB was established. Outcomes are promising in view of real-life benefits. Future research should investigate the generalization of the observed benefits for signal detection to speech perception in everyday listening situations and determine the importance of coordination of bilateral speech processors and accentuation of envelope cues.
Multicore Challenges and Benefits for High Performance Scientific Computing
Nielsen, Ida M. B.; Janssen, Curtis L.
2008-01-01
Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexitymore » of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.« less
Li, Zhi; Wei, Henglu; Zhou, Wei; Duan, Zhemin
2018-01-01
Dynamic thermal management (DTM) mechanisms utilize embedded thermal sensors to collect fine-grained temperature information for monitoring the real-time thermal behavior of multi-core processors. However, embedded thermal sensors are very susceptible to a variety of sources of noise, including environmental uncertainty and process variation. This causes the discrepancies between actual temperatures and those observed by on-chip thermal sensors, which seriously affect the efficiency of DTM. In this paper, a smoothing filter-based Kalman prediction technique is proposed to accurately estimate the temperatures from noisy sensor readings. For the multi-sensor estimation scenario, the spatial correlations among different sensor locations are exploited. On this basis, a multi-sensor synergistic calibration algorithm (known as MSSCA) is proposed to improve the simultaneous prediction accuracy of multiple sensors. Moreover, an infrared imaging-based temperature measurement technique is also proposed to capture the thermal traces of an advanced micro devices (AMD) quad-core processor in real time. The acquired real temperature data are used to evaluate our prediction performance. Simulation shows that the proposed synergistic calibration scheme can reduce the root-mean-square error (RMSE) by 1.2 ∘C and increase the signal-to-noise ratio (SNR) by 15.8 dB (with a very small average runtime overhead) compared with assuming the thermal sensor readings to be ideal. Additionally, the average false alarm rate (FAR) of the corrected sensor temperature readings can be reduced by 28.6%. These results clearly demonstrate that if our approach is used to perform temperature estimation, the response mechanisms of DTM can be triggered to adjust the voltages, frequencies, and cooling fan speeds at more appropriate times. PMID:29393862
Li, Xin; Ou, Xingtao; Li, Zhi; Wei, Henglu; Zhou, Wei; Duan, Zhemin
2018-02-02
Dynamic thermal management (DTM) mechanisms utilize embedded thermal sensors to collect fine-grained temperature information for monitoring the real-time thermal behavior of multi-core processors. However, embedded thermal sensors are very susceptible to a variety of sources of noise, including environmental uncertainty and process variation. This causes the discrepancies between actual temperatures and those observed by on-chip thermal sensors, which seriously affect the efficiency of DTM. In this paper, a smoothing filter-based Kalman prediction technique is proposed to accurately estimate the temperatures from noisy sensor readings. For the multi-sensor estimation scenario, the spatial correlations among different sensor locations are exploited. On this basis, a multi-sensor synergistic calibration algorithm (known as MSSCA) is proposed to improve the simultaneous prediction accuracy of multiple sensors. Moreover, an infrared imaging-based temperature measurement technique is also proposed to capture the thermal traces of an advanced micro devices (AMD) quad-core processor in real time. The acquired real temperature data are used to evaluate our prediction performance. Simulation shows that the proposed synergistic calibration scheme can reduce the root-mean-square error (RMSE) by 1.2 ∘ C and increase the signal-to-noise ratio (SNR) by 15.8 dB (with a very small average runtime overhead) compared with assuming the thermal sensor readings to be ideal. Additionally, the average false alarm rate (FAR) of the corrected sensor temperature readings can be reduced by 28.6%. These results clearly demonstrate that if our approach is used to perform temperature estimation, the response mechanisms of DTM can be triggered to adjust the voltages, frequencies, and cooling fan speeds at more appropriate times.
NASA Astrophysics Data System (ADS)
Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger
2011-06-01
We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.
Speed and accuracy improvements in FLAASH atmospheric correction of hyperspectral imagery
NASA Astrophysics Data System (ADS)
Perkins, Timothy; Adler-Golden, Steven; Matthew, Michael W.; Berk, Alexander; Bernstein, Lawrence S.; Lee, Jamine; Fox, Marsha
2012-11-01
Remotely sensed spectral imagery of the earth's surface can be used to fullest advantage when the influence of the atmosphere has been removed and the measurements are reduced to units of reflectance. Here, we provide a comprehensive summary of the latest version of the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes atmospheric correction algorithm. We also report some new code improvements for speed and accuracy. These include the re-working of the original algorithm in C-language code parallelized with message passing interface and containing a new radiative transfer look-up table option, which replaces executions of the MODTRAN model. With computation times now as low as ~10 s per image per computer processor, automated, real-time, on-board atmospheric correction of hyper- and multi-spectral imagery is within reach.
NASA Astrophysics Data System (ADS)
Yamamoto, H.; Nakajima, K.; Zhang, K.; Nanai, S.
2015-12-01
Powerful numerical codes that are capable of modeling complex coupled processes of physics and chemistry have been developed for predicting the fate of CO2 in reservoirs as well as its potential impacts on groundwater and subsurface environments. However, they are often computationally demanding for solving highly non-linear models in sufficient spatial and temporal resolutions. Geological heterogeneity and uncertainties further increase the challenges in modeling works. Two-phase flow simulations in heterogeneous media usually require much longer computational time than that in homogeneous media. Uncertainties in reservoir properties may necessitate stochastic simulations with multiple realizations. Recently, massively parallel supercomputers with more than thousands of processors become available in scientific and engineering communities. Such supercomputers may attract attentions from geoscientist and reservoir engineers for solving the large and non-linear models in higher resolutions within a reasonable time. However, for making it a useful tool, it is essential to tackle several practical obstacles to utilize large number of processors effectively for general-purpose reservoir simulators. We have implemented massively-parallel versions of two TOUGH2 family codes (a multi-phase flow simulator TOUGH2 and a chemically reactive transport simulator TOUGHREACT) on two different types (vector- and scalar-type) of supercomputers with a thousand to tens of thousands of processors. After completing implementation and extensive tune-up on the supercomputers, the computational performance was measured for three simulations with multi-million grid models, including a simulation of the dissolution-diffusion-convection process that requires high spatial and temporal resolutions to simulate the growth of small convective fingers of CO2-dissolved water to larger ones in a reservoir scale. The performance measurement confirmed that the both simulators exhibit excellent scalabilities showing almost linear speedup against number of processors up to over ten thousand cores. Generally this allows us to perform coupled multi-physics (THC) simulations on high resolution geologic models with multi-million grid in a practical time (e.g., less than a second per time step).
Jayashree, B; Rajgopal, S; Hoisington, D; Prasanth, V P; Chandra, S
2008-09-24
Structure, is a widely used software tool to investigate population genetic structure with multi-locus genotyping data. The software uses an iterative algorithm to group individuals into "K" clusters, representing possibly K genetically distinct subpopulations. The serial implementation of this programme is processor-intensive even with small datasets. We describe an implementation of the program within a parallel framework. Speedup was achieved by running different replicates and values of K on each node of the cluster. A web-based user-oriented GUI has been implemented in PHP, through which the user can specify input parameters for the programme. The number of processors to be used can be specified in the background command. A web-based visualization tool "Visualstruct", written in PHP (HTML and Java script embedded), allows for the graphical display of population clusters output from Structure, where each individual may be visualized as a line segment with K colors defining its possible genomic composition with respect to the K genetic sub-populations. The advantage over available programs is in the increased number of individuals that can be visualized. The analyses of real datasets indicate a speedup of up to four, when comparing the speed of execution on clusters of eight processors with the speed of execution on one desktop. The software package is freely available to interested users upon request.
Word Reading Fluency as a Serial Naming Task
ERIC Educational Resources Information Center
Protopapas, Athanassios; Katopodi, Katerina; Altani, Angeliki; Georgiou, George K.
2018-01-01
Word list reading fluency is theoretically expected to depend on single word reading speed. Yet the correlation between the two diminishes with increasing fluency, while fluency remains strongly correlated to serial digit naming. We hypothesized that multi-element sequence processing is an important component of fluency. We used confirmatory…
Web-based multi-channel analyzer
Gritzo, Russ E.
2003-12-23
The present invention provides an improved multi-channel analyzer designed to conveniently gather, process, and distribute spectrographic pulse data. The multi-channel analyzer may operate on a computer system having memory, a processor, and the capability to connect to a network and to receive digitized spectrographic pulses. The multi-channel analyzer may have a software module integrated with a general-purpose operating system that may receive digitized spectrographic pulses for at least 10,000 pulses per second. The multi-channel analyzer may further have a user-level software module that may receive user-specified controls dictating the operation of the multi-channel analyzer, making the multi-channel analyzer customizable by the end-user. The user-level software may further categorize and conveniently distribute spectrographic pulse data employing non-proprietary, standard communication protocols and formats.
Maximum-Likelihood Estimation With a Contracting-Grid Search Algorithm
Hesterman, Jacob Y.; Caucci, Luca; Kupinski, Matthew A.; Barrett, Harrison H.; Furenlid, Lars R.
2010-01-01
A fast search algorithm capable of operating in multi-dimensional spaces is introduced. As a sample application, we demonstrate its utility in the 2D and 3D maximum-likelihood position-estimation problem that arises in the processing of PMT signals to derive interaction locations in compact gamma cameras. We demonstrate that the algorithm can be parallelized in pipelines, and thereby efficiently implemented in specialized hardware, such as field-programmable gate arrays (FPGAs). A 2D implementation of the algorithm is achieved in Cell/BE processors, resulting in processing speeds above one million events per second, which is a 20× increase in speed over a conventional desktop machine. Graphics processing units (GPUs) are used for a 3D application of the algorithm, resulting in processing speeds of nearly 250,000 events per second which is a 250× increase in speed over a conventional desktop machine. These implementations indicate the viability of the algorithm for use in real-time imaging applications. PMID:20824155
Monte Carlo charged-particle tracking and energy deposition on a Lagrangian mesh.
Yuan, J; Moses, G A; McKenty, P W
2005-10-01
A Monte Carlo algorithm for alpha particle tracking and energy deposition on a cylindrical computational mesh in a Lagrangian hydrodynamics code used for inertial confinement fusion (ICF) simulations is presented. The straight line approximation is used to follow propagation of "Monte Carlo particles" which represent collections of alpha particles generated from thermonuclear deuterium-tritium (DT) reactions. Energy deposition in the plasma is modeled by the continuous slowing down approximation. The scheme addresses various aspects arising in the coupling of Monte Carlo tracking with Lagrangian hydrodynamics; such as non-orthogonal severely distorted mesh cells, particle relocation on the moving mesh and particle relocation after rezoning. A comparison with the flux-limited multi-group diffusion transport method is presented for a polar direct drive target design for the National Ignition Facility. Simulations show the Monte Carlo transport method predicts about earlier ignition than predicted by the diffusion method, and generates higher hot spot temperature. Nearly linear speed-up is achieved for multi-processor parallel simulations.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array
NASA Astrophysics Data System (ADS)
Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul
2008-04-01
This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2015-07-14
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Portable parallel stochastic optimization for the design of aeropropulsion components
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Rhodes, G. S.
1994-01-01
This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
The Models-3 Community Multi-scale Air Quality (CMAQ) model, first released by the USEPA in 1999 (Byun and Ching. 1999), continues to be developed and evaluated. The principal components of the CMAQ system include a comprehensive emission processor known as the Sparse Matrix O...
The research and application of multi-biometric acquisition embedded system
NASA Astrophysics Data System (ADS)
Deng, Shichao; Liu, Tiegen; Guo, Jingjing; Li, Xiuyan
2009-11-01
The identification technology based on multi-biometric can greatly improve the applicability, reliability and antifalsification. This paper presents a multi-biometric system bases on embedded system, which includes: three capture daughter boards are applied to obtain different biometric: one each for fingerprint, iris and vein of the back of hand; FPGA (Field Programmable Gate Array) is designed as coprocessor, which uses to configure three daughter boards on request and provides data path between DSP (digital signal processor) and daughter boards; DSP is the master processor and its functions include: control the biometric information acquisition, extracts feature as required and responsible for compare the results with the local database or data server through network communication. The advantages of this system were it can acquire three different biometric in real time, extracts complexity feature flexibly in different biometrics' raw data according to different purposes and arithmetic and network interface on the core-board will be the solution of big data scale. Because this embedded system has high stability, reliability, flexibility and fit for different data scale, it can satisfy the demand of multi-biometric recognition.
NASA Technical Reports Server (NTRS)
Kikuchi, Hideaki; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya; Shimojo, Fuyuki; Saini, Subhash
2003-01-01
Scalability of a low-cost, Intel Xeon-based, multi-Teraflop Linux cluster is tested for two high-end scientific applications: Classical atomistic simulation based on the molecular dynamics method and quantum mechanical calculation based on the density functional theory. These scalable parallel applications use space-time multiresolution algorithms and feature computational-space decomposition, wavelet-based adaptive load balancing, and spacefilling-curve-based data compression for scalable I/O. Comparative performance tests are performed on a 1,024-processor Linux cluster and a conventional higher-end parallel supercomputer, 1,184-processor IBM SP4. The results show that the performance of the Linux cluster is comparable to that of the SP4. We also study various effects, such as the sharing of memory and L2 cache among processors, on the performance.
Ai, Qi; Chen, Xiao; Tian, Miao; Yan, Bin-bin; Zhang, Ying; Song, Fei-jun; Chen, Gen-xiang; Sang, Xin-zhu; Wang, Yi-quan; Xiao, Feng; Alameh, Kamal
2015-02-01
Based on a digital micromirror device (DMD) processor as the multi-wavelength narrow-band tunable filter, we demonstrate a multi-port tunable fiber laser through experiments. The key property of this laser is that any lasing wavelength channel from any arbitrary output port can be switched independently over the whole C-band, which is only driven by single DMD chip flexibly. All outputs display an excellent tuning capacity and high consistency in the whole C-band with a 0.02 nm linewidth, 0.055 nm wavelength tuning step, and side-mode suppression ratio greater than 60 dB. Due to the automatic power control and polarization design, the power uniformity of output lasers is less than 0.008 dB and the wavelength fluctuation is below 0.02 nm within 2 h at room temperature.
Electronic Structure Calculations and Adaptation Scheme in Multi-core Computing Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seshagiri, Lakshminarasimhan; Sosonkina, Masha; Zhang, Zhao
2009-05-20
Multi-core processing environments have become the norm in the generic computing environment and are being considered for adding an extra dimension to the execution of any application. The T2 Niagara processor is a very unique environment where it consists of eight cores having a capability of running eight threads simultaneously in each of the cores. Applications like General Atomic and Molecular Electronic Structure (GAMESS), used for ab-initio molecular quantum chemistry calculations, can be good indicators of the performance of such machines and would be a guideline for both hardware designers and application programmers. In this paper we try to benchmarkmore » the GAMESS performance on a T2 Niagara processor for a couple of molecules. We also show the suitability of using a middleware based adaptation algorithm on GAMESS on such a multi-core environment.« less
Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer
1997-01-01
A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.
Knowledge-based vision for space station object motion detection, recognition, and tracking
NASA Technical Reports Server (NTRS)
Symosek, P.; Panda, D.; Yalamanchili, S.; Wehner, W., III
1987-01-01
Computer vision, especially color image analysis and understanding, has much to offer in the area of the automation of Space Station tasks such as construction, satellite servicing, rendezvous and proximity operations, inspection, experiment monitoring, data management and training. Knowledge-based techniques improve the performance of vision algorithms for unstructured environments because of their ability to deal with imprecise a priori information or inaccurately estimated feature data and still produce useful results. Conventional techniques using statistical and purely model-based approaches lack flexibility in dealing with the variabilities anticipated in the unstructured viewing environment of space. Algorithms developed under NASA sponsorship for Space Station applications to demonstrate the value of a hypothesized architecture for a Video Image Processor (VIP) are presented. Approaches to the enhancement of the performance of these algorithms with knowledge-based techniques and the potential for deployment of highly-parallel multi-processor systems for these algorithms are discussed.
Multi-frequency communication system and method
Carrender, Curtis Lee; Gilbert, Ronald W.
2004-06-01
A multi-frequency RFID remote communication system is provided that includes a plurality of RFID tags configured to receive a first signal and to return a second signal, the second signal having a first frequency component and a second frequency component, the second frequency component including data unique to each remote RFID tag. The system further includes a reader configured to transmit an interrogation signal and to receive remote signals from the tags. A first signal processor, preferably a mixer, removes an intermediate frequency component from the received signal, and a second processor, preferably a second mixer, analyzes the IF frequency component to output data that is unique to each remote tag.
Electro-Optic Computing Architectures. Volume I
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit (OW
Three-dimensional Aerodynamic Instability in Multi-stage Axial Compressors
NASA Technical Reports Server (NTRS)
Suder, Kenneth (Technical Monitor); Tan, Choon-Sooi
2003-01-01
Four separate tasks are reported. The first task: A Computational Model for Short Wavelength Stall Inception and Development In Multi-Stage Compressors; the second task: Three-dimensional Rotating Stall Inception and Effects of Rotating Tip Clearance Asymmetry in Axial Compressors; the third task:Development of an Effective Computational Methodology for Body Force Representation of High-speed Rotor 37; and the fourth task:Development of Circumferential Inlet Distortion through a Representative Eleven Stage High-speed axial compressor. The common theme that threaded throughout these four tasks is the conceptual framework that consists of quantifying flow processes at the fadcompressor blade passage level to define the compressor performance characteristics needed for addressing physical phenomena such compressor aerodynamic instability and compressor response to flow distoriton with length scales larger than compressor blade-to-blade spacing at the system level. The results from these two levels can be synthesized to: (1) simulate compressor aerodynamic instability inception local to a blade rotor tip and its development from a local flow event into the nonlinear limit cycle instability that involves the entire compressor as was demonstrated in the first task; (2) determine the conditions under which compressor stability assessment based on two-dimensional model may not be adequate and the effects of self-induced flow distortion on compressor stability limit as in the second task; (3) quantify multistage compressor response to inlet distortion in stagnation pressure as illustrated in the fourth task; and (4) elucidate its potential applicability for compressor map generation under uniform as well as non-uniform inlet flow given three-dimensional Navier-Stokes solution for each individual blade row as was demonstrated in the third task.
Tunable multi-wavelength fiber lasers based on an Opto-VLSI processor and optical amplifiers.
Xiao, Feng; Alameh, Kamal; Lee, Yong Tak
2009-12-07
A multi-wavelength tunable fiber laser based on the use of an Opto-VLSI processor in conjunction with different optical amplifiers is proposed and experimentally demonstrated. The Opto-VLSI processor can simultaneously select any part of the gain spectrum from each optical amplifier into its associated fiber ring, leading to a multiport tunable fiber laser source. We experimentally demonstrate a 3-port tunable fiber laser source, where each output wavelength of each port can independently be tuned within the C-band with a wavelength step of about 0.05 nm. Experimental results demonstrate a laser linewidth as narrow as 0.05 nm and an optical side-mode-suppression-ratio (SMSR) of about 35 dB. The demonstrated three fiber lasers have excellent stability at room temperature and output power uniformity less than 0.5 dB over the whole C-band.
The parallel algorithm for the 2D discrete wavelet transform
NASA Astrophysics Data System (ADS)
Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel
2018-04-01
The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Dynamically programmable cache
NASA Astrophysics Data System (ADS)
Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas
1998-10-01
Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
Distributed Optimization of Multi-Agent Systems: Framework, Local Optimizer, and Applications
NASA Astrophysics Data System (ADS)
Zu, Yue
Convex optimization problem can be solved in a centralized or distributed manner. Compared with centralized methods based on single-agent system, distributed algorithms rely on multi-agent systems with information exchanging among connected neighbors, which leads to great improvement on the system fault tolerance. Thus, a task within multi-agent system can be completed with presence of partial agent failures. By problem decomposition, a large-scale problem can be divided into a set of small-scale sub-problems that can be solved in sequence/parallel. Hence, the computational complexity is greatly reduced by distributed algorithm in multi-agent system. Moreover, distributed algorithm allows data collected and stored in a distributed fashion, which successfully overcomes the drawbacks of using multicast due to the bandwidth limitation. Distributed algorithm has been applied in solving a variety of real-world problems. Our research focuses on the framework and local optimizer design in practical engineering applications. In the first one, we propose a multi-sensor and multi-agent scheme for spatial motion estimation of a rigid body. Estimation performance is improved in terms of accuracy and convergence speed. Second, we develop a cyber-physical system and implement distributed computation devices to optimize the in-building evacuation path when hazard occurs. The proposed Bellman-Ford Dual-Subgradient path planning method relieves the congestion in corridor and the exit areas. At last, highway traffic flow is managed by adjusting speed limits to minimize the fuel consumption and travel time in the third project. Optimal control strategy is designed through both centralized and distributed algorithm based on convex problem formulation. Moreover, a hybrid control scheme is presented for highway network travel time minimization. Compared with no controlled case or conventional highway traffic control strategy, the proposed hybrid control strategy greatly reduces total travel time on test highway network.
Multitasking OS manages a team of processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ripps, D.L.
1983-07-21
MTOS-68k is a real-time multitasking operating system designed for the popular MC68000 microprocessors. It aproaches task coordination and synchronization in a fashion that matches uniquely the structural simplicity and regularity of the 68000 instruction set. Since in many 68000 applications the speed and power of one CPU are not enough, MTOS-68k has been designed to support multiple processors, as well as multiple tasks. Typically, the devices are tightly coupled single-board computers, that is they share a backplane and parts of global memory.
A Survey of Recent MARTe Based Systems
NASA Astrophysics Data System (ADS)
Neto, André C.; Alves, Diogo; Boncagni, Luca; Carvalho, Pedro J.; Valcarcel, Daniel F.; Barbalace, Antonio; De Tommasi, Gianmaria; Fernandes, Horácio; Sartori, Filippo; Vitale, Enzo; Vitelli, Riccardo; Zabeo, Luca
2011-08-01
The Multithreaded Application Real-Time executor (MARTe) is a data driven framework environment for the development and deployment of real-time control algorithms. The main ideas which led to the present version of the framework were to standardize the development of real-time control systems, while providing a set of strictly bounded standard interfaces to the outside world and also accommodating a collection of facilities which promote the speed and ease of development, commissioning and deployment of such systems. At the core of every MARTe based application, is a set of independent inter-communicating software blocks, named Generic Application Modules (GAM), orchestrated by a real-time scheduler. The platform independence of its core library provides MARTe the necessary robustness and flexibility for conveniently testing applications in different environments including non-real-time operating systems. MARTe is already being used in several machines, each with its own peculiarities regarding hardware interfacing, supervisory control configuration, operating system and target control application. This paper presents and compares the most recent results of systems using MARTe: the JET Vertical Stabilization system, which uses the Real Time Application Interface (RTAI) operating system on Intel multi-core processors; the COMPASS plasma control system, driven by Linux RT also on Intel multi-core processors; ISTTOK real-time tomography equilibrium reconstruction which shares the same support configuration of COMPASS; JET error field correction coils based on VME, PowerPC and VxWorks; FTU LH reflected power system running on VME, Intel with RTAI.
NASA Astrophysics Data System (ADS)
Laracuente, Nicholas; Grossman, Carl
2013-03-01
We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
Tolbert, Jeremy R; Kabali, Pratik; Brar, Simeranjit; Mukhopadhyay, Saibal
2009-01-01
We present a digital system for adaptive data compression for low power wireless transmission of Electroencephalography (EEG) data. The proposed system acts as a base-band processor between the EEG analog-to-digital front-end and RF transceiver. It performs a real-time accuracy energy trade-off for multi-channel EEG signal transmission by controlling the volume of transmitted data. We propose a multi-core digital signal processor for on-chip processing of EEG signals, to detect signal information of each channel and perform real-time adaptive compression. Our analysis shows that the proposed approach can provide significant savings in transmitter power with minimal impact on the overall signal accuracy.
MAT - MULTI-ATTRIBUTE TASK BATTERY FOR HUMAN OPERATOR WORKLOAD AND STRATEGIC BEHAVIOR RESEARCH
NASA Technical Reports Server (NTRS)
Comstock, J. R.
1994-01-01
MAT, a Multi-Attribute Task battery, gives the researcher the capability of performing multi-task workload and performance experiments. The battery provides a benchmark set of tasks for use in a wide range of laboratory studies of operator performance and workload. MAT incorporates tasks analogous to activities that aircraft crew members perform in flight, while providing a high degree of experiment control, performance data on each subtask, and freedom to use non-pilot test subjects. The MAT battery primary display is composed of four separate task windows which are as follows: a monitoring task window which includes gauges and warning lights, a tracking task window for the demands of manual control, a communication task window to simulate air traffic control communications, and a resource management task window which permits maintaining target levels on a fuel management task. In addition, a scheduling task window gives the researcher information about future task demands. The battery also provides the option of manual or automated control of tasks. The task generates performance data for each subtask. The task battery may be paused and onscreen workload rating scales presented to the subject. The MAT battery was designed to use a serially linked second computer to generate the voice messages for the Communications task. The MATREMX program and support files, which are included in the MAT package, were designed to work with the Heath Voice Card (Model HV-2000, available through the Heath Company, Benton Harbor, Michigan 49022); however, the MATREMX program and support files may easily be modified to work with other voice synthesizer or digitizer cards. The MAT battery task computer may also be used independent of the voice computer if no computer synthesized voice messages are desired or if some other method of presenting auditory messages is devised. MAT is written in QuickBasic and assembly language for IBM PC series and compatible computers running MS-DOS. The code in MAT is written for Microsoft QuickBasic 4.5 and Microsoft Macro Assembler 5.1. This package requires a joystick and EGA or VGA color graphics. An 80286, 386, or 486 processor machine is highly recommended. The standard distribution medium for MAT is a 5.25 inch 360K MS-DOS format diskette. The files are compressed using the PKZIP file compression utility. PKUNZIP is included on the distribution diskette. MAT was developed in 1992. IBM PC is a registered trademark of International Business Machines. MS-DOS, Microsoft QuickBasic, and Microsoft Macro Assembler are registered trademarks of Microsoft Corporation. PKZIP and PKUNZIP are registered trademarks of PKWare, Inc.
Multi-Scale Characterization of Orthotropic Microstructures
2008-04-01
D. Valiveti, S. J. Harris, J. Boileau, A domain partitioning based pre-processor for multi-scale modelling of cast aluminium alloys , Modelling and...SUPPLEMENTARY NOTES Journal article submitted to Modeling and Simulation in Materials Science and Engineering. PAO Case Number: WPAFB 08-3362...element for charac- terization or simulation to avoid misleading predictions of macroscopic defor- mation, fracture, or transport behavior. Likewise
Runtime Performance Monitoring Tool for RTEMS System Software
NASA Astrophysics Data System (ADS)
Cho, B.; Kim, S.; Park, H.; Kim, H.; Choi, J.; Chae, D.; Lee, J.
2007-08-01
RTEMS is a commercial-grade real-time operating system that supports multi-processor computers. However, there are not many development tools for RTEMS. In this paper, we report new RTEMS-based runtime performance monitoring tool. We have implemented a light weight runtime monitoring task with an extension to the RTEMS APIs. Using our tool, software developers can verify various performance- related parameters during runtime. Our tool can be used during software development phase and in-orbit operation as well. Our implemented target agent is light weight and has small overhead using SpaceWire interface. Efforts to reduce overhead and to add other monitoring parameters are currently under research.
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Multi-Threaded Algorithms for GPGPU in the ATLAS High Level Trigger
NASA Astrophysics Data System (ADS)
Conde Muíño, P.; ATLAS Collaboration
2017-10-01
General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded ATLAS High Level Trigger farm. We have developed a demonstrator including GPGPU implementations of Inner Detector and Muon tracking and Calorimeter clustering within the ATLAS software framework. ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, with Level-1 implemented in hardware and the High Level Trigger implemented in software running on a farm of commodity CPU. The High Level Trigger reduces the trigger rate from the 100 kHz Level-1 acceptance rate to 1.5 kHz for recording, requiring an average per-event processing time of ∼ 250 ms for this task. The selection in the high level trigger is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available farm resources presents a significant challenge that will increase significantly with future LHC upgrades. During the LHC data taking period starting in 2021, luminosity will reach up to three times the original design value. Luminosity will increase further to 7.5 times the design value in 2026 following LHC and ATLAS upgrades. Corresponding improvements in the speed of the reconstruction code will be needed to provide the required trigger selection power within affordable computing resources. Key factors determining the potential benefit of including GPGPU as part of the HLT processor farm are: the relative speed of the CPU and GPGPU algorithm implementations; the relative execution times of the GPGPU algorithms and serial code remaining on the CPU; the number of GPGPU required, and the relative financial cost of the selected GPGPU. We give a brief overview of the algorithms implemented and present new measurements that compare the performance of various configurations exploiting GPGPU cards.
Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.
2010-01-01
Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792
Energy Efficient Image/Video Data Transmission on Commercial Multi-Core Processors
Lee, Sungju; Kim, Heegon; Chung, Yongwha; Park, Daihee
2012-01-01
In transmitting image/video data over Video Sensor Networks (VSNs), energy consumption must be minimized while maintaining high image/video quality. Although image/video compression is well known for its efficiency and usefulness in VSNs, the excessive costs associated with encoding computation and complexity still hinder its adoption for practical use. However, it is anticipated that high-performance handheld multi-core devices will be used as VSN processing nodes in the near future. In this paper, we propose a way to improve the energy efficiency of image and video compression with multi-core processors while maintaining the image/video quality. We improve the compression efficiency at the algorithmic level or derive the optimal parameters for the combination of a machine and compression based on the tradeoff between the energy consumption and the image/video quality. Based on experimental results, we confirm that the proposed approach can improve the energy efficiency of the straightforward approach by a factor of 2∼5 without compromising image/video quality. PMID:23202181
Platform-Independence and Scheduling In a Multi-Threaded Real-Time Simulation
NASA Technical Reports Server (NTRS)
Sugden, Paul P.; Rau, Melissa A.; Kenney, P. Sean
2001-01-01
Aviation research often relies on real-time, pilot-in-the-loop flight simulation as a means to develop new flight software, flight hardware, or pilot procedures. Often these simulations become so complex that a single processor is incapable of performing the necessary computations within a fixed time-step. Threads are an elegant means to distribute the computational work-load when running on a symmetric multi-processor machine. However, programming with threads often requires operating system specific calls that reduce code portability and maintainability. While a multi-threaded simulation allows a significant increase in the simulation complexity, it also increases the workload of a simulation operator by requiring that the operator determine which models run on which thread. To address these concerns an object-oriented design was implemented in the NASA Langley Standard Real-Time Simulation in C++ (LaSRS++) application framework. The design provides a portable and maintainable means to use threads and also provides a mechanism to automatically load balance the simulation models.
An Energy-Aware Runtime Management of Multi-Core Sensory Swarms.
Kim, Sungchan; Yang, Hoeseok
2017-08-24
In sensory swarms, minimizing energy consumption under performance constraint is one of the key objectives. One possible approach to this problem is to monitor application workload that is subject to change at runtime, and to adjust system configuration adaptively to satisfy the performance goal. As today's sensory swarms are usually implemented using multi-core processors with adjustable clock frequency, we propose to monitor the CPU workload periodically and adjust the task-to-core allocation or clock frequency in an energy-efficient way in response to the workload variations. In doing so, we present an online heuristic that determines the most energy-efficient adjustment that satisfies the performance requirement. The proposed method is based on a simple yet effective energy model that is built upon performance prediction using IPC (instructions per cycle) measured online and power equation derived empirically. The use of IPC accounts for memory intensities of a given workload, enabling the accurate prediction of execution time. Hence, the model allows us to rapidly and accurately estimate the effect of the two control knobs, clock frequency adjustment and core allocation. The experiments show that the proposed technique delivers considerable energy saving of up to 45%compared to the state-of-the-art multi-core energy management technique.
An Energy-Aware Runtime Management of Multi-Core Sensory Swarms
Kim, Sungchan
2017-01-01
In sensory swarms, minimizing energy consumption under performance constraint is one of the key objectives. One possible approach to this problem is to monitor application workload that is subject to change at runtime, and to adjust system configuration adaptively to satisfy the performance goal. As today’s sensory swarms are usually implemented using multi-core processors with adjustable clock frequency, we propose to monitor the CPU workload periodically and adjust the task-to-core allocation or clock frequency in an energy-efficient way in response to the workload variations. In doing so, we present an online heuristic that determines the most energy-efficient adjustment that satisfies the performance requirement. The proposed method is based on a simple yet effective energy model that is built upon performance prediction using IPC (instructions per cycle) measured online and power equation derived empirically. The use of IPC accounts for memory intensities of a given workload, enabling the accurate prediction of execution time. Hence, the model allows us to rapidly and accurately estimate the effect of the two control knobs, clock frequency adjustment and core allocation. The experiments show that the proposed technique delivers considerable energy saving of up to 45%compared to the state-of-the-art multi-core energy management technique. PMID:28837094
Large calculation of the flow over a hypersonic vehicle using a GPU
NASA Astrophysics Data System (ADS)
Elsen, Erich; LeGresley, Patrick; Darve, Eric
2008-12-01
Graphics processing units are capable of impressive computing performance up to 518 Gflops peak performance. Various groups have been using these processors for general purpose computing; most efforts have focussed on demonstrating relatively basic calculations, e.g. numerical linear algebra, or physical simulations for visualization purposes with limited accuracy. This paper describes the simulation of a hypersonic vehicle configuration with detailed geometry and accurate boundary conditions using the compressible Euler equations. To the authors' knowledge, this is the most sophisticated calculation of this kind in terms of complexity of the geometry, the physical model, the numerical methods employed, and the accuracy of the solution. The Navier-Stokes Stanford University Solver (NSSUS) was used for this purpose. NSSUS is a multi-block structured code with a provably stable and accurate numerical discretization which uses a vertex-based finite-difference method. A multi-grid scheme is used to accelerate the solution of the system. Based on a comparison of the Intel Core 2 Duo and NVIDIA 8800GTX, speed-ups of over 40× were demonstrated for simple test geometries and 20× for complex geometries.
High-Speed Real-Time Resting-State fMRI Using Multi-Slab Echo-Volumar Imaging
Posse, Stefan; Ackley, Elena; Mutihac, Radu; Zhang, Tongsheng; Hummatov, Ruslan; Akhtari, Massoud; Chohan, Muhammad; Fisch, Bruce; Yonas, Howard
2013-01-01
We recently demonstrated that ultra-high-speed real-time fMRI using multi-slab echo-volumar imaging (MEVI) significantly increases sensitivity for mapping task-related activation and resting-state networks (RSNs) compared to echo-planar imaging (Posse et al., 2012). In the present study we characterize the sensitivity of MEVI for mapping RSN connectivity dynamics, comparing independent component analysis (ICA) and a novel seed-based connectivity analysis (SBCA) that combines sliding-window correlation analysis with meta-statistics. This SBCA approach is shown to minimize the effects of confounds, such as movement, and CSF and white matter signal changes, and enables real-time monitoring of RSN dynamics at time scales of tens of seconds. We demonstrate highly sensitive mapping of eloquent cortex in the vicinity of brain tumors and arterio-venous malformations, and detection of abnormal resting-state connectivity in epilepsy. In patients with motor impairment, resting-state fMRI provided focal localization of sensorimotor cortex compared with more diffuse activation in task-based fMRI. The fast acquisition speed of MEVI enabled segregation of cardiac-related signal pulsation using ICA, which revealed distinct regional differences in pulsation amplitude and waveform, elevated signal pulsation in patients with arterio-venous malformations and a trend toward reduced pulsatility in gray matter of patients compared with healthy controls. Mapping cardiac pulsation in cortical gray matter may carry important functional information that distinguishes healthy from diseased tissue vasculature. This novel fMRI methodology is particularly promising for mapping eloquent cortex in patients with neurological disease, having variable degree of cooperation in task-based fMRI. In conclusion, ultra-high-real-time speed fMRI enhances the sensitivity of mapping the dynamics of resting-state connectivity and cerebro-vascular pulsatility for clinical and neuroscience research applications. PMID:23986677
Clustered Multi-Task Learning for Automatic Radar Target Recognition
Li, Cong; Bao, Weimin; Xu, Luping; Zhang, Hua
2017-01-01
Model training is a key technique for radar target recognition. Traditional model training algorithms in the framework of single task leaning ignore the relationships among multiple tasks, which degrades the recognition performance. In this paper, we propose a clustered multi-task learning, which can reveal and share the multi-task relationships for radar target recognition. To further make full use of these relationships, the latent multi-task relationships in the projection space are taken into consideration. Specifically, a constraint term in the projection space is proposed, the main idea of which is that multiple tasks within a close cluster should be close to each other in the projection space. In the proposed method, the cluster structures and multi-task relationships can be autonomously learned and utilized in both of the original and projected space. In view of the nonlinear characteristics of radar targets, the proposed method is extended to a non-linear kernel version and the corresponding non-linear multi-task solving method is proposed. Comprehensive experimental studies on simulated high-resolution range profile dataset and MSTAR SAR public database verify the superiority of the proposed method to some related algorithms. PMID:28953267
Electro-Optic Computing Architectures: Volume II. Components and System Design and Analysis
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit
2008-07-01
generation of process partitioning, a thread pipelining becomes possible. In this paper we briefly summarize the requirements and trends for FADEC based... FADEC environment, presenting a hypothetical realization of an example application. Finally we discuss the application of Time-Triggered...based control applications of the future. 15. SUBJECT TERMS Gas turbine, FADEC , Multi-core processing technology, disturbed based control
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.
Zierke, Stephanie; Bakos, Jason D
2010-04-12
Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Multi-fuel reformers for fuel cells used in transportation. Phase 1: Multi-fuel reformers
NASA Astrophysics Data System (ADS)
1994-05-01
DOE has established the goal, through the Fuel Cells in Transportation Program, of fostering the rapid development and commercialization of fuel cells as economic competitors for the internal combustion engine. Central to this goal is a safe feasible means of supplying hydrogen of the required purity to the vehicular fuel cell system. Two basic strategies are being considered: (1) on-board fuel processing whereby alternative fuels such as methanol, ethanol or natural gas stored on the vehicle undergo reformation and subsequent processing to produce hydrogen, and (2) on-board storage of pure hydrogen provided by stationary fuel processing plants. This report analyzes fuel processor technologies, types of fuel and fuel cell options for on-board reformation. As the Phase 1 of a multi-phased program to develop a prototype multi-fuel reformer system for a fuel cell powered vehicle, the objective of this program was to evaluate the feasibility of a multi-fuel reformer concept and to select a reforming technology for further development in the Phase 2 program, with the ultimate goal of integration with a DOE-designated fuel cell and vehicle configuration. The basic reformer processes examined in this study included catalytic steam reforming (SR), non-catalytic partial oxidation (POX) and catalytic partial oxidation (also known as Autothermal Reforming, or ATR). Fuels under consideration in this study included methanol, ethanol, and natural gas. A systematic evaluation of reforming technologies, fuels, and transportation fuel cell applications was conducted for the purpose of selecting a suitable multi-fuel processor for further development and demonstration in a transportation application.
The Minerva Multi-Microprocessor.
A multiprocessor system is described which is an experiment in low cost, extensible, multiprocessor architectures. Global issues such as inclusion of a central bus, design of the bus arbiter, and methods of interrupt handling are considered. The system initially includes two processor types, based on microprocessors, and these are discussed. Methods for reducing processor demand for the central bus are described.
Multiple core computer processor with globally-accessible local memories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shalf, John; Donofrio, David; Oliker, Leonid
A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality ofmore » processor cores.« less
Biphasic responses in multi-site phosphorylation systems.
Suwanmajo, Thapanar; Krishnan, J
2013-12-06
Multi-site phosphorylation systems are repeatedly encountered in cellular biology and multi-site modification is a basic building block of post-translational modification. In this paper, we demonstrate how distributive multi-site modification mechanisms by a single kinase/phosphatase pair can lead to biphasic/partial biphasic dose-response characteristics for the maximally phosphorylated substrate at steady state. We use simulations and analysis to uncover a hidden competing effect which is responsible for this and analyse how it may be accentuated. We build on this to analyse different variants of multi-site phosphorylation mechanisms showing that some mechanisms are intrinsically not capable of displaying this behaviour. This provides both a consolidated understanding of how and under what conditions biphasic responses are obtained in multi-site phosphorylation and a basis for discriminating between different mechanisms based on this. We also demonstrate how this behaviour may be combined with other behaviour such as threshold and bistable responses, demonstrating the capacity of multi-site phosphorylation systems to act as complex molecular signal processors.
Examining the Impact of Off-Task Multi-Tasking with Technology on Real-Time Classroom Learning
ERIC Educational Resources Information Center
Wood, Eileen; Zivcakova, Lucia; Gentile, Petrice; Archer, Karin; De Pasquale, Domenica; Nosko, Amanda
2012-01-01
The purpose of the present study was to examine the impact of multi-tasking with digital technologies while attempting to learn from real-time classroom lectures in a university setting. Four digitally-based multi-tasking activities (texting using a cell-phone, emailing, MSN messaging and Facebook[TM]) were compared to 3 control groups…
Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology
NASA Astrophysics Data System (ADS)
Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.
2009-04-01
Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.
Neural simulations on multi-core architectures.
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
USC orthogonal multiprocessor for image processing with neural networks
NASA Astrophysics Data System (ADS)
Hwang, Kai; Panda, Dhabaleswar K.; Haddadi, Navid
1990-07-01
This paper presents the architectural features and imaging applications of the Orthogonal MultiProcessor (OMP) system, which is under construction at the University of Southern California with research funding from NSF and assistance from several industrial partners. The prototype OMP is being built with 16 Intel i860 RISC microprocessors and 256 parallel memory modules using custom-designed spanning buses, which are 2-D interleaved and orthogonally accessed without conflicts. The 16-processor OMP prototype is targeted to achieve 430 MIPS and 600 Mflops, which have been verified by simulation experiments based on the design parameters used. The prototype OMP machine will be initially applied for image processing, computer vision, and neural network simulation applications. We summarize important vision and imaging algorithms that can be restructured with neural network models. These algorithms can efficiently run on the OMP hardware with linear speedup. The ultimate goal is to develop a high-performance Visual Computer (Viscom) for integrated low- and high-level image processing and vision tasks.
Programming methodology for a general purpose automation controller
NASA Technical Reports Server (NTRS)
Sturzenbecker, M. C.; Korein, J. U.; Taylor, R. H.
1987-01-01
The General Purpose Automation Controller is a multi-processor architecture for automation programming. A methodology has been developed whose aim is to simplify the task of programming distributed real-time systems for users in research or manufacturing. Programs are built by configuring function blocks (low-level computations) into processes using data flow principles. These processes are activated through the verb mechanism. Verbs are divided into two classes: those which support devices, such as robot joint servos, and those which perform actions on devices, such as motion control. This programming methodology was developed in order to achieve the following goals: (1) specifications for real-time programs which are to a high degree independent of hardware considerations such as processor, bus, and interconnect technology; (2) a component approach to software, so that software required to support new devices and technologies can be integrated by reconfiguring existing building blocks; (3) resistance to error and ease of debugging; and (4) a powerful command language interface.
Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions
2014-05-01
processor developed by IBM and other companies , incorpo- rates the verb—POWER5— processor as the Power Processor Element (PPE), one of the early general...deliver an power efficient single-precision peak performance of more than 256 GFlops. Substantially more raw power became available later, when nVIDIA ...algorithms, including IBM’s Cell/B.E., GPUs from NVidia and AMD and many-core CPUs from Intel.27 The vast growth of digital video content has been a
Real Time Target Tracking Using Dedicated Vision Hardware
NASA Astrophysics Data System (ADS)
Kambies, Keith; Walsh, Peter
1988-03-01
This paper describes a real-time vision target tracking system developed by Adaptive Automation, Inc. and delivered to NASA's Launch Equipment Test Facility, Kennedy Space Center, Florida. The target tracking system is part of the Robotic Application Development Laboratory (RADL) which was designed to provide NASA with a general purpose robotic research and development test bed for the integration of robot and sensor systems. One of the first RADL system applications is the closing of a position control loop around a six-axis articulated arm industrial robot using a camera and dedicated vision processor as the input sensor so that the robot can locate and track a moving target. The vision system is inside of the loop closure of the robot tracking system, therefore, tight throughput and latency constraints are imposed on the vision system that can only be met with specialized hardware and a concurrent approach to the processing algorithms. State of the art VME based vision boards capable of processing the image at frame rates were used with a real-time, multi-tasking operating system to achieve the performance required. This paper describes the high speed vision based tracking task, the system throughput requirements, the use of dedicated vision hardware architecture, and the implementation design details. Important to the overall philosophy of the complete system was the hierarchical and modular approach applied to all aspects of the system, hardware and software alike, so there is special emphasis placed on this topic in the paper.
Project Report: Automatic Sequence Processor Software Analysis
NASA Technical Reports Server (NTRS)
Benjamin, Brandon
2011-01-01
The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.
A Comparison of Two Methods Used for Ranking Task Exposure Levels Using Simulated Multi-Task Data
1999-12-17
OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE A COMPARISON OF TWO METHODS USED FOR RANKING TASK EXPOSURE LEVELS USING SIMULATED MULTI-TASK...COSTANTINO Oklahoma City, Oklahoma 1999 ^ooo wx °^ A COMPARISON OF TWO METHODS USED FOR RANKING TASK EXPOSURE LEVELS USING SIMULATED MULTI-TASK DATA... METHODS AND MATERIALS 9 TV. RESULTS 14 V. DISCUSSION AND CONCLUSION 28 LIST OF REFERENCES 31 APPENDICES 33 Appendix A JJ -in Appendix B Dl IV
A Modular Pipelined Processor for High Resolution Gamma-Ray Spectroscopy
NASA Astrophysics Data System (ADS)
Veiga, Alejandro; Grunfeld, Christian
2016-02-01
The design of a digital signal processor for gamma-ray applications is presented in which a single ADC input can simultaneously provide temporal and energy characterization of gamma radiation for a wide range of applications. Applying pipelining techniques, the processor is able to manage and synchronize very large volumes of streamed real-time data. Its modular user interface provides a flexible environment for experimental design. The processor can fit in a medium-sized FPGA device operating at ADC sampling frequency, providing an efficient solution for multi-channel applications. Two experiments are presented in order to characterize its temporal and energy resolution.
Feeling smart: Effects of caffeine and glucose on cognition, mood and self-judgment.
Ullrich, Susann; de Vries, Yfke C; Kühn, Simone; Repantis, Dimitris; Dresler, Martin; Ohla, Kathrin
2015-11-01
During education and early career, young adults often face examinations and assessment centers. Coffee and energy drinks are convenient and commonly used to enhance or maintain performance in these situations. Whether these macronutrients improve performance in a demanding and drawn-out multi-task situation is not clear. Using double-blind, placebo-controlled studies, we set out to examine the effects of caffeine and glucose in an assessment center-like situation, under natural consumption conditions, in a group of young adults who were heterogeneous with respect to consumption patterns. We measured multi-task performance including logical thinking, processing speed, numeric and verbal memory, attention and the ability to concentrate, and mood over a two-hour period. Caffeine and glucose were administered in common beverages with appropriate placebo controls allowing the assessment of psychological effects of expectancy. Importantly, and in contrast to most previous studies, participants retained their habitual caffeine and sugar intake (studies 1 and 2) as this represents common behavior. Based on the bulk of literature, we hypothesized that (i) caffeine enhances attentional performance and mood, while performance in more complex tasks will remain unchanged, and that (ii) glucose enhances performance on memory tasks accompanied with negative mood. Our results provide evidence that neither caffeine nor glucose significantly influence cognitive performance when compared with placebo, water, or no treatment controls in a multi-task setting. Yet, caffeine and, by trend, placebo improve dispositions such that participants perceive preserved mental energy throughout the test procedure. These subjective effects were stronger after 24 h caffeine abstinence (study 3). Future studies will have to address whether these mood changes actually result in increased motivation during a challenging task.
Multi-task feature selection in microarray data by binary integer programming.
Lan, Liang; Vucetic, Slobodan
2013-12-20
A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Phase retrieval algorithm for JWST Flight and Testbed Telescope
NASA Astrophysics Data System (ADS)
Dean, Bruce H.; Aronstein, David L.; Smith, J. Scott; Shiri, Ron; Acton, D. Scott
2006-06-01
An image-based wavefront sensing and control algorithm for the James Webb Space Telescope (JWST) is presented. The algorithm heritage is discussed in addition to implications for algorithm performance dictated by NASA's Technology Readiness Level (TRL) 6. The algorithm uses feedback through an adaptive diversity function to avoid the need for phase-unwrapping post-processing steps. Algorithm results are demonstrated using JWST Testbed Telescope (TBT) commissioning data and the accuracy is assessed by comparison with interferometer results on a multi-wave phase aberration. Strategies for minimizing aliasing artifacts in the recovered phase are presented and orthogonal basis functions are implemented for representing wavefronts in irregular hexagonal apertures. Algorithm implementation on a parallel cluster of high-speed digital signal processors (DSPs) is also discussed.
First experience of vectorizing electromagnetic physics models for detector simulation
NASA Astrophysics Data System (ADS)
Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.
2015-12-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
Palttala, Iida; Heinämäki, Jyrki; Honkanen, Outi; Suominen, Risto; Antikainen, Osmo; Hirvonen, Jouni; Yliruusi, Jouko
2013-03-01
To date, little is known on applicability of different types of pharmaceutical dosage forms in an automated high-speed multi-dose dispensing process. The purpose of the present study was to identify and further investigate various process-induced and/or product-related limitations associated with multi-dose dispensing process. The rates of product defects and dose dispensing errors in automated multi-dose dispensing were retrospectively investigated during a 6-months follow-up period. The study was based on the analysis of process data of totally nine automated high-speed multi-dose dispensing systems. Special attention was paid to the dependence of multi-dose dispensing errors/product defects and pharmaceutical tablet properties (such as shape, dimensions, weight, scored lines, coatings, etc.) to profile the most suitable forms of tablets for automated dose dispensing systems. The relationship between the risk of errors in dose dispensing and tablet characteristics were visualized by creating a principal component analysis (PCA) model for the outcome of dispensed tablets. The two most common process-induced failures identified in the multi-dose dispensing are predisposal of tablet defects and unexpected product transitions in the medication cassette (dose dispensing error). The tablet defects are product-dependent failures, while the tablet transitions are dependent on automated multi-dose dispensing systems used. The occurrence of tablet defects is approximately twice as common as tablet transitions. Optimal tablet preparation for the high-speed multi-dose dispensing would be a round-shaped, relatively small/middle-sized, film-coated tablet without any scored line. Commercial tablet products can be profiled and classified based on their suitability to a high-speed multi-dose dispensing process.
Ultra-Compact Transputer-Based Controller for High-Level, Multi-Axis Coordination
NASA Technical Reports Server (NTRS)
Zenowich, Brian; Crowell, Adam; Townsend, William T.
2013-01-01
The design of machines that rely on arrays of servomotors such as robotic arms, orbital platforms, and combinations of both, imposes a heavy computational burden to coordinate their actions to perform coherent tasks. For example, the robotic equivalent of a person tracing a straight line in space requires enormously complex kinematics calculations, and complexity increases with the number of servo nodes. A new high-level architecture for coordinated servo-machine control enables a practical, distributed transputer alternative to conventional central processor electronics. The solution is inherently scalable, dramatically reduces bulkiness and number of conductor runs throughout the machine, requires only a fraction of the power, and is designed for cooling in a vacuum.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors
NASA Astrophysics Data System (ADS)
Yi, Hongsuk
2014-03-01
Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
Binaural sensitivity in children who use bilateral cochlear implants.
Ehlers, Erica; Goupell, Matthew J; Zheng, Yi; Godar, Shelly P; Litovsky, Ruth Y
2017-06-01
Children who are deaf and receive bilateral cochlear implants (BiCIs) perform better on spatial hearing tasks using bilateral rather than unilateral inputs; however, they underperform relative to normal-hearing (NH) peers. This gap in performance is multi-factorial, including the inability of speech processors to reliably deliver binaural cues. Although much is known regarding binaural sensitivity of adults with BiCIs, less is known about how the development of binaural sensitivity in children with BiCIs compared to NH children. Sixteen children (ages 9-17 years) were tested using synchronized research processors. Interaural time differences and interaural level differences (ITDs and ILDs, respectively) were presented to pairs of pitch-matched electrodes. Stimuli were 300-ms, 100-pulses-per-second, constant-amplitude pulse trains. In the first and second experiments, discrimination of interaural cues (either ITDs or ILDs) was measured using a two-interval left/right task. In the third experiment, subjects reported the perceived intracranial position of ITDs and ILDs in a lateralization task. All children demonstrated sensitivity to ILDs, possibly due to monaural level cues. Children who were born deaf had weak or absent sensitivity to ITDs; in contrast, ITD sensitivity was noted in children with previous exposure to acoustic hearing. Therefore, factors such as auditory deprivation, in particular, lack of early exposure to consistent timing differences between the ears, may delay the maturation of binaural circuits and cause insensitivity to binaural differences.
Design and multi-physics optimization of rotary MRF brakes
NASA Astrophysics Data System (ADS)
Topcu, Okan; Taşcıoğlu, Yiğit; Konukseven, Erhan İlhan
2018-03-01
Particle swarm optimization (PSO) is a popular method to solve the optimization problems. However, calculations for each particle will be excessive when the number of particles and complexity of the problem increases. As a result, the execution speed will be too slow to achieve the optimized solution. Thus, this paper proposes an automated design and optimization method for rotary MRF brakes and similar multi-physics problems. A modified PSO algorithm is developed for solving multi-physics engineering optimization problems. The difference between the proposed method and the conventional PSO is to split up the original single population into several subpopulations according to the division of labor. The distribution of tasks and the transfer of information to the next party have been inspired by behaviors of a hunting party. Simulation results show that the proposed modified PSO algorithm can overcome the problem of heavy computational burden of multi-physics problems while improving the accuracy. Wire type, MR fluid type, magnetic core material, and ideal current inputs have been determined by the optimization process. To the best of the authors' knowledge, this multi-physics approach is novel for optimizing rotary MRF brakes and the developed PSO algorithm is capable of solving other multi-physics engineering optimization problems. The proposed method has showed both better performance compared to the conventional PSO and also has provided small, lightweight, high impedance rotary MRF brake designs.
PANDA: A distributed multiprocessor operating system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chubb, P.
1989-01-01
PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Multi-Task Convolutional Neural Network for Pose-Invariant Face Recognition
NASA Astrophysics Data System (ADS)
Yin, Xi; Liu, Xiaoming
2018-02-01
This paper explores multi-task learning (MTL) for face recognition. We answer the questions of how and why MTL can improve the face recognition performance. First, we propose a multi-task Convolutional Neural Network (CNN) for face recognition where identity classification is the main task and pose, illumination, and expression estimations are the side tasks. Second, we develop a dynamic-weighting scheme to automatically assign the loss weight to each side task, which is a crucial problem in MTL. Third, we propose a pose-directed multi-task CNN by grouping different poses to learn pose-specific identity features, simultaneously across all poses. Last but not least, we propose an energy-based weight analysis method to explore how CNN-based MTL works. We observe that the side tasks serve as regularizations to disentangle the variations from the learnt identity features. Extensive experiments on the entire Multi-PIE dataset demonstrate the effectiveness of the proposed approach. To the best of our knowledge, this is the first work using all data in Multi-PIE for face recognition. Our approach is also applicable to in-the-wild datasets for pose-invariant face recognition and achieves comparable or better performance than state of the art on LFW, CFP, and IJB-A datasets.
Energy-aware Thread and Data Management in Heterogeneous Multi-core, Multi-memory Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, Chun-Yi
By 2004, microprocessor design focused on multicore scaling—increasing the number of cores per die in each generation—as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitivemore » or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems.« less
Geospace simulations using modern accelerator processor technology
NASA Astrophysics Data System (ADS)
Germaschewski, K.; Raeder, J.; Larson, D. J.
2009-12-01
OpenGGCM (Open Geospace General Circulation Model) is a well-established numerical code simulating the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is currently limited by computational constraints on grid resolution. OpenGGCM has been ported to make use of the added computational powerof modern accelerator based processor architectures, in particular the Cell processor. The Cell architecture is a novel inhomogeneous multicore architecture capable of achieving up to 230 GFLops on a single chip. The University of New Hampshire recently acquired a PowerXCell 8i based computing cluster, and here we will report initial performance results of OpenGGCM. Realizing the high theoretical performance of the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallelization approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We use a modern technique, automatic code generation, which shields the application programmer from having to deal with all of the implementation details just described, keeping the code much more easily maintainable. Our preliminary results indicate excellent performance, a speed-up of a factor of 30 compared to the unoptimized version.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications
NASA Technical Reports Server (NTRS)
Povitsky, Alex; Morris, Philip J.
1999-01-01
In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, F.; Morel, M.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
Final Report for Project DE-FC02-06ER25755 [Pmodels2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panda, Dhabaleswar; Sadayappan, P.
2014-03-12
In this report, we describe the research accomplished by the OSU team under the Pmodels2 project. The team has worked on various angles: designing high performance MPI implementations on modern networking technologies (Mellanox InfiniBand (including the new ConnectX2 architecture and Quad Data Rate), QLogic InfiniPath, the emerging 10GigE/iWARP and RDMA over Converged Enhanced Ethernet (RoCE) and Obsidian IB-WAN), studying MPI scalability issues for multi-thousand node clusters using XRC transport, scalable job start-up, dynamic process management support, efficient one-sided communication, protocol offloading and designing scalable collective communication libraries for emerging multi-core architectures. New designs conforming to the Argonne’s Nemesis interface havemore » also been carried out. All of these above solutions have been integrated into the open-source MVAPICH/MVAPICH2 software. This software is currently being used by more than 2,100 organizations worldwide (in 71 countries). As of January ’14, more than 200,000 downloads have taken place from the OSU Web site. In addition, many InfiniBand vendors, server vendors, system integrators and Linux distributors have been incorporating MVAPICH/MVAPICH2 into their software stacks and distributing it. Several InfiniBand systems using MVAPICH/MVAPICH2 have obtained positions in the TOP500 ranking of supercomputers in the world. The latest November ’13 ranking include the following systems: 7th ranked Stampede system at TACC with 462,462 cores; 11th ranked Tsubame 2.5 system at Tokyo Institute of Technology with 74,358 cores; 16th ranked Pleiades system at NASA with 81,920 cores; Work on PGAS models has proceeded on multiple directions. The Scioto framework, which supports task-parallelism in one-sided and global-view parallel programming, has been extended to allow multi-processor tasks that are executed by processor groups. A quantum Monte Carlo application is being ported onto the extended Scioto framework. A public release of Global Trees (GT) has been made, along with the Global Chunks (GC) framework on which GT is built. The Global Chunks (GC) layer is also being used as the basis for the development of a higher level Global Graphs (GG) layer. The Global Graphs (GG) system will provide a global address space view of distributed graph data structures on distributed memory systems.« less
Performance Analysis of a Hybrid Overset Multi-Block Application on Multiple Architectures
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak
2003-01-01
This paper presents a detailed performance analysis of a multi-block overset grid compu- tational fluid dynamics app!ication on multiple state-of-the-art computer architectures. The application is implemented using a hybrid MPI+OpenMP programming paradigm that exploits both coarse and fine-grain parallelism; the former via MPI message passing and the latter via OpenMP directives. The hybrid model also extends the applicability of multi-block programs to large clusters of SNIP nodes by overcoming the restriction that the number of processors be less than the number of grid blocks. A key kernel of the application, namely the LU-SGS linear solver, had to be modified to enhance the performance of the hybrid approach on the target machines. Investigations were conducted on cacheless Cray SX6 vector processors, cache-based IBM Power3 and Power4 architectures, and single system image SGI Origin3000 platforms. Overall results for complex vortex dynamics simulations demonstrate that the SX6 achieves the highest performance and outperforms the RISC-based architectures; however, the best scaling performance was achieved on the Power3.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Procassini, R.J.
1997-12-31
The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution ofmore » particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.« less
Liu, Y; Wickens, C D
1994-11-01
The evaluation of mental workload is becoming increasingly important in system design and analysis. The present study examined the structure and assessment of mental workload in performing decision and monitoring tasks by focusing on two mental workload measurements: subjective assessment and time estimation. The task required the assignment of a series of incoming customers to the shortest of three parallel service lines displayed on a computer monitor. The subject was either in charge of the customer assignment (manual mode) or was monitoring an automated system performing the same task (automatic mode). In both cases, the subjects were required to detect the non-optimal assignments that they or the computer had made. Time pressure was manipulated by the experimenter to create fast and slow conditions. The results revealed a multi-dimensional structure of mental workload and a multi-step process of subjective workload assessment. The results also indicated that subjective workload was more influenced by the subject's participatory mode than by the factor of task speed. The time estimation intervals produced while performing the decision and monitoring tasks had significantly greater length and larger variability than those produced while either performing no other tasks or performing a well practised customer assignment task. This result seemed to indicate that time estimation was sensitive to the presence of perceptual/cognitive demands, but not to response related activities to which behavioural automaticity has developed.
The Use Of Videography For Three-Dimensional Motion Analysis
NASA Astrophysics Data System (ADS)
Hawkins, D. A.; Hawthorne, D. L.; DeLozier, G. S.; Campbell, K. R.; Grabiner, M. D.
1988-02-01
Special video path editing capabilities with custom hardware and software, have been developed for use in conjunction with existing video acquisition hardware and firmware. This system has simplified the task of quantifying the kinematics of human movement. A set of retro-reflective markers are secured to a subject performing a given task (i.e. walking, throwing, swinging a golf club, etc.). Multiple cameras, a video processor, and a computer work station collect video data while the task is performed. Software has been developed to edit video files, create centroid data, and identify marker paths. Multi-camera path files are combined to form a 3D path file using the DLT method of cinematography. A separate program converts the 3D path file into kinematic data by creating a set of local coordinate axes and performing a series of coordinate transformations from one local system to the next. The kinematic data is then displayed for appropriate review and/or comparison.
De Ceulaer, Geert; Bestel, Julie; Mülder, Hans E; Goldbeck, Felix; de Varebeke, Sebastien Pierre Janssens; Govaerts, Paul J
2016-05-01
Roger is a digital adaptive multi-channel remote microphone technology that wirelessly transmits a speaker's voice directly to a hearing instrument or cochlear implant sound processor. Frequency hopping between channels, in combination with repeated broadcast, avoids interference issues that have limited earlier generation FM systems. This study evaluated the benefit of the Roger Pen transmitter microphone in a multiple talker network (MTN) for cochlear implant users in a simulated noisy conversation setting. Twelve post-lingually deafened adult Advanced Bionics CII/HiRes 90K recipients were recruited. Subjects used a Naida CI Q70 processor with integrated Roger 17 receiver. The test environment simulated four people having a meal in a noisy restaurant, one the CI user (listener), and three companions (talkers) talking non-simultaneously in a diffuse field of multi-talker babble. Speech reception thresholds (SRTs) were determined without the Roger Pen, with one Roger Pen, and with three Roger Pens in an MTN. Using three Roger Pens in an MTN improved the SRT by 14.8 dB over using no Roger Pen, and by 13.1 dB over using a single Roger Pen (p < 0.0001). The Roger Pen in an MTN provided statistically and clinically significant improvement in speech perception in noise for Advanced Bionics cochlear implant recipients. The integrated Roger 17 receiver made it easy for users of the Naida CI Q70 processor to take advantage of the Roger system. The listening advantage and ease of use should encourage more clinicians to recommend and fit Roger in adult cochlear implant patients.
High frequency signal acquisition and control system based on DSP+FPGA
NASA Astrophysics Data System (ADS)
Liu, Xiao-qi; Zhang, Da-zhi; Yin, Ya-dong
2017-10-01
This paper introduces a design and implementation of high frequency signal acquisition and control system based on DSP + FPGA. The system supports internal/external clock and internal/external trigger sampling. It has a maximum sampling rate of 400MBPS and has a 1.4GHz input bandwidth for the ADC. Data can be collected continuously or periodically in systems and they are stored in DDR2. At the same time, the system also supports real-time acquisition, the collected data after digital frequency conversion and Cascaded Integrator-Comb (CIC) filtering, which then be sent to the CPCI bus through the high-speed DSP, can be assigned to the fiber board for subsequent processing. The system integrates signal acquisition and pre-processing functions, which uses high-speed A/D, high-speed DSP and FPGA mixed technology and has a wide range of uses in data acquisition and recording. In the signal processing, the system can be seamlessly connected to the dedicated processor board. The system has the advantages of multi-selectivity, good scalability and so on, which satisfies the different requirements of different signals in different projects.
Low Power, Low Mass, Modular, Multi-band Software-defined Radios
NASA Technical Reports Server (NTRS)
Haskins, Christopher B. (Inventor); Millard, Wesley P. (Inventor)
2013-01-01
Methods and systems to implement and operate software-defined radios (SDRs). An SDR may be configured to perform a combination of fractional and integer frequency synthesis and direct digital synthesis under control of a digital signal processor, which may provide a set of relatively agile, flexible, low-noise, and low spurious, timing and frequency conversion signals, and which may be used to maintain a transmit path coherent with a receive path. Frequency synthesis may include dithering to provide additional precision. The SDR may include task-specific software-configurable systems to perform tasks in accordance with software-defined parameters or personalities. The SDR may include a hardware interface system to control hardware components, and a host interface system to provide an interface to the SDR with respect to a host system. The SDR may be configured for one or more of communications, navigation, radio science, and sensors.
Designing minimal space telerobotics systems for maximum performance
NASA Technical Reports Server (NTRS)
Backes, Paul G.; Long, Mark K.; Steele, Robert D.
1992-01-01
The design of the remote site of a local-remote telerobot control system is described which addresses the constraints of limited computational power available at the remote site control system while providing a large range of control capabilities. The Modular Telerobot Task Execution System (MOTES) provides supervised autonomous control, shared control and teleoperation for a redundant manipulator. The system is capable of nominal task execution as well as monitoring and reflex motion. The MOTES system is minimized while providing a large capability by limiting its functionality to only that which is necessary at the remote site and by utilizing a unified multi-sensor based impedance control scheme. A command interpreter similar to one used on robotic spacecraft is used to interpret commands received from the local site. The system is written in Ada and runs in a VME environment on 68020 processors and initially controls a Robotics Research K1207 7 degree of freedom manipulator.
Multi-task feature learning by using trace norm regularization
NASA Astrophysics Data System (ADS)
Jiangmei, Zhang; Binfeng, Yu; Haibo, Ji; Wang, Kunpeng
2017-11-01
Multi-task learning can extract the correlation of multiple related machine learning problems to improve performance. This paper considers applying the multi-task learning method to learn a single task. We propose a new learning approach, which employs the mixture of expert model to divide a learning task into several related sub-tasks, and then uses the trace norm regularization to extract common feature representation of these sub-tasks. A nonlinear extension of this approach by using kernel is also provided. Experiments conducted on both simulated and real data sets demonstrate the advantage of the proposed approach.
Multi-element germanium detectors for synchrotron applications
NASA Astrophysics Data System (ADS)
Rumaiz, A. K.; Kuczewski, A. J.; Mead, J.; Vernon, E.; Pinelli, D.; Dooryhee, E.; Ghose, S.; Caswell, T.; Siddons, D. P.; Miceli, A.; Baldwin, J.; Almer, J.; Okasinski, J.; Quaranta, O.; Woods, R.; Krings, T.; Stock, S.
2018-04-01
We have developed a series of monolithic multi-element germanium detectors, based on sensor arrays produced by the Forschungzentrum Julich, and on Application-specific integrated circuits (ASICs) developed at Brookhaven. Devices have been made with element counts ranging from 64 to 384. These detectors are being used at NSLS-II and APS for a range of diffraction experiments, both monochromatic and energy-dispersive. Compact and powerful readout systems have been developed, based on the new generation of FPGA system-on-chip devices, which provide closely coupled multi-core processors embedded in large gate arrays. We will discuss the technical details of the systems, and present some of the results from them.
An enhanced high-speed multi-digit BCD adder using quantum-dot cellular automata
NASA Astrophysics Data System (ADS)
Ajitha, D.; Ramanaiah, K. V.; Sumalatha, V.
2017-02-01
The advent of development of high-performance, low-power digital circuits is achieved by a suitable emerging nanodevice called quantum-dot cellular automata (QCA). Even though many efficient arithmetic circuits were designed using QCA, there is still a challenge to implement high-speed circuits in an optimized manner. Among these circuits, one of the essential structures is a parallel multi-digit decimal adder unit with significant speed which is very attractive for future environments. To achieve high speed, a new correction logic formulation method is proposed for single and multi-digit BCD adder. The proposed enhanced single-digit BCD adder (ESDBA) is 26% faster than the carry flow adder (CFA)-based BCD adder. The multi-digit operations are also performed using the proposed ESDBA, which is cascaded innovatively. The enhanced multi-digit BCD adder (EMDBA) performs two 4-digit and two 8-digit BCD addition 50% faster than the CFA-based BCD adder with the nominal overhead of the area. The EMDBA performs two 4-digit BCD addition 24% faster with 23% decrease in the area, similarly for 8-digit operation the EMDBA achieves 36% increase in speed with 21% less area compared to the existing carry look ahead (CLA)-based BCD adder design. The proposed multi-digit adder produces significantly less delay of (N –1) + 3.5 clock cycles compared to the N* One digit BCD adder delay required by the conventional BCD adder method. It is observed that as per our knowledge this is the first innovative proposal for multi-digit BCD addition using QCA.
NASA Astrophysics Data System (ADS)
Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.
1986-06-01
In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava
2017-01-01
For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particlemore » tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.« less
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
NASA Astrophysics Data System (ADS)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2017-08-01
For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Image Matrix Processor for Volumetric Computations Final Report CRADA No. TSB-1148-95
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roberson, G. Patrick; Browne, Jolyon
The development of an Image Matrix Processor (IMP) was proposed that would provide an economical means to perform rapid ray-tracing processes on volume "Giga Voxel" data sets. This was a multi-phased project. The objective of the first phase of the IMP project was to evaluate the practicality of implementing a workstation-based Image Matrix Processor for use in volumetric reconstruction and rendering using hardware simulation techniques. Additionally, ARACOR and LLNL worked together to identify and pursue further funding sources to complete a second phase of this project.
Gulde, Philipp; Hermsdörfer, Joachim
2017-05-01
The kinematic performance of basic motor tasks shows a clear decrease with advancing age. This study examined if the rules known from such tasks can be generalized to activities of daily living. We examined the end-effector kinematics of 13 young and 13 elderly participants in the multi-step activity of daily living of tea-making. Furthermore, we analyzed bimanual behavior and hand dominance in the task using different conditions of execution. The elderly sample took substantially longer to complete the activity (almost 50%) with longer trajectories compared with the young sample. Models of multiple linear regression revealed that the longer trajectories prolonged the trial duration in both groups, and while movement speed influenced the trial duration of young participants, phases of inactivity negatively affected how long the activity took the elderly subjects. No differences were found regarding bimanual performance or hand dominance. We assume that in self-paced activities of daily living, the age-dependent differences in the kinematics are more likely to be based on the higher cognitive demands of the task rather than on pure motor capability. Furthermore, it seems that not all of the rules known from basic motor tasks can be generalized to activities of daily living.
Real-time portable system for fabric defect detection using an ARM processor
NASA Astrophysics Data System (ADS)
Fernandez-Gallego, J. A.; Yañez-Puentes, J. P.; Ortiz-Jaramillo, B.; Alvarez, J.; Orjuela-Vargas, S. A.; Philips, W.
2012-06-01
Modern textile industry seeks to produce textiles as little defective as possible since the presence of defects can decrease the final price of products from 45% to 65%. Automated visual inspection (AVI) systems, based on image analysis, have become an important alternative for replacing traditional inspections methods that involve human tasks. An AVI system gives the advantage of repeatability when implemented within defined constrains, offering more objective and reliable results for particular tasks than human inspection. Costs of automated inspection systems development can be reduced using modular solutions with embedded systems, in which an important advantage is the low energy consumption. Among the possibilities for developing embedded systems, the ARM processor has been explored for acquisition, monitoring and simple signal processing tasks. In a recent approach we have explored the use of the ARM processor for defects detection by implementing the wavelet transform. However, the computation speed of the preprocessing was not yet sufficient for real time applications. In this approach we significantly improve the preprocessing speed of the algorithm, by optimizing matrix operations, such that it is adequate for a real time application. The system was tested for defect detection using different defect types. The paper is focused in giving a detailed description of the basis of the algorithm implementation, such that other algorithms may use of the ARM operations for fast implementations.
PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations
NASA Astrophysics Data System (ADS)
Hamada, Tsuyoshi; Fukushige, Toshiyuki; Kawai, Atsushi; Makino, Junichiro
2000-10-01
We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable multi-purpose computer for many-body simulations. The main difference between PROGRAPE-1 and ``traditional'' GRAPE systems is that the former uses FPGA (Field Programmable Gate Array) chips as the processing elements, while the latter relies on a hardwired pipeline processor specialized to gravitational interactions. Since the logic implemented in FPGA chips can be reconfigured, we can use PROGRAPE-1 to calculate not only gravitational interactions, but also other forms of interactions, such as the van der Waals force, hydro\\-dynamical interactions in the SPHr calculation, and so on. PROGRAPE-1 comprises two Altera EPF10K100 FPGA chips, each of which contains nominally 100000 gates. To evaluate the programmability and performance of PROGRAPE-1, we implemented a pipeline for gravitational interactions similar to that of GRAPE-3. One pipeline is fitted into a single FPGA chip, operated at 16 MHz clock. Thus, for gravitational interactions, PROGRAPE-1 provided a speed of 0.96 Gflops-equivalent. PROGRAPE will prove to be useful for a wide-range of particle-based simulations in which the calculation cost of interactions other than gravity is high, such as the evaluation of SPH interactions.
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Performances of multiprocessor multidisk architectures for continuous media storage
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.
1996-03-01
Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
A pluggable framework for parallel pairwise sequence search.
Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli
2007-01-01
The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.
Performance prediction: A case study using a multi-ring KSR-1 machine
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Zhu, Jianping
1995-01-01
While computers with tens of thousands of processors have successfully delivered high performance power for solving some of the so-called 'grand-challenge' applications, the notion of scalability is becoming an important metric in the evaluation of parallel machine architectures and algorithms. In this study, the prediction of scalability and its application are carefully investigated. A simple formula is presented to show the relation between scalability, single processor computing power, and degradation of parallelism. A case study is conducted on a multi-ring KSR1 shared virtual memory machine. Experimental and theoretical results show that the influence of topology variation of an architecture is predictable. Therefore, the performance of an algorithm on a sophisticated, heirarchical architecture can be predicted and the best algorithm-machine combination can be selected for a given application.
Crack Damage Detection Method via Multiple Visual Features and Efficient Multi-Task Learning Model.
Wang, Baoxian; Zhao, Weigang; Gao, Po; Zhang, Yufeng; Wang, Zhe
2018-06-02
This paper proposes an effective and efficient model for concrete crack detection. The presented work consists of two modules: multi-view image feature extraction and multi-task crack region detection. Specifically, multiple visual features (such as texture, edge, etc.) of image regions are calculated, which can suppress various background noises (such as illumination, pockmark, stripe, blurring, etc.). With the computed multiple visual features, a novel crack region detector is advocated using a multi-task learning framework, which involves restraining the variability for different crack region features and emphasizing the separability between crack region features and complex background ones. Furthermore, the extreme learning machine is utilized to construct this multi-task learning model, thereby leading to high computing efficiency and good generalization. Experimental results of the practical concrete images demonstrate that the developed algorithm can achieve favorable crack detection performance compared with traditional crack detectors.
Numerical simulation of active track tensioning system for autonomous hybrid vehicle
NASA Astrophysics Data System (ADS)
Mȩżyk, Arkadiusz; Czapla, Tomasz; Klein, Wojciech; Mura, Gabriel
2017-05-01
One of the most important components of a high speed tracked vehicle is an efficient suspension system. The vehicle should be able to operate both in rough terrain for performance of engineering tasks as well as on the road with high speed. This is especially important for an autonomous platform that operates either with or without human supervision, so that the vibration level can rise compared to a manned vehicle. In this case critical electronic and electric parts must be protected to ensure the reliability of the vehicle. The paper presents a dynamic parameters determination methodology of suspension system for an autonomous high speed tracked platform with total weight of about 5 tonnes and hybrid propulsion system. Common among tracked vehicles suspension solutions and cost-efficient, the torsion-bar system was chosen. One of the most important issues was determining optimal track tensioning - in this case an active hydraulic system was applied. The selection of system parameters was performed with using numerical model based on multi-body dynamic approach. The results of numerical analysis were used to define parameters of active tensioning control system setup. LMS Virtual.Lab Motion was used for multi-body dynamics numerical calculation and Matlab/SIMULINK for control system simulation.
NASA Technical Reports Server (NTRS)
Schunk, Richard Gregory; Chung, T. J.
2001-01-01
A parallelized version of the Flowfield Dependent Variation (FDV) Method is developed to analyze a problem of current research interest, the flowfield resulting from a triple shock/boundary layer interaction. Such flowfields are often encountered in the inlets of high speed air-breathing vehicles including the NASA Hyper-X research vehicle. In order to resolve the complex shock structure and to provide adequate resolution for boundary layer computations of the convective heat transfer from surfaces inside the inlet, models containing over 500,000 nodes are needed. Efficient parallelization of the computation is essential to achieving results in a timely manner. Results from a parallelization scheme, based upon multi-threading, as implemented on multiple processor supercomputers and workstations is presented.
An Internal Data Non-hiding Type Real-time Kernel and its Application to the Mechatronics Controller
NASA Astrophysics Data System (ADS)
Yoshida, Toshio
For the mechatronics equipment controller that controls robots and machine tools, high-speed motion control processing is essential. The software system of the controller like other embedded systems is composed of three layers software such as real-time kernel layer, middleware layer, and application software layer on the dedicated hardware. The application layer in the top layer is composed of many numbers of tasks, and application function of the system is realized by the cooperation between these tasks. In this paper we propose an internal data non-hiding type real-time kernel in which customizing the task control is possible only by change in the program code of the task side without any changes in the program code of real-time kernel. It is necessary to reduce the overhead caused by the real-time kernel task control for the speed-up of the motion control of the mechatronics equipment. For this, customizing the task control function is needed. We developed internal data non-cryptic type real-time kernel ZRK to evaluate this method, and applied to the control of the multi system automatic lathe. The effect of the speed-up of the task cooperation processing was able to be confirmed by combined task control processing on the task side program code using an internal data non-hiding type real-time kernel ZRK.
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas
2008-01-01
A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
NASA Astrophysics Data System (ADS)
Wright, Adam A.; Momin, Orko; Shin, Young Ho; Shakya, Rahul; Nepal, Kumud; Ahlgren, David J.
2010-01-01
This paper presents the application of a distributed systems architecture to an autonomous ground vehicle, Q, that participates in both the autonomous and navigation challenges of the Intelligent Ground Vehicle Competition. In the autonomous challenge the vehicle is required to follow a course, while avoiding obstacles and staying within the course boundaries, which are marked by white lines. For the navigation challenge, the vehicle is required to reach a set of target destinations, known as way points, with given GPS coordinates and avoid obstacles that it encounters in the process. Previously the vehicle utilized a single laptop to execute all processing activities including image processing, sensor interfacing and data processing, path planning and navigation algorithms and motor control. National Instruments' (NI) LabVIEW served as the programming language for software implementation. As an upgrade to last year's design, a NI compact Reconfigurable Input/Output system (cRIO) was incorporated to the system architecture. The cRIO is NI's solution for rapid prototyping that is equipped with a real time processor, an FPGA and modular input/output. Under the current system, the real time processor handles the path planning and navigation algorithms, the FPGA gathers and processes sensor data. This setup leaves the laptop to focus on running the image processing algorithm. Image processing as previously presented by Nepal et. al. is a multi-step line extraction algorithm and constitutes the largest processor load. This distributed approach results in a faster image processing algorithm which was previously Q's bottleneck. Additionally, the path planning and navigation algorithms are executed more reliably on the real time processor due to the deterministic nature of operation. The implementation of this architecture required exploration of various inter-system communication techniques. Data transfer between the laptop and the real time processor using UDP packets was established as the most reliable protocol after testing various options. Improvement can be made to the system by migrating more algorithms to the hardware based FPGA to further speed up the operations of the vehicle.
Kapellusch, Jay M; Silverstein, Barbara A; Bao, Stephen S; Thiese, Mathew S; Merryweather, Andrew S; Hegmann, Kurt T; Garg, Arun
2018-02-01
The Strain Index (SI) and the American Conference of Governmental Industrial Hygienists (ACGIH) threshold limit value for hand activity level (TLV for HAL) have been shown to be associated with prevalence of distal upper-limb musculoskeletal disorders such as carpal tunnel syndrome (CTS). The SI and TLV for HAL disagree on more than half of task exposure classifications. Similarly, time-weighted average (TWA), peak, and typical exposure techniques used to quantity physical exposure from multi-task jobs have shown between-technique agreement ranging from 61% to 93%, depending upon whether the SI or TLV for HAL model was used. This study compared exposure-response relationships between each model-technique combination and prevalence of CTS. Physical exposure data from 1,834 workers (710 with multi-task jobs) were analyzed using the SI and TLV for HAL and the TWA, typical, and peak multi-task job exposure techniques. Additionally, exposure classifications from the SI and TLV for HAL were combined into a single measure and evaluated. Prevalent CTS cases were identified using symptoms and nerve-conduction studies. Mixed effects logistic regression was used to quantify exposure-response relationships between categorized (i.e., low, medium, and high) physical exposure and CTS prevalence for all model-technique combinations, and for multi-task workers, mono-task workers, and all workers combined. Except for TWA TLV for HAL, all model-technique combinations showed monotonic increases in risk of CTS with increased physical exposure. The combined-models approach showed stronger association than the SI or TLV for HAL for multi-task workers. Despite differences in exposure classifications, nearly all model-technique combinations showed exposure-response relationships with prevalence of CTS for the combined sample of mono-task and multi-task workers. Both the TLV for HAL and the SI, with the TWA or typical techniques, appear useful for epidemiological studies and surveillance. However, the utility of TWA, typical, and peak techniques for job design and intervention is dubious.
The Multi-Feature Hypothesis: Connectionist Guidelines for L2 Task Design
ERIC Educational Resources Information Center
Moonen, Machteld; de Graaff, Rick; Westhoff, Gerard; Brekelmans, Mieke
2014-01-01
This study focuses on the effects of task type on the retention and ease of activation of second language (L2) vocabulary, based on the multi-feature hypothesis (Moonen, De Graaff, & Westhoff, 2006). Two tasks were compared: a writing task and a list-learning task. It was hypothesized that performing the writing task would yield higher…
Conditional load and store in a shared memory
Blumrich, Matthias A; Ohmacht, Martin
2015-02-03
A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Research status of multi - robot systems task allocation and uncertainty treatment
NASA Astrophysics Data System (ADS)
Li, Dahui; Fan, Qi; Dai, Xuefeng
2017-08-01
The multi-robot coordination algorithm has become a hot research topic in the field of robotics in recent years. It has a wide range of applications and good application prospects. This paper analyzes and summarizes the current research status of multi-robot coordination algorithms at home and abroad. From task allocation and dealing with uncertainty, this paper discusses the multi-robot coordination algorithm and presents the advantages and disadvantages of each method commonly used.
Multi-channel time-reversal receivers for multi and 1-bit implementations
Candy, James V.; Chambers, David H.; Guidry, Brian L.; Poggio, Andrew J.; Robbins, Christopher L.
2008-12-09
A communication system for transmitting a signal through a channel medium comprising digitizing the signal, time-reversing the digitized signal, and transmitting the signal through the channel medium. In one embodiment a transmitter is adapted to transmit the signal, a multiplicity of receivers are adapted to receive the signal, a digitizer digitizes the signal, and a time-reversal signal processor is adapted to time-reverse the digitized signal. An embodiment of the present invention includes multi bit implementations. Another embodiment of the present invention includes 1-bit implementations. Another embodiment of the present invention includes a multiplicity of receivers used in the step of transmitting the signal through the channel medium.
IMPETUS - Interactive MultiPhysics Environment for Unified Simulations.
Ha, Vi Q; Lykotrafitis, George
2016-12-08
We introduce IMPETUS - Interactive MultiPhysics Environment for Unified Simulations, an object oriented, easy-to-use, high performance, C++ program for three-dimensional simulations of complex physical systems that can benefit a large variety of research areas, especially in cell mechanics. The program implements cross-communication between locally interacting particles and continuum models residing in the same physical space while a network facilitates long-range particle interactions. Message Passing Interface is used for inter-processor communication for all simulations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Acoustical Techniques/Designs Investigated During the Southeast Asia Conflict 1966-1972
1981-01-21
the inbuoy processor above, the three sensor channels, two directional and one omni, are multi - plexed together and via the RF link transmitted back to...width at the 10 db dco;.n points and mnximtum side lobe level of -15 db) requires at least six’ elements in the line. At a geometric wean frequency of...presentation provides the data to the operator in the most accessible format. The vehicle descriptors are easily extracted with the aid of multi -point
NASA Technical Reports Server (NTRS)
Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)
1983-01-01
A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Center for Technology for Advanced Scientific Componet Software (TASCS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Govindaraju, Madhusudhan
Advanced Scientific Computing Research Computer Science FY 2010Report Center for Technology for Advanced Scientific Component Software: Distributed CCA State University of New York, Binghamton, NY, 13902 Summary The overall objective of Binghamton's involvement is to work on enhancements of the CCA environment, motivated by the applications and research initiatives discussed in the proposal. This year we are working on re-focusing our design and development efforts to develop proof-of-concept implementations that have the potential to significantly impact scientific components. We worked on developing parallel implementations for non-hydrostatic code and worked on a model coupling interface for biogeochemical computations coded in MATLAB.more » We also worked on the design and implementation modules that will be required for the emerging MapReduce model to be effective for scientific applications. Finally, we focused on optimizing the processing of scientific datasets on multi-core processors. Research Details We worked on the following research projects that we are working on applying to CCA-based scientific applications. 1. Non-Hydrostatic Hydrodynamics: Non-static hydrodynamics are significantly more accurate at modeling internal waves that may be important in lake ecosystems. Non-hydrostatic codes, however, are significantly more computationally expensive, often prohibitively so. We have worked with Chin Wu at the University of Wisconsin to parallelize non-hydrostatic code. We have obtained a speed up of about 26 times maximum. Although this is significant progress, we hope to improve the performance further, such that it becomes a practical alternative to hydrostatic codes. 2. Model-coupling for water-based ecosystems: To answer pressing questions about water resources requires that physical models (hydrodynamics) be coupled with biological and chemical models. Most hydrodynamics codes are written in Fortran, however, while most ecologists work in MATLAB. This disconnect creates a great barrier. To address this, we are working on a model coupling interface that will allow biogeochemical computations written in MATLAB to couple with Fortran codes. This will greatly improve the productivity of ecosystem scientists. 2. Low overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications: Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and diskbased systems, and can also be brought to bear in processing small but CPU intensive distributed applications. MapReduce however carries its own burdens. Through experiments using Hadoop in the context of diverse applications, we uncovered latencies and delay conditions potentially inhibiting the expected performance of a parallel execution in CPU-intensive applications. Furthermore, as it currently stands, MapReduce is favored for data-centric applications, and as such tends to be solely applied to disk-based applications. The paradigm, falls short in bringing its novelty to diskless systems dedicated to in-memory applications, and compute intensive programs processing much smaller data, but requiring intensive computations. In this project, we focused both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. We designed LEMO-MR [1], a Low overhead, elastic, configurable for in- memory applications, and on-demand fault tolerance, an optimized implementation of MapReduce, for both on disk and in memory applications. We conducted experiments to identify not only the necessary components of this model, but also trade offs and factors to be considered. We have initial results to show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. We have quantified the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment. 3. Cache Performance Optimization for Processing XML and HDF-based Application Data on Multi-core Processors: It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors. Implementations of scientific middleware and applications that do not adapt to the programming paradigm when executing on emerging processors can severely impact the overall performance. In this project, we focused on the utilization of the L2 cache, which is a critical shared resource on chip multiprocessors (CMP). The access pattern of the shared L2 cache, which is dependent on how the application schedules and assigns processing work to each thread, can either enhance or hurt the ability to hide memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to inform scheduling decisions in multi-threaded programming. In this project, using the TAU toolkit for performance feedback from dual- and quad-core machines, we conducted performance analysis and recommendations on how processing threads can be scheduled on multi-core nodes to enhance the performance of a class of scientific applications that requires processing of HDF5 data. In particular, we quantified the gains associated with the use of the adaptations we have made to the Cache-Affinity and Balanced-Set scheduling algorithms to improve L2 cache performance, and hence the overall application execution time [2]. References: 1. Zacharia Fadika, Madhusudhan Govindaraju, ``MapReduce Implementation for Memory-Based and Processing Intensive Applications'', accepted in 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, USA, Nov 30 - Dec 3, 2010. 2. Rajdeep Bhowmik, Madhusudhan Govindaraju, ``Cache Performance Optimization for Processing XML-based Application Data on Multi-core Processors'', in proceedings of The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 17-20, 2010, Melbourne, Victoria, Australia. Contact Information: Madhusudhan Govindaraju Binghamton University State University of New York (SUNY) mgovinda@cs.binghamton.edu Phone: 607-777-4904« less
Distributed digital signal processors for multi-body flexible structures
NASA Technical Reports Server (NTRS)
Lee, Gordon K. F.
1992-01-01
Multi-body flexible structures, such as those currently under investigation in spacecraft design, are large scale (high-order) dimensional systems. Controlling and filtering such structures is a computationally complex problem. This is particularly important when many sensors and actuators are located along the structure and need to be processed in real time. This report summarizes research activity focused on solving the signal processing (that is, information processing) issues of multi-body structures. A distributed architecture is developed in which single loop processors are employed for local filtering and control. By implementing such a philosophy with an embedded controller configuration, a supervising controller may be used to process global data and make global decisions as the local devices are processing local information. A hardware testbed, a position controller system for a servo motor, is employed to illustrate the capabilities of the embedded controller structure. Several filtering and control structures which can be modeled as rational functions can be implemented on the system developed in this research effort. Thus the results of the study provide a support tool for many Control/Structure Interaction (CSI) NASA testbeds such as the Evolutionary model and the nine-bay truss structure.
Algorithm Design of CPCI Backboard's Interrupts Management Based on VxWorks' Multi-Tasks
NASA Astrophysics Data System (ADS)
Cheng, Jingyuan; An, Qi; Yang, Junfeng
2006-09-01
This paper begins with a brief introduction of the embedded real-time operating system VxWorks and CompactPCI standard, then gives the programming interfaces of Peripheral Controller Interface (PCI) configuring, interrupts handling and multi-tasks programming interface under VxWorks, and then emphasis is placed on the software frameworks of CPCI interrupt management based on multi-tasks. This method is sound in design and easy to adapt, ensures that all possible interrupts are handled in time, which makes it suitable for data acquisition systems with multi-channels, a high data rate, and hard real-time high energy physics.
NASA Astrophysics Data System (ADS)
Vikram, K. Arun; Ratnam, Ch; Lakshmi, VVK; Kumar, A. Sunny; Ramakanth, RT
2018-02-01
Meta-heuristic multi-response optimization methods are widely in use to solve multi-objective problems to obtain Pareto optimal solutions during optimization. This work focuses on optimal multi-response evaluation of process parameters in generating responses like surface roughness (Ra), surface hardness (H) and tool vibration displacement amplitude (Vib) while performing operations like tangential and orthogonal turn-mill processes on A-axis Computer Numerical Control vertical milling center. Process parameters like tool speed, feed rate and depth of cut are considered as process parameters machined over brass material under dry condition with high speed steel end milling cutters using Taguchi design of experiments (DOE). Meta-heuristic like Dragonfly algorithm is used to optimize the multi-objectives like ‘Ra’, ‘H’ and ‘Vib’ to identify the optimal multi-response process parameters combination. Later, the results thus obtained from multi-objective dragonfly algorithm (MODA) are compared with another multi-response optimization technique Viz. Grey relational analysis (GRA).
High-performance computing for airborne applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quinn, Heather M; Manuzzato, Andrea; Fairbanks, Tom
2010-06-28
Recently, there has been attempts to move common satellite tasks to unmanned aerial vehicles (UAVs). UAVs are significantly cheaper to buy than satellites and easier to deploy on an as-needed basis. The more benign radiation environment also allows for an aggressive adoption of state-of-the-art commercial computational devices, which increases the amount of data that can be collected. There are a number of commercial computing devices currently available that are well-suited to high-performance computing. These devices range from specialized computational devices, such as field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), to traditional computing platforms, such as microprocessors. Even thoughmore » the radiation environment is relatively benign, these devices could be susceptible to single-event effects. In this paper, we will present radiation data for high-performance computing devices in a accelerated neutron environment. These devices include a multi-core digital signal processor, two field-programmable gate arrays, and a microprocessor. From these results, we found that all of these devices are suitable for many airplane environments without reliability problems.« less
Integration, Development and Performance of the 500 TFLOPS Heterogeneous Cluster (Condor)
2012-08-01
PlayStation 3 for High Performance Cluster Computing” LAPACK Working Note 185, 2007. [ 4 ] Feng, W., X. Feng, and R. Ge, “Green Supercomputing Comes of...CONFERENCE PAPER (Post Print) 3. DATES COVERED (From - To) JUN 2010 – MAY 2013 4 . TITLE AND SUBTITLE INTEGRATION, DEVELOPMENT AND PERFORMANCE OF...and streaming processing; the PlayStation 3 uses the IBM Cell BE processor, which adopts the multi-processor, single-instruction-multiple- data (SIMD
Analog hardware for learning neural networks
NASA Technical Reports Server (NTRS)
Eberhardt, Silvio P. (Inventor)
1991-01-01
This is a recurrent or feedforward analog neural network processor having a multi-level neuron array and a synaptic matrix for storing weighted analog values of synaptic connection strengths which is characterized by temporarily changing one connection strength at a time to determine its effect on system output relative to the desired target. That connection strength is then adjusted based on the effect, whereby the processor is taught the correct response to training examples connection by connection.
A Modified Distributed Bees Algorithm for Multi-Sensor Task Allocation.
Tkach, Itshak; Jevtić, Aleksandar; Nof, Shimon Y; Edan, Yael
2018-03-02
Multi-sensor systems can play an important role in monitoring tasks and detecting targets. However, real-time allocation of heterogeneous sensors to dynamic targets/tasks that are unknown a priori in their locations and priorities is a challenge. This paper presents a Modified Distributed Bees Algorithm (MDBA) that is developed to allocate stationary heterogeneous sensors to upcoming unknown tasks using a decentralized, swarm intelligence approach to minimize the task detection times. Sensors are allocated to tasks based on sensors' performance, tasks' priorities, and the distances of the sensors from the locations where the tasks are being executed. The algorithm was compared to a Distributed Bees Algorithm (DBA), a Bees System, and two common multi-sensor algorithms, market-based and greedy-based algorithms, which were fitted for the specific task. Simulation analyses revealed that MDBA achieved statistically significant improved performance by 7% with respect to DBA as the second-best algorithm, and by 19% with respect to Greedy algorithm, which was the worst, thus indicating its fitness to provide solutions for heterogeneous multi-sensor systems.
Defect-Repairable Latent Feature Extraction of Driving Behavior via a Deep Sparse Autoencoder
Taniguchi, Tadahiro; Takenaka, Kazuhito; Bando, Takashi
2018-01-01
Data representing driving behavior, as measured by various sensors installed in a vehicle, are collected as multi-dimensional sensor time-series data. These data often include redundant information, e.g., both the speed of wheels and the engine speed represent the velocity of the vehicle. Redundant information can be expected to complicate the data analysis, e.g., more factors need to be analyzed; even varying the levels of redundancy can influence the results of the analysis. We assume that the measured multi-dimensional sensor time-series data of driving behavior are generated from low-dimensional data shared by the many types of one-dimensional data of which multi-dimensional time-series data are composed. Meanwhile, sensor time-series data may be defective because of sensor failure. Therefore, another important function is to reduce the negative effect of defective data when extracting low-dimensional time-series data. This study proposes a defect-repairable feature extraction method based on a deep sparse autoencoder (DSAE) to extract low-dimensional time-series data. In the experiments, we show that DSAE provides high-performance latent feature extraction for driving behavior, even for defective sensor time-series data. In addition, we show that the negative effect of defects on the driving behavior segmentation task could be reduced using the latent features extracted by DSAE. PMID:29462931
Advantages of Task-Specific Multi-Objective Optimisation in Evolutionary Robotics.
Trianni, Vito; López-Ibáñez, Manuel
2015-01-01
The application of multi-objective optimisation to evolutionary robotics is receiving increasing attention. A survey of the literature reveals the different possibilities it offers to improve the automatic design of efficient and adaptive robotic systems, and points to the successful demonstrations available for both task-specific and task-agnostic approaches (i.e., with or without reference to the specific design problem to be tackled). However, the advantages of multi-objective approaches over single-objective ones have not been clearly spelled out and experimentally demonstrated. This paper fills this gap for task-specific approaches: starting from well-known results in multi-objective optimisation, we discuss how to tackle commonly recognised problems in evolutionary robotics. In particular, we show that multi-objective optimisation (i) allows evolving a more varied set of behaviours by exploring multiple trade-offs of the objectives to optimise, (ii) supports the evolution of the desired behaviour through the introduction of objectives as proxies, (iii) avoids the premature convergence to local optima possibly introduced by multi-component fitness functions, and (iv) solves the bootstrap problem exploiting ancillary objectives to guide evolution in the early phases. We present an experimental demonstration of these benefits in three different case studies: maze navigation in a single robot domain, flocking in a swarm robotics context, and a strictly collaborative task in collective robotics.
Stochastic airspace simulation tool development
DOT National Transportation Integrated Search
2009-10-01
Modeling and simulation is often used to study : the physical world when observation may not be : practical. The overall goal of a recent and ongoing : simulation tool project has been to provide a : documented, lifecycle-managed, multi-processor : c...
Blaettler, M; Bruegger, A; Forster, I C; Lehareinger, Y
1988-03-01
The design of an analog interface to a digital audio signal processor (DASP)-video cassette recorder (VCR) system is described. The complete system represents a low-cost alternative to both FM instrumentation tape recorders and multi-channel chart recorders. The interface or DASP input-output unit described in this paper enables the recording and playback of up to 12 analog channels with a maximum of 12 bit resolution and a bandwidth of 2 kHz per channel. Internal control and timing in the recording component of the interface is performed using ROMs which can be reprogrammed to suit different analog-to-digital converter hardware. Improvement in the bandwidth specifications is possible by connecting channels in parallel. A parallel 16 bit data output port is provided for direct transfer of the digitized data to a computer.
NASA Astrophysics Data System (ADS)
Lin, Hsien-I.; Nguyen, Xuan-Anh
2017-05-01
To operate a redundant manipulator to accomplish the end-effector trajectory planning and simultaneously control its gesture in online programming, incorporating the human motion is a useful and flexible option. This paper focuses on a manipulative instrument that can simultaneously control its arm gesture and end-effector trajectory via human teleoperation. The instrument can be classified by two parts; first, for the human motion capture and data processing, marker systems are proposed to capture human gesture. Second, the manipulator kinematics control is implemented by an augmented multi-tasking method, and forward and backward reaching inverse kinematics, respectively. Especially, the local-solution and divergence problems of a multi-tasking method are resolved by the proposed augmented multi-tasking method. Computer simulations and experiments with a 7-DOF (degree of freedom) redundant manipulator were used to validate the proposed method. Comparison among the single-tasking, original multi-tasking, and augmented multi-tasking algorithms were performed and the result showed that the proposed augmented method had a good end-effector position accuracy and the most similar gesture to the human gesture. Additionally, the experimental results showed that the proposed instrument was realized online.
Modeling and simulation of dynamic ant colony's labor division for task allocation of UAV swarm
NASA Astrophysics Data System (ADS)
Wu, Husheng; Li, Hao; Xiao, Renbin; Liu, Jie
2018-02-01
The problem of unmanned aerial vehicle (UAV) task allocation not only has the intrinsic attribute of complexity, such as highly nonlinear, dynamic, highly adversarial and multi-modal, but also has a better practicability in various multi-agent systems, which makes it more and more attractive recently. In this paper, based on the classic fixed response threshold model (FRTM), under the idea of "problem centered + evolutionary solution" and by a bottom-up way, the new dynamic environmental stimulus, response threshold and transition probability are designed, and a dynamic ant colony's labor division (DACLD) model is proposed. DACLD allows a swarm of agents with a relatively low-level of intelligence to perform complex tasks, and has the characteristic of distributed framework, multi-tasks with execution order, multi-state, adaptive response threshold and multi-individual response. With the proposed model, numerical simulations are performed to illustrate the effectiveness of the distributed task allocation scheme in two situations of UAV swarm combat (dynamic task allocation with a certain number of enemy targets and task re-allocation due to unexpected threats). Results show that our model can get both the heterogeneous UAVs' real-time positions and states at the same time, and has high degree of self-organization, flexibility and real-time response to dynamic environments.
Modular countermine payload for small robots
NASA Astrophysics Data System (ADS)
Herman, Herman; Few, Doug; Versteeg, Roelof; Valois, Jean-Sebastien; McMahill, Jeff; Licitra, Michael; Henciak, Edward
2010-04-01
Payloads for small robotic platforms have historically been designed and implemented as platform and task specific solutions. A consequence of this approach is that payloads cannot be deployed on different robotic platforms without substantial re-engineering efforts. To address this issue, we developed a modular countermine payload that is designed from the ground-up to be platform agnostic. The payload consists of the multi-mission payload controller unit (PCU) coupled with the configurable mission specific threat detection, navigation and marking payloads. The multi-mission PCU has all the common electronics to control and interface to all the payloads. It also contains the embedded processor that can be used to run the navigational and control software. The PCU has a very flexible robot interface which can be configured to interface to various robot platforms. The threat detection payload consists of a two axis sweeping arm and the detector. The navigation payload consists of several perception sensors that are used for terrain mapping, obstacle detection and navigation. Finally, the marking payload consists of a dual-color paint marking system. Through the multimission PCU, all these payloads are packaged in a platform agnostic way to allow deployment on multiple robotic platforms, including Talon and Packbot.
Segment Fixed Priority Scheduling for Self Suspending Real Time Tasks
2016-08-11
Segment-Fixed Priority Scheduling for Self-Suspending Real -Time Tasks Junsung Kim, Department of Electrical and Computer Engineering, Carnegie...4 2.1 Application of a Multi-Segment Self-Suspending Real -Time Task Model ............................. 5 3 Fixed Priority Scheduling...1 Figure 2: A multi-segment self-suspending real -time task model
A Modified Distributed Bees Algorithm for Multi-Sensor Task Allocation †
Nof, Shimon Y.; Edan, Yael
2018-01-01
Multi-sensor systems can play an important role in monitoring tasks and detecting targets. However, real-time allocation of heterogeneous sensors to dynamic targets/tasks that are unknown a priori in their locations and priorities is a challenge. This paper presents a Modified Distributed Bees Algorithm (MDBA) that is developed to allocate stationary heterogeneous sensors to upcoming unknown tasks using a decentralized, swarm intelligence approach to minimize the task detection times. Sensors are allocated to tasks based on sensors’ performance, tasks’ priorities, and the distances of the sensors from the locations where the tasks are being executed. The algorithm was compared to a Distributed Bees Algorithm (DBA), a Bees System, and two common multi-sensor algorithms, market-based and greedy-based algorithms, which were fitted for the specific task. Simulation analyses revealed that MDBA achieved statistically significant improved performance by 7% with respect to DBA as the second-best algorithm, and by 19% with respect to Greedy algorithm, which was the worst, thus indicating its fitness to provide solutions for heterogeneous multi-sensor systems. PMID:29498683
System properties, feedback control and effector coordination of human temperature regulation.
Werner, Jürgen
2010-05-01
The aim of human temperature regulation is to protect body processes by establishing a relative constancy of deep body temperature (regulated variable), in spite of external and internal influences on it. This is basically achieved by a distributed multi-sensor, multi-processor, multi-effector proportional feedback control system. The paper explains why proportional control implies inherent deviations of the regulated variable from the value in the thermoneutral zone. The concept of feedback of the thermal state of the body, conveniently represented by a high-weighted core temperature (T (c)) and low-weighted peripheral temperatures (T (s)) is equivalent to the control concept of "auxiliary feedback control", using a main (regulated) variable (T (c)), supported by an auxiliary variable (T (s)). This concept implies neither regulation of T (s) nor feedforward control. Steady-states result in the closed control-loop, when the open-loop properties of the (heat transfer) process are compatible with those of the thermoregulatory processors. They are called operating points or balance points and are achieved due to the inherent property of dynamical stability of the thermoregulatory feedback loop. No set-point and no comparison of signals (e.g. actual-set value) are necessary. Metabolic heat production and sweat production, though receiving the same information about the thermal state of the body, are independent effectors with different thresholds and gains. Coordination between one of these effectors and the vasomotor effector is achieved by the fact that changes in the (heat transfer) process evoked by vasomotor control are taken into account by the metabolic/sweat processor.
Large Margin Multi-Modal Multi-Task Feature Extraction for Image Classification.
Yong Luo; Yonggang Wen; Dacheng Tao; Jie Gui; Chao Xu
2016-01-01
The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We, therefore, propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features, so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization, and each subproblem can be efficiently solved. Experiments on two challenging real-world image data sets demonstrate the effectiveness and superiority of the proposed method.
NASA Astrophysics Data System (ADS)
Samala, Ravi K.; Chan, Heang-Ping; Hadjiiski, Lubomir M.; Helvie, Mark A.; Cha, Kenny H.; Richter, Caleb D.
2017-12-01
Transfer learning in deep convolutional neural networks (DCNNs) is an important step in its application to medical imaging tasks. We propose a multi-task transfer learning DCNN with the aim of translating the ‘knowledge’ learned from non-medical images to medical diagnostic tasks through supervised training and increasing the generalization capabilities of DCNNs by simultaneously learning auxiliary tasks. We studied this approach in an important application: classification of malignant and benign breast masses. With Institutional Review Board (IRB) approval, digitized screen-film mammograms (SFMs) and digital mammograms (DMs) were collected from our patient files and additional SFMs were obtained from the Digital Database for Screening Mammography. The data set consisted of 2242 views with 2454 masses (1057 malignant, 1397 benign). In single-task transfer learning, the DCNN was trained and tested on SFMs. In multi-task transfer learning, SFMs and DMs were used to train the DCNN, which was then tested on SFMs. N-fold cross-validation with the training set was used for training and parameter optimization. On the independent test set, the multi-task transfer learning DCNN was found to have significantly (p = 0.007) higher performance compared to the single-task transfer learning DCNN. This study demonstrates that multi-task transfer learning may be an effective approach for training DCNN in medical imaging applications when training samples from a single modality are limited.
Improving multi-tasking ability through action videogames.
Chiappe, Dan; Conger, Mark; Liao, Janet; Caldwell, J Lynn; Vu, Kim-Phuong L
2013-03-01
The present study examined whether action videogames can improve multi-tasking in high workload environments. Two groups with no action videogame experience were pre-tested using the Multi-Attribute Task Battery (MATB). It consists of two primary tasks; tracking and fuel management, and two secondary tasks; systems monitoring and communication. One group served as a control group, while a second played action videogames a minimum of 5 h a week for 10 weeks. Both groups returned for a post-assessment on the MATB. We found the videogame treatment enhanced performance on secondary tasks, without interfering with the primary tasks. Our results demonstrate action videogames can increase people's ability to take on additional tasks by increasing attentional capacity. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Synthetic Synchronisation: From Attention and Multi-Tasking to Negative Capability and Judgment
ERIC Educational Resources Information Center
Stables, Andrew
2013-01-01
Educational literature has tended to focus, explicitly and implicitly, on two kinds of task orientation: the ability either to focus on a single task, or to multi-task. A third form of orientation characterises many highly successful people. This is the ability to combine several tasks into one: to "kill two (or more) birds with one…
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
State University of New York Institute of Technology (SUNYIT) Summer Scholar Program
2009-10-01
COVERED (From - To) March 2007 – April 2009 4 . TITLE AND SUBTITLE STATE UNIVERSITY OF NEW YORK INSTITUTE OF TECHNOLOGY (SUNYIT) SUMMER SCHOLAR...Even with access to the Arctic Regional Supercomputer Center (ARSC), evolving a 9/7 wavelet with four multi-resolution levels (MRA 4 ) involves...evaluated over the multiple processing elements in the Cell processor. It was tested on Cell processors in a Sony Playstation 3 and on an IBM QS20 blade
Hardware based redundant multi-threading inside a GPU for improved reliability
Sridharan, Vilas; Gurumurthi, Sudhanva
2015-05-05
A system and method for verifying computation output using computer hardware are provided. Instances of computation are generated and processed on hardware-based processors. As instances of computation are processed, each instance of computation receives a load accessible to other instances of computation. Instances of output are generated by processing the instances of computation. The instances of output are verified against each other in a hardware based processor to ensure accuracy of the output.
Multi-element germanium detectors for synchrotron applications
Rumaiz, A. K.; Kuczewski, A. J.; Mead, J.; ...
2018-04-27
In this paper, we have developed a series of monolithic multi-element germanium detectors, based on sensor arrays produced by the Forschungzentrum Julich, and on Application-specific integrated circuits (ASICs) developed at Brookhaven. Devices have been made with element counts ranging from 64 to 384. These detectors are being used at NSLS-II and APS for a range of diffraction experiments, both monochromatic and energy-dispersive. Compact and powerful readout systems have been developed, based on the new generation of FPGA system-on-chip devices, which provide closely coupled multi-core processors embedded in large gate arrays. Finally, we will discuss the technical details of the systems,more » and present some of the results from them.« less
Multi-element germanium detectors for synchrotron applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rumaiz, A. K.; Kuczewski, A. J.; Mead, J.
In this paper, we have developed a series of monolithic multi-element germanium detectors, based on sensor arrays produced by the Forschungzentrum Julich, and on Application-specific integrated circuits (ASICs) developed at Brookhaven. Devices have been made with element counts ranging from 64 to 384. These detectors are being used at NSLS-II and APS for a range of diffraction experiments, both monochromatic and energy-dispersive. Compact and powerful readout systems have been developed, based on the new generation of FPGA system-on-chip devices, which provide closely coupled multi-core processors embedded in large gate arrays. Finally, we will discuss the technical details of the systems,more » and present some of the results from them.« less
Multiple-function multi-input/multi-output digital control and on-line analysis
NASA Technical Reports Server (NTRS)
Hoadley, Sherwood T.; Wieseman, Carol D.; Mcgraw, Sandra M.
1992-01-01
The design and capabilities of two digital controller systems for aeroelastic wind-tunnel models are described. The first allowed control of flutter while performing roll maneuvers with wing load control as well as coordinating the acquisition, storage, and transfer of data for on-line analysis. This system, which employs several digital signal multi-processor (DSP) boards programmed in high-level software languages, is housed in a SUN Workstation environment. A second DCS provides a measure of wind-tunnel safety by functioning as a trip system during testing in the case of high model dynamic response or in case the first DCS fails. The second DCS uses National Instruments LabVIEW Software and Hardware within a Macintosh environment.
Potential of minicomputer/array-processor system for nonlinear finite-element analysis
NASA Technical Reports Server (NTRS)
Strohkorb, G. A.; Noor, A. K.
1983-01-01
The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.
Design and test of a regenerative satellite transmultiplexer
NASA Astrophysics Data System (ADS)
Hung, Kenny King-Ming
1993-05-01
In a multiple access scheme for regenerative satellite communications, the bulk frequency division multiple access (FDMA) uplink signal is demodulated on board the satellite and then remodulated for time division multiplexing (TDM) downlink transmission. Conversion from frequency to time division multiplex format requires that the uplink signal be frequency demultiplexed and each individual carrier be subsequently demodulated. For thin-route application which consists of a large number of channels with fixed data rate, multicarrier demodulation can be accomplished efficiently by a digital transmultiplexer (TMUX) using a fast Fourier transform processor followed by a bank of per-channel processors. A time domain description of the TMUX algorithm is derived which elucidates how the TMUX functions. The per-channel processor performs timing and carrier recovery for optimum and coherent data detection. Timing recovery is necessarily achieved asynchronously by a filter coefficient interpolation. Carrier recovery is performed using an all-digital phase-locked loop. The combination of both timing and carrier loops is investigated for a multi-user system. The performance of the overall system is assessed over a multi-user, additive white Gaussian noise channel for a bit energy to noise power spectral density ratio down to zero dB.
Na, Okpin; Cai, Xiao-Chuan; Xi, Yunping
2017-01-01
The prediction of the chloride-induced corrosion is very important because of the durable life of concrete structure. To simulate more realistic durability performance of concrete structures, complex scientific methods and more accurate material models are needed. In order to predict the robust results of corrosion initiation time and to describe the thin layer from concrete surface to reinforcement, a large number of fine meshes are also used. The purpose of this study is to suggest more realistic physical model regarding coupled hygro-chemo transport and to implement the model with parallel finite element algorithm. Furthermore, microclimate model with environmental humidity and seasonal temperature is adopted. As a result, the prediction model of chloride diffusion under unsaturated condition was developed with parallel algorithms and was applied to the existing bridge to validate the model with multi-boundary condition. As the number of processors increased, the computational time decreased until the number of processors became optimized. Then, the computational time increased because the communication time between the processors increased. The framework of present model can be extended to simulate the multi-species de-icing salts ingress into non-saturated concrete structures in future work. PMID:28772714
Digital Intermediate Frequency Receiver Module For Use In Airborne Sar Applications
Tise, Bertice L.; Dubbert, Dale F.
2005-03-08
A digital IF receiver (DRX) module directly compatible with advanced radar systems such as synthetic aperture radar (SAR) systems. The DRX can combine a 1 G-Sample/sec 8-bit ADC with high-speed digital signal processor, such as high gate-count FPGA technology or ASICs to realize a wideband IF receiver. DSP operations implemented in the DRX can include quadrature demodulation and multi-rate, variable-bandwidth IF filtering. Pulse-to-pulse (Doppler domain) filtering can also be implemented in the form of a presummer (accumulator) and an azimuth prefilter. An out of band noise source can be employed to provide a dither signal to the ADC, and later be removed by digital signal processing. Both the range and Doppler domain filtering operations can be implemented using a unique pane architecture which allows on-the-fly selection of the filter decimation factor, and hence, the filter bandwidth. The DRX module can include a standard VME-64 interface for control, status, and programming. An interface can provide phase history data to the real-time image formation processors. A third front-panel data port (FPDP) interface can send wide bandwidth, raw phase histories to a real-time phase history recorder for ground processing.
Scalable software architecture for on-line multi-camera video processing
NASA Astrophysics Data System (ADS)
Camplani, Massimo; Salgado, Luis
2011-03-01
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.
NASA Astrophysics Data System (ADS)
Koziel, Slawomir; Bekasiewicz, Adrian
2016-10-01
Multi-objective optimization of antenna structures is a challenging task owing to the high computational cost of evaluating the design objectives as well as the large number of adjustable parameters. Design speed-up can be achieved by means of surrogate-based optimization techniques. In particular, a combination of variable-fidelity electromagnetic (EM) simulations, design space reduction techniques, response surface approximation models and design refinement methods permits identification of the Pareto-optimal set of designs within a reasonable timeframe. Here, a study concerning the scalability of surrogate-assisted multi-objective antenna design is carried out based on a set of benchmark problems, with the dimensionality of the design space ranging from six to 24 and a CPU cost of the EM antenna model from 10 to 20 min per simulation. Numerical results indicate that the computational overhead of the design process increases more or less quadratically with the number of adjustable geometric parameters of the antenna structure at hand, which is a promising result from the point of view of handling even more complex problems.
Artificial Neuron Based on Integrated Semiconductor Quantum Dot Mode-Locked Lasers
NASA Astrophysics Data System (ADS)
Mesaritakis, Charis; Kapsalis, Alexandros; Bogris, Adonis; Syvridis, Dimitris
2016-12-01
Neuro-inspired implementations have attracted strong interest as a power efficient and robust alternative to the digital model of computation with a broad range of applications. Especially, neuro-mimetic systems able to produce and process spike-encoding schemes can offer merits like high noise-resiliency and increased computational efficiency. Towards this direction, integrated photonics can be an auspicious platform due to its multi-GHz bandwidth, its high wall-plug efficiency and the strong similarity of its dynamics under excitation with biological spiking neurons. Here, we propose an integrated all-optical neuron based on an InAs/InGaAs semiconductor quantum-dot passively mode-locked laser. The multi-band emission capabilities of these lasers allows, through waveband switching, the emulation of the excitation and inhibition modes of operation. Frequency-response effects, similar to biological neural circuits, are observed just as in a typical two-section excitable laser. The demonstrated optical building block can pave the way for high-speed photonic integrated systems able to address tasks ranging from pattern recognition to cognitive spectrum management and multi-sensory data processing.
Artificial Neuron Based on Integrated Semiconductor Quantum Dot Mode-Locked Lasers
Mesaritakis, Charis; Kapsalis, Alexandros; Bogris, Adonis; Syvridis, Dimitris
2016-01-01
Neuro-inspired implementations have attracted strong interest as a power efficient and robust alternative to the digital model of computation with a broad range of applications. Especially, neuro-mimetic systems able to produce and process spike-encoding schemes can offer merits like high noise-resiliency and increased computational efficiency. Towards this direction, integrated photonics can be an auspicious platform due to its multi-GHz bandwidth, its high wall-plug efficiency and the strong similarity of its dynamics under excitation with biological spiking neurons. Here, we propose an integrated all-optical neuron based on an InAs/InGaAs semiconductor quantum-dot passively mode-locked laser. The multi-band emission capabilities of these lasers allows, through waveband switching, the emulation of the excitation and inhibition modes of operation. Frequency-response effects, similar to biological neural circuits, are observed just as in a typical two-section excitable laser. The demonstrated optical building block can pave the way for high-speed photonic integrated systems able to address tasks ranging from pattern recognition to cognitive spectrum management and multi-sensory data processing. PMID:27991574
Concurrent Learning of Control in Multi agent Sequential Decision Tasks
2018-04-17
Concurrent Learning of Control in Multi-agent Sequential Decision Tasks The overall objective of this project was to develop multi-agent reinforcement...learning (MARL) approaches for intelligent agents to autonomously learn distributed control policies in decentral- ized partially observable...shall be subject to any oenalty for failing to comply with a collection of information if it does not display a currently valid OMB control number
Advantages of Task-Specific Multi-Objective Optimisation in Evolutionary Robotics
Trianni, Vito; López-Ibáñez, Manuel
2015-01-01
The application of multi-objective optimisation to evolutionary robotics is receiving increasing attention. A survey of the literature reveals the different possibilities it offers to improve the automatic design of efficient and adaptive robotic systems, and points to the successful demonstrations available for both task-specific and task-agnostic approaches (i.e., with or without reference to the specific design problem to be tackled). However, the advantages of multi-objective approaches over single-objective ones have not been clearly spelled out and experimentally demonstrated. This paper fills this gap for task-specific approaches: starting from well-known results in multi-objective optimisation, we discuss how to tackle commonly recognised problems in evolutionary robotics. In particular, we show that multi-objective optimisation (i) allows evolving a more varied set of behaviours by exploring multiple trade-offs of the objectives to optimise, (ii) supports the evolution of the desired behaviour through the introduction of objectives as proxies, (iii) avoids the premature convergence to local optima possibly introduced by multi-component fitness functions, and (iv) solves the bootstrap problem exploiting ancillary objectives to guide evolution in the early phases. We present an experimental demonstration of these benefits in three different case studies: maze navigation in a single robot domain, flocking in a swarm robotics context, and a strictly collaborative task in collective robotics. PMID:26295151
Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks
NASA Technical Reports Server (NTRS)
Lee, Y. H.; Shin, K. G.
1982-01-01
A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. A performance evaluation in terms of task execution time is also presented.
A multi-threaded version of MCFM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campbell, John M.; Ellis, R. Keith; Giele, Walter T.
We report on our findings modifying MCFM using OpenMP to implement multi-threading. By using OpenMP, the modified MCFM will execute on any processor, automatically adjusting to the number of available threads. We then modified the integration routine VEGAS to distribute the event evaluation over the threads, while combining all events at the end of every iteration to optimize the numerical integration. Furthermore, we took special care so that the results of the Monte Carlo integration were independent of the number of threads used, to facilitate the validation of the OpenMP version of MCFM.
1987-07-01
transmission lines Low - noise mm wave detectors, mixers and amplifiers Multi-GHz chirp transform processors High performance small antenna arrays Multi-GHz A/D...attractive alternative. The overall advantages for HTS mm wave receivers are very- low quantum-limited noise , wide bandwidth, low electrical power...0 0 3 2 1 6 6.3A 0 0 0 2 -3 S Total 2 2 4 S 4 17 116 10, ELF Communication (far term). Extremely low frequency communication via magnetic wave has
Advanced graphical user interface for multi-physics simulations using AMST
NASA Astrophysics Data System (ADS)
Hoffmann, Florian; Vogel, Frank
2017-07-01
Numerical modelling of particulate matter has gained much popularity in recent decades. Advanced Multi-physics Simulation Technology (AMST) is a state-of-the-art three dimensional numerical modelling technique combining the eX-tended Discrete Element Method (XDEM) with Computational Fluid Dynamics (CFD) and Finite Element Analysis (FEA) [1]. One major limitation of this code is the lack of a graphical user interface (GUI) meaning that all pre-processing has to be made directly in a HDF5-file. This contribution presents the first graphical pre-processor developed for AMST.
Multi-processing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1990-01-01
The MIMD concept is applied, through multitasking, with relatively minor modifications to an existing code for a single processor. This approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. An existing single processor algorithm is mapped without the need for developing a new algorithm. The procedure of designing a code utilizing this approach is automated with the Unix stream editor. A Multiple Processor Multiple Grid (MPMG) code is developed as a demonstration of this approach. This code solves the three-dimensional, Reynolds-averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. This solver is applied to a generic, oblique-wing aircraft problem on a four-processor computer using one process for data management and nonparallel computations and three processes for pseudotime advance on three different grid systems.
Design and Analysis of Self-Adapted Task Scheduling Strategies in Wireless Sensor Networks
Guo, Wenzhong; Xiong, Naixue; Chao, Han-Chieh; Hussain, Sajid; Chen, Guolong
2011-01-01
In a wireless sensor network (WSN), the usage of resources is usually highly related to the execution of tasks which consume a certain amount of computing and communication bandwidth. Parallel processing among sensors is a promising solution to provide the demanded computation capacity in WSNs. Task allocation and scheduling is a typical problem in the area of high performance computing. Although task allocation and scheduling in wired processor networks has been well studied in the past, their counterparts for WSNs remain largely unexplored. Existing traditional high performance computing solutions cannot be directly implemented in WSNs due to the limitations of WSNs such as limited resource availability and the shared communication medium. In this paper, a self-adapted task scheduling strategy for WSNs is presented. First, a multi-agent-based architecture for WSNs is proposed and a mathematical model of dynamic alliance is constructed for the task allocation problem. Then an effective discrete particle swarm optimization (PSO) algorithm for the dynamic alliance (DPSO-DA) with a well-designed particle position code and fitness function is proposed. A mutation operator which can effectively improve the algorithm’s ability of global search and population diversity is also introduced in this algorithm. Finally, the simulation results show that the proposed solution can achieve significant better performance than other algorithms. PMID:22163971
Multi-Attribute Sequential Search
ERIC Educational Resources Information Center
Bearden, J. Neil; Connolly, Terry
2007-01-01
This article describes empirical and theoretical results from two multi-attribute sequential search tasks. In both tasks, the DM sequentially encounters options described by two attributes and must pay to learn the values of the attributes. In the "continuous" version of the task the DM learns the precise numerical value of an attribute when she…
Hierarchical algorithms for modeling the ocean on hierarchical architectures
NASA Astrophysics Data System (ADS)
Hill, C. N.
2012-12-01
This presentation will describe an approach to using accelerator/co-processor technology that maps hierarchical, multi-scale modeling techniques to an underlying hierarchical hardware architecture. The focus of this work is on making effective use of both CPU and accelerator/co-processor parts of a system, for large scale ocean modeling. In the work, a lower resolution basin scale ocean model is locally coupled to multiple, "embedded", limited area higher resolution sub-models. The higher resolution models execute on co-processor/accelerator hardware and do not interact directly with other sub-models. The lower resolution basin scale model executes on the system CPU(s). The result is a multi-scale algorithm that aligns with hardware designs in the co-processor/accelerator space. We demonstrate this approach being used to substitute explicit process models for standard parameterizations. Code for our sub-models is implemented through a generic abstraction layer, so that we can target multiple accelerator architectures with different programming environments. We will present two application and implementation examples. One uses the CUDA programming environment and targets GPU hardware. This example employs a simple non-hydrostatic two dimensional sub-model to represent vertical motion more accurately. The second example uses a highly threaded three-dimensional model at high resolution. This targets a MIC/Xeon Phi like environment and uses sub-models as a way to explicitly compute sub-mesoscale terms. In both cases the accelerator/co-processor capability provides extra compute cycles that allow improved model fidelity for little or no extra wall-clock time cost.
Hernández, Moisés; Guerrero, Ginés D.; Cecilia, José M.; García, José M.; Inuggi, Alberto; Jbabdi, Saad; Behrens, Timothy E. J.; Sotiropoulos, Stamatios N.
2013-01-01
With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation. PMID:23658616
Comparing the Performance of Two Dynamic Load Distribution Methods
NASA Technical Reports Server (NTRS)
Kale, L. V.
1987-01-01
Parallel processing of symbolic computations on a message-passing multi-processor presents one challenge: To effectively utilize the available processors, the load must be distributed uniformly to all the processors. However, the structure of these computations cannot be predicted in advance. go, static scheduling methods are not applicable. In this paper, we compare the performance of two dynamic, distributed load balancing methods with extensive simulation studies. The two schemes are: the Contracting Within a Neighborhood (CWN) scheme proposed by us, and the Gradient Model proposed by Lin and Keller. We conclude that although simpler, the CWN is significantly more effective at distributing the work than the Gradient model.
Performance comparison analysis library communication cluster system using merge sort
NASA Astrophysics Data System (ADS)
Wulandari, D. A. R.; Ramadhan, M. E.
2018-04-01
Begins by using a single processor, to increase the speed of computing time, the use of multi-processor was introduced. The second paradigm is known as parallel computing, example cluster. The cluster must have the communication potocol for processing, one of it is message passing Interface (MPI). MPI have many library, both of them OPENMPI and MPICH2. Performance of the cluster machine depend on suitable between performance characters of library communication and characters of the problem so this study aims to analyze the comparative performances libraries in handling parallel computing process. The case study in this research are MPICH2 and OpenMPI. This case research execute sorting’s problem to know the performance of cluster system. The sorting problem use mergesort method. The research method is by implementing OpenMPI and MPICH2 on a Linux-based cluster by using five computer virtual then analyze the performance of the system by different scenario tests and three parameters for to know the performance of MPICH2 and OpenMPI. These performances are execution time, speedup and efficiency. The results of this study showed that the addition of each data size makes OpenMPI and MPICH2 have an average speed-up and efficiency tend to increase but at a large data size decreases. increased data size doesn’t necessarily increased speed up and efficiency but only execution time example in 100000 data size. OpenMPI has a execution time greater than MPICH2 example in 1000 data size average execution time with MPICH2 is 0,009721 and OpenMPI is 0,003895 OpenMPI can customize communication needs.
RC64, a Rad-Hard Many-Core High- Performance DSP for Space Applications
NASA Astrophysics Data System (ADS)
Ginosar, Ran; Aviely, Peleg; Gellis, Hagay; Liran, Tuvia; Israeli, Tsvika; Nesher, Roy; Lange, Fredy; Dobkin, Reuven; Meirov, Henri; Reznik, Dror
2015-09-01
RC64, a novel rad-hard 64-core signal processing chip targets DSP performance of 75 GMACs (16bit), 150 GOPS and 38 single precision GFLOPS while dissipating less than 10 Watts. RC64 integrates advanced DSP cores with a multi-bank shared memory and a hardware scheduler, also supporting DDR2/3 memory and twelve 3.125 Gbps full duplex high speed serial links using SpaceFibre and other protocols. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 300 MHz integrated circuit on a 65nm CMOS technology, assembled in hermetically sealed ceramic CCGA624 package and qualified to the highest space standards.
RC64, a Rad-Hard Many-Core High-Performance DSP for Space Applications
NASA Astrophysics Data System (ADS)
Ginosar, Ran; Aviely, Peleg; Liran, Tuvia; Alon, Dov; Mandler, Alberto; Lange, Fredy; Dobkin, Reuven; Goldberg, Miki
2014-08-01
RC64, a novel rad-hard 64-core signal processing chip targets DSP performance of 75 GMACs (16bit), 150 GOPS and 20 single precision GFLOPS while dissipating less than 10 Watts. RC64 integrates advanced DSP cores with a multi-bank shared memory and a hardware scheduler, also supporting DDR2/3 memory and twelve 2.5 Gbps full duplex high speed serial links using SpaceFibre and other protocols. The programming model employs sequential fine-grain tasks and a separate task map to define task dependencies. RC64 is implemented as a 300 MHz integrated circuit on a 65nm CMOS technology, assembled in hermetically sealed ceramic CCGA624 package and qualified to the highest space standards.
Parallel eigenanalysis of finite element models in a completely connected architecture
NASA Technical Reports Server (NTRS)
Akl, F. A.; Morel, M. R.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
Custom instruction set NIOS-based OFDM processor for FPGAs
NASA Astrophysics Data System (ADS)
Meyer-Bäse, Uwe; Sunkara, Divya; Castillo, Encarnacion; Garcia, Antonio
2006-05-01
Orthogonal Frequency division multiplexing (OFDM) spread spectrum technique, sometimes also called multi-carrier or discrete multi-tone modulation, are used in bandwidth-efficient communication systems in the presence of channel distortion. The benefits of OFDM are high spectral efficiency, resiliency to RF interference, and lower multi-path distortion. OFDM is the basis for the European digital audio broadcasting (DAB) standard, the global asymmetric digital subscriber line (ADSL) standard, in the IEEE 802.11 5.8 GHz band standard, and ongoing development in wireless local area networks. The modulator and demodulator in an OFDM system can be implemented by use of a parallel bank of filters based on the discrete Fourier transform (DFT), in case the number of subchannels is large (e.g. K > 25), the OFDM system are efficiently implemented by use of the fast Fourier transform (FFT) to compute the DFT. We have developed a custom FPGA-based Altera NIOS system to increase the performance, programmability, and low power in mobil wireless systems. The overall gain observed for a 1024-point FFT ranges depending on the multiplier used by the NIOS processor between a factor of 3 and 16. A careful optimization described in the appendix yield a performance gain of up to 77% when compared with our preliminary results.
Multi-threaded ATLAS simulation on Intel Knights Landing processors
NASA Astrophysics Data System (ADS)
Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration
2017-10-01
The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications
NASA Astrophysics Data System (ADS)
Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.
2015-06-01
The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.
Performance of VPIC on Sequoia
NASA Astrophysics Data System (ADS)
Nystrom, William
2014-10-01
Sequoia is a major DOE computing resource which is characteristic of future resources in that it has many threads per compute node, 64, and the individual processor cores are simpler and less powerful than cores on previous processors like Intel's Sandy Bridge or AMD's Opteron. An effort is in progress to port VPIC to the Blue Gene Q architecture of Sequoia and evaluate its performance. Results of this work will be presented on single node performance of VPIC as well as multi-node scaling.
Balasubramonian, Rajeev [Sandy, UT; Dwarkadas, Sandhya [Rochester, NY; Albonesi, David [Ithaca, NY
2009-02-10
In a processor having multiple clusters which operate in parallel, the number of clusters in use can be varied dynamically. At the start of each program phase, the configuration option for an interval is run to determine the optimal configuration, which is used until the next phase change is detected. The optimum instruction interval is determined by starting with a minimum interval and doubling it until a low stability factor is reached.
The Use of a UNIX-Based Workstation in the Information Systems Laboratory
1989-03-01
system. The conclusions of the research and the resulting recommendations are presented in Chapter III. These recommendations include how to manage...required to run the program on a new system, these should not be significant changes. 2. Processing Environment The UNIX processing environment is...interactive with multi-tasking and multi-user capabilities. Multi-tasking refers to the fact that many programs can be run concurrently. This capability
The place-value of a digit in multi-digit numbers is processed automatically.
Kallai, Arava Y; Tzelgov, Joseph
2012-09-01
The automatic processing of the place-value of digits in a multi-digit number was investigated in 4 experiments. Experiment 1 and two control experiments employed a numerical comparison task in which the place-value of a non-zero digit was varied in a string composed of zeros. Experiment 2 employed a physical comparison task in which strings of digits varied in their physical sizes. In both types of tasks, the place-value of the non-zero digit in the string was irrelevant to the task performed. Interference of the place-value information was found in both tasks. When the non-zero digit occupied a lower place-value, it was recognized slower as a larger digit or as written in a larger font size. We concluded that place-value in a multi-digit number is processed automatically. These results support the notion of a decomposed representation of multi-digit numbers in memory. PsycINFO Database Record (c) 2012 APA, all rights reserved.
Task Prioritization in Dual-Tasking: Instructions versus Preferences
Jansen, Reinier J.; van Egmond, René; de Ridder, Huib
2016-01-01
The role of task prioritization in performance tradeoffs during multi-tasking has received widespread attention. However, little is known on whether people have preferences regarding tasks, and if so, whether these preferences conflict with priority instructions. Three experiments were conducted with a high-speed driving game and an auditory memory task. In Experiment 1, participants did not receive priority instructions. Participants performed different sequences of single-task and dual-task conditions. Task performance was evaluated according to participants’ retrospective accounts on preferences. These preferences were reformulated as priority instructions in Experiments 2 and 3. The results showed that people differ in their preferences regarding task prioritization in an experimental setting, which can be overruled by priority instructions, but only after increased dual-task exposure. Additional measures of mental effort showed that performance tradeoffs had an impact on mental effort. The interpretation of these findings was used to explore an extension of Threaded Cognition Theory with Hockey’s Compensatory Control Model. PMID:27391779
Parallel Task Management Library for MARTe
NASA Astrophysics Data System (ADS)
Valcarcel, Daniel F.; Alves, Diogo; Neto, Andre; Reux, Cedric; Carvalho, Bernardo B.; Felton, Robert; Lomas, Peter J.; Sousa, Jorge; Zabeo, Luca
2014-06-01
The Multithreaded Application Real-Time executor (MARTe) is a real-time framework with increasing popularity and support in the thermonuclear fusion community. It allows modular code to run in a multi-threaded environment leveraging on the current multi-core processor (CPU) technology. One application that relies on the MARTe framework is the Joint European Torus (JET) tokamak WAll Load Limiter System (WALLS). It calculates and monitors the temperature on metal tiles and plasma facing components (PFCs) that can melt or flake if their temperature gets too high when exposed to power loads. One of the main time consuming tasks in WALLS is the calculation of thermal diffusion models in real-time. These models tend to be described by very large state-space models thus making them perfect candidates for parallelisation. MARTe's traditional approach for task parallelisation is to split the problem into several Real-Time Threads, each responsible for a self-contained sequential execution of an input-to-output chain. This is usually possible, but it might not always be practical for algorithmic or technical reasons. Also, it might not be easily scalable with an increase in the number of available CPU cores. The WorkLibrary introduces a “GPU-like approach” of splitting work among the available cores of modern CPUs that is (i) straightforward to use in an application, (ii) scalable with the availability of cores and all of this (iii) without rewriting or recompiling the source code. The first part of this article explains the motivation behind the library, its architecture and implementation. The second part presents a real application for WALLS, a parallel version of a large state-space model describing the 2D thermal diffusion on a JET tile.
Liu, Chun; Kroll, Andreas
2016-01-01
Multi-robot task allocation determines the task sequence and distribution for a group of robots in multi-robot systems, which is one of constrained combinatorial optimization problems and more complex in case of cooperative tasks because they introduce additional spatial and temporal constraints. To solve multi-robot task allocation problems with cooperative tasks efficiently, a subpopulation-based genetic algorithm, a crossover-free genetic algorithm employing mutation operators and elitism selection in each subpopulation, is developed in this paper. Moreover, the impact of mutation operators (swap, insertion, inversion, displacement, and their various combinations) is analyzed when solving several industrial plant inspection problems. The experimental results show that: (1) the proposed genetic algorithm can obtain better solutions than the tested binary tournament genetic algorithm with partially mapped crossover; (2) inversion mutation performs better than other tested mutation operators when solving problems without cooperative tasks, and the swap-inversion combination performs better than other tested mutation operators/combinations when solving problems with cooperative tasks. As it is difficult to produce all desired effects with a single mutation operator, using multiple mutation operators (including both inversion and swap) is suggested when solving similar combinatorial optimization problems.
1990-06-01
reader is cautioned that computer programs developed in this research may not have been exercised for all cases of interest. While every effort has been...Source of Funding Numbers _. Program Element No Project No I Task No I Work Unit Accession No 11 Title (Include security classflcation) APPLICATION OF...formats. Previous applications of these encoding formats were on industry standard computers (PC) over a 16-20 klIz channel. This report discusses the
Automatic efficiency optimization of an axial compressor with adjustable inlet guide vanes
NASA Astrophysics Data System (ADS)
Li, Jichao; Lin, Feng; Nie, Chaoqun; Chen, Jingyi
2012-04-01
The inlet attack angle of rotor blade reasonably can be adjusted with the change of the stagger angle of inlet guide vane (IGV); so the efficiency of each condition will be affected. For the purpose to improve the efficiency, the DSP (Digital Signal Processor) controller is designed to adjust the stagger angle of IGV automatically in order to optimize the efficiency at any operating condition. The A/D signal collection includes inlet static pressure, outlet static pressure, outlet total pressure, rotor speed and torque signal, the efficiency can be calculated in the DSP, and the angle signal for the stepping motor which control the IGV will be sent out from the D/A. Experimental investigations are performed in a three-stage, low-speed axial compressor with variable inlet guide vanes. It is demonstrated that the DSP designed can well adjust the stagger angle of IGV online, the efficiency under different conditions can be optimized. This establishment of DSP online adjustment scheme may provide a practical solution for improving performance of multi-stage axial flow compressor when its operating condition is varied.
NASA Astrophysics Data System (ADS)
Balaji, V.; Benson, Rusty; Wyman, Bruce; Held, Isaac
2016-10-01
Climate models represent a large variety of processes on a variety of timescales and space scales, a canonical example of multi-physics multi-scale modeling. Current hardware trends, such as Graphical Processing Units (GPUs) and Many Integrated Core (MIC) chips, are based on, at best, marginal increases in clock speed, coupled with vast increases in concurrency, particularly at the fine grain. Multi-physics codes face particular challenges in achieving fine-grained concurrency, as different physics and dynamics components have different computational profiles, and universal solutions are hard to come by. We propose here one approach for multi-physics codes. These codes are typically structured as components interacting via software frameworks. The component structure of a typical Earth system model consists of a hierarchical and recursive tree of components, each representing a different climate process or dynamical system. This recursive structure generally encompasses a modest level of concurrency at the highest level (e.g., atmosphere and ocean on different processor sets) with serial organization underneath. We propose to extend concurrency much further by running more and more lower- and higher-level components in parallel with each other. Each component can further be parallelized on the fine grain, potentially offering a major increase in the scalability of Earth system models. We present here first results from this approach, called coarse-grained component concurrency, or CCC. Within the Geophysical Fluid Dynamics Laboratory (GFDL) Flexible Modeling System (FMS), the atmospheric radiative transfer component has been configured to run in parallel with a composite component consisting of every other atmospheric component, including the atmospheric dynamics and all other atmospheric physics components. We will explore the algorithmic challenges involved in such an approach, and present results from such simulations. Plans to achieve even greater levels of coarse-grained concurrency by extending this approach within other components, such as the ocean, will be discussed.
Multi-task learning with group information for human action recognition
NASA Astrophysics Data System (ADS)
Qian, Li; Wu, Song; Pu, Nan; Xu, Shulin; Xiao, Guoqiang
2018-04-01
Human action recognition is an important and challenging task in computer vision research, due to the variations in human motion performance, interpersonal differences and recording settings. In this paper, we propose a novel multi-task learning framework with group information (MTL-GI) for accurate and efficient human action recognition. Specifically, we firstly obtain group information through calculating the mutual information according to the latent relationship between Gaussian components and action categories, and clustering similar action categories into the same group by affinity propagation clustering. Additionally, in order to explore the relationships of related tasks, we incorporate group information into multi-task learning. Experimental results evaluated on two popular benchmarks (UCF50 and HMDB51 datasets) demonstrate the superiority of our proposed MTL-GI framework.
A novel soft biomimetic microrobot with two motion attitudes.
Shi, Liwei; Guo, Shuxiang; Li, Maoxun; Mao, Shilian; Xiao, Nan; Gao, Baofeng; Song, Zhibin; Asaka, Kinji
2012-12-06
A variety of microrobots have commonly been used in the fields of biomedical engineering and underwater operations during the last few years. Thanks to their compact structure, low driving power, and simple control systems, microrobots can complete a variety of underwater tasks, even in limited spaces. To accomplish our objectives, we previously designed several bio-inspired underwater microrobots with compact structure, flexibility, and multi-functionality, using ionic polymer metal composite (IPMC) actuators. To implement high-position precision for IPMC legs, in the present research, we proposed an electromechanical model of an IPMC actuator and analysed the deformation and actuating force of an equivalent IPMC cantilever beam, which could be used to design biomimetic legs, fingers, or fins for an underwater microrobot. We then evaluated the tip displacement of an IPMC actuator experimentally. The experimental deflections fit the theoretical values very well when the driving frequency was larger than 1 Hz. To realise the necessary multi-functionality for adapting to complex underwater environments, we introduced a walking biomimetic microrobot with two kinds of motion attitudes: a lying state and a standing state. The microrobot uses eleven IPMC actuators to move and two shape memory alloy (SMA) actuators to change its motion attitude. In the lying state, the microrobot implements stick-insect-inspired walking/rotating motion, fish-like swimming motion, horizontal grasping motion, and floating motion. In the standing state, it implements inchworm-inspired crawling motion in two horizontal directions and grasping motion in the vertical direction. We constructed a prototype of this biomimetic microrobot and evaluated its walking, rotating, and floating speeds experimentally. The experimental results indicated that the robot could attain a maximum walking speed of 3.6 mm/s, a maximum rotational speed of 9°/s, and a maximum floating speed of 7.14 mm/s. Obstacle-avoidance and swimming experiments were also carried out to demonstrate its multi-functionality.
ABC of multi-fractal spacetimes and fractional sea turtles
NASA Astrophysics Data System (ADS)
Calcagni, Gianluca
2016-04-01
We clarify what it means to have a spacetime fractal geometry in quantum gravity and show that its properties differ from those of usual fractals. A weak and a strong definition of multi-scale and multi-fractal spacetimes are given together with a sketch of the landscape of multi-scale theories of gravitation. Then, in the context of the fractional theory with q-derivatives, we explore the consequences of living in a multi-fractal spacetime. To illustrate the behavior of a non-relativistic body, we take the entertaining example of a sea turtle. We show that, when only the time direction is fractal, sea turtles swim at a faster speed than in an ordinary world, while they swim at a slower speed if only the spatial directions are fractal. The latter type of geometry is the one most commonly found in quantum gravity. For time-like fractals, relativistic objects can exceed the speed of light, but strongly so only if their size is smaller than the range of particle-physics interactions. We also find new results about log-oscillating measures, the measure presentation and their role in physical observations and in future extensions to nowhere-differentiable stochastic spacetimes.
Real-Time Visual Tracking through Fusion Features
Ruan, Yang; Wei, Zhenzhong
2016-01-01
Due to their high-speed, correlation filters for object tracking have begun to receive increasing attention. Traditional object trackers based on correlation filters typically use a single type of feature. In this paper, we attempt to integrate multiple feature types to improve the performance, and we propose a new DD-HOG fusion feature that consists of discriminative descriptors (DDs) and histograms of oriented gradients (HOG). However, fusion features as multi-vector descriptors cannot be directly used in prior correlation filters. To overcome this difficulty, we propose a multi-vector correlation filter (MVCF) that can directly convolve with a multi-vector descriptor to obtain a single-channel response that indicates the location of an object. Experiments on the CVPR2013 tracking benchmark with the evaluation of state-of-the-art trackers show the effectiveness and speed of the proposed method. Moreover, we show that our MVCF tracker, which uses the DD-HOG descriptor, outperforms the structure-preserving object tracker (SPOT) in multi-object tracking because of its high-speed and ability to address heavy occlusion. PMID:27347951
Metacognition of Multi-Tasking: How Well Do We Predict the Costs of Divided Attention?
Finley, Jason R.; Benjamin, Aaron S.; McCarley, Jason S.
2014-01-01
Risky multi-tasking, such as texting while driving, may occur because people misestimate the costs of divided attention. In two experiments, participants performed a computerized visual-manual tracking task in which they attempted to keep a mouse cursor within a small target that moved erratically around a circular track. They then separately performed an auditory n-back task. After practicing both tasks separately, participants received feedback on their single-task tracking performance and predicted their dual-task tracking performance before finally performing the two tasks simultaneously. Most participants correctly predicted reductions in tracking performance under dual-task conditions, with a majority overestimating the costs of dual-tasking. However, the between-subjects correlation between predicted and actual performance decrements was near zero. This combination of results suggests that people do anticipate costs of multi-tasking, but have little metacognitive insight on the extent to which they are personally vulnerable to the risks of divided attention, relative to other people. PMID:24490818
Scholey, Andrew; Savage, Karen; O'Neill, Barry V; Owen, Lauren; Stough, Con; Priestley, Caroline; Wetherell, Mark
2014-09-01
This study assessed the effects of two doses of glucose and a caffeine-glucose combination on mood and performance of an ecologically valid, computerised multi-tasking platform. Following a double-blind, placebo-controlled, randomised, parallel-groups design, 150 healthy adults (mean age 34.78 years) consumed drinks containing placebo, 25 g glucose, 60 g glucose or 60 g glucose with 40 mg caffeine. They completed a multi-tasking framework at baseline and then 30 min following drink consumption with mood assessments immediately before and after the multi-tasking framework. Blood glucose and salivary caffeine were co-monitored. The caffeine-glucose group had significantly better total multi-tasking scores than the placebo or 60 g glucose groups and were significantly faster at mental arithmetic tasks than either glucose drink group. There were no significant treatment effects on mood. Caffeine and glucose levels confirmed compliance with overnight abstinence/fasting, respectively, and followed the predicted post-drink patterns. These data suggest that co-administration of glucose and caffeine allows greater allocation of attentional resources than placebo or glucose alone. At present, we cannot rule out the possibility that the effects are due to caffeine alone Future studies should aim at disentangling caffeine and glucose effects. © 2014 The Authors. Human Psychopharmacology: Clinical and Experimental published by John Wiley & Sons, Ltd.
Scholey, Andrew; Savage, Karen; O'Neill, Barry V; Owen, Lauren; Stough, Con; Priestley, Caroline; Wetherell, Mark
2014-01-01
Background This study assessed the effects of two doses of glucose and a caffeine–glucose combination on mood and performance of an ecologically valid, computerised multi-tasking platform. Materials and methods Following a double-blind, placebo-controlled, randomised, parallel-groups design, 150 healthy adults (mean age 34.78 years) consumed drinks containing placebo, 25 g glucose, 60 g glucose or 60 g glucose with 40 mg caffeine. They completed a multi-tasking framework at baseline and then 30 min following drink consumption with mood assessments immediately before and after the multi-tasking framework. Blood glucose and salivary caffeine were co-monitored. Results The caffeine–glucose group had significantly better total multi-tasking scores than the placebo or 60 g glucose groups and were significantly faster at mental arithmetic tasks than either glucose drink group. There were no significant treatment effects on mood. Caffeine and glucose levels confirmed compliance with overnight abstinence/fasting, respectively, and followed the predicted post-drink patterns. Conclusion These data suggest that co-administration of glucose and caffeine allows greater allocation of attentional resources than placebo or glucose alone. At present, we cannot rule out the possibility that the effects are due to caffeine alone Future studies should aim at disentangling caffeine and glucose effects. PMID:25196040
Multi-gigabit optical interconnects for next-generation on-board digital equipment
NASA Astrophysics Data System (ADS)
Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques
2017-11-01
Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.
Multi-gigabit optical interconnects for next-generation on-board digital equipment
NASA Astrophysics Data System (ADS)
Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques
2004-06-01
Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.
Multi-task Gaussian process for imputing missing data in multi-trait and multi-environment trials.
Hori, Tomoaki; Montcho, David; Agbangla, Clement; Ebana, Kaworu; Futakuchi, Koichi; Iwata, Hiroyoshi
2016-11-01
A method based on a multi-task Gaussian process using self-measuring similarity gave increased accuracy for imputing missing phenotypic data in multi-trait and multi-environment trials. Multi-environmental trial (MET) data often encounter the problem of missing data. Accurate imputation of missing data makes subsequent analysis more effective and the results easier to understand. Moreover, accurate imputation may help to reduce the cost of phenotyping for thinned-out lines tested in METs. METs are generally performed for multiple traits that are correlated to each other. Correlation among traits can be useful information for imputation, but single-trait-based methods cannot utilize information shared by traits that are correlated. In this paper, we propose imputation methods based on a multi-task Gaussian process (MTGP) using self-measuring similarity kernels reflecting relationships among traits, genotypes, and environments. This framework allows us to use genetic correlation among multi-trait multi-environment data and also to combine MET data and marker genotype data. We compared the accuracy of three MTGP methods and iterative regularized PCA using rice MET data. Two scenarios for the generation of missing data at various missing rates were considered. The MTGP performed a better imputation accuracy than regularized PCA, especially at high missing rates. Under the 'uniform' scenario, in which missing data arise randomly, inclusion of marker genotype data in the imputation increased the imputation accuracy at high missing rates. Under the 'fiber' scenario, in which missing data arise in all traits for some combinations between genotypes and environments, the inclusion of marker genotype data decreased the imputation accuracy for most traits while increasing the accuracy in a few traits remarkably. The proposed methods will be useful for solving the missing data problem in MET data.
NASA Astrophysics Data System (ADS)
Hauth, T.; Innocente and, V.; Piparo, D.
2012-12-01
The processing of data acquired by the CMS detector at LHC is carried out with an object-oriented C++ software framework: CMSSW. With the increasing luminosity delivered by the LHC, the treatment of recorded data requires extraordinary large computing resources, also in terms of CPU usage. A possible solution to cope with this task is the exploitation of the features offered by the latest microprocessor architectures. Modern CPUs present several vector units, the capacity of which is growing steadily with the introduction of new processor generations. Moreover, an increasing number of cores per die is offered by the main vendors, even on consumer hardware. Most recent C++ compilers provide facilities to take advantage of such innovations, either by explicit statements in the programs sources or automatically adapting the generated machine instructions to the available hardware, without the need of modifying the existing code base. Programming techniques to implement reconstruction algorithms and optimised data structures are presented, that aim to scalable vectorization and parallelization of the calculations. One of their features is the usage of new language features of the C++11 standard. Portions of the CMSSW framework are illustrated which have been found to be especially profitable for the application of vectorization and multi-threading techniques. Specific utility components have been developed to help vectorization and parallelization. They can easily become part of a larger common library. To conclude, careful measurements are described, which show the execution speedups achieved via vectorised and multi-threaded code in the context of CMSSW.
MIMO signal progressing with RLSCMA algorithm for multi-mode multi-core optical transmission system
NASA Astrophysics Data System (ADS)
Bi, Yuan; Liu, Bo; Zhang, Li-jia; Xin, Xiang-jun; Zhang, Qi; Wang, Yong-jun; Tian, Qing-hua; Tian, Feng; Mao, Ya-ya
2018-01-01
In the process of transmitting signals of multi-mode multi-core fiber, there will be mode coupling between modes. The mode dispersion will also occur because each mode has different transmission speed in the link. Mode coupling and mode dispersion will cause damage to the useful signal in the transmission link, so the receiver needs to deal received signal with digital signal processing, and compensate the damage in the link. We first analyzes the influence of mode coupling and mode dispersion in the process of transmitting signals of multi-mode multi-core fiber, then presents the relationship between the coupling coefficient and dispersion coefficient. Then we carry out adaptive signal processing with MIMO equalizers based on recursive least squares constant modulus algorithm (RLSCMA). The MIMO equalization algorithm offers adaptive equalization taps according to the degree of crosstalk in cores or modes, which eliminates the interference among different modes and cores in space division multiplexing(SDM) transmission system. The simulation results show that the distorted signals are restored efficiently with fast convergence speed.
Two-dimensional systolic-array architecture for pixel-level vision tasks
NASA Astrophysics Data System (ADS)
Vijverberg, Julien A.; de With, Peter H. N.
2010-05-01
This paper presents ongoing work on the design of a two-dimensional (2D) systolic array for image processing. This component is designed to operate on a multi-processor system-on-chip. In contrast with other 2D systolic-array architectures and many other hardware accelerators, we investigate the applicability of executing multiple tasks in a time-interleaved fashion on the Systolic Array (SA). This leads to a lower external memory bandwidth and better load balancing of the tasks on the different processing tiles. To enable the interleaving of tasks, we add a shadow-state register for fast task switching. To reduce the number of accesses to the external memory, we propose to share the communication assist between consecutive tasks. A preliminary, non-functional version of the SA has been synthesized for an XV4S25 FPGA device and yields a maximum clock frequency of 150 MHz requiring 1,447 slices and 5 memory blocks. Mapping tasks from video content-analysis applications from literature on the SA yields reductions in the execution time of 1-2 orders of magnitude compared to the software implementation. We conclude that the choice for an SA architecture is useful, but a scaled version of the SA featuring less logic with fewer processing and pipeline stages yielding a lower clock frequency, would be sufficient for a video analysis system-on-chip.
Robust media processing on programmable power-constrained systems
NASA Astrophysics Data System (ADS)
McVeigh, Jeff
2005-03-01
To achieve consumer-level quality, media systems must process continuous streams of audio and video data while maintaining exacting tolerances on sampling rate, jitter, synchronization, and latency. While it is relatively straightforward to design fixed-function hardware implementations to satisfy worst-case conditions, there is a growing trend to utilize programmable multi-tasking solutions for media applications. The flexibility of these systems enables support for multiple current and future media formats, which can reduce design costs and time-to-market. This paper provides practical engineering solutions to achieve robust media processing on such systems, with specific attention given to power-constrained platforms. The techniques covered in this article utilize the fundamental concepts of algorithm and software optimization, software/hardware partitioning, stream buffering, hierarchical prioritization, and system resource and power management. A novel enhancement to dynamically adjust processor voltage and frequency based on buffer fullness to reduce system power consumption is examined in detail. The application of these techniques is provided in a case study of a portable video player implementation based on a general-purpose processor running a non real-time operating system that achieves robust playback of synchronized H.264 video and MP3 audio from local storage and streaming over 802.11.
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
A hybrid algorithm for parallel molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Mangiardi, Chris M.; Meyer, R.
2017-10-01
This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
MILC Code Performance on High End CPU and GPU Supercomputer Clusters
NASA Astrophysics Data System (ADS)
DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug
2018-03-01
With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
Research and development of a control system for multi axis cooperative motion based on PMAC
NASA Astrophysics Data System (ADS)
Guo, Xiao-xiao; Dong, Deng-feng; Zhou, Wei-hu
2017-10-01
Based on Programmable Multi-axes Controller (PMAC), a design of a multi axis motion control system for the simulator of spatial targets' dynamic optical properties is proposed. According to analysis the properties of spatial targets' simulator motion control system, using IPC as the main control layer, TurboPMAC2 as the control layer to meet coordinated motion control, data acquisition and analog output. A simulator using 5 servomotors which is connected with speed reducers to drive the output axis was implemented to simulate the motion of both the sun and the space target. Based on PMAC using PID and a notch filter algorithm, negative feedback, the speed and acceleration feed forward algorithm to satisfy the axis' requirements of the good stability and high precision at low speeds. In the actual system, it shows that the velocity precision is higher than 0.04 s ° and the precision of repetitive positioning is better than 0.006° when each axis is at a low-speed. Besides, the system achieves the control function of multi axis coordinated motion. The design provides an important technical support for detecting spatial targets, also promoting the theoretical research.
Mochizuki, Futa; Kagawa, Keiichiro; Okihara, Shin-ichiro; Seo, Min-Woong; Zhang, Bo; Takasawa, Taishi; Yasutomi, Keita; Kawahito, Shoji
2016-02-22
In the work described in this paper, an image reproduction scheme with an ultra-high-speed temporally compressive multi-aperture CMOS image sensor was demonstrated. The sensor captures an object by compressing a sequence of images with focal-plane temporally random-coded shutters, followed by reconstruction of time-resolved images. Because signals are modulated pixel-by-pixel during capturing, the maximum frame rate is defined only by the charge transfer speed and can thus be higher than those of conventional ultra-high-speed cameras. The frame rate and optical efficiency of the multi-aperture scheme are discussed. To demonstrate the proposed imaging method, a 5×3 multi-aperture image sensor was fabricated. The average rising and falling times of the shutters were 1.53 ns and 1.69 ns, respectively. The maximum skew among the shutters was 3 ns. The sensor observed plasma emission by compressing it to 15 frames, and a series of 32 images at 200 Mfps was reconstructed. In the experiment, by correcting disparities and considering temporal pixel responses, artifacts in the reconstructed images were reduced. An improvement in PSNR from 25.8 dB to 30.8 dB was confirmed in simulations.
Multi-kilowatt modularized spacecraft power processing system development
NASA Technical Reports Server (NTRS)
Andrews, R. E.; Hayden, J. H.; Hedges, R. T.; Rehmann, D. W.
1975-01-01
A review of existing information pertaining to spacecraft power processing systems and equipment was accomplished with a view towards applicability to the modularization of multi-kilowatt power processors. Power requirements for future spacecraft were determined from the NASA mission model-shuttle systems payload data study which provided the limits for modular power equipment capabilities. Three power processing systems were compared to evaluation criteria to select the system best suited for modularity. The shunt regulated direct energy transfer system was selected by this analysis for a conceptual design effort which produced equipment specifications, schematics, envelope drawings, and power module configurations.
Scalability and Portability of Two Parallel Implementations of ADI
NASA Technical Reports Server (NTRS)
Phung, Thanh; VanderWijngaart, Rob F.
1994-01-01
Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Design and implementation of self-balancing coaxial two wheel robot based on HSIC
NASA Astrophysics Data System (ADS)
Hu, Tianlian; Zhang, Hua; Dai, Xin; Xia, Xianfeng; Liu, Ran; Qiu, Bo
2007-12-01
This thesis has studied the control problem concerning position and orientation control of self-balancing coaxial two wheel robot based on the human simulated intelligent control (HSIC) theory. Adopting Lagrange equation, the dynamic model of self-balancing coaxial two-wheel Robot is built up, and the Sensory-motor Intelligent Schemas (SMIS) of HSIC controller for the robot is designed by analyzing its movement and simulating the human controller. In robot's motion process, by perceiving position and orientation of the robot and using multi-mode control strategy based on characteristic identification, the HSIC controller enables the robot to control posture. Utilizing Matlab/Simulink, a simulation platform is established and a motion controller is designed and realized based on RT-Linux real-time operating system, employing high speed ARM9 processor S3C2440 as kernel of the motion controller. The effectiveness of the new design is testified by the experiment.
No scanning depth imaging system based on TOF
NASA Astrophysics Data System (ADS)
Sun, Rongchun; Piao, Yan; Wang, Yu; Liu, Shuo
2016-03-01
To quickly obtain a 3D model of real world objects, multi-point ranging is very important. However, the traditional measuring method usually adopts the principle of point by point or line by line measurement, which is too slow and of poor efficiency. In the paper, a no scanning depth imaging system based on TOF (time of flight) was proposed. The system is composed of light source circuit, special infrared image sensor module, processor and controller of image data, data cache circuit, communication circuit, and so on. According to the working principle of the TOF measurement, image sequence was collected by the high-speed CMOS sensor, and the distance information was obtained by identifying phase difference, and the amplitude image was also calculated. Experiments were conducted and the experimental results show that the depth imaging system can achieve no scanning depth imaging function with good performance.
NASA Technical Reports Server (NTRS)
Yan, Jerry C.
1987-01-01
In concurrent systems, a major responsibility of the resource management system is to decide how the application program is to be mapped onto the multi-processor. Instead of using abstract program and machine models, a generate-and-test framework known as 'post-game analysis' that is based on data gathered during program execution is proposed. Each iteration consists of (1) (a simulation of) an execution of the program; (2) analysis of the data gathered; and (3) the proposal of a new mapping that would have a smaller execution time. These heuristics are applied to predict execution time changes in response to small perturbations applied to the current mapping. An initial experiment was carried out using simple strategies on 'pipeline-like' applications. The results obtained from four simple strategies demonstrated that for this kind of application, even simple strategies can produce acceptable speed-up with a small number of iterations.
Mera, David; Cotos, José M; Varela-Pet, José; Garcia-Pineda, Oscar
2012-10-01
Satellite Synthetic Aperture Radar (SAR) has been established as a useful tool for detecting hydrocarbon spillage on the ocean's surface. Several surveillance applications have been developed based on this technology. Environmental variables such as wind speed should be taken into account for better SAR image segmentation. This paper presents an adaptive thresholding algorithm for detecting oil spills based on SAR data and a wind field estimation as well as its implementation as a part of a functional prototype. The algorithm was adapted to an important shipping route off the Galician coast (northwest Iberian Peninsula) and was developed on the basis of confirmed oil spills. Image testing revealed 99.93% pixel labelling accuracy. By taking advantage of multi-core processor architecture, the prototype was optimized to get a nearly 30% improvement in processing time. Copyright © 2012 Elsevier Ltd. All rights reserved.
A microprocessor based on a two-dimensional semiconductor.
Wachter, Stefan; Polyushkin, Dmitry K; Bethge, Ole; Mueller, Thomas
2017-04-11
The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III-V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor-molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material.
A microprocessor based on a two-dimensional semiconductor
NASA Astrophysics Data System (ADS)
Wachter, Stefan; Polyushkin, Dmitry K.; Bethge, Ole; Mueller, Thomas
2017-04-01
The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III-V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor--molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
NASA Technical Reports Server (NTRS)
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Three Program Architecture for Design Optimization
NASA Technical Reports Server (NTRS)
Miura, Hirokazu; Olson, Lawrence E. (Technical Monitor)
1998-01-01
In this presentation, I would like to review historical perspective on the program architecture used to build design optimization capabilities based on mathematical programming and other numerical search techniques. It is rather straightforward to classify the program architecture in three categories as shown above. However, the relative importance of each of the three approaches has not been static, instead dynamically changing as the capabilities of available computational resource increases. For example, we considered that the direct coupling architecture would never be used for practical problems, but availability of such computer systems as multi-processor. In this presentation, I would like to review the roles of three architecture from historical as well as current and future perspective. There may also be some possibility for emergence of hybrid architecture. I hope to provide some seeds for active discussion where we are heading to in the very dynamic environment for high speed computing and communication.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Development of the SEASIS instrument for SEDSAT
NASA Technical Reports Server (NTRS)
Maier, Mark W.
1996-01-01
Two SEASIS experiment objectives are key: take images that allow three axis attitude determination and take multi-spectral images of the earth. During the tether mission it is also desirable to capture images for the recoiling tether from the endmass perspective (which has never been observed). SEASIS must store all its imagery taken during the tether mission until the earth downlink can be established. SEASIS determines attitude with a panoramic camera and performs earth observation with a telephoto lens camera. Camera video is digitized, compressed, and stored in solid state memory. These objectives are addressed through the following architectural choices: (1) A camera system using a Panoramic Annular Lens (PAL). This lens has a 360 deg. azimuthal field of view by a +45 degree vertical field measured from a plan normal to the lens boresight axis. It has been shown in Mr. Mark Steadham's UAH M.S. thesis that his camera can determine three axis attitude anytime the earth and one other recognizable celestial object (for example, the sun) is in the field of view. This will be essentially all the time during tether deployment. (2) A second camera system using telephoto lens and filter wheel. The camera is a black and white standard video camera. The filters are chosen to cover the visible spectral bands of remote sensing interest. (3) A processor and mass memory arrangement linked to the cameras. Video signals from the cameras are digitized, compressed in the processor, and stored in a large static RAM bank. The processor is a multi-chip module consisting of a T800 Transputer and three Zoran floating point Digital Signal Processors. This processor module was supplied under ARPA contract by the Space Computer Corporation to demonstrate its use in space.
An Updated Version of the U.S. Air Force Multi-Attribute Task Battery (AF-MATB)
2014-08-01
assessing human performance in a controlled multitask environment. The most recent release of AF-MATB contains numerous improvements and additions...Strategic Behavior, MATB, Multitasking , Task Battery, Simulator, Multi-Attribute Task Battery, Automation 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF...performance and multitasking strategy. As a result, a specific Information Throughput (IT) Mode was designed to customize the task to fit the Human
Technology transfer of military space microprocessor developments
NASA Astrophysics Data System (ADS)
Gorden, C.; King, D.; Byington, L.; Lanza, D.
1999-01-01
Over the past 13 years the Air Force Research Laboratory (AFRL) has led the development of microprocessors and computers for USAF space and strategic missile applications. As a result of these Air Force development programs, advanced computer technology is available for use by civil and commercial space customers as well. The Generic VHSIC Spaceborne Computer (GVSC) program began in 1985 at AFRL to fulfill a deficiency in the availability of space-qualified data and control processors. GVSC developed a radiation hardened multi-chip version of the 16-bit, Mil-Std 1750A microprocessor. The follow-on to GVSC, the Advanced Spaceborne Computer Module (ASCM) program, was initiated by AFRL to establish two industrial sources for complete, radiation-hardened 16-bit and 32-bit computers and microelectronic components. Development of the Control Processor Module (CPM), the first of two ASCM contract phases, concluded in 1994 with the availability of two sources for space-qualified, 16-bit Mil-Std-1750A computers, cards, multi-chip modules, and integrated circuits. The second phase of the program, the Advanced Technology Insertion Module (ATIM), was completed in December 1997. ATIM developed two single board computers based on 32-bit reduced instruction set computer (RISC) processors. GVSC, CPM, and ATIM technologies are flying or baselined into the majority of today's DoD, NASA, and commercial satellite systems.
Time series modeling of human operator dynamics in manual control tasks
NASA Technical Reports Server (NTRS)
Biezad, D. J.; Schmidt, D. K.
1984-01-01
A time-series technique is presented for identifying the dynamic characteristics of the human operator in manual control tasks from relatively short records of experimental data. Control of system excitation signals used in the identification is not required. The approach is a multi-channel identification technique for modeling multi-input/multi-output situations. The method presented includes statistical tests for validity, is designed for digital computation, and yields estimates for the frequency responses of the human operator. A comprehensive relative power analysis may also be performed for validated models. This method is applied to several sets of experimental data; the results are discussed and shown to compare favorably with previous research findings. New results are also presented for a multi-input task that has not been previously modeled to demonstrate the strengths of the method.
Time Series Modeling of Human Operator Dynamics in Manual Control Tasks
NASA Technical Reports Server (NTRS)
Biezad, D. J.; Schmidt, D. K.
1984-01-01
A time-series technique is presented for identifying the dynamic characteristics of the human operator in manual control tasks from relatively short records of experimental data. Control of system excitation signals used in the identification is not required. The approach is a multi-channel identification technique for modeling multi-input/multi-output situations. The method presented includes statistical tests for validity, is designed for digital computation, and yields estimates for the frequency response of the human operator. A comprehensive relative power analysis may also be performed for validated models. This method is applied to several sets of experimental data; the results are discussed and shown to compare favorably with previous research findings. New results are also presented for a multi-input task that was previously modeled to demonstrate the strengths of the method.
NASA Astrophysics Data System (ADS)
Panfil, Wawrzyniec; Moczulski, Wojciech
2017-10-01
In the paper presented is a control system of a mobile robots group intended for carrying out inspection missions. The main research problem was to define such a control system in order to facilitate a cooperation of the robots resulting in realization of the committed inspection tasks. Many of the well-known control systems use auctions for tasks allocation, where a subject of an auction is a task to be allocated. It seems that in the case of missions characterized by much larger number of tasks than number of robots it will be better if robots (instead of tasks) are subjects of auctions. The second identified problem concerns the one-sided robot-to-task fitness evaluation. Simultaneous assessment of the robot-to-task fitness and task attractiveness for robot should affect positively for the overall effectiveness of the multi-robot system performance. The elaborated system allows to assign tasks to robots using various methods for evaluation of fitness between robots and tasks, and using some tasks allocation methods. There is proposed the method for multi-criteria analysis, which is composed of two assessments, i.e. robot's concurrency position for task among other robots and task's attractiveness for robot among other tasks. Furthermore, there are proposed methods for tasks allocation applying the mentioned multi-criteria analysis method. The verification of both the elaborated system and the proposed tasks' allocation methods was carried out with the help of simulated experiments. The object under test was a group of inspection mobile robots being a virtual counterpart of the real mobile-robot group.
Li, Lian-Hui; Mo, Rong
2015-01-01
The production task queue has a great significance for manufacturing resource allocation and scheduling decision. Man-made qualitative queue optimization method has a poor effect and makes the application difficult. A production task queue optimization method is proposed based on multi-attribute evaluation. According to the task attributes, the hierarchical multi-attribute model is established and the indicator quantization methods are given. To calculate the objective indicator weight, criteria importance through intercriteria correlation (CRITIC) is selected from three usual methods. To calculate the subjective indicator weight, BP neural network is used to determine the judge importance degree, and then the trapezoid fuzzy scale-rough AHP considering the judge importance degree is put forward. The balanced weight, which integrates the objective weight and the subjective weight, is calculated base on multi-weight contribution balance model. The technique for order preference by similarity to an ideal solution (TOPSIS) improved by replacing Euclidean distance with relative entropy distance is used to sequence the tasks and optimize the queue by the weighted indicator value. A case study is given to illustrate its correctness and feasibility.
Li, Lian-hui; Mo, Rong
2015-01-01
The production task queue has a great significance for manufacturing resource allocation and scheduling decision. Man-made qualitative queue optimization method has a poor effect and makes the application difficult. A production task queue optimization method is proposed based on multi-attribute evaluation. According to the task attributes, the hierarchical multi-attribute model is established and the indicator quantization methods are given. To calculate the objective indicator weight, criteria importance through intercriteria correlation (CRITIC) is selected from three usual methods. To calculate the subjective indicator weight, BP neural network is used to determine the judge importance degree, and then the trapezoid fuzzy scale-rough AHP considering the judge importance degree is put forward. The balanced weight, which integrates the objective weight and the subjective weight, is calculated base on multi-weight contribution balance model. The technique for order preference by similarity to an ideal solution (TOPSIS) improved by replacing Euclidean distance with relative entropy distance is used to sequence the tasks and optimize the queue by the weighted indicator value. A case study is given to illustrate its correctness and feasibility. PMID:26414758
Parallel, stochastic measurement of molecular surface area.
Juba, Derek; Varshney, Amitabh
2008-08-01
Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface area computation that maps well to the emerging multi-core architectures. Our algorithm is also progressive, providing a rough estimate of surface area immediately and refining this estimate as time goes on. Furthermore, the algorithm generates points on the molecular surface which can be used for point-based rendering. We demonstrate a GPU implementation of our algorithm and show that it compares favorably with several existing molecular surface computation programs, giving fast estimates of the molecular surface area with good accuracy.
HyperForest: A high performance multi-processor architecture for real-time intelligent systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.
1997-04-01
Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doorsmore » for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.« less
An SVM-based solution for fault detection in wind turbines.
Santos, Pedro; Villa, Luisa F; Reñones, Aníbal; Bustillo, Andres; Maudes, Jesús
2015-03-09
Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs) are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs) shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets.
NASA Astrophysics Data System (ADS)
Gance, Julien; Texier, Benoît; Leite, Orlando; Bernard, Jean; Truffert, Catherine; Lebert, François; Yamashita, Yoshihiro
2016-04-01
Electrical resistivity tomography (ERT) is an adapted tool for the monitoring of soil moisture variations in aquifers (Binley et al., 2015). Nevertheless, in some specific cases, like for highly permeable soils or fractured aquifers, the measurements from the device can be slower than the water flow through the entire investigated zone. Therefore, the monitoring of such phenomena cannot be performed with classical devices. In such cases, we require a high-speed measurement of soils resistivity. Since 20 years, the speed of acquisition of the resistivity meters has been improved by the development of multi-channel devices allowing to perform multi-electrode (> 4) measurements. The switching capabilities of the actual devices allow to measure over long profiles up to hundreds of electrodes only using one transmitter. Based on this multi-receiver technology and on previous work from Yamashita et al. (2013), authors have developed a 250 W multi-transmitter device for the high speed measurement of resistivity and induced polarization. Current is therefore injected simultaneously in the soil through six injection electrodes. The injected current is coded for each transmitter using Code Division Multiple Access (CDMA, Yamashita et al., 2014) so that the different voltages induced by each sources can be reconstructed from the total potential measurement signal at each receiver, allowing to save acquisition time. The first operational prototype features 3 transmitters and 6 receivers. Its performances are compared to a mono-transmitter device for different sequences of acquisition in 2D and 3D configurations both in theory and on real field data acquired on a shallow sedimentary aquifer in the Loire valley in France. This device is promising for the accurate monitoring of rapid water flows in heterogeneous aquifers.
Multi-Task Learning with Low Rank Attribute Embedding for Multi-Camera Person Re-Identification.
Su, Chi; Yang, Fan; Zhang, Shiliang; Tian, Qi; Davis, Larry Steven; Gao, Wen
2018-05-01
We propose Multi-Task Learning with Low Rank Attribute Embedding (MTL-LORAE) to address the problem of person re-identification on multi-cameras. Re-identifications on different cameras are considered as related tasks, which allows the shared information among different tasks to be explored to improve the re-identification accuracy. The MTL-LORAE framework integrates low-level features with mid-level attributes as the descriptions for persons. To improve the accuracy of such description, we introduce the low-rank attribute embedding, which maps original binary attributes into a continuous space utilizing the correlative relationship between each pair of attributes. In this way, inaccurate attributes are rectified and missing attributes are recovered. The resulting objective function is constructed with an attribute embedding error and a quadratic loss concerning class labels. It is solved by an alternating optimization strategy. The proposed MTL-LORAE is tested on four datasets and is validated to outperform the existing methods with significant margins.
Development of an extensible dual-core wireless sensing node for cyber-physical systems
NASA Astrophysics Data System (ADS)
Kane, Michael; Zhu, Dapeng; Hirose, Mitsuhito; Dong, Xinjun; Winter, Benjamin; Häckell, Mortiz; Lynch, Jerome P.; Wang, Yang; Swartz, A.
2014-04-01
The introduction of wireless telemetry into the design of monitoring and control systems has been shown to reduce system costs while simplifying installations. To date, wireless nodes proposed for sensing and actuation in cyberphysical systems have been designed using microcontrollers with one computational pipeline (i.e., single-core microcontrollers). While concurrent code execution can be implemented on single-core microcontrollers, concurrency is emulated by splitting the pipeline's resources to support multiple threads of code execution. For many applications, this approach to multi-threading is acceptable in terms of speed and function. However, some applications such as feedback controls demand deterministic timing of code execution and maximum computational throughput. For these applications, the adoption of multi-core processor architectures represents one effective solution. Multi-core microcontrollers have multiple computational pipelines that can execute embedded code in parallel and can be interrupted independent of one another. In this study, a new wireless platform named Martlet is introduced with a dual-core microcontroller adopted in its design. The dual-core microcontroller design allows Martlet to dedicate one core to standard wireless sensor operations while the other core is reserved for embedded data processing and real-time feedback control law execution. Another distinct feature of Martlet is a standardized hardware interface that allows specialized daughter boards (termed wing boards) to be interfaced to the Martlet baseboard. This extensibility opens opportunity to encapsulate specialized sensing and actuation functions in a wing board without altering the design of Martlet. In addition to describing the design of Martlet, a few example wings are detailed, along with experiments showing the Martlet's ability to monitor and control physical systems such as wind turbines and buildings.
NASA Astrophysics Data System (ADS)
Ning, Mengmeng; Che, Hang; Kong, Weizhong; Wang, Peng; Liu, Bingxiao; Xu, Zhengdong; Wang, Xiaochao; Long, Changjun; Zhang, Bin; Wu, Youmei
2017-12-01
The physical characteristics of Xiliu 10 Block reservoir is poor, it has strong reservoir inhomogeneity between layers and high kaolinite content of the reservoir, the scaling trend of fluid is serious, causing high block injection well pressure and difficulty in achieving injection requirements. In the past acidizing process, the reaction speed with mineral is fast, the effective distance is shorter and It is also easier to lead to secondary sedimentation in conventional mud acid system. On this point, we raised multi-hydrogen acid technology, multi-hydrogen acid release hydrogen ions by multistage ionization which could react with pore blockage, fillings and skeletal effects with less secondary pollution. Multi-hydrogen acid system has advantages as moderate speed, deep penetration, clay low corrosion rate, wet water and restrains precipitation, etc. It can reach the goal of plug removal in deep stratum. The field application result shows that multi-hydrogen acid plug removal method has good effects on application in low permeability reservoir in Block Xiliu 10.
Field Validation of the Stability Limit of a Multi MW Turbine
NASA Astrophysics Data System (ADS)
Kallesøe, Bjarne S.; Kragh, Knud A.
2016-09-01
Long slender blades of modern multi-megawatt turbines exhibit a flutter like instability at rotor speeds above a critical rotor speed. Knowing the critical rotor speed is crucial to a safe turbine design. The flutter like instability can only be estimated using geometrically non-linear aeroelastic codes. In this study, the estimated rotor speed stability limit of a 7 MW state of the art wind turbine is validated experimentally. The stability limit is estimated using Siemens Wind Powers in-house aeroelastic code, and the results show that the predicted stability limit is within 5% of the experimentally observed limit.
Concepts for Variable/Multi-Speed Rotorcraft Drive System
NASA Technical Reports Server (NTRS)
Stevens, Mark A.; Handschuh, Robert F.; Lewicki, David G.
2008-01-01
In several recent studies and on-going developments for advanced rotorcraft, the need for variable or multi-speed capable rotors has been raised. A speed change of up to 50 percent has been proposed for future rotorcraft to improve overall vehicle performance. Accomplishing rotor speed changes during operation requires both a rotor that can perform effectively over the operation speed/load range, and a propulsion system that can enable these speed changes. A study has been completed to investigate possible drive system arrangements that can accommodate up to the 50 percent speed change. Several concepts will be presented and evaluated. The most promising configurations will be identified and developed for future testing in a sub-scaled test facility to validate operational capability.
Upper Mantle Shear Wave Structure Beneath North America From Multi-mode Surface Wave Tomography
NASA Astrophysics Data System (ADS)
Yoshizawa, K.; Ekström, G.
2008-12-01
The upper mantle structure beneath the North American continent has been investigated from measurements of multi-mode phase speeds of Love and Rayleigh waves. To estimate fundamental-mode and higher-mode phase speeds of surface waves from a single seismogram at regional distances, we have employed a method of nonlinear waveform fitting based on a direct model-parameter search using the neighbourhood algorithm (Yoshizawa & Kennett, 2002). The method of the waveform analysis has been fully automated by employing empirical quantitative measures for evaluating the accuracy/reliability of estimated multi-mode phase dispersion curves, and thus it is helpful in processing the dramatically increasing numbers of seismic data from the latest regional networks such as USArray. As a first step toward modeling the regional anisotropic shear-wave velocity structure of the North American upper mantle with extended vertical resolution, we have applied the method to long-period three-component records of seismic stations in North America, which mostly comprise the GSN and US regional networks as well as the permanent and transportable USArray stations distributed by the IRIS DMC. Preliminary multi-mode phase-speed models show large-scale patterns of isotropic heterogeneity, such as a strong velocity contrast between the western and central/eastern United States, which are consistent with the recent global and regional models (e.g., Marone, et al. 2007; Nettles & Dziewonski, 2008). We will also discuss radial anisotropy of shear wave speed beneath North America from multi-mode dispersion measurements of Love and Rayleigh waves.
A microcomputer based frequency-domain processor for laser Doppler anemometry
NASA Technical Reports Server (NTRS)
Horne, W. Clifton; Adair, Desmond
1988-01-01
A prototype multi-channel laser Doppler anemometry (LDA) processor was assembled using a wideband transient recorder and a microcomputer with an array processor for fast Fourier transform (FFT) computations. The prototype instrument was used to acquire, process, and record signals from a three-component wind tunnel LDA system subject to various conditions of noise and flow turbulence. The recorded data was used to evaluate the effectiveness of burst acceptance criteria, processing algorithms, and selection of processing parameters such as record length. The recorded signals were also used to obtain comparative estimates of signal-to-noise ratio between time-domain and frequency-domain signal detection schemes. These comparisons show that the FFT processing scheme allows accurate processing of signals for which the signal-to-noise ratio is 10 to 15 dB less than is practical using counter processors.
Multi-Hop Link Capacity of Multi-Route Multi-Hop MRC Diversity for a Virtual Cellular Network
NASA Astrophysics Data System (ADS)
Daou, Imane; Kudoh, Eisuke; Adachi, Fumiyuki
In virtual cellular network (VCN), proposed for high-speed mobile communications, the signal transmitted from a mobile terminal is received by some wireless ports distributed in each virtual cell and relayed to the central port that acts as a gateway to the core network. In this paper, we apply the multi-route MHMRC diversity in order to decrease the transmit power and increase the multi-hop link capacity. The transmit power, the interference power and the link capacity are evaluated for DS-CDMA multi-hop VCN by computer simulation. The multi-route MHMRC diversity can be applied to not only DS-CDMA but also other access schemes (i. e. MC-CDMA, OFDM, etc.).
HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition.
Fan, Jianping; Zhao, Tianyi; Kuang, Zhenzhong; Zheng, Yu; Zhang, Ji; Yu, Jun; Peng, Jinye
2017-02-09
In this paper, a hierarchical deep multi-task learning (HD-MTL) algorithm is developed to support large-scale visual recognition (e.g., recognizing thousands or even tens of thousands of atomic object classes automatically). First, multiple sets of multi-level deep features are extracted from different layers of deep convolutional neural networks (deep CNNs), and they are used to achieve more effective accomplishment of the coarseto- fine tasks for hierarchical visual recognition. A visual tree is then learned by assigning the visually-similar atomic object classes with similar learning complexities into the same group, which can provide a good environment for determining the interrelated learning tasks automatically. By leveraging the inter-task relatedness (inter-class similarities) to learn more discriminative group-specific deep representations, our deep multi-task learning algorithm can train more discriminative node classifiers for distinguishing the visually-similar atomic object classes effectively. Our hierarchical deep multi-task learning (HD-MTL) algorithm can integrate two discriminative regularization terms to control the inter-level error propagation effectively, and it can provide an end-to-end approach for jointly learning more representative deep CNNs (for image representation) and more discriminative tree classifier (for large-scale visual recognition) and updating them simultaneously. Our incremental deep learning algorithms can effectively adapt both the deep CNNs and the tree classifier to the new training images and the new object classes. Our experimental results have demonstrated that our HD-MTL algorithm can achieve very competitive results on improving the accuracy rates for large-scale visual recognition.
Burke, Marcus G.; Fonck, Raymond J.; Bongard, Michael W.; ...
2016-07-18
This article corrects an error in M.G. Burke et al., 'Multi-point, high-speed passive ion velocity distribution diagnostic on the Pegasus Toroidal Experiment,' Rev. Sci. Instrum. 83, 10D516 (2012) pertaining to ion temperature. The conclusions of this paper are not altered by the revised ion temperature measurements.
Multi-robot task allocation based on two dimensional artificial fish swarm algorithm
NASA Astrophysics Data System (ADS)
Zheng, Taixiong; Li, Xueqin; Yang, Liangyi
2007-12-01
The problem of task allocation for multiple robots is to allocate more relative-tasks to less relative-robots so as to minimize the processing time of these tasks. In order to get optimal multi-robot task allocation scheme, a twodimensional artificial swarm algorithm based approach is proposed in this paper. In this approach, the normal artificial fish is extended to be two dimension artificial fish. In the two dimension artificial fish, each vector of primary artificial fish is extended to be an m-dimensional vector. Thus, each vector can express a group of tasks. By redefining the distance between artificial fish and the center of artificial fish, the behavior of two dimension fish is designed and the task allocation algorithm based on two dimension artificial swarm algorithm is put forward. At last, the proposed algorithm is applied to the problem of multi-robot task allocation and comparer with GA and SA based algorithm is done. Simulation and compare result shows the proposed algorithm is effective.
Multi-object detection and tracking technology based on hexagonal opto-electronic detector
NASA Astrophysics Data System (ADS)
Song, Yong; Hao, Qun; Li, Xiang
2008-02-01
A novel multi-object detection and tracking technology based on hexagonal opto-electronic detector is proposed, in which (1) a new hexagonal detector, which is composed of 6 linear CCDs, has been firstly developed to achieve the field of view of 360 degree, (2) to achieve the detection and tracking of multi-object with high speed, the object recognition criterions of Object Signal Width Criterion (OSWC) and Horizontal Scale Ratio Criterion (HSRC) are proposed. In this paper, Simulated Experiments have been carried out to verify the validity of the proposed technology, which show that the detection and tracking of multi-object can be achieved with high speed by using the proposed hexagonal detector and the criterions of OSWC and HSRC, indicating that the technology offers significant advantages in Photo-electric Detection, Computer Vision, Virtual Reality, Augment Reality, etc.
Status of Multi-beam Long Trace-profiler Development
NASA Technical Reports Server (NTRS)
Gubarev, Mikhail V.; Merthe, Daniel J.; Kilaru, Kiranmayee; Kester, Thomas; Ramsey, Brian; McKinney, Wayne R.; Takacs, Peter Z.; Dahir, A.; Yashchuk, Valeriy V.
2013-01-01
The multi-beam long trace profiler (MB-LTP) is under development at NASA's Marshall Space Flight Center. The traditional LTPs scans the surface under the test by a single laser beam directly measuring the surface figure slope errors. While capable of exceptional surface slope accuracy, the LTP single beam scanning has slow measuring speed. Metrology efficiency can be increased by replacing the single laser beam with multiple beams that can scan a section of the test surface at a single instance. The increase in speed with such a system would be almost proportional to the number of laser beams. The progress for a multi-beam long trace profiler development is presented.
Efficient system interrupt concept design at the microprogramming level
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fakharzadeh, M.M.
1989-01-01
Over the past decade the demand for high speed super microcomputers has been tremendously increased. To satisfy this demand many high speed 32-bit microcomputers have been designed. However, the currently available 32-bit systems do not provide an adequate solution to many highly demanding problems such as in multitasking, and in interrupt driven applications, which both require context switching. Systems for these purposes usually incorporate sophisticated software. In order to be efficient, a high end microprocessor based system must satisfy stringent software demands. Although these microprocessors use the latest technology in the fabrication design and run at a very high speed,more » they still suffer from insufficient hardware support for such applications. All too often, this lack also is the premier cause of execution inefficiency. In this dissertation a micro-programmable control unit and operation unit is considered in an advanced design. An automaton controller is designed for high speed micro-level interrupt handling. Different stack models are designed for the single task and multitasking environment. The stacks are used for storage of various components of the processor during the interrupt calls, procedure calls, and task switching. A universal (as an example seven port) register file is designed for high speed parameter passing, and intertask communication in the multitasking environment. In addition, the register file provides a direct path between ALU and the peripheral data which is important in real-time control applications. The overall system is a highly parallel architecture, with no pipeline and internal cache memory, which allows the designer to be able to predict the processor's behavior during the critical times.« less
The Experimental Mathematician: The Pleasure of Discovery and the Role of Proof
ERIC Educational Resources Information Center
Borwein, Jonathan M.
2005-01-01
The emergence of powerful mathematical computing environments, the growing availability of correspondingly powerful (multi-processor) computers and the pervasive presence of the Internet allow for mathematicians, students and teachers, to proceed heuristically and "quasi-inductively." We may increasingly use symbolic and numeric computation,…
Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study
2010-01-01
Background Gene silencing using exogenous small interfering RNAs (siRNAs) is now a widespread molecular tool for gene functional study and new-drug target identification. The key mechanism in this technique is to design efficient siRNAs that incorporated into the RNA-induced silencing complexes (RISC) to bind and interact with the mRNA targets to repress their translations to proteins. Although considerable progress has been made in the computational analysis of siRNA binding efficacy, few joint analysis of different RNAi experiments conducted under different experimental scenarios has been done in research so far, while the joint analysis is an important issue in cross-platform siRNA efficacy prediction. A collective analysis of RNAi mechanisms for different datasets and experimental conditions can often provide new clues on the design of potent siRNAs. Results An elegant multi-task learning paradigm for cross-platform siRNA efficacy prediction is proposed. Experimental studies were performed on a large dataset of siRNA sequences which encompass several RNAi experiments recently conducted by different research groups. By using our multi-task learning method, the synergy among different experiments is exploited and an efficient multi-task predictor for siRNA efficacy prediction is obtained. The 19 most popular biological features for siRNA according to their jointly importance in multi-task learning were ranked. Furthermore, the hypothesis is validated out that the siRNA binding efficacy on different messenger RNAs(mRNAs) have different conditional distribution, thus the multi-task learning can be conducted by viewing tasks at an "mRNA"-level rather than at the "experiment"-level. Such distribution diversity derived from siRNAs bound to different mRNAs help indicate that the properties of target mRNA have important implications on the siRNA binding efficacy. Conclusions The knowledge gained from our study provides useful insights on how to analyze various cross-platform RNAi data for uncovering of their complex mechanism. PMID:20380733
Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study.
Liu, Qi; Xu, Qian; Zheng, Vincent W; Xue, Hong; Cao, Zhiwei; Yang, Qiang
2010-04-10
Gene silencing using exogenous small interfering RNAs (siRNAs) is now a widespread molecular tool for gene functional study and new-drug target identification. The key mechanism in this technique is to design efficient siRNAs that incorporated into the RNA-induced silencing complexes (RISC) to bind and interact with the mRNA targets to repress their translations to proteins. Although considerable progress has been made in the computational analysis of siRNA binding efficacy, few joint analysis of different RNAi experiments conducted under different experimental scenarios has been done in research so far, while the joint analysis is an important issue in cross-platform siRNA efficacy prediction. A collective analysis of RNAi mechanisms for different datasets and experimental conditions can often provide new clues on the design of potent siRNAs. An elegant multi-task learning paradigm for cross-platform siRNA efficacy prediction is proposed. Experimental studies were performed on a large dataset of siRNA sequences which encompass several RNAi experiments recently conducted by different research groups. By using our multi-task learning method, the synergy among different experiments is exploited and an efficient multi-task predictor for siRNA efficacy prediction is obtained. The 19 most popular biological features for siRNA according to their jointly importance in multi-task learning were ranked. Furthermore, the hypothesis is validated out that the siRNA binding efficacy on different messenger RNAs(mRNAs) have different conditional distribution, thus the multi-task learning can be conducted by viewing tasks at an "mRNA"-level rather than at the "experiment"-level. Such distribution diversity derived from siRNAs bound to different mRNAs help indicate that the properties of target mRNA have important implications on the siRNA binding efficacy. The knowledge gained from our study provides useful insights on how to analyze various cross-platform RNAi data for uncovering of their complex mechanism.
Zhao, Heng; Song, Pengfei; Meixner, Duane D; Kinnick, Randall R; Callstrom, Matthew R; Sanchez, William; Urban, Matthew W; Manduca, Armando; Greenleaf, James F; Chen, Shigao
2014-11-01
Shear wave speed can be used to assess tissue elasticity, which is associated with tissue health. Ultrasound shear wave elastography techniques based on measuring the propagation speed of the shear waves induced by acoustic radiation force are becoming promising alternatives to biopsy in liver fibrosis staging. However, shear waves generated by such methods are typically very weak. Therefore, the penetration may become problematic, especially for overweight or obese patients. In this study, we developed a new method called external vibration multi-directional ultrasound shearwave elastography (EVMUSE), in which external vibration from a loudspeaker was used to generate a multi-directional shear wave field. A directional filter was then applied to separate the complex shear wave field into several shear wave fields propagating in different directions. A 2-D shear wave speed map was reconstructed from each individual shear wave field, and a final 2-D shear wave speed map was constructed by compounding these individual wave speed maps. The method was validated using two homogeneous phantoms and one multi-purpose tissue-mimicking phantom. Ten patients undergoing liver magnetic resonance elastography (MRE) were also studied with EVMUSE to compare results between the two methods. Phantom results showed EVMUSE was able to quantify tissue elasticity accurately with good penetration. In vivo EVMUSE results were well correlated with MRE results, indicating the promise of using EVMUSE for liver fibrosis staging.
Zhao, Heng; Song, Pengfei; Meixner, Duane D.; Kinnick, Randall R.; Callstrom, Matthew R.; Sanchez, William; Urban, Matthew W.; Manduca, Armando; Greenleaf, James F.
2014-01-01
Shear wave speed can be used to assess tissue elasticity, which is associated with tissue health. Ultrasound shear wave elastography techniques based on measuring the propagation speed of the shear waves induced by acoustic radiation force are becoming promising alternatives to biopsy in liver fibrosis staging. However, shear waves generated by such methods are typically very weak. Therefore, the penetration may become problematic, especially for overweight or obese patients. In this study, we developed a new method called External Vibration Multi-directional Ultrasound Shearwave Elastography (EVMUSE), in which external vibration from a loudspeaker was used to generate a multi-directional shear wave field. A directional filter was then applied to separate the complex shear wave field into several shear wave fields propagating in different directions. A two-dimensional (2D) shear wave speed map was reconstructed from each individual shear wave field, and a final 2D shear wave speed map was constructed by compounding these individual wave speed maps. The method was validated using two homogeneous phantoms and one multi-purpose tissue-mimicking phantom. Ten patients undergoing liver Magnetic Resonance Elastography (MRE) were also studied with EVMUSE to compare results between the two methods. Phantom results showed EVMUSE was able to quantify tissue elasticity accurately with good penetration. In vivo EVMUSE results were well correlated with MRE results, indicating the promise of using EVMUSE for liver fibrosis staging. PMID:25020066
TOPICAL REVIEW: Advances and challenges in computational plasma science
NASA Astrophysics Data System (ADS)
Tang, W. M.; Chan, V. S.
2005-02-01
Scientific simulation, which provides a natural bridge between theory and experiment, is an essential tool for understanding complex plasma behaviour. Recent advances in simulations of magnetically confined plasmas are reviewed in this paper, with illustrative examples, chosen from associated research areas such as microturbulence, magnetohydrodynamics and other topics. Progress has been stimulated, in particular, by the exponential growth of computer speed along with significant improvements in computer technology. The advances in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics have produced increasingly good agreement between experimental observations and computational modelling. This was enabled by two key factors: (a) innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales and (b) access to powerful new computational resources. Excellent progress has been made in developing codes for which computer run-time and problem-size scale well with the number of processors on massively parallel processors (MPPs). Examples include the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPPs to produce three-dimensional, general geometry, nonlinear particle simulations that have accelerated advances in understanding the nature of turbulence self-regulation by zonal flows. These calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In looking towards the future, the current results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. This should produce the scientific excitement which will help to (a) stimulate enhanced cross-cutting collaborations with other fields and (b) attract the bright young talent needed for the future health of the field of plasma science.
Advances and challenges in computational plasma science
NASA Astrophysics Data System (ADS)
Tang, W. M.
2005-02-01
Scientific simulation, which provides a natural bridge between theory and experiment, is an essential tool for understanding complex plasma behaviour. Recent advances in simulations of magnetically confined plasmas are reviewed in this paper, with illustrative examples, chosen from associated research areas such as microturbulence, magnetohydrodynamics and other topics. Progress has been stimulated, in particular, by the exponential growth of computer speed along with significant improvements in computer technology. The advances in both particle and fluid simulations of fine-scale turbulence and large-scale dynamics have produced increasingly good agreement between experimental observations and computational modelling. This was enabled by two key factors: (a) innovative advances in analytic and computational methods for developing reduced descriptions of physics phenomena spanning widely disparate temporal and spatial scales and (b) access to powerful new computational resources. Excellent progress has been made in developing codes for which computer run-time and problem-size scale well with the number of processors on massively parallel processors (MPPs). Examples include the effective usage of the full power of multi-teraflop (multi-trillion floating point computations per second) MPPs to produce three-dimensional, general geometry, nonlinear particle simulations that have accelerated advances in understanding the nature of turbulence self-regulation by zonal flows. These calculations, which typically utilized billions of particles for thousands of time-steps, would not have been possible without access to powerful present generation MPP computers and the associated diagnostic and visualization capabilities. In looking towards the future, the current results from advanced simulations provide great encouragement for being able to include increasingly realistic dynamics to enable deeper physics insights into plasmas in both natural and laboratory environments. This should produce the scientific excitement which will help to (a) stimulate enhanced cross-cutting collaborations with other fields and (b) attract the bright young talent needed for the future health of the field of plasma science.
Multi-agent cooperation rescue algorithm based on influence degree and state prediction
NASA Astrophysics Data System (ADS)
Zheng, Yanbin; Ma, Guangfu; Wang, Linlin; Xi, Pengxue
2018-04-01
Aiming at the multi-agent cooperative rescue in disaster, a multi-agent cooperative rescue algorithm based on impact degree and state prediction is proposed. Firstly, based on the influence of the information in the scene on the collaborative task, the influence degree function is used to filter the information. Secondly, using the selected information to predict the state of the system and Agent behavior. Finally, according to the result of the forecast, the cooperative behavior of Agent is guided and improved the efficiency of individual collaboration. The simulation results show that this algorithm can effectively solve the cooperative rescue problem of multi-agent and ensure the efficient completion of the task.
A new multi-spectral feature level image fusion method for human interpretation
NASA Astrophysics Data System (ADS)
Leviner, Marom; Maltz, Masha
2009-03-01
Various different methods to perform multi-spectral image fusion have been suggested, mostly on the pixel level. However, the jury is still out on the benefits of a fused image compared to its source images. We present here a new multi-spectral image fusion method, multi-spectral segmentation fusion (MSSF), which uses a feature level processing paradigm. To test our method, we compared human observer performance in a three-task experiment using MSSF against two established methods: averaging and principle components analysis (PCA), and against its two source bands, visible and infrared. The three tasks that we studied were: (1) simple target detection, (2) spatial orientation, and (3) camouflaged target detection. MSSF proved superior to the other fusion methods in all three tests; MSSF also outperformed the source images in the spatial orientation and camouflaged target detection tasks. Based on these findings, current speculation about the circumstances in which multi-spectral image fusion in general and specific fusion methods in particular would be superior to using the original image sources can be further addressed.
A queueing model of pilot decision making in a multi-task flight management situation
NASA Technical Reports Server (NTRS)
Walden, R. S.; Rouse, W. B.
1977-01-01
Allocation of decision making responsibility between pilot and computer is considered and a flight management task, designed for the study of pilot-computer interaction, is discussed. A queueing theory model of pilot decision making in this multi-task, control and monitoring situation is presented. An experimental investigation of pilot decision making and the resulting model parameters are discussed.
Multi-phase SPH modelling of violent hydrodynamics on GPUs
NASA Astrophysics Data System (ADS)
Mokos, Athanasios; Rogers, Benedict D.; Stansby, Peter K.; Domínguez, José M.
2015-11-01
This paper presents the acceleration of multi-phase smoothed particle hydrodynamics (SPH) using a graphics processing unit (GPU) enabling large numbers of particles (10-20 million) to be simulated on just a single GPU card. With novel hardware architectures such as a GPU, the optimum approach to implement a multi-phase scheme presents some new challenges. Many more particles must be included in the calculation and there are very different speeds of sound in each phase with the largest speed of sound determining the time step. This requires efficient computation. To take full advantage of the hardware acceleration provided by a single GPU for a multi-phase simulation, four different algorithms are investigated: conditional statements, binary operators, separate particle lists and an intermediate global function. Runtime results show that the optimum approach needs to employ separate cell and neighbour lists for each phase. The profiler shows that this approach leads to a reduction in both memory transactions and arithmetic operations giving significant runtime gains. The four different algorithms are compared to the efficiency of the optimised single-phase GPU code, DualSPHysics, for 2-D and 3-D simulations which indicate that the multi-phase functionality has a significant computational overhead. A comparison with an optimised CPU code shows a speed up of an order of magnitude over an OpenMP simulation with 8 threads and two orders of magnitude over a single thread simulation. A demonstration of the multi-phase SPH GPU code is provided by a 3-D dam break case impacting an obstacle. This shows better agreement with experimental results than an equivalent single-phase code. The multi-phase GPU code enables a convergence study to be undertaken on a single GPU with a large number of particles that otherwise would have required large high performance computing resources.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Posse, Stefan
2011-01-01
The rapid development of fMRI was paralleled early on by the adaptation of MR spectroscopic imaging (MRSI) methods to quantify water relaxation changes during brain activation. This review describes the evolution of multi-echo acquisition from high-speed MRSI to multi-echo EPI and beyond. It highlights milestones in the development of multi-echo acquisition methods, such as the discovery of considerable gains in fMRI sensitivity when combining echo images, advances in quantification of the BOLD effect using analytical biophysical modeling and interleaved multi-region shimming. The review conveys the insight gained from combining fMRI and MRSI methods and concludes with recent trends in ultra-fast fMRI, which will significantly increase temporal resolution of multi-echo acquisition. PMID:22056458
Multi-GPU accelerated three-dimensional FDTD method for electromagnetic simulation.
Nagaoka, Tomoaki; Watanabe, Soichi
2011-01-01
Numerical simulation with a numerical human model using the finite-difference time domain (FDTD) method has recently been performed in a number of fields in biomedical engineering. To improve the method's calculation speed and realize large-scale computing with the numerical human model, we adapt three-dimensional FDTD code to a multi-GPU environment using Compute Unified Device Architecture (CUDA). In this study, we used NVIDIA Tesla C2070 as GPGPU boards. The performance of multi-GPU is evaluated in comparison with that of a single GPU and vector supercomputer. The calculation speed with four GPUs was approximately 3.5 times faster than with a single GPU, and was slightly (approx. 1.3 times) slower than with the supercomputer. Calculation speed of the three-dimensional FDTD method using GPUs can significantly improve with an expanding number of GPUs.
Multitasking a three-dimensional Navier-Stokes algorithm on the Cray-2
NASA Technical Reports Server (NTRS)
Swisshelm, Julie M.
1989-01-01
A three-dimensional computational aerodynamics algorithm has been multitasked for efficient parallel execution on the Cray-2. It provides a means for examining the multitasking performance of a complete CFD application code. An embedded zonal multigrid scheme is used to solve the Reynolds-averaged Navier-Stokes equations for an internal flow model problem. The explicit nature of each component of the method allows a spatial partitioning of the computational domain to achieve a well-balanced task load for MIMD computers with vector-processing capability. Experiments have been conducted with both two- and three-dimensional multitasked cases. The best speedup attained by an individual task group was 3.54 on four processors of the Cray-2, while the entire solver yielded a speedup of 2.67 on four processors for the three-dimensional case. The multiprocessing efficiency of various types of computational tasks is examined, performance on two Cray-2s with different memory access speeds is compared, and extrapolation to larger problems is discussed.
Parallel Ada benchmarks for the SVMS
NASA Technical Reports Server (NTRS)
Collard, Philippe E.
1990-01-01
The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.
Large liquid rocket engine transient performance simulation system
NASA Technical Reports Server (NTRS)
Mason, J. R.; Southwick, R. D.
1989-01-01
Phase 1 of the Rocket Engine Transient Simulation (ROCETS) program consists of seven technical tasks: architecture; system requirements; component and submodel requirements; submodel implementation; component implementation; submodel testing and verification; and subsystem testing and verification. These tasks were completed. Phase 2 of ROCETS consists of two technical tasks: Technology Test Bed Engine (TTBE) model data generation; and system testing verification. During this period specific coding of the system processors was begun and the engineering representations of Phase 1 were expanded to produce a simple model of the TTBE. As the code was completed, some minor modifications to the system architecture centering on the global variable common, GLOBVAR, were necessary to increase processor efficiency. The engineering modules completed during Phase 2 are listed: INJTOO - main injector; MCHBOO - main chamber; NOZLOO - nozzle thrust calculations; PBRNOO - preburner; PIPE02 - compressible flow without inertia; PUMPOO - polytropic pump; ROTROO - rotor torque balance/speed derivative; and TURBOO - turbine. Detailed documentation of these modules is in the Appendix. In addition to the engineering modules, several submodules were also completed. These submodules include combustion properties, component performance characteristics (maps), and specific utilities. Specific coding was begun on the system configuration processor. All functions necessary for multiple module operation were completed but the SOLVER implementation is still under development. This system, the Verification Checkout Facility (VCF) allows interactive comparison of module results to store data as well as provides an intermediate checkout of the processor code. After validation using the VCF, the engineering modules and submodules were used to build a simple TTBE.
Multi-level manual and autonomous control superposition for intelligent telerobot
NASA Technical Reports Server (NTRS)
Hirai, Shigeoki; Sato, T.
1989-01-01
Space telerobots are recognized to require cooperation with human operators in various ways. Multi-level manual and autonomous control superposition in telerobot task execution is described. The object model, the structured master-slave manipulation system, and the motion understanding system are proposed to realize the concept. The object model offers interfaces for task level and object level human intervention. The structured master-slave manipulation system offers interfaces for motion level human intervention. The motion understanding system maintains the consistency of the knowledge through all the levels which supports the robot autonomy while accepting the human intervention. The superposing execution of the teleoperational task at multi-levels realizes intuitive and robust task execution for wide variety of objects and in changeful environment. The performance of several examples of operating chemical apparatuses is shown.
Zhang, Rubo; Yang, Yu
2017-01-01
Research on distributed task planning model for multi-autonomous underwater vehicle (MAUV). A scroll time domain quantum artificial bee colony (STDQABC) optimization algorithm is proposed to solve the multi-AUV optimal task planning scheme. In the uncertain marine environment, the rolling time domain control technique is used to realize a numerical optimization in a narrowed time range. Rolling time domain control is one of the better task planning techniques, which can greatly reduce the computational workload and realize the tradeoff between AUV dynamics, environment and cost. Finally, a simulation experiment was performed to evaluate the distributed task planning performance of the scroll time domain quantum bee colony optimization algorithm. The simulation results demonstrate that the STDQABC algorithm converges faster than the QABC and ABC algorithms in terms of both iterations and running time. The STDQABC algorithm can effectively improve MAUV distributed tasking planning performance, complete the task goal and get the approximate optimal solution. PMID:29186166
Li, Jianjun; Zhang, Rubo; Yang, Yu
2017-01-01
Research on distributed task planning model for multi-autonomous underwater vehicle (MAUV). A scroll time domain quantum artificial bee colony (STDQABC) optimization algorithm is proposed to solve the multi-AUV optimal task planning scheme. In the uncertain marine environment, the rolling time domain control technique is used to realize a numerical optimization in a narrowed time range. Rolling time domain control is one of the better task planning techniques, which can greatly reduce the computational workload and realize the tradeoff between AUV dynamics, environment and cost. Finally, a simulation experiment was performed to evaluate the distributed task planning performance of the scroll time domain quantum bee colony optimization algorithm. The simulation results demonstrate that the STDQABC algorithm converges faster than the QABC and ABC algorithms in terms of both iterations and running time. The STDQABC algorithm can effectively improve MAUV distributed tasking planning performance, complete the task goal and get the approximate optimal solution.
Burke, Marcus G. [University of Wisconsin-Madison] (ORCID:0000000176193724); Fonck, Raymond J. [University of Wisconsin-Madison] (ORCID:0000000294386762); Bongard, Michael W. [University of Wisconsin-Madison] (ORCID:0000000231609746); Schlossberg, David J. [University of Wisconsin-Madison] (ORCID:0000000287139448); Winz, Gregory R. [University of Wisconsin-Madison] (ORCID:0000000177627184)
2016-07-18
This data set contains openly-documented, machine readable digital research data corresponding to figures published in M.G. Burke et al., 'Erratum: "Multi-point, high-speed passive ion velocity distribution diagnostic on the Pegasus Toroidal Experiment" [Rev. Sci. Instrum. 83, 10D516 (2012)],' Rev. Sci. Instrum. 87, 079902 (2016).
NASA Astrophysics Data System (ADS)
Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry
2003-08-01
Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Design of multi-function sensor detection system in coal mine based on ARM
NASA Astrophysics Data System (ADS)
Ge, Yan-Xiang; Zhang, Quan-Zhu; Deng, Yong-Hong
2017-06-01
The traditional coal mine sensor in the specific measurement points, the number and type of channel will be greater than or less than the number of monitoring points, resulting in a waste of resources or cannot meet the application requirements, in order to enable the sensor to adapt to the needs of different occasions and reduce the cost, a kind of multi-functional intelligent sensor multiple sensors and ARM11 the S3C6410 processor is used to design and realize the dust, gas, temperature and humidity sensor functions together, and has storage, display, voice, pictures, data query, alarm and other new functions.
Techniques in processing multi-frequency multi-polarization spaceborne SAR data
NASA Technical Reports Server (NTRS)
Curlander, John C.; Chang, C. Y.
1991-01-01
This paper presents the algorithm design of the SIR-C ground data processor, with emphasis on the unique elements involved in the production of registered multifrequency polarimetric data products. A quick-look processing algorithm used for generation of low-resolution browse image products and estimation of echo signal parameters is also presented. Specifically the discussion covers: (1) azimuth reference function generation to produce registered polarimetric imagery; (2) geometric rectification to accommondate cross-track and along-track Doppler drifts; (3) multilook filtering designed to generate output imagery with a uniform resolution; and (4) efficient coding to compress the polarimetric image data for distribution.
ROOT 6 and beyond: TObject, C++14 and many cores
Bellenot, B.; Canal, Ph; Couet, O.; ...
2015-12-23
Following the release of version 6, ROOT has entered a new area of development. It will leverage the industrial strength compiler library shipping in ROOT 6 and its support of the C++11/14 standard, to significantly simplify and harden ROOT's interfaces and to clarify and substantially improve ROOT's support for multi-threaded environments. Furthermore, this talk will also recap the most important new features and enhancements in ROOT in general, focusing on those allowed by the improved interpreter and better compiler support, including I/O for smart pointers, easier type safe access to the content of TTrees and enhanced multi processor support.
Langevin, Christian D.
2009-01-01
SEAWAT is a MODFLOW-based computer program designed to simulate variable-density groundwater flow coupled with multi-species solute and heat transport. The program has been used for a wide variety of groundwater studies including saltwater intrusion in coastal aquifers, aquifer storage and recovery in brackish limestone aquifers, and brine migration within continental aquifers. SEAWAT is relatively easy to apply because it uses the familiar MODFLOW structure. Thus, most commonly used pre- and post-processors can be used to create datasets and visualize results. SEAWAT is a public domain computer program distributed free of charge by the U.S. Geological Survey.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-09-12
Mcqueuer is a simple tool that allows anyone from researchers to experienced developers to create multi-node/multi-core jobs by simply creating a file with a list of commands. Users simply combine tasks, which would otherwise each be their own job on the cluster, into a single file that is given to Mcqueuer. Mcqueuer then does the heavy lifting required to process the tasks in parallel in a single multi-node job. In addition, Mcqueuer provides load-balancing, which frees the user from having to worry about complex memory and CPU considerations, and instead focus on the processing itself.
Goal-oriented robot navigation learning using a multi-scale space representation.
Llofriu, M; Tejera, G; Contreras, M; Pelc, T; Fellous, J M; Weitzenfeld, A
2015-12-01
There has been extensive research in recent years on the multi-scale nature of hippocampal place cells and entorhinal grid cells encoding which led to many speculations on their role in spatial cognition. In this paper we focus on the multi-scale nature of place cells and how they contribute to faster learning during goal-oriented navigation when compared to a spatial cognition system composed of single scale place cells. The task consists of a circular arena with a fixed goal location, in which a robot is trained to find the shortest path to the goal after a number of learning trials. Synaptic connections are modified using a reinforcement learning paradigm adapted to the place cells multi-scale architecture. The model is evaluated in both simulation and physical robots. We find that larger scale and combined multi-scale representations favor goal-oriented navigation task learning. Copyright © 2015 Elsevier Ltd. All rights reserved.
A theoretical framework for negotiating the path of emergency management multi-agency coordination.
Curnin, Steven; Owen, Christine; Paton, Douglas; Brooks, Benjamin
2015-03-01
Multi-agency coordination represents a significant challenge in emergency management. The need for liaison officers working in strategic level emergency operations centres to play organizational boundary spanning roles within multi-agency coordination arrangements that are enacted in complex and dynamic emergency response scenarios creates significant research and practical challenges. The aim of the paper is to address a gap in the literature regarding the concept of multi-agency coordination from a human-environment interaction perspective. We present a theoretical framework for facilitating multi-agency coordination in emergency management that is grounded in human factors and ergonomics using the methodology of core-task analysis. As a result we believe the framework will enable liaison officers to cope more efficiently within the work domain. In addition, we provide suggestions for extending the theory of core-task analysis to an alternate high reliability environment. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Application of convolve-multiply-convolve SAW processor for satellite communications
NASA Technical Reports Server (NTRS)
Lie, Y. S.; Ching, M.
1991-01-01
There is a need for a satellite communications receiver than can perform simultaneous multi-channel processing of single channel per carrier (SCPC) signals originating from various small (mobile or fixed) earth stations. The number of ground users can be as many as 1000. Conventional techniques of simultaneously processing these signals is by employing as many RF-bandpass filters as the number of channels. Consequently, such an approach would result in a bulky receiver, which becomes impractical for satellite applications. A unique approach utilizing a realtime surface acoustic wave (SAW) chirp transform processor is presented. The application of a Convolve-Multiply-Convolve (CMC) chirp transform processor is described. The CMC processor transforms each input channel into a unique timeslot, while preserving its modulation content (in this case QPSK). Subsequently, each channel is individually demodulated without the need of input channel filters. Circuit complexity is significantly reduced, because the output frequency of the CMC processor is common for all input channel frequencies. The results of theoretical analysis and experimental results are in good agreement.
Design and realization of the baseband processor in satellite navigation and positioning receiver
NASA Astrophysics Data System (ADS)
Zhang, Dawei; Hu, Xiulin; Li, Chen
2007-11-01
The content of this paper is focused on the Design and realization of the baseband processor in satellite navigation and positioning receiver. Baseband processor is the most important part of the satellite positioning receiver. The design covers baseband processor's main functions include multi-channel digital signal DDC, acquisition, code tracking, carrier tracking, demodulation, etc. The realization is based on an Altera's FPGA device, that makes the system can be improved and upgraded without modifying the hardware. It embodies the theory of software defined radio (SDR), and puts the theory of the spread spectrum into practice. This paper puts emphasis on the realization of baseband processor in FPGA. In the order of choosing chips, design entry, debugging and synthesis, the flow is presented detailedly. Additionally the paper detailed realization of Digital PLL in order to explain a method of reducing the consumption of FPGA. Finally, the paper presents the result of Synthesis. This design has been used in BD-1, BD-2 and GPS.
NASA Technical Reports Server (NTRS)
Chu, Y. Y.
1978-01-01
A unified formulation of computer-aided, multi-task, decision making is presented. Strategy for the allocation of decision making responsibility between human and computer is developed. The plans of a flight management systems are studied. A model based on the queueing theory was implemented.
ERIC Educational Resources Information Center
Mechling, Linda C.; Ayres, Kevin M.; Purrazzella, Kaitlin; Purrazzella, Kimberly
2014-01-01
This investigation examined the ability of four adults with moderate intellectual disability to complete multi-component tasks using continuous video modeling. Continuous video modeling, which is a newly researched application of video modeling, presents video in a "looping" format which automatically repeats playing of the video while…
Data Mining Algorithms for Classification of Complex Biomedical Data
ERIC Educational Resources Information Center
Lan, Liang
2012-01-01
In my dissertation, I will present my research which contributes to solve the following three open problems from biomedical informatics: (1) Multi-task approaches for microarray classification; (2) Multi-label classification of gene and protein prediction from multi-source biological data; (3) Spatial scan for movement data. In microarray…
RoboCup: Multi-disciplinary Senior Design Project.
ERIC Educational Resources Information Center
Elder, Kevin Lee
A cross-college team of educators has developed a collaborative, multi-disciplinary senior design course at Ohio University. This course offers an attractive opportunity for students from a variety of disciplines to work together in a learning community to accomplish a challenging task. It provides a novel multi-disciplinary learning environment…
Boyacioğlu, Rasim; Schulz, Jenni; Koopmans, Peter J; Barth, Markus; Norris, David G
2015-10-01
A multiband multi-echo (MBME) sequence is implemented and compared to a matched standard multi-echo (ME) protocol to investigate the potential improvement in sensitivity and spatial specificity at 7 T for resting state and task fMRI. ME acquisition is attractive because BOLD sensitivity is less affected by variation in T2*, and because of the potential for separating BOLD and non-BOLD signal components. MBME further reduces TR thus increasing the potential reduction in physiological noise. In this study we used FSL-FIX to clean ME and MBME resting state and task fMRI data (both 3.5mm isotropic). After noise correction, the detection of resting state networks improves with more non-artifactual independent components being observed. Additional activation clusters for task data are discovered for MBME data (increased sensitivity) whereas existing clusters become more localized for resting state (improved spatial specificity). The results obtained indicate that MBME is superior to ME at high field strengths. Copyright © 2015 Elsevier Inc. All rights reserved.
Problems With Deployment of Multi-Domained, Multi-Homed Mobile Networks
NASA Technical Reports Server (NTRS)
Ivancic, William D.
2008-01-01
This document describes numerous problems associated with deployment of multi-homed mobile platforms consisting of multiple networks and traversing large geographical areas. The purpose of this document is to provide insight to real-world deployment issues and provide information to groups that are addressing many issues related to multi-homing, policy-base routing, route optimization and mobile security - particularly those groups within the Internet Engineering Task Force.
MREG V1.1 : a multi-scale image registration algorithm for SAR applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eichel, Paul H.
2013-08-01
MREG V1.1 is the sixth generation SAR image registration algorithm developed by the Signal Processing&Technology Department for Synthetic Aperture Radar applications. Like its predecessor algorithm REGI, it employs a powerful iterative multi-scale paradigm to achieve the competing goals of sub-pixel registration accuracy and the ability to handle large initial offsets. Since it is not model based, it allows for high fidelity tracking of spatially varying terrain-induced misregistration. Since it does not rely on image domain phase, it is equally adept at coherent and noncoherent image registration. This document provides a brief history of the registration processors developed by Dept. 5962more » leading up to MREG V1.1, a full description of the signal processing steps involved in the algorithm, and a user's manual with application specific recommendations for CCD, TwoColor MultiView, and SAR stereoscopy.« less
NASA Astrophysics Data System (ADS)
Rahman, P. A.
2018-05-01
This scientific paper deals with the model of the knapsack optimization problem and method of its solving based on directed combinatorial search in the boolean space. The offered by the author specialized mathematical model of decomposition of the search-zone to the separate search-spheres and the algorithm of distribution of the search-spheres to the different cores of the multi-core processor are also discussed. The paper also provides an example of decomposition of the search-zone to the several search-spheres and distribution of the search-spheres to the different cores of the quad-core processor. Finally, an offered by the author formula for estimation of the theoretical maximum of the computational acceleration, which can be achieved due to the parallelization of the search-zone to the search-spheres on the unlimited number of the processor cores, is also given.
Processor farming in two-level analysis of historical bridge
NASA Astrophysics Data System (ADS)
Krejčí, T.; Kruis, J.; Koudelka, T.; Šejnoha, M.
2017-11-01
This contribution presents a processor farming method in connection with a multi-scale analysis. In this method, each macro-scopic integration point or each finite element is connected with a certain meso-scopic problem represented by an appropriate representative volume element (RVE). The solution of a meso-scale problem provides then effective parameters needed on the macro-scale. Such an analysis is suitable for parallel computing because the meso-scale problems can be distributed among many processors. The application of the processor farming method to a real world masonry structure is illustrated by an analysis of Charles bridge in Prague. The three-dimensional numerical model simulates the coupled heat and moisture transfer of one half of arch No. 3. and it is a part of a complex hygro-thermo-mechanical analysis which has been developed to determine the influence of climatic loading on the current state of the bridge.
Robust multi-model control of an autonomous wind power system
NASA Astrophysics Data System (ADS)
Cutululis, Nicolas Antonio; Ceanga, Emil; Hansen, Anca Daniela; Sørensen, Poul
2006-09-01
This article presents a robust multi-model control structure for a wind power system that uses a variable speed wind turbine (VSWT) driving a permanent magnet synchronous generator (PMSG) connected to a local grid. The control problem consists in maximizing the energy captured from the wind for varying wind speeds. The VSWT-PMSG linearized model analysis reveals the resonant nature of its dynamic at points on the optimal regimes characteristic (ORC). The natural frequency of the system and the damping factor are strongly dependent on the operating point on the ORC. Under these circumstances a robust multi-model control structure is designed. The simulation results prove the viability of the proposed control structure. Copyright
Dual motor drive vehicle speed synchronization and coordination control strategy
NASA Astrophysics Data System (ADS)
Huang, Hao; Tu, Qunzhang; Jiang, Chenming; Ma, Limin; Li, Pei; Zhang, Hongxing
2018-04-01
Multi-motor driven systems are more and more widely used in the field of electric engineering vehicles, as a result of the road conditions and the variable load of engineering vehicles, makes multi-motors synchronization coordinated control system as a key point of the development of the electric vehicle drive system. This paper based on electrical machinery transmission speed in the process of engineering vehicles headed for coordinated control problem, summarized control strategies at home and abroad in recent years, made analysis and comparison of the characteristics, finally discussed the trend of development of the multi-motor coordination control, provided a reference for synchronized control system research of electric drive engineering vehicles.
Passive perception system for day/night autonomous off-road navigation
NASA Astrophysics Data System (ADS)
Rankin, Arturo L.; Bergh, Charles F.; Goldberg, Steven B.; Bellutta, Paolo; Huertas, Andres; Matthies, Larry H.
2005-05-01
Passive perception of terrain features is a vital requirement for military related unmanned autonomous vehicle operations, especially under electromagnetic signature management conditions. As a member of Team Raptor, the Jet Propulsion Laboratory developed a self-contained passive perception system under the DARPA funded PerceptOR program. An environmentally protected forward-looking sensor head was designed and fabricated in-house to straddle an off-the-shelf pan-tilt unit. The sensor head contained three color cameras for multi-baseline daytime stereo ranging, a pair of cooled mid-wave infrared cameras for nighttime stereo ranging, and supporting electronics to synchronize captured imagery. Narrow-baseline stereo provided improved range data density in cluttered terrain, while wide-baseline stereo provided more accurate ranging for operation at higher speeds in relatively open areas. The passive perception system processed stereo images and outputted over a local area network terrain maps containing elevation, terrain type, and detected hazards. A novel software architecture was designed and implemented to distribute the data processing on a 533MHz quad 7410 PowerPC single board computer under the VxWorks real-time operating system. This architecture, which is general enough to operate on N processors, has been subsequently tested on Pentium-based processors under Windows and Linux, and a Sparc based-processor under Unix. The passive perception system was operated during FY04 PerceptOR program evaluations at Fort A. P. Hill, Virginia, and Yuma Proving Ground, Arizona. This paper discusses the Team Raptor passive perception system hardware and software design, implementation, and performance, and describes a road map to faster and improved passive perception.
Application of Multi-task Lasso Regression in the Stellar Parametrization
NASA Astrophysics Data System (ADS)
Chang, L. N.; Zhang, P. A.
2015-01-01
The multi-task learning approaches have attracted the increasing attention in the fields of machine learning, computer vision, and artificial intelligence. By utilizing the correlations in tasks, learning multiple related tasks simultaneously is better than learning each task independently. An efficient multi-task Lasso (Least Absolute Shrinkage Selection and Operator) regression algorithm is proposed in this paper to estimate the physical parameters of stellar spectra. It not only makes different physical parameters share the common features, but also can effectively preserve their own peculiar features. Experiments were done based on the ELODIE data simulated with the stellar atmospheric simulation model, and on the SDSS data released by the American large survey Sloan. The precision of the model is better than those of the methods in the related literature, especially for the acceleration of gravity (lg g) and the chemical abundance ([Fe/H]). In the experiments, we changed the resolution of the spectrum, and applied the noises with different signal-to-noise ratio (SNR) to the spectrum, so as to illustrate the stability of the model. The results show that the model is influenced by both the resolution and the noise. But the influence of the noise is larger than that of the resolution. In general, the multi-task Lasso regression algorithm is easy to operate, has a strong stability, and also can improve the overall accuracy of the model.
NASA Astrophysics Data System (ADS)
Erez, Mattan; Dally, William J.
Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.
Bulea, Thomas C.; Kim, Jonghyun; Damiano, Diane L.; Stanley, Christopher J.; Park, Hyung-Soon
2015-01-01
Accumulating evidence suggests cortical circuits may contribute to control of human locomotion. Here, noninvasive electroencephalography (EEG) recorded from able-bodied volunteers during a novel treadmill walking paradigm was used to assess neural correlates of walking. A systematic processing method, including a recently developed subspace reconstruction algorithm, reduced movement-related EEG artifact prior to independent component analysis and dipole source localization. We quantified cortical activity while participants tracked slow and fast target speeds across two treadmill conditions: an active mode that adjusted belt speed based on user movements and a passive mode reflecting a typical treadmill. Our results reveal frequency specific, multi-focal task related changes in cortical oscillations elicited by active walking. Low γ band power, localized to the prefrontal and posterior parietal cortices, was significantly increased during double support and early swing phases, critical points in the gait cycle since the active controller adjusted speed based on pelvis position and swing foot velocity. These phasic γ band synchronizations provide evidence that prefrontal and posterior parietal networks, previously implicated in visuo-spatial and somotosensory integration, are engaged to enhance lower limb control during gait. Sustained μ and β band desynchronization within sensorimotor cortex, a neural correlate for movement, was observed during walking thereby validating our methods for isolating cortical activity. Our results also demonstrate the utility of EEG recorded during locomotion for probing the multi-regional cortical networks which underpin its execution. For example, the cortical network engagement elicited by the active treadmill suggests that it may enhance neuroplasticity for more effective motor training. PMID:26029077
USDA-ARS?s Scientific Manuscript database
High volume instrumentation (HVITM) and advanced fiber information system (AFIS) measurements are increasingly being utilized as primary and routine means of acquiring fiber quality data by cotton breeders and fiber processors. There is amount of information regarding fiber and yarn qualities, but l...
Hierarchial parallel computer architecture defined by computational multidisciplinary mechanics
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug; Johnson, Keith
1989-01-01
The goal is to develop an architecture for parallel processors enabling optimal handling of multi-disciplinary computation of fluid-solid simulations employing finite element and difference schemes. The goals, philosphical and modeling directions, static and dynamic poly trees, example problems, interpolative reduction, the impact on solvers are shown in viewgraph form.
Multi-Rate Secure Processor Terminal Architecture Study. Volume 1. Terminal Architecture.
1981-06-01
together because of the intimate relationship that must be established between the KG devices and the control of those devices to satisy security...9.6 kilobit for ti’.:., pass filter funtion because it’s time span is larger. The resultdot loading is estimated at 260 microseconds out of 833
NASA Technical Reports Server (NTRS)
Johnson, Dennis A. (Inventor)
1996-01-01
A laser doppler velocimeter uses frequency shifting of a laser beam to provide signal information for each velocity component. A composite electrical signal generated by a light detector is digitized and a processor produces a discrete Fourier transform based on the digitized electrical signal. The transform includes two peak frequencies corresponding to the two velocity components.
Investigation of Large Scale Cortical Models on Clustered Multi-Core Processors
2013-02-01
with the bias node ( gray ) denoted as ww and the weights associated with the remaining first layer nodes (black) denoted as W. In forming the overall...Implementation of RBF network on GPU Platform 3.5.1 The Cholesky decomposition algorithm We need to invert the matrix multiplication GTG to
A study of the parallel algorithm for large-scale DC simulation of nonlinear systems
NASA Astrophysics Data System (ADS)
Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel
Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.
2nd Generation QUATARA Flight Computer Project
NASA Technical Reports Server (NTRS)
Falker, Jay; Keys, Andrew; Fraticelli, Jose Molina; Capo-Iugo, Pedro; Peeples, Steven
2015-01-01
Single core flight computer boards have been designed, developed, and tested (DD&T) to be flown in small satellites for the last few years. In this project, a prototype flight computer will be designed as a distributed multi-core system containing four microprocessors running code in parallel. This flight computer will be capable of performing multiple computationally intensive tasks such as processing digital and/or analog data, controlling actuator systems, managing cameras, operating robotic manipulators and transmitting/receiving from/to a ground station. In addition, this flight computer will be designed to be fault tolerant by creating both a robust physical hardware connection and by using a software voting scheme to determine the processor's performance. This voting scheme will leverage on the work done for the Space Launch System (SLS) flight software. The prototype flight computer will be constructed with Commercial Off-The-Shelf (COTS) components which are estimated to survive for two years in a low-Earth orbit.
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations
NASA Technical Reports Server (NTRS)
Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos
2009-01-01
A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
NASA Astrophysics Data System (ADS)
Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying
2017-05-01
In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
Image processing using Gallium Arsenide (GaAs) technology
NASA Technical Reports Server (NTRS)
Miller, Warner H.
1989-01-01
The need to increase the information return from space-borne imaging systems has increased in the past decade. The use of multi-spectral data has resulted in the need for finer spatial resolution and greater spectral coverage. Onboard signal processing will be necessary in order to utilize the available Tracking and Data Relay Satellite System (TDRSS) communication channel at high efficiency. A generally recognized approach to the increased efficiency of channel usage is through data compression techniques. The compression technique implemented is a differential pulse code modulation (DPCM) scheme with a non-uniform quantizer. The need to advance the state-of-the-art of onboard processing was recognized and a GaAs integrated circuit technology was chosen. An Adaptive Programmable Processor (APP) chip set was developed which is based on an 8-bit slice general processor. The reason for choosing the compression technique for the Multi-spectral Linear Array (MLA) instrument is described. Also a description is given of the GaAs integrated circuit chip set which will demonstrate that data compression can be performed onboard in real time at data rate in the order of 500 Mb/s.
A dual-processor multi-frequency implementation of the FINDS algorithm
NASA Technical Reports Server (NTRS)
Godiwala, Pankaj M.; Caglayan, Alper K.
1987-01-01
This report presents a parallel processing implementation of the FINDS (Fault Inferring Nonlinear Detection System) algorithm on a dual processor configured target flight computer. First, a filter initialization scheme is presented which allows the no-fail filter (NFF) states to be initialized using the first iteration of the flight data. A modified failure isolation strategy, compatible with the new failure detection strategy reported earlier, is discussed and the performance of the new FDI algorithm is analyzed using flight recorded data from the NASA ATOPS B-737 aircraft in a Microwave Landing System (MLS) environment. The results show that low level MLS, IMU, and IAS sensor failures are detected and isolated instantaneously, while accelerometer and rate gyro failures continue to take comparatively longer to detect and isolate. The parallel implementation is accomplished by partitioning the FINDS algorithm into two parts: one based on the translational dynamics and the other based on the rotational kinematics. Finally, a multi-rate implementation of the algorithm is presented yielding significantly low execution times with acceptable estimation and FDI performance.
Telerobot local-remote control architecture for space flight program applications
NASA Technical Reports Server (NTRS)
Zimmerman, Wayne; Backes, Paul; Steele, Robert; Long, Mark; Bon, Bruce; Beahan, John
1993-01-01
The JPL Supervisory Telerobotics (STELER) Laboratory has developed and demonstrated a unique local-remote robot control architecture which enables management of intermittent communication bus latencies and delays such as those expected for ground-remote operation of Space Station robotic systems via the Tracking and Data Relay Satellite System (TDRSS) communication platform. The current work at JPL in this area has focused on enhancing the technologies and transferring the control architecture to hardware and software environments which are more compatible with projected ground and space operational environments. At the local site, the operator updates the remote worksite model using stereo video and a model overlay/fitting algorithm which outputs the location and orientation of the object in free space. That information is relayed to the robot User Macro Interface (UMI) to enable programming of the robot control macros. This capability runs on a single Silicon Graphics Inc. machine. The operator can employ either manual teleoperation, shared control, or supervised autonomous control to manipulate the intended object. The remote site controller, called the Modular Telerobot Task Execution System (MOTES), runs in a multi-processor VME environment and performs the task sequencing, task execution, trajectory generation, closed loop force/torque control, task parameter monitoring, and reflex action. This paper describes the new STELER architecture implementation, and also documents the results of the recent autonomous docking task execution using the local site and MOTES.
Comprehensive Oculomotor Behavioral Response Assessment (COBRA)
NASA Technical Reports Server (NTRS)
Stone, Leland S. (Inventor); Liston, Dorion B. (Inventor)
2017-01-01
An eye movement-based methodology and assessment tool may be used to quantify many aspects of human dynamic visual processing using a relatively simple and short oculomotor task, noninvasive video-based eye tracking, and validated oculometric analysis techniques. By examining the eye movement responses to a task including a radially-organized appropriately randomized sequence of Rashbass-like step-ramp pursuit-tracking trials, distinct performance measurements may be generated that may be associated with, for example, pursuit initiation (e.g., latency and open-loop pursuit acceleration), steady-state tracking (e.g., gain, catch-up saccade amplitude, and the proportion of the steady-state response consisting of smooth movement), direction tuning (e.g., oblique effect amplitude, horizontal-vertical asymmetry, and direction noise), and speed tuning (e.g., speed responsiveness and noise). This quantitative approach may provide fast and results (e.g., a multi-dimensional set of oculometrics and a single scalar impairment index) that can be interpreted by one without a high degree of scientific sophistication or extensive training.
GPU Multi-Scale Particle Tracking and Multi-Fluid Simulations of the Radiation Belts
NASA Astrophysics Data System (ADS)
Ziemba, T.; Carscadden, J.; O'Donnell, D.; Winglee, R.; Harnett, E.; Cash, M.
2007-12-01
The properties of the radiation belts can vary dramatically under the influence of magnetic storms and storm-time substorms. The task of understanding and predicting radiation belt properties is made difficult because their properties determined by global processes as well as small-scale wave-particle interactions. A full solution to the problem will require major innovations in technique and computer hardware. The proposed work will demonstrates liked particle tracking codes with new multi-scale/multi-fluid global simulations that provide the first means to include small-scale processes within the global magnetospheric context. A large hurdle to the problem is having sufficient computer hardware that is able to handle the dissipate temporal and spatial scale sizes. A major innovation of the work is that the codes are designed to run of graphics processing units (GPUs). GPUs are intrinsically highly parallelized systems that provide more than an order of magnitude computing speed over a CPU based systems, for little more cost than a high end-workstation. Recent advancements in GPU technologies allow for full IEEE float specifications with performance up to several hundred GFLOPs per GPU and new software architectures have recently become available to ease the transition from graphics based to scientific applications. This allows for a cheap alternative to standard supercomputing methods and should increase the time to discovery. A demonstration of the code pushing more than 500,000 particles faster than real time is presented, and used to provide new insight into radiation belt dynamics.
A parallel and modular deformable cell Car-Parrinello code
NASA Astrophysics Data System (ADS)
Cavazzoni, Carlo; Chiarotti, Guido L.
1999-12-01
We have developed a modular parallel code implementing the Car-Parrinello [Phys. Rev. Lett. 55 (1985) 2471] algorithm including the variable cell dynamics [Europhys. Lett. 36 (1994) 345; J. Phys. Chem. Solids 56 (1995) 510]. Our code is written in Fortran 90, and makes use of some new programming concepts like encapsulation, data abstraction and data hiding. The code has a multi-layer hierarchical structure with tree like dependences among modules. The modules include not only the variables but also the methods acting on them, in an object oriented fashion. The modular structure allows easier code maintenance, develop and debugging procedures, and is suitable for a developer team. The layer structure permits high portability. The code displays an almost linear speed-up in a wide range of number of processors independently of the architecture. Super-linear speed up is obtained with a "smart" Fast Fourier Transform (FFT) that uses the available memory on the single node (increasing for a fixed problem with the number of processing elements) as temporary buffer to store wave function transforms. This code has been used to simulate water and ammonia at giant planet conditions for systems as large as 64 molecules for ˜50 ps.
Study of a hybrid multispectral processor
NASA Technical Reports Server (NTRS)
Marshall, R. E.; Kriegler, F. J.
1973-01-01
A hybrid processor is described offering enough handling capacity and speed to process efficiently the large quantities of multispectral data that can be gathered by scanner systems such as MSDS, SKYLAB, ERTS, and ERIM M-7. Combinations of general-purpose and special-purpose hybrid computers were examined to include both analog and digital types as well as all-digital configurations. The current trend toward lower costs for medium-scale digital circuitry suggests that the all-digital approach may offer the better solution within the time frame of the next few years. The study recommends and defines such a hybrid digital computing system in which both special-purpose and general-purpose digital computers would be employed. The tasks of recognizing surface objects would be performed in a parallel, pipeline digital system while the tasks of control and monitoring would be handled by a medium-scale minicomputer system. A program to design and construct a small, prototype, all-digital system has been started.
Design of Smart Multi-Functional Integrated Aviation Photoelectric Payload
NASA Astrophysics Data System (ADS)
Zhang, X.
2018-04-01
To coordinate with the small UAV at reconnaissance mission, we've developed a smart multi-functional integrated aviation photoelectric payload. The payload weighs only 1kg, and has a two-axis stabilized platform with visible task payload, infrared task payload, laser pointers and video tracker. The photoelectric payload could complete the reconnaissance tasks above the target area (including visible and infrared). Because of its light weight, small size, full-featured, high integrated, the constraints of the UAV platform carrying the payload will be reduced a lot, which helps the payload suit for more extensive using occasions. So all users of this type of smart multi-functional integrated aviation photoelectric payload will do better works on completion of the ground to better pinpoint targets, artillery calibration, assessment of observe strike damage, customs officials and other tasks.
Implementing An Image Understanding System Architecture Using Pipe
NASA Astrophysics Data System (ADS)
Luck, Randall L.
1988-03-01
This paper will describe PIPE and how it can be used to implement an image understanding system. Image understanding is the process of developing a description of an image in order to make decisions about its contents. The tasks of image understanding are generally split into low level vision and high level vision. Low level vision is performed by PIPE -a high performance parallel processor with an architecture specifically designed for processing video images at up to 60 fields per second. High level vision is performed by one of several types of serial or parallel computers - depending on the application. An additional processor called ISMAP performs the conversion from iconic image space to symbolic feature space. ISMAP plugs into one of PIPE's slots and is memory mapped into the high level processor. Thus it forms the high speed link between the low and high level vision processors. The mechanisms for bottom-up, data driven processing and top-down, model driven processing are discussed.
NASA Technical Reports Server (NTRS)
Tan, Choon-Sooi; Suder, Kenneth (Technical Monitor)
2003-01-01
A framework for an effective computational methodology for characterizing the stability and the impact of distortion in high-speed multi-stage compressor is being developed. The methodology consists of using a few isolated-blade row Navier-Stokes solutions for each blade row to construct a body force database. The purpose of the body force database is to replace each blade row in a multi-stage compressor by a body force distribution to produce same pressure rise and flow turning. To do this, each body force database is generated in such a way that it can respond to the changes in local flow conditions. Once the database is generated, no hrther Navier-Stokes computations are necessary. The process is repeated for every blade row in the multi-stage compressor. The body forces are then embedded as source terms in an Euler solver. The method is developed to have the capability to compute the performance in a flow that has radial as well as circumferential non-uniformity with a length scale larger than a blade pitch; thus it can potentially be used to characterize the stability of a compressor under design. It is these two latter features as well as the accompanying procedure to obtain the body force representation that distinguish the present methodology from the streamline curvature method. The overall computational procedures have been developed. A dimensional analysis was carried out to determine the local flow conditions for parameterizing the magnitudes of the local body force representation of blade rows. An Euler solver was modified to embed the body forces as source terms. The results from the dimensional analysis show that the body forces can be parameterized in terms of the two relative flow angles, the relative Mach number, and the Reynolds number. For flow in a high-speed transonic blade row, they can be parameterized in terms of the local relative Mach number alone.
Distributed digital signal processors for multi-body structures
NASA Technical Reports Server (NTRS)
Lee, Gordon K.
1990-01-01
Several digital filter designs were investigated which may be used to process sensor data from large space structures and to design digital hardware to implement the distributed signal processing architecture. Several experimental tests articles are available at NASA Langley Research Center to evaluate these designs. A summary of some of the digital filter designs is presented, an evaluation of their characteristics relative to control design is discussed, and candidate hardware microcontroller/microcomputer components are given. Future activities include software evaluation of the digital filter designs and actual hardware inplementation of some of the signal processor algorithms on an experimental testbed at NASA Langley.
A miniature on-chip multi-functional ECG signal processor with 30 µW ultra-low power consumption.
Liu, Xin; Zheng, Yuan Jin; Phyu, Myint Wai; Zhao, Bin; Je, Minkyu; Yuan, Xiao Jun
2010-01-01
In this paper, a miniature low-power Electrocardiogram (ECG) signal processing application specific integrated circuit (ASIC) chip is proposed. This chip provides multiple critical functions for ECG analysis using a systematic wavelet transform algorithm and a novel SRAM-based ASIC architecture, while achieves low cost and high performance. Using 0.18 µm CMOS technology and 1 V power supply, this ASIC chip consumes only 29 µW and occupies an area of 3 mm(2). This on-chip ECG processor is highly suitable for reliable real-time cardiac status monitoring applications.
Tay, Laura; Lim, Wee Shiong; Chan, Mark; Ali, Noorhazlina; Chong, Mei Sian
2016-01-01
Gait disorders are common in early dementia, with particularly pronounced dual-task deficits, contributing to the increased fall risk and mobility decline associated with cognitive impairment. This study examines the effects of a combined cognitive stimulation and physical exercise programme (MINDVital) on gait performance under single- and dual-task conditions in older adults with mild dementia. Thirty-nine patients with early dementia participated in a multi-disciplinary rehabilitation programme comprising both physical exercise and cognitive stimulation. The programme was conducted in 8-week cycles with participants attending once weekly, and all participants completed 2 successive cycles. Cognitive, functional performance and behavioural symptoms were assessed at baseline and at the end of each 8-week cycle. Gait speed was examined under both single- (Timed Up and Go and 6-metre walk tests) and dual-task (animal category and serial counting) conditions. A random effects model was performed for the independent effect of MINDVital on the primary outcome variable of gait speed under dual-task conditions. The mean age of patients enroled in the rehabilitation programme was 79 ± 6.2 years; 25 (64.1%) had a diagnosis of Alzheimer's dementia, and 26 (66.7%) were receiving a cognitive enhancer therapy. There was a significant improvement in cognitive performance [random effects coefficient (standard error) = 0.90 (0.31), p = 0.003] and gait speed under both dual-task situations [animal category: random effects coefficient = 0.04 (0.02), p = 0.039; serial counting: random effects coefficient = 0.05 (0.02), p = 0.013], with reduced dual-task cost for gait speed [serial counting: random effects coefficient = -4.05 (2.35), p = 0.086] following successive MINDVital cycles. No significant improvement in single-task gait speed was observed. Improved cognitive performance over time was a significant determinant of changes in dual-task gait speed [random effects coefficients = 0.01 (0.005), p = 0.048, and 0.02 (0.005), p = 0.003 for category fluency and counting backwards, respectively]. A combined physical and cognitive rehabilitation programme leads to significant improvements in dual-task walking in early dementia, which may be contributed by improvement in cognitive performance, as single-task gait performance remained stable. © 2016 S. Karger AG, Basel.
USDA-ARS?s Scientific Manuscript database
The term bovine viral diarrhea (BVD) has come to refer to a diverse collection of clinical presentations that include respiratory, enteric and reproductive symptoms accompanied by immunosuppression. While the majority of cases are subclinical in nature two forms exist, mucosal disease and hemorrhag...
Collective Machine Learning: Team Learning and Classification in Multi-Agent Systems
ERIC Educational Resources Information Center
Gifford, Christopher M.
2009-01-01
This dissertation focuses on the collaboration of multiple heterogeneous, intelligent agents (hardware or software) which collaborate to learn a task and are capable of sharing knowledge. The concept of collaborative learning in multi-agent and multi-robot systems is largely under studied, and represents an area where further research is needed to…
Developing a Multi-Dimensional Evaluation Framework for Faculty Teaching and Service Performance
ERIC Educational Resources Information Center
Baker, Diane F.; Neely, Walter P.; Prenshaw, Penelope J.; Taylor, Patrick A.
2015-01-01
A task force was created in a small, AACSB-accredited business school to develop a more comprehensive set of standards for faculty performance. The task force relied heavily on faculty input to identify and describe key dimensions that capture effective teaching and service performance. The result is a multi-dimensional framework that will be used…
ERIC Educational Resources Information Center
Iwanami, Akira; Okajima, Yuka; Ota, Haruhisa; Tani, Masayuki; Yamada, Takashi; Hashimoro, Ryuichiro; Kanai, Chieko; Watanabe, Hiromi; Yamasue, Hidenori; Kawakubo, Yuki; Kato, Nobumasa
2011-01-01
Dysfunction of the prefrontal cortex has been previously reported in individuals with Asperger's disorder. In the present study, we used multi-channel near-infrared spectroscopy (NIRS) to detect changes in the oxygenated hemoglobin concentration ([oxy-Hb]) during two verbal fluency tasks. The subjects were 20 individuals with Asperger's disorder…
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nagasaka, Y; Matsuoka, S; Azad, A
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
Improved Sensitivity Spontaneous Raman Scattering Multi-Gas Sensor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buric, Michael P.; Chen, Kevin P.; Falk, Joel
2009-01-01
We report a backward-wave spontaneous-Raman multi-gas sensor employing a hollow-core photonic-bandgap-fiber to contain gasses and increase interaction length. Silica Raman noise and detection speed are reduced using a digital spatial filter and a cladding seal.
Adjustment of driver behavior to an urban multi-lane roundabout.
DOT National Transportation Integrated Search
2007-05-01
In the summer of 2006, the city of Springfield, Oregon installed the first urban multi-lane roundabout in the state. It was hypothesized that after installation, speed variability on approaches to the intersection would decrease from the values with ...
An SVM-Based Solution for Fault Detection in Wind Turbines
Santos, Pedro; Villa, Luisa F.; Reñones, Aníbal; Bustillo, Andres; Maudes, Jesús
2015-01-01
Research into fault diagnosis in machines with a wide range of variable loads and speeds, such as wind turbines, is of great industrial interest. Analysis of the power signals emitted by wind turbines for the diagnosis of mechanical faults in their mechanical transmission chain is insufficient. A successful diagnosis requires the inclusion of accelerometers to evaluate vibrations. This work presents a multi-sensory system for fault diagnosis in wind turbines, combined with a data-mining solution for the classification of the operational state of the turbine. The selected sensors are accelerometers, in which vibration signals are processed using angular resampling techniques and electrical, torque and speed measurements. Support vector machines (SVMs) are selected for the classification task, including two traditional and two promising new kernels. This multi-sensory system has been validated on a test-bed that simulates the real conditions of wind turbines with two fault typologies: misalignment and imbalance. Comparison of SVM performance with the results of artificial neural networks (ANNs) shows that linear kernel SVM outperforms other kernels and ANNs in terms of accuracy, training and tuning times. The suitability and superior performance of linear SVM is also experimentally analyzed, to conclude that this data acquisition technique generates linearly separable datasets. PMID:25760051
NASA Technical Reports Server (NTRS)
Bloomberg, J. J.; Peters, B. T.; Mulavara, A. P.; Brady, R. A.; Batson, C. D.; Miller, C. A.; Ploutz-Snyder, R. J.; Guined, J. R.; Buxton, R. E.; Cohen, H. S.
2011-01-01
During exploration-class missions, sensorimotor disturbances may lead to disruption in the ability to ambulate and perform functional tasks during the initial introduction to a novel gravitational environment following a landing on a planetary surface. The overall goal of our current project is to develop a sensorimotor adaptability training program to facilitate rapid adaptation to these environments. We have developed a unique training system comprised of a treadmill placed on a motion-base facing a virtual visual scene. It provides an unstable walking surface combined with incongruent visual flow designed to enhance sensorimotor adaptability. Greater metabolic cost incurred during balance instability means more physical work is required during adaptation to new environments possibly affecting crewmembers? ability to perform mission critical tasks during early surface operations on planetary expeditions. The goal of this study was to characterize adaptation to a discordant sensory challenge across a number of performance modalities including locomotor stability, multi-tasking ability and metabolic cost. METHODS: Subjects (n=15) walked (4.0 km/h) on a treadmill for an 8 -minute baseline walking period followed by 20-minutes of walking (4.0 km/h) with support surface motion (0.3 Hz, sinusoidal lateral motion, peak amplitude 25.4 cm) provided by the treadmill/motion-base system. Stride frequency and auditory reaction time were collected as measures of locomotor stability and multi-tasking ability, respectively. Metabolic data (VO2) were collected via a portable metabolic gas analysis system. RESULTS: At the onset of lateral support surface motion, subj ects walking on our treadmill showed an increase in stride frequency and auditory reaction time indicating initial balance and multi-tasking disturbances. During the 20-minute adaptation period, balance control and multi-tasking performance improved. Similarly, throughout the 20-minute adaptation period, VO2 gradually decreased following an initial increase after the onset of support surface motion. DISCUSSION: Resu lts confirmed that walking in discordant conditions not only compromises locomotor stability and the ability to multi-task, but comes at a quantifiable metabolic cost. Importantly, like locomotor stability and multi-tasking ability, metabolic expenditure while walking in discordant sensory conditions improved during adaptation. This confirms that sensorimotor adaptability training can benefit multiple performance parameters central to the successful completion of critical mission tasks.
Ultrasonic tomography for in-process measurements of temperature in a multi-phase medium
Beller, Laurence S.
1993-01-01
A method and apparatus for the in-process measurement of internal particulate temperature utilizing ultrasonic tomography techniques to determine the speed of sound through a specimen material. Ultrasonic pulses are transmitted through a material, which can be a multi-phase material, over known flight paths and the ultrasonic pulse transit times through all sectors of the specimen are measured to determine the speed of sound. The speed of sound being a function of temperature, it is possible to establish the correlation between speed of sound and temperature, throughout a cross-section of the material, which correlation is programmed into a computer to provide for a continuous in-process measurement of temperature throughout the specimen.
Shi, Qi; Abdel-Aty, Mohamed; Yu, Rongjie
2016-03-01
In traffic safety studies, crash frequency modeling of total crashes is the cornerstone before proceeding to more detailed safety evaluation. The relationship between crash occurrence and factors such as traffic flow and roadway geometric characteristics has been extensively explored for a better understanding of crash mechanisms. In this study, a multi-level Bayesian framework has been developed in an effort to identify the crash contributing factors on an urban expressway in the Central Florida area. Two types of traffic data from the Automatic Vehicle Identification system, which are the processed data capped at speed limit and the unprocessed data retaining the original speed were incorporated in the analysis along with road geometric information. The model framework was proposed to account for the hierarchical data structure and the heterogeneity among the traffic and roadway geometric data. Multi-level and random parameters models were constructed and compared with the Negative Binomial model under the Bayesian inference framework. Results showed that the unprocessed traffic data was superior. Both multi-level models and random parameters models outperformed the Negative Binomial model and the models with random parameters achieved the best model fitting. The contributing factors identified imply that on the urban expressway lower speed and higher speed variation could significantly increase the crash likelihood. Other geometric factors were significant including auxiliary lanes and horizontal curvature. Copyright © 2015 Elsevier Ltd. All rights reserved.
MHD code using multi graphical processing units: SMAUG+
NASA Astrophysics Data System (ADS)
Gyenge, N.; Griffiths, M. K.; Erdélyi, R.
2018-01-01
This paper introduces the Sheffield Magnetohydrodynamics Algorithm Using GPUs (SMAUG+), an advanced numerical code for solving magnetohydrodynamic (MHD) problems, using multi-GPU systems. Multi-GPU systems facilitate the development of accelerated codes and enable us to investigate larger model sizes and/or more detailed computational domain resolutions. This is a significant advancement over the parent single-GPU MHD code, SMAUG (Griffiths et al., 2015). Here, we demonstrate the validity of the SMAUG + code, describe the parallelisation techniques and investigate performance benchmarks. The initial configuration of the Orszag-Tang vortex simulations are distributed among 4, 16, 64 and 100 GPUs. Furthermore, different simulation box resolutions are applied: 1000 × 1000, 2044 × 2044, 4000 × 4000 and 8000 × 8000 . We also tested the code with the Brio-Wu shock tube simulations with model size of 800 employing up to 10 GPUs. Based on the test results, we observed speed ups and slow downs, depending on the granularity and the communication overhead of certain parallel tasks. The main aim of the code development is to provide massively parallel code without the memory limitation of a single GPU. By using our code, the applied model size could be significantly increased. We demonstrate that we are able to successfully compute numerically valid and large 2D MHD problems.
What’s the Story? The Tale of Reading Fluency Told at Speed
Benjamin, Christopher F. A.; Gaab, Nadine
2012-01-01
Fluent readers process written text rapidly and accurately, and comprehend what they read. Historically, reading fluency has been modeled as the product of discrete skills such as single word decoding. More recent conceptualizations emphasize that fluent reading is the product of competency in, and the coordination of, multiple cognitive sub-skills (a multi-componential view). In this study, we examined how the pattern of activation in core reading regions changes as the ability to read fluently is manipulated through reading speed. We evaluated 13 right-handed adults with a novel fMRI task assessing fluent sentence reading and lower-order letter reading at each participant’s normal fluent reading speed, as well as constrained (slowed) and accelerated reading speeds. Comparing fluent reading conditions with rest revealed regions including bilateral occipito-fusiform, left middle temporal, and inferior frontal gyral clusters across reading speeds. The selectivity of these regions’ responses to fluent sentence reading was shown by comparison with the letter reading task. Region of interest analyses showed that at constrained and accelerated speeds these regions responded significantly more to fluent sentence reading. Critically, as reading speed increased, activation increased in a single reading-related region: occipital/fusiform cortex (left > right). These results demonstrate that while brain regions engaged in reading respond selectively during fluent reading, these regions respond differently as the ability to read fluently is manipulated. Implications for our understanding of reading fluency, reading development, and reading disorders are discussed. PMID:21954000
Attention in a multi-task environment
NASA Technical Reports Server (NTRS)
Andre, Anthony D.; Heers, Susan T.
1993-01-01
Two experiments used a low fidelity multi-task simulation to investigate the effects of cue specificity on task preparation and performance. Subjects performed a continuous compensatory tracking task and were periodically prompted to perform one of several concurrent secondary tasks. The results provide strong evidence that subjects enacted a strategy to actively divert resources towards secondary task preparation only when they had specific information about an upcoming task to be performed. However, this strategy was not as much affected by the type of task cued (Experiment 1) or its difficulty level (Experiment 2). Overall, subjects seemed aware of both the costs (degraded primary task tracking) and benefits (improved secondary task performance) of cue information. Implications of the present results for computational human performance/workload models are discussed.
Exploring microwave resonant multi-point ignition using high-speed schlieren imaging
NASA Astrophysics Data System (ADS)
Liu, Cheng; Zhang, Guixin; Xie, Hong; Deng, Lei; Wang, Zhi
2018-03-01
Microwave plasma offers a potential method to achieve rapid combustion in a high-speed combustor. In this paper, microwave resonant multi-point ignition and its control method have been studied via high-speed schlieren imaging. The experiment was conducted with the microwave resonant ignition system and the schlieren optical system. The microwave pulse in 2.45 GHz with 2 ms width and 3 kW peak power was employed as an ignition energy source to produce initial flame kernels in the combustion chamber. A reflective schlieren method was designed to illustrate the flame development process with a high-speed camera. The bottom of the combustion chamber was made of a quartz glass coated with indium tin oxide, which ensures sufficient microwave reflection and light penetration. Ignition experiments were conducted at 2 bars of stoichiometric methane-air mixtures. Schlieren images show that flame kernels were generated at more than one location simultaneously and flame propagated with different speeds in different flame kernels. Ignition kernels were discussed in three types according to their appearances. Pressure curves and combustion duration also show that multi-point ignition plays a significant role in accelerating combustion.
Exploring microwave resonant multi-point ignition using high-speed schlieren imaging.
Liu, Cheng; Zhang, Guixin; Xie, Hong; Deng, Lei; Wang, Zhi
2018-03-01
Microwave plasma offers a potential method to achieve rapid combustion in a high-speed combustor. In this paper, microwave resonant multi-point ignition and its control method have been studied via high-speed schlieren imaging. The experiment was conducted with the microwave resonant ignition system and the schlieren optical system. The microwave pulse in 2.45 GHz with 2 ms width and 3 kW peak power was employed as an ignition energy source to produce initial flame kernels in the combustion chamber. A reflective schlieren method was designed to illustrate the flame development process with a high-speed camera. The bottom of the combustion chamber was made of a quartz glass coated with indium tin oxide, which ensures sufficient microwave reflection and light penetration. Ignition experiments were conducted at 2 bars of stoichiometric methane-air mixtures. Schlieren images show that flame kernels were generated at more than one location simultaneously and flame propagated with different speeds in different flame kernels. Ignition kernels were discussed in three types according to their appearances. Pressure curves and combustion duration also show that multi-point ignition plays a significant role in accelerating combustion.
Design and realization of the real-time spectrograph controller for LAMOST based on FPGA
NASA Astrophysics Data System (ADS)
Wang, Jianing; Wu, Liyan; Zeng, Yizhong; Dai, Songxin; Hu, Zhongwen; Zhu, Yongtian; Wang, Lei; Wu, Zhen; Chen, Yi
2008-08-01
A large Schmitt reflector telescope, Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST), is being built in China, which has effective aperture of 4 meters and can observe the spectra of as many as 4000 objects simultaneously. To fit such a large amount of observational objects, the dispersion part is composed of a set of 16 multipurpose fiber-fed double-beam Schmidt spectrographs, of which each has about ten of moveable components realtimely accommodated and manipulated by a controller. An industrial Ethernet network connects those 16 spectrograph controllers. The light from stars is fed to the entrance slits of the spectrographs with optical fibers. In this paper, we mainly introduce the design and realization of our real-time controller for the spectrograph, our design using the technique of System On Programmable Chip (SOPC) based on Field Programmable Gate Array (FPGA) and then realizing the control of the spectrographs through NIOSII Soft Core Embedded Processor. We seal the stepper motor controller as intellectual property (IP) cores and reuse it, greatly simplifying the design process and then shortening the development time. Under the embedded operating system μC/OS-II, a multi-tasks control program has been well written to realize the real-time control of the moveable parts of the spectrographs. At present, a number of such controllers have been applied in the spectrograph of LAMOST.
NASA Astrophysics Data System (ADS)
Mamonov, V. N.; Nazarov, A. D.; Serov, A. F.; Terekhov, V. I.
2016-01-01
The effect of parameters of the multi-ring Couette system with counter rotating coaxial cylinders on the process of thermal energy release in a viscous liquid filling this system is considered with regard to the problem of determining the possibility of creating the high-performance wind heat generator. The multi-cylinder rotor design allows directly conversion of the mechanical power of a device consisting of two "rotor" wind turbines with a common axis normal to the air flow into the thermal energy in a wide range of rotational speed of the cylinders. Experimental results on the measurement of thermal power released in the pilot heat generator at different relative angular speeds of cylinder rotation are presented.
NASA Astrophysics Data System (ADS)
Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.
2015-07-01
Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
NASA Astrophysics Data System (ADS)
Gong, Hao; Yu, Lifeng; Leng, Shuai; Dilger, Samantha; Zhou, Wei; Ren, Liqiang; McCollough, Cynthia H.
2018-03-01
Channelized Hotelling observer (CHO) has demonstrated strong correlation with human observer (HO) in both single-slice viewing mode and multi-slice viewing mode in low-contrast detection tasks with uniform background. However, it remains unknown if the simplest single-slice CHO in uniform background can be used to predict human observer performance in more realistic tasks that involve patient anatomical background and multi-slice viewing mode. In this study, we aim to investigate the correlation between CHO in a uniform water background and human observer performance at a multi-slice viewing mode on patient liver background for a low-contrast lesion detection task. The human observer study was performed on CT images from 7 abdominal CT exams. A noise insertion tool was employed to synthesize CT scans at two additional dose levels. A validated lesion insertion tool was used to numerically insert metastatic liver lesions of various sizes and contrasts into both phantom and patient images. We selected 12 conditions out of 72 possible experimental conditions to evaluate the correlation at various radiation doses, lesion sizes, lesion contrasts and reconstruction algorithms. CHO with both single and multi-slice viewing modes were strongly correlated with HO. The corresponding Pearson's correlation coefficient was 0.982 (with 95% confidence interval (CI) [0.936, 0.995]) and 0.989 (with 95% CI of [0.960, 0.997]) in multi-slice and single-slice viewing modes, respectively. Therefore, this study demonstrated the potential to use the simplest single-slice CHO to assess image quality for more realistic clinically relevant CT detection tasks.
Efficient Parallel Engineering Computing on Linux Workstations
NASA Technical Reports Server (NTRS)
Lou, John Z.
2010-01-01
A C software module has been developed that creates lightweight processes (LWPs) dynamically to achieve parallel computing performance in a variety of engineering simulation and analysis applications to support NASA and DoD project tasks. The required interface between the module and the application it supports is simple, minimal and almost completely transparent to the user applications, and it can achieve nearly ideal computing speed-up on multi-CPU engineering workstations of all operating system platforms. The module can be integrated into an existing application (C, C++, Fortran and others) either as part of a compiled module or as a dynamically linked library (DLL).
Application of Multi-task Lasso Regression in the Parametrization of Stellar Spectra
NASA Astrophysics Data System (ADS)
Chang, Li-Na; Zhang, Pei-Ai
2015-07-01
The multi-task learning approaches have attracted the increasing attention in the fields of machine learning, computer vision, and artificial intelligence. By utilizing the correlations in tasks, learning multiple related tasks simultaneously is better than learning each task independently. An efficient multi-task Lasso (Least Absolute Shrinkage Selection and Operator) regression algorithm is proposed in this paper to estimate the physical parameters of stellar spectra. It not only can obtain the information about the common features of the different physical parameters, but also can preserve effectively their own peculiar features. Experiments were done based on the ELODIE synthetic spectral data simulated with the stellar atmospheric model, and on the SDSS data released by the American large-scale survey Sloan. The estimation precision of our model is better than those of the methods in the related literature, especially for the estimates of the gravitational acceleration (lg g) and the chemical abundance ([Fe/H]). In the experiments we changed the spectral resolution, and applied the noises with different signal-to-noise ratios (SNRs) to the spectral data, so as to illustrate the stability of the model. The results show that the model is influenced by both the resolution and the noise. But the influence of the noise is larger than that of the resolution. In general, the multi-task Lasso regression algorithm is easy to operate, it has a strong stability, and can also improve the overall prediction accuracy of the model.
ERIC Educational Resources Information Center
Babu, Rakesh; Singh, Rahul
2013-01-01
This paper presents a novel task-oriented, user-centered, multi-method evaluation (TUME) technique and shows how it is useful in providing a more complete, practical and solution-oriented assessment of the accessibility and usability of Learning Management Systems (LMS) for blind and visually impaired (BVI) students. Novel components of TUME…
Multi-Robot Coalitions Formation with Deadlines: Complexity Analysis and Solutions
2017-01-01
Multi-robot task allocation is one of the main problems to address in order to design a multi-robot system, very especially when robots form coalitions that must carry out tasks before a deadline. A lot of factors affect the performance of these systems and among them, this paper is focused on the physical interference effect, produced when two or more robots want to access the same point simultaneously. To our best knowledge, this paper presents the first formal description of multi-robot task allocation that includes a model of interference. Thanks to this description, the complexity of the allocation problem is analyzed. Moreover, the main contribution of this paper is to provide the conditions under which the optimal solution of the aforementioned allocation problem can be obtained solving an integer linear problem. The optimal results are compared to previous allocation algorithms already proposed by the first two authors of this paper and with a new method proposed in this paper. The results obtained show how the new task allocation algorithms reach up more than an 80% of the median of the optimal solution, outperforming previous auction algorithms with a huge reduction of the execution time. PMID:28118384
Multi-Robot Coalitions Formation with Deadlines: Complexity Analysis and Solutions.
Guerrero, Jose; Oliver, Gabriel; Valero, Oscar
2017-01-01
Multi-robot task allocation is one of the main problems to address in order to design a multi-robot system, very especially when robots form coalitions that must carry out tasks before a deadline. A lot of factors affect the performance of these systems and among them, this paper is focused on the physical interference effect, produced when two or more robots want to access the same point simultaneously. To our best knowledge, this paper presents the first formal description of multi-robot task allocation that includes a model of interference. Thanks to this description, the complexity of the allocation problem is analyzed. Moreover, the main contribution of this paper is to provide the conditions under which the optimal solution of the aforementioned allocation problem can be obtained solving an integer linear problem. The optimal results are compared to previous allocation algorithms already proposed by the first two authors of this paper and with a new method proposed in this paper. The results obtained show how the new task allocation algorithms reach up more than an 80% of the median of the optimal solution, outperforming previous auction algorithms with a huge reduction of the execution time.
Ritchie, David W; Kozakov, Dima; Vajda, Sandor
2008-09-01
Predicting how proteins interact at the molecular level is a computationally intensive task. Many protein docking algorithms begin by using fast Fourier transform (FFT) correlation techniques to find putative rigid body docking orientations. Most such approaches use 3D Cartesian grids and are therefore limited to computing three dimensional (3D) translational correlations. However, translational FFTs can speed up the calculation in only three of the six rigid body degrees of freedom, and they cannot easily incorporate prior knowledge about a complex to focus and hence further accelerate the calculation. Furthemore, several groups have developed multi-term interaction potentials and others use multi-copy approaches to simulate protein flexibility, which both add to the computational cost of FFT-based docking algorithms. Hence there is a need to develop more powerful and more versatile FFT docking techniques. This article presents a closed-form 6D spherical polar Fourier correlation expression from which arbitrary multi-dimensional multi-property multi-resolution FFT correlations may be generated. The approach is demonstrated by calculating 1D, 3D and 5D rotational correlations of 3D shape and electrostatic expansions up to polynomial order L=30 on a 2 GB personal computer. As expected, 3D correlations are found to be considerably faster than 1D correlations but, surprisingly, 5D correlations are often slower than 3D correlations. Nonetheless, we show that 5D correlations will be advantageous when calculating multi-term knowledge-based interaction potentials. When docking the 84 complexes of the Protein Docking Benchmark, blind 3D shape plus electrostatic correlations take around 30 minutes on a contemporary personal computer and find acceptable solutions within the top 20 in 16 cases. Applying a simple angular constraint to focus the calculation around the receptor binding site produces acceptable solutions within the top 20 in 28 cases. Further constraining the search to the ligand binding site gives up to 48 solutions within the top 20, with calculation times of just a few minutes per complex. Hence the approach described provides a practical and fast tool for rigid body protein-protein docking, especially when prior knowledge about one or both binding sites is available.
MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware.
Lommen, Arjen; Kools, Harrie J
2012-08-01
A new, multi-threaded version of the GC-MS and LC-MS data processing software, metAlign, has been developed which is able to utilize multiple cores on one PC. This new version was tested using three different multi-core PCs with different operating systems. The performance of noise reduction, baseline correction and peak-picking was 8-19 fold faster compared to the previous version on a single core machine from 2008. The alignment was 5-10 fold faster. Factors influencing the performance enhancement are discussed. Our observations show that performance scales with the increase in processor core numbers we currently see in consumer PC hardware development.
A fast ultrasonic simulation tool based on massively parallel implementations
NASA Astrophysics Data System (ADS)
Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain
2014-02-01
This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
From Wheatstone to Cameron and beyond: overview in 3-D and 4-D imaging technology
NASA Astrophysics Data System (ADS)
Gilbreath, G. Charmaine
2012-02-01
This paper reviews three-dimensional (3-D) and four-dimensional (4-D) imaging technology, from Wheatstone through today, with some prognostications for near future applications. This field is rich in variety, subject specialty, and applications. A major trend, multi-view stereoscopy, is moving the field forward to real-time wide-angle 3-D reconstruction as breakthroughs in parallel processing and multi-processor computers enable very fast processing. Real-time holography meets 4-D imaging reconstruction at the goal of achieving real-time, interactive, 3-D imaging. Applications to telesurgery and telemedicine as well as to the needs of the defense and intelligence communities are also discussed.
1987-12-01
Synchronization and Data Passing Mechanism ........ 50 4. System Shut Down .................................................................. 51 5...high performance, fault tolerance, and extensibility. These features are attained by synchronizing and coordinating the dis- tributed multicomputer... synchronizing all processors in the network. In a multitransputer network, processes that communicate with each other do so synchronously . This makes
2016-03-01
in-vitro decision to incubate a startup, Lexumo [7], which is developing a commercial Software as a Service ( SaaS ) vulnerability assessment...LTS Label Transition System MUSE Mining and Understanding Software Enclaves RTEMS Real-Time Executive for Multi-processor Systems SaaS Software ...as a Service SSA Static Single Assignment SWE Software Epistemology UD/DU Def-Use/Use-Def Chains (Dataflow Graph)
ERIC Educational Resources Information Center
Fazlioglu, Muge
2017-01-01
This dissertation examines the risk-based approach to privacy and data protection and the role of information sensitivity within risk management. Determining what information carries the greatest risk is a multi-layered challenge that involves balancing the rights and interests of multiple actors, including data controllers, data processors, and…
Zu, Chen; Jie, Biao; Liu, Mingxia; Chen, Songcan
2015-01-01
Multimodal classification methods using different modalities of imaging and non-imaging data have recently shown great advantages over traditional single-modality-based ones for diagnosis and prognosis of Alzheimer’s disease (AD), as well as its prodromal stage, i.e., mild cognitive impairment (MCI). However, to the best of our knowledge, most existing methods focus on mining the relationship across multiple modalities of the same subjects, while ignoring the potentially useful relationship across different subjects. Accordingly, in this paper, we propose a novel learning method for multimodal classification of AD/MCI, by fully exploring the relationships across both modalities and subjects. Specifically, our proposed method includes two subsequent components, i.e., label-aligned multi-task feature selection and multimodal classification. In the first step, the feature selection learning from multiple modalities are treated as different learning tasks and a group sparsity regularizer is imposed to jointly select a subset of relevant features. Furthermore, to utilize the discriminative information among labeled subjects, a new label-aligned regularization term is added into the objective function of standard multi-task feature selection, where label-alignment means that all multi-modality subjects with the same class labels should be closer in the new feature-reduced space. In the second step, a multi-kernel support vector machine (SVM) is adopted to fuse the selected features from multi-modality data for final classification. To validate our method, we perform experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using baseline MRI and FDG-PET imaging data. The experimental results demonstrate that our proposed method achieves better classification performance compared with several state-of-the-art methods for multimodal classification of AD/MCI. PMID:26572145
Han, Min Cheol; Yeom, Yeon Soo; Lee, Hyun Su; Shin, Bangho; Kim, Chan Hyeong; Furuta, Takuya
2018-05-04
In this study, the multi-threading performance of the Geant4, MCNP6, and PHITS codes was evaluated as a function of the number of threads (N) and the complexity of the tetrahedral-mesh phantom. For this, three tetrahedral-mesh phantoms of varying complexity (simple, moderately complex, and highly complex) were prepared and implemented in the three different Monte Carlo codes, in photon and neutron transport simulations. Subsequently, for each case, the initialization time, calculation time, and memory usage were measured as a function of the number of threads used in the simulation. It was found that for all codes, the initialization time significantly increased with the complexity of the phantom, but not with the number of threads. Geant4 exhibited much longer initialization time than the other codes, especially for the complex phantom (MRCP). The improvement of computation speed due to the use of a multi-threaded code was calculated as the speed-up factor, the ratio of the computation speed on a multi-threaded code to the computation speed on a single-threaded code. Geant4 showed the best multi-threading performance among the codes considered in this study, with the speed-up factor almost linearly increasing with the number of threads, reaching ~30 when N = 40. PHITS and MCNP6 showed a much smaller increase of the speed-up factor with the number of threads. For PHITS, the speed-up factors were low when N = 40. For MCNP6, the increase of the speed-up factors was better, but they were still less than ~10 when N = 40. As for memory usage, Geant4 was found to use more memory than the other codes. In addition, compared to that of the other codes, the memory usage of Geant4 more rapidly increased with the number of threads, reaching as high as ~74 GB when N = 40 for the complex phantom (MRCP). It is notable that compared to that of the other codes, the memory usage of PHITS was much lower, regardless of both the complexity of the phantom and the number of threads, hardly increasing with the number of threads for the MRCP.
Customized binary and multi-level HfO2-x-based memristors tuned by oxidation conditions.
He, Weifan; Sun, Huajun; Zhou, Yaxiong; Lu, Ke; Xue, Kanhao; Miao, Xiangshui
2017-08-30
The memristor is a promising candidate for the next generation non-volatile memory, especially based on HfO 2-x , given its compatibility with advanced CMOS technologies. Although various resistive transitions were reported independently, customized binary and multi-level memristors in unified HfO 2-x material have not been studied. Here we report Pt/HfO 2-x /Ti memristors with double memristive modes, forming-free and low operation voltage, which were tuned by oxidation conditions of HfO 2-x films. As O/Hf ratios of HfO 2-x films increase, the forming voltages, SET voltages, and R off /R on windows increase regularly while their resistive transitions undergo from gradually to sharply in I/V sweep. Two memristors with typical resistive transitions were studied to customize binary and multi-level memristive modes, respectively. For binary mode, high-speed switching with 10 3 pulses (10 ns) and retention test at 85 °C (>10 4 s) were achieved. For multi-level mode, the 12-levels stable resistance states were confirmed by ongoing multi-window switching (ranging from 10 ns to 1 μs and completing 10 cycles of each pulse). Our customized binary and multi-level HfO 2-x -based memristors show high-speed switching, multi-level storage and excellent stability, which can be separately applied to logic computing and neuromorphic computing, further suitable for in-memory computing chip when deposition atmosphere may be fine-tuned.
Optimal processor assignment for pipeline computations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath
1991-01-01
The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.
Conesa-Muñoz, Jesús; Gonzalez-de-Soto, Mariano; Gonzalez-de-Santos, Pablo; Ribeiro, Angela
2015-03-05
This paper describes a supervisor system for monitoring the operation of automated agricultural vehicles. The system analyses all of the information provided by the sensors and subsystems on the vehicles in real time and notifies the user when a failure or potentially dangerous situation is detected. In some situations, it is even able to execute a neutralising protocol to remedy the failure. The system is based on a distributed and multi-level architecture that divides the supervision into different subsystems, allowing for better management of the detection and repair of failures. The proposed supervision system was developed to perform well in several scenarios, such as spraying canopy treatments against insects and diseases and selective weed treatments, by either spraying herbicide or burning pests with a mechanical-thermal actuator. Results are presented for selective weed treatment by the spraying of herbicide. The system successfully supervised the task; it detected failures such as service disruptions, incorrect working speeds, incorrect implement states, and potential collisions. Moreover, the system was able to prevent collisions between vehicles by taking action to avoid intersecting trajectories. The results show that the proposed system is a highly useful tool for managing fleets of autonomous vehicles. In particular, it can be used to manage agricultural vehicles during treatment operations.
Conesa-Muñoz, Jesús; Gonzalez-de-Soto, Mariano; Gonzalez-de-Santos, Pablo; Ribeiro, Angela
2015-01-01
This paper describes a supervisor system for monitoring the operation of automated agricultural vehicles. The system analyses all of the information provided by the sensors and subsystems on the vehicles in real time and notifies the user when a failure or potentially dangerous situation is detected. In some situations, it is even able to execute a neutralising protocol to remedy the failure. The system is based on a distributed and multi-level architecture that divides the supervision into different subsystems, allowing for better management of the detection and repair of failures. The proposed supervision system was developed to perform well in several scenarios, such as spraying canopy treatments against insects and diseases and selective weed treatments, by either spraying herbicide or burning pests with a mechanical-thermal actuator. Results are presented for selective weed treatment by the spraying of herbicide. The system successfully supervised the task; it detected failures such as service disruptions, incorrect working speeds, incorrect implement states, and potential collisions. Moreover, the system was able to prevent collisions between vehicles by taking action to avoid intersecting trajectories. The results show that the proposed system is a highly useful tool for managing fleets of autonomous vehicles. In particular, it can be used to manage agricultural vehicles during treatment operations. PMID:25751079
Qin, Yuhong; Zhang, Jingru; Zhang, Yuan; Li, Fangbing; Han, Yongtao; Zou, Nan; Xu, Haowei; Qian, Meiyuan; Pan, Canping
2016-09-02
An automated multi-plug filtration cleanup (m-PFC) method on modified QuEChERS (quick, easy, cheap, effective, rugged, and safe) extracts was developed. The automatic device was aimed to reduce labor-consuming manual operation workload in the cleanup steps. It could control the volume and the speed of pulling and pushing cycles accurately. In this work, m-PFC was based on multi-walled carbon nanotubes (MWCNTs) mixed with other sorbents and anhydrous magnesium sulfate (MgSO4) in a packed tip for analysis of pesticide multi-residues in crop commodities followed by liquid chromatography with tandem mass spectrometric (LC-MS/MS) detection. It was validated by analyzing 25 pesticides in six representative matrices spiked at two concentration levels of 10 and 100μg/kg. Salts, sorbents, m-PFC procedure, automated pulling and pushing volume, automated pulling speed, and pushing speed for each matrix were optimized. After optimization, two general automated m-PFC methods were introduced to relatively simple (apple, citrus fruit, peanut) and relatively complex (spinach, leek, green tea) matrices. Spike recoveries were within 83 and 108% and 1-14% RSD for most analytes in the tested matrices. Matrix-matched calibrations were performed with the coefficients of determination >0.997 between concentration levels of 10 and 1000μg/kg. The developed method was successfully applied to the determination of pesticide residues in market samples. Copyright © 2016 Elsevier B.V. All rights reserved.
Advancement and Application of Multi-Phase CFD Modeling to High Speed Supercavitating Flows
2013-08-13
5a. CONTRACT NUMBER 5b. GRANT NUMBER N00014-09-1-0042 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Jules W. Lindau and Michael P. Kinzel 5d. PROJECT...REPORT U b. ABSTRACT U c. THIS PAGE U 17. LIMITATION OF ABSTRACT U 18. NUMBER OF PAGES 29 19a. NAME OF RESPONSIBLE PERSON Jules W. Lindau...Application of Multi-Phase CFD Modeling to High Speed Supercavitating Flows Michael P. Kinzel Jules W. Lindau Penn State University Applied Research
Behavior-based multi-robot collaboration for autonomous construction tasks
NASA Technical Reports Server (NTRS)
Stroupe, Ashley; Huntsberger, Terry; Okon, Avi; Aghazarian, Hrand; Robinson, Matthew
2005-01-01
The Robot Construction Crew (RCC) is a heterogeneous multi-robot system for autonomous construction of a structure through assembly of Long components. The two robot team demonstrates component placement into an existing structure in a realistic environment. The task requires component acquisition, cooperative transport, and cooperative precision manipulation. A behavior-based architecture provides adaptability. The RCC approach minimizes computation, power, communication, and sensing for applicability to space-related construction efforts, but the techniques are applicable to terrestrial construction tasks.
Behavior-Based Multi-Robot Collaboration for Autonomous Construction Tasks
NASA Technical Reports Server (NTRS)
Stroupe, Ashley; Huntsberger, Terry; Okon, Avi; Aghazarian, Hrand; Robinson, Matthew
2005-01-01
We present a heterogeneous multi-robot system for autonomous construction of a structure through assembly of long components. Placement of a component within an existing structure in a realistic environment is demonstrated on a two-robot team. The task requires component acquisition, cooperative transport, and cooperative precision manipulation. Far adaptability, the system is designed as a behavior-based architecture. Far applicability to space-related construction efforts, computation, power, communication, and sensing are minimized, though the techniques developed are also applicable to terrestrial construction tasks.
Collaborative simulation method with spatiotemporal synchronization process control
NASA Astrophysics Data System (ADS)
Zou, Yisheng; Ding, Guofu; Zhang, Weihua; Zhang, Jian; Qin, Shengfeng; Tan, John Kian
2016-10-01
When designing a complex mechatronics system, such as high speed trains, it is relatively difficult to effectively simulate the entire system's dynamic behaviors because it involves multi-disciplinary subsystems. Currently,a most practical approach for multi-disciplinary simulation is interface based coupling simulation method, but it faces a twofold challenge: spatial and time unsynchronizations among multi-directional coupling simulation of subsystems. A new collaborative simulation method with spatiotemporal synchronization process control is proposed for coupling simulating a given complex mechatronics system across multiple subsystems on different platforms. The method consists of 1) a coupler-based coupling mechanisms to define the interfacing and interaction mechanisms among subsystems, and 2) a simulation process control algorithm to realize the coupling simulation in a spatiotemporal synchronized manner. The test results from a case study show that the proposed method 1) can certainly be used to simulate the sub-systems interactions under different simulation conditions in an engineering system, and 2) effectively supports multi-directional coupling simulation among multi-disciplinary subsystems. This method has been successfully applied in China high speed train design and development processes, demonstrating that it can be applied in a wide range of engineering systems design and simulation with improved efficiency and effectiveness.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Allan Ray
1987-05-01
Increases in high speed hardware have mandated studies in software techniques to exploit the parallel capabilities. This thesis examines the effects a run-time scheduler has on a multiprocessor. The model consists of directed, acyclic graphs, generated from serial FORTRAN benchmark programs by the parallel compiler Parafrase. A multitasked, multiprogrammed environment is created. Dependencies are generated by the compiler. Tasks are bidimensional, i.e., they may specify both time and processor requests. Processor requests may be folded into execution time by the scheduler. The graphs may arrive at arbitrary time intervals. The general case is NP-hard, thus, a variety of heuristics aremore » examined by a simulator. Multiprogramming demonstrates a greater need for a run-time scheduler than does monoprogramming for a variety of reasons, e.g., greater stress on the processors, a larger number of independent control paths, more variety in the task parameters, etc. The dynamic critical path series of algorithms perform well. Dynamic critical volume did not add much. Unfortunately, dynamic critical path maximizes turnaround time as well as throughput. Two schedulers are presented which balance throughput and turnaround time. The first requires classification of jobs by type; the second requires selection of a ratio value which is dependent upon system parameters. 45 refs., 19 figs., 20 tabs.« less
Pryor, Alan; Ophus, Colin; Miao, Jianwei
2017-10-25
Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. In this paper, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditionalmore » multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic, using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic.« less
Plant, Richard R; Turner, Garry
2009-08-01
Since the publication of Plant, Hammond, and Turner (2004), which highlighted a pressing need for researchers to pay more attention to sources of error in computer-based experiments, the landscape has undoubtedly changed, but not necessarily for the better. Readily available hardware has improved in terms of raw speed; multi core processors abound; graphics cards now have hundreds of megabytes of RAM; main memory is measured in gigabytes; drive space is measured in terabytes; ever larger thin film transistor displays capable of single-digit response times, together with newer Digital Light Processing multimedia projectors, enable much greater graphic complexity; and new 64-bit operating systems, such as Microsoft Vista, are now commonplace. However, have millisecond-accurate presentation and response timing improved, and will they ever be available in commodity computers and peripherals? In the present article, we used a Black Box ToolKit to measure the variability in timing characteristics of hardware used commonly in psychological research.
A microprocessor based on a two-dimensional semiconductor
Wachter, Stefan; Polyushkin, Dmitry K.; Bethge, Ole; Mueller, Thomas
2017-01-01
The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III–V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor—molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material. PMID:28398336
Method and system for data clustering for very large databases
NASA Technical Reports Server (NTRS)
Livny, Miron (Inventor); Zhang, Tian (Inventor); Ramakrishnan, Raghu (Inventor)
1998-01-01
Multi-dimensional data contained in very large databases is efficiently and accurately clustered to determine patterns therein and extract useful information from such patterns. Conventional computer processors may be used which have limited memory capacity and conventional operating speed, allowing massive data sets to be processed in a reasonable time and with reasonable computer resources. The clustering process is organized using a clustering feature tree structure wherein each clustering feature comprises the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster. A dense region of data points is treated collectively as a single cluster, and points in sparsely occupied regions can be treated as outliers and removed from the clustering feature tree. The clustering can be carried out continuously with new data points being received and processed, and with the clustering feature tree being restructured as necessary to accommodate the information from the newly received data points.
Pryor, Alan; Ophus, Colin; Miao, Jianwei
2017-01-01
Simulation of atomic-resolution image formation in scanning transmission electron microscopy can require significant computation times using traditional methods. A recently developed method, termed plane-wave reciprocal-space interpolated scattering matrix (PRISM), demonstrates potential for significant acceleration of such simulations with negligible loss of accuracy. Here, we present a software package called Prismatic for parallelized simulation of image formation in scanning transmission electron microscopy (STEM) using both the PRISM and multislice methods. By distributing the workload between multiple CUDA-enabled GPUs and multicore processors, accelerations as high as 1000 × for PRISM and 15 × for multislice are achieved relative to traditional multislice implementations using a single 4-GPU machine. We demonstrate a potentially important application of Prismatic , using it to compute images for atomic electron tomography at sufficient speeds to include in the reconstruction pipeline. Prismatic is freely available both as an open-source CUDA/C++ package with a graphical user interface and as a Python package, PyPrismatic .