Science.gov

Sample records for parallel processing strategies

  1. Parallel strategies for SAR processing

    NASA Astrophysics Data System (ADS)

    Segoviano, Jesus A.

    2004-12-01

    This article proposes a series of strategies for improving the computer process of the Synthetic Aperture Radar (SAR) signal treatment, following the three usual lines of action to speed up the execution of any computer program. On the one hand, it is studied the optimization of both, the data structures and the application architecture used on it. On the other hand it is considered a hardware improvement. For the former, they are studied both, the usually employed SAR process data structures, proposing the use of parallel ones and the way the parallelization of the algorithms employed on the process is implemented. Besides, the parallel application architecture classifies processes between fine/coarse grain. These are assigned to individual processors or separated in a division among processors, all of them in their corresponding architectures. For the latter, it is studied the hardware employed on the computer parallel process used in the SAR handling. The improvement here refers to several kinds of platforms in which the SAR process is implemented, shared memory multicomputers, and distributed memory multiprocessors. A comparison between them gives us some guidelines to follow in order to get a maximum throughput with a minimum latency and a maximum effectiveness with a minimum cost, all together with a limited complexness. It is concluded and described, that the approach consisting of the processing of the algorithms in a GNU/Linux environment, together with a Beowulf cluster platform offers, under certain conditions, the best compromise between performance and cost, and promises the major development in the future for the Synthetic Aperture Radar computer power thirsty applications in the next years.

  2. Special parallel processing workshop

    SciTech Connect

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  3. Modified parallel cascade control strategy for stable, unstable and integrating processes.

    PubMed

    Raja, G Lloyds; Ali, Ahmad

    2016-11-01

    This manuscript presents a modified parallel cascade control structure (PCCS) for a class of stable, unstable and integrating process models with time delay. The proposed PCCS consists of three controllers. Internal Model Control (IMC) approach is used to design the disturbance rejection controller in the secondary loop. Parameters of the proportional-integral (PI) controller which is used for setpoint tracking is obtained by equating the first and second derivatives of desired and actual closed loop transfer functions at the origin of s-plane. Routh Hurwitz stability criterion is used to design the proportional-derivative (PD) controller which stabilizes the unstable/integrating primary process model. An analytical expression is proposed for computing the desired closed loop time constant of the primary loop in terms of plant model parameters so as to achieve an user-defined maximum sensitivity. Based on extensive simulation studies, a suitable value for the secondary closed loop time constant is also recommended. This is an advantage of the present work over the reported parallel cascade control schemes where authors provide a suitable range of values for the closed loop time constants. The proposed tuning strategy requires tuning of four/six controller parameters for stable/unstable and integrating process models which is less compared to the reported strategies. Simulation results illustrate that the proposed method yields significant improvement in closed loop performance compared to some of the recently reported tuning strategies for both nominal and perturbed process models. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  4. Parallel-processing with surface plasmons, a new strategy for converting the broad solar spectrum

    NASA Technical Reports Server (NTRS)

    Anderson, L. M.

    1982-01-01

    A new strategy for efficient solar-energy conversion is based on parallel processing with surface plasmons: guided electromagnetic waves supported on thin films of common metals like aluminum or silver. The approach is unique in identifying a broadband carrier with suitable range for energy transport and an inelastic tunneling process which can be used to extract more energy from the more energetic carriers without requiring different materials for each frequency band. The aim is to overcome the fundamental 56-percent loss associated with mismatch between the broad solar spectrum and the monoenergetic conduction electrons used to transport energy in conventional silicon solar cells. This paper presents a qualitative discussion of the unknowns and barrier problems, including ideas for coupling surface plasmons into the tunnels, a step which has been the weak link in the efficiency chain.

  5. Parallel-processing with surface plasmons, a new strategy for converting the broad solar spectrum

    NASA Technical Reports Server (NTRS)

    Anderson, L. M.

    1982-01-01

    A new strategy for efficient solar-energy conversion is based on parallel processing with surface plasmons: guided electromagnetic waves supported on thin films of common metals like aluminum or silver. The approach is unique in identifying a broadband carrier with suitable range for energy transport and an inelastic tunneling process which can be used to extract more energy from the more energetic carriers without requiring different materials for each frequency band. The aim is to overcome the fundamental 56-percent loss associated with mismatch between the broad solar spectrum and the monoenergetic conduction electrons used to transport energy in conventional silicon solar cells. This paper presents a qualitative discussion of the unknowns and barrier problems, including ideas for coupling surface plasmons into the tunnels, a step which has been the weak link in the efficiency chain.

  6. Parallel processing optimization strategy based on MapReduce model in cloud storage environment

    NASA Astrophysics Data System (ADS)

    Cui, Jianming; Liu, Jiayi; Li, Qiuyan

    2017-05-01

    Currently, a large number of documents in the cloud storage process employed the way of packaging after receiving all the packets. From the local transmitter this stored procedure to the server, packing and unpacking will consume a lot of time, and the transmission efficiency is low as well. A new parallel processing algorithm is proposed to optimize the transmission mode. According to the operation machine graphs model work, using MPI technology parallel execution Mapper and Reducer mechanism. It is good to use MPI technology to implement Mapper and Reducer parallel mechanism. After the simulation experiment of Hadoop cloud computing platform, this algorithm can not only accelerate the file transfer rate, but also shorten the waiting time of the Reducer mechanism. It will break through traditional sequential transmission constraints and reduce the storage coupling to improve the transmission efficiency.

  7. Parallel processing ITS

    SciTech Connect

    Fan, W.C.; Halbleib, J.A. Sr.

    1996-09-01

    This report provides a users` guide for parallel processing ITS on a UNIX workstation network, a shared-memory multiprocessor or a massively-parallel processor. The parallelized version of ITS is based on a master/slave model with message passing. Parallel issues such as random number generation, load balancing, and communication software are briefly discussed. Timing results for example problems are presented for demonstration purposes.

  8. A new strategy for efficient solar energy conversion: Parallel-processing with surface plasmons

    NASA Technical Reports Server (NTRS)

    Anderson, L. M.

    1982-01-01

    This paper introduces an advanced concept for direct conversion of sunlight to electricity, which aims at high efficiency by tailoring the conversion process to separate energy bands within the broad solar spectrum. The objective is to obtain a high level of spectrum-splitting without sequential losses or unique materials for each frequency band. In this concept, sunlight excites a spectrum of surface plasma waves which are processed in parallel on the same metal film. The surface plasmons transport energy to an array of metal-barrier-semiconductor diodes, where energy is extracted by inelastic tunneling. Diodes are tuned to different frequency bands by selecting the operating voltage and geometry, but all diodes share the same materials.

  9. Speeding up parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.

  10. Parallel processing of natural language

    SciTech Connect

    Chang, H.O.

    1986-01-01

    Two types of parallel natural language processing are studied in this work: (1) the parallelism between syntactic and nonsyntactic processing and (2) the parallelism within syntactic processing. It is recognized that a syntactic category can potentially be attached to more than one node in the syntactic tree of a sentence. Even if all the attachments are syntactically well-formed, nonsyntactic factors such as semantic and pragmatic consideration may require one particular attachment. Syntactic processing must synchronize and communicate with nonsyntactic processing. Two syntactic processing algorithms are proposed for use in a parallel environment: Early's algorithm and the LR(k) algorithm. Conditions are identified to detect the syntactic ambiguity and the algorithms are augmented accordingly. It is shown that by using nonsyntactic information during syntactic processing, backtracking can be reduced, and the performance of the syntactic processor is improved. For the second type of parallelism, it is recognized that one portion of a grammar can be isolated from the rest of the grammar and be processed by a separate processor. A partial grammar of a larger grammar is defined. Parallel syntactic processing is achieved by using two processors concurrently: the main processor (mp) and the two processors concurrently: the main processor (mp) and the auxiliary processor (ap).

  11. Parallel processing for control applications

    SciTech Connect

    Telford, J. W.

    2001-01-01

    Parallel processing has been a topic of discussion in computer science circles for decades. Using more than one single computer to control a process has many advantages that compensate for the additional cost. Initially multiple computers were used to attain higher speeds. A single cpu could not perform all of the operations necessary for real time operation. As technology progressed and cpu's became faster, the speed issue became less significant. The additional processing capabilities however continue to make high speeds an attractive element of parallel processing. Another reason for multiple processors is reliability. For the purpose of this discussion, reliability and robustness will be the focal paint. Most contemporary conceptions of parallel processing include visions of hundreds of single computers networked to provide 'computing power'. Indeed our own teraflop machines are built from large numbers of computers configured in a network (and thus limited by the network). There are many approaches to parallel configfirations and this presentation offers something slightly different from the contemporary networked model. In the world of embedded computers, which is a pervasive force in contemporary computer controls, there are many single chip computers available. If one backs away from the PC based parallel computing model and considers the possibilities of a parallel control device based on multiple single chip computers, a new area of possibilities becomes apparent. This study will look at the use of multiple single chip computers in a parallel configuration with emphasis placed on maximum reliability.

  12. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Lau, Sonie

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.

  13. Tightly integrated single- and multi-crystal data collection strategy calculation and parallelized data processing in JBluIce beamline control system

    SciTech Connect

    Pothineni, Sudhir Babu; Venugopalan, Nagarajan; Ogata, Craig M.; Hilgart, Mark C.; Stepanov, Sergey; Sanishvili, Ruslan; Becker, Michael; Winter, Graeme; Sauter, Nicholas K.; Smith, Janet L.; Fischetti, Robert F.

    2014-11-18

    The calculation of single- and multi-crystal data collection strategies and a data processing pipeline have been tightly integrated into the macromolecular crystallographic data acquisition and beamline control software JBluIce. Both tasks employ wrapper scripts around existing crystallographic software. JBluIce executes scripts through a distributed resource management system to make efficient use of all available computing resources through parallel processing. The JBluIce single-crystal data collection strategy feature uses a choice of strategy programs to help users rank sample crystals and collect data. The strategy results can be conveniently exported to a data collection run. The JBluIce multi-crystal strategy feature calculates a collection strategy to optimize coverage of reciprocal space in cases where incomplete data are available from previous samples. The JBluIce data processing runs simultaneously with data collection using a choice of data reduction wrappers for integration and scaling of newly collected data, with an option for merging with pre-existing data. Data are processed separately if collected from multiple sites on a crystal or from multiple crystals, then scaled and merged. Results from all strategy and processing calculations are displayed in relevant tabs of JBluIce.

  14. Tightly integrated single- and multi-crystal data collection strategy calculation and parallelized data processing in JBluIce beamline control system

    DOE PAGES

    Pothineni, Sudhir Babu; Venugopalan, Nagarajan; Ogata, Craig M.; ...

    2014-11-18

    The calculation of single- and multi-crystal data collection strategies and a data processing pipeline have been tightly integrated into the macromolecular crystallographic data acquisition and beamline control software JBluIce. Both tasks employ wrapper scripts around existing crystallographic software. JBluIce executes scripts through a distributed resource management system to make efficient use of all available computing resources through parallel processing. The JBluIce single-crystal data collection strategy feature uses a choice of strategy programs to help users rank sample crystals and collect data. The strategy results can be conveniently exported to a data collection run. The JBluIce multi-crystal strategy feature calculates amore » collection strategy to optimize coverage of reciprocal space in cases where incomplete data are available from previous samples. The JBluIce data processing runs simultaneously with data collection using a choice of data reduction wrappers for integration and scaling of newly collected data, with an option for merging with pre-existing data. Data are processed separately if collected from multiple sites on a crystal or from multiple crystals, then scaled and merged. Results from all strategy and processing calculations are displayed in relevant tabs of JBluIce.« less

  15. Tightly integrated single- and multi-crystal data collection strategy calculation and parallelized data processing in JBluIce beamline control system.

    PubMed

    Pothineni, Sudhir Babu; Venugopalan, Nagarajan; Ogata, Craig M; Hilgart, Mark C; Stepanov, Sergey; Sanishvili, Ruslan; Becker, Michael; Winter, Graeme; Sauter, Nicholas K; Smith, Janet L; Fischetti, Robert F

    2014-12-01

    The calculation of single- and multi-crystal data collection strategies and a data processing pipeline have been tightly integrated into the macromolecular crystallographic data acquisition and beamline control software JBluIce. Both tasks employ wrapper scripts around existing crystallographic software. JBluIce executes scripts through a distributed resource management system to make efficient use of all available computing resources through parallel processing. The JBluIce single-crystal data collection strategy feature uses a choice of strategy programs to help users rank sample crystals and collect data. The strategy results can be conveniently exported to a data collection run. The JBluIce multi-crystal strategy feature calculates a collection strategy to optimize coverage of reciprocal space in cases where incomplete data are available from previous samples. The JBluIce data processing runs simultaneously with data collection using a choice of data reduction wrappers for integration and scaling of newly collected data, with an option for merging with pre-existing data. Data are processed separately if collected from multiple sites on a crystal or from multiple crystals, then scaled and merged. Results from all strategy and processing calculations are displayed in relevant tabs of JBluIce.

  16. Tightly integrated single- and multi-crystal data collection strategy calculation and parallelized data processing in JBluIce beamline control system

    PubMed Central

    Pothineni, Sudhir Babu; Venugopalan, Nagarajan; Ogata, Craig M.; Hilgart, Mark C.; Stepanov, Sergey; Sanishvili, Ruslan; Becker, Michael; Winter, Graeme; Sauter, Nicholas K.; Smith, Janet L.; Fischetti, Robert F.

    2014-01-01

    The calculation of single- and multi-crystal data collection strategies and a data processing pipeline have been tightly integrated into the macromolecular crystallographic data acquisition and beamline control software JBluIce. Both tasks employ wrapper scripts around existing crystallographic software. JBluIce executes scripts through a distributed resource management system to make efficient use of all available computing resources through parallel processing. The JBluIce single-crystal data collection strategy feature uses a choice of strategy programs to help users rank sample crystals and collect data. The strategy results can be conveniently exported to a data collection run. The JBluIce multi-crystal strategy feature calculates a collection strategy to optimize coverage of reciprocal space in cases where incomplete data are available from previous samples. The JBluIce data processing runs simultaneously with data collection using a choice of data reduction wrappers for integration and scaling of newly collected data, with an option for merging with pre-existing data. Data are processed separately if collected from multiple sites on a crystal or from multiple crystals, then scaled and merged. Results from all strategy and processing calculations are displayed in relevant tabs of JBluIce. PMID:25484844

  17. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  18. Parallel processing spacecraft communication system

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary S. (Inventor); Donaldson, James A. (Inventor); Luong, Huy H. (Inventor); Wood, Steven H. (Inventor)

    1998-01-01

    An uplink controlling assembly speeds data processing using a special parallel codeblock technique. A correct start sequence initiates processing of a frame. Two possible start sequences can be used; and the one which is used determines whether data polarity is inverted or non-inverted. Processing continues until uncorrectable errors are found. The frame ends by intentionally sending a block with an uncorrectable error. Each of the codeblocks in the frame has a channel ID. Each channel ID can be separately processed in parallel. This obviates the problem of waiting for error correction processing. If that channel number is zero, however, it indicates that the frame of data represents a critical command only. That data is handled in a special way, independent of the software. Otherwise, the processed data further handled using special double buffering techniques to avoid problems from overrun. When overrun does occur, the system takes action to lose only the oldest data.

  19. Parallel Computing Strategies for Irregular Algorithms

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  20. Scheduling Tasks In Parallel Processing

    NASA Technical Reports Server (NTRS)

    Price, Camille C.; Salama, Moktar A.

    1989-01-01

    Algorithms sought to minimize time and cost of computation. Report describes research on scheduling of computations tasks in system of multiple identical data processors operating in parallel. Computational intractability requires use of suboptimal heuristic algorithms. First algorithm called "list heuristic", variation of classical list scheduling. Second algorithm called "cluster heuristic" applied to tightly coupled tasks and consists of four phases. Third algorithm called "exchange heuristic", iterative-improvement algorithm beginning with initial feasible assignment of tasks to processors and periods of time. Fourth algorithm is iterative one for optimal assignment of tasks and based on concept called "simulated annealing" because of mathematical resemblance to aspects of physical annealing processes.

  1. Processes parallel execution using Grid Wizard Enterprise.

    PubMed

    Ruiz, Marco

    2009-01-01

    The field of high-performance computing (HPC) has provided a wide array of strategies for supplying additional computing power to the goal of reducing the total "clock time" required to complete various computational processes. These strategies range from the development of higher-performance hardware to the assembly of large networks of commodity computers, with each strategy designed to address a particular aspect and/or manifestation of a given computational problem. GWE (Grid Wizard Enterprise) in that regard, is an HPC distributed enterprise system, aimed at providing a solution to the particular problem of running inter-independent computational processes faster by parallelizing their execution across a virtual grid of computational resources with a minimum of user intervention.

  2. Hierarchical, parallel computing strategies using component object model for process modelling responses of forest plantations to interacting multiple stresses

    Treesearch

    J. G. Isebrands; G. E. Host; K. Lenz; G. Wu; H. W. Stech

    2000-01-01

    Process models are powerful research tools for assessing the effects of multiple environmental stresses on forest plantations. These models are driven by interacting environmental variables and often include genetic factors necessary for assessing forest plantation growth over a range of different site, climate, and silvicultural conditions. However, process models are...

  3. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  4. Coordination in serial-parallel image processing

    NASA Astrophysics Data System (ADS)

    Wójcik, Waldemar; Dubovoi, Vladymyr M.; Duda, Marina E.; Romaniuk, Ryszard S.; Yesmakhanova, Laura; Kozbakova, Ainur

    2015-12-01

    Serial-parallel systems used to convert the image. The control of their work results with the need to solve coordination problem. The paper summarizes the model of coordination of resource allocation in relation to the task of synchronizing parallel processes; the genetic algorithm of coordination developed, its adequacy verified in relation to the process of parallel image processing.

  5. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1991-01-01

    The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.

  6. Parallel computation of Gaussian processes

    NASA Astrophysics Data System (ADS)

    Preuss, R.; von Toussaint, U.

    2017-06-01

    Within the Bayesian framework we utilize Gaussian processes for parametric studies of long running computer codes. Since the simulations are expensive it is necessary to exploit the computational budget in the best possible manner. Employing the sum over variances - being indicators for the quality of the fit - as the utility function we established an optimized and automated sequential parameter selection procedure. However, often it is also desirable to utilize the parallel running capabilities of present computer technology and abandon the sequential parameter selection for a faster overall turn-around time (wall-clock time). The paper proposes to achieve this by marginalizing over the expected outcomes at optimized test points in order to set up a pool of starting values for batch execution.

  7. Parallel Strategies for Crash and Impact Simulations

    SciTech Connect

    Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

    1998-12-07

    We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

  8. Design strategies for irregularly adapting parallel applications

    SciTech Connect

    Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

    2000-11-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability.

  9. Parallel Activation in Bilingual Phonological Processing

    ERIC Educational Resources Information Center

    Lee, Su-Yeon

    2011-01-01

    In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…

  10. Dual compile strategy for parallel heterogeneous execution.

    SciTech Connect

    Smith, Tyler Barratt; Perry, James Thomas

    2012-06-01

    The purpose of the Dual Compile Strategy is to increase our trust in the Compute Engine during its execution of instructions. This is accomplished by introducing a heterogeneous Monitor Engine that checks the execution of the Compute Engine. This leads to the production of a second and custom set of instructions designed for monitoring the execution of the Compute Engine at runtime. This use of multiple engines differs from redundancy in that one engine is working on the application while the other engine is monitoring and checking in parallel instead of both applications (and engines) performing the same work at the same time.

  11. Partitioning in parallel processing of production systems

    SciTech Connect

    Oflazer, K.

    1987-01-01

    This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpreter with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.

  12. Parallel processing considerations for image recognition tasks

    NASA Astrophysics Data System (ADS)

    Simske, Steven J.

    2011-01-01

    Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.

  13. Parallel processing of genomics data

    NASA Astrophysics Data System (ADS)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  14. Computer architecture and parallel processing

    SciTech Connect

    Hwang, K.; Faye, A.

    1984-01-01

    The book is intended as a text to support two semesters of courses in computer architecture at the college senior and graduate levels. There are excellent problems for students at the end of each chapter. The authors have divided the use of computers into the following four levels of sophistication: data processing, information processing, knowledge processing, and intelligence processing.

  15. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1995-01-01

    The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.

  16. Parallelization Strategies for Large Particle Simulations in Astrophysics

    NASA Astrophysics Data System (ADS)

    Pattabiraman, Bharath

    The modeling of collisional N-body stellar systems is a topic of great current interest in several branches of astrophysics and cosmology. These systems are dominated by the physics of relaxation, the collective effect of many weak, random gravitational encounters between stars. They connect directly to our understanding of star clusters, and to the formation of exotic objects such as X-ray binaries, pulsars, and massive black holes. As a prototypical multi-physics, multi-scale problem, the numerical simulation of such systems is computationally intensive, and can only be achieved through high-performance computing. The goal of this thesis is to present parallelization and optimization strategies that can be used to develop efficient computational tools for simulating collisional N-body systems. This leads to major advances: 1) From an astrophysics perspective, these tools enable the study of new physical regimes out of reach by previous simulations. They also lead to much more complete parameter space exploration, allowing direct comparison of numerical results to observational data. 2) On the high-performance computing front, efficient parallelization of a multi-component application requires the meticulous redesign of the various components, as well as innovative parallelization techniques. Many of the challenges faced in this process lie at the very heart of high-performance computing research, including achieving optimal load balancing, maximizing utilization of computational resources, and making effective use of different parallel platforms. For modeling collisional N-body systems, a Monte Carlo approach provides ideal balance between speed and accuracy, as opposed to the more accurate but less scalable direct N-body method. We describe the development of a new version of the Cluster Monte Carlo (CMC) code capable of simulating systems with a realistic number of stars, while accounting for all important physical processes. This efficient and scalable

  17. Parallel processing of a rotating shaft simulation

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.

    1989-01-01

    A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.

  18. Applications of Parallel Processing in Configuration Analyses

    NASA Technical Reports Server (NTRS)

    Sundaram, Ppchuraman; Hager, James O.; Biedron, Robert T.

    1999-01-01

    The paper presents the recent progress made towards developing an efficient and user-friendly parallel environment for routine analysis of large CFD problems. The coarse-grain parallel version of the CFL3D Euler/Navier-Stokes analysis code, CFL3Dhp, has been ported onto most available parallel platforms. The CFL3Dhp solution accuracy on these parallel platforms has been verified with the CFL3D sequential analyses. User-friendly pre- and post-processing tools that enable a seamless transfer from sequential to parallel processing have been written. Static load balancing tool for CFL3Dhp analysis has also been implemented for achieving good parallel efficiency. For large problems, load balancing efficiency as high as 95% can be achieved even when large number of processors are used. Linear scalability of the CFL3Dhp code with increasing number of processors has also been shown using a large installed transonic nozzle boattail analysis. To highlight the fast turn-around time of parallel processing, the TCA full configuration in sideslip Navier-Stokes drag polar at supersonic cruise has been obtained in a day. CFL3Dhp is currently being used as a production analysis tool.

  19. Highly Parallel Modern Signal Processing.

    DTIC Science & Technology

    1982-02-28

    looked at the application of these techniques to systems with coherent speckle noise, such as synthetic aperature (SAR) imagery, coherent sonar and...pprtitioned matrix inversion , comput;atio-n o"f crossambigul ty fun~ctions, formation of outer prCdu1cL tAand skewed outer products, and multiplication of...operations are multiplication, inversion , and L-U decomposition. In signal processing such operations can be found in adaptive filtering, data

  20. Knowledge representation into Ada parallel processing

    NASA Technical Reports Server (NTRS)

    Masotto, Tom; Babikyan, Carol; Harper, Richard

    1990-01-01

    The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.

  1. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  2. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  3. Parallel processing: The Cm/sup */ experience

    SciTech Connect

    Siewiorek, D.; Gehringer, E.; Segall, Z.

    1986-01-01

    This book describes the parallel-processing research with CM/sup */ at Carnegie-Mellon University. Cm/sup */ is a tightly coupled 50-processor multiprocessing system that has been in operation since 1977. Two complete operating systems-StarOS and Medusa-are part of its development along with a number of applications.

  4. Parallel and Serial Processes in Visual Search

    ERIC Educational Resources Information Center

    Thornton, Thomas L.; Gilden, David L.

    2007-01-01

    A long-standing issue in the study of how people acquire visual information centers around the scheduling and deployment of attentional resources: Is the process serial, or is it parallel? A substantial empirical effort has been dedicated to resolving this issue. However, the results remain largely inconclusive because the methodologies that have…

  5. Efficient multitasking: parallel versus serial processing of multiple tasks

    PubMed Central

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling. PMID:26441742

  6. Efficient multitasking: parallel versus serial processing of multiple tasks.

    PubMed

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.

  7. Hypercluster parallel processing library user's manual

    NASA Technical Reports Server (NTRS)

    Quealy, Angela

    1990-01-01

    This User's Manual describes the Hypercluster Parallel Processing Library, composed of FORTRAN-callable subroutines which enable a FORTRAN programmer to manipulate and transfer information throughout the Hypercluster at NASA Lewis Research Center. Each subroutine and its parameters are described in detail. A simple heat flow application using Laplace's equation is included to demonstrate the use of some of the library's subroutines. The manual can be used initially as an introduction to the parallel features provided by the library. Thereafter it can be used as a reference when programming an application.

  8. Software development strategies for parallel computer architectures

    NASA Astrophysics Data System (ADS)

    Gruber, Ralf; Cooper, W. Anthony; Beniston, Martin; Gengler, Marc; Merazzi, Silvio

    1991-09-01

    As pragmatic users of high performance supercomputers, we believe that nowadays parallel computer architectures with disturbed memories are not yet mature to be used by a wide range of application engineers. A big effort should be made to bring these very promising computers closer to the users. One major flaw of massively parallel machines is that the programmer has to take care himself of the data flow which is often different on different parallel computers. To overcome this problem, we propose that data structures be standardized. The data base then can become an integrated part of the system and the data flow for a given algorithm can be easily prescribed. Fixing data structures forces the computer manufacturer to rather adapt his machine to user's demands and not, as it happens now, the user has to adapt to the innovative computer science approach of the computer manufacturer. In this paper, we present data standards chosen for our ASTRID programming platform for research scientist and engineers, as well as a plasma physics application which won the Cray Gigaflop Performance Awards 1989 and 1990 and which was succesfully ported on an INTEL iPSC/2 hypercube.

  9. Parallel algorithm strategies for circuit simulation.

    SciTech Connect

    Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard

    2010-01-01

    Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.

  10. Partitioning strategies for parallel KIVA-4 engine simulations

    SciTech Connect

    Torres, D J; Kong, S C

    2008-01-01

    Parallel KIVA-4 is described and simulated in four different engine geometries. The Message Passing-Interface (MPl) was used to parallelize KIVA-4. Par itioning strategies ar accesed in light of the fact that cells can become deactivated and activated during the course of an engine simulation which will affect the load balance between processors.

  11. Parallel processing for computer vision and display

    SciTech Connect

    Dew, P.M. . Dept. of Computer Studies); Earnshaw, R.A. ); Heywood, T.R. )

    1989-01-01

    The widespread availability of high performance computers has led to an increased awareness of the importance of visualization techniques particularly in engineering and science. However, many visualization tasks involve processing large amounts of data or manipulating complex computer models of 3D objects. For example, in the field of computer aided engineering it is often necessary to display an edit solid object (see Plate 1) which can take many minutes even on the fastest serial processors. Another example of a computationally intensive problem, this time from computer vision, is the recognition of objects in a 3D scene from a stereo image pair. To perform visualization tasks of this type in real and reasonable time it is necessary to exploit the advances in parallel processing that have taken place over the last decade. This book uniquely provides a collection of papers from leading visualization researchers with a common interest in the application and exploitation of parallel processing techniques.

  12. Parallel Processing of Affective Visual Stimuli

    PubMed Central

    Peyk, Peter; Schupp, Harald T.; Keil, Andreas; Elbert, Thomas; Junghöfer, Markus

    2009-01-01

    Event-related potential (ERP) studies of affective picture processing have demonstrated an early posterior negativity (EPN) for emotionally arousing pictures that are embedded in a rapid visual stream. The present study examined the selective processing of emotional pictures while systematically varying picture presentation rates between 1 and 16 Hz. Previous results with presentation rates up to 5 Hz were replicated in that emotional compared to neutral pictures were associated with a greater EPN. Discrimination among emotional and neutral contents was maintained up to 12 Hz. To explore the notion of parallel processing, convolution analysis was used: EPNs generated by linear superposition of slow rate ERPs explained 70-93% of the variance of measured EPNs, giving evidence for an impressive capacity of parallel affective discrimination in rapid serial picture presentation. PMID:19055507

  13. Parallel Programming Strategies for Irregular Adaptive Applications

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2001-01-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.

  14. Parallel Processing for Computational Continuum Dynamics,

    DTIC Science & Technology

    1985-01-01

    Instruction stream, Multiple Data stream ( MIMD ). An example of a machine of this type is the HEP HIOO computer manu- factured by the Denelcor...parallel architecture in general and for the HEP H1O00 computer in partic- ular. The approach is a step-by-step procedure based on a progression from the...Element Processor) by Denelcor has MIMD architecture. The HEP computer is designed to combine from one up to 16 Process Execu- tion Modules (PEM’s

  15. Competitive Parallel Processing For Compression Of Data

    NASA Technical Reports Server (NTRS)

    Diner, Daniel B.; Fender, Antony R. H.

    1990-01-01

    Momentarily-best compression algorithm selected. Proposed competitive-parallel-processing system compresses data for transmission in channel of limited band-width. Likely application for compression lies in high-resolution, stereoscopic color-television broadcasting. Data from information-rich source like color-television camera compressed by several processors, each operating with different algorithm. Referee processor selects momentarily-best compressed output.

  16. Parallel Function Strategy in Pronoun Assignment

    ERIC Educational Resources Information Center

    Grober, Ellen H.; And Others

    1978-01-01

    Subjects completed sentences of the form NP1 aux V NP2 because (but) Pro...(e.g., John may scold Bill because he...) with a reason or motive for the action described. A basic perceptual strategy was hypothesized to underlie the comprehension of these sentences which have a potentially ambiguous pronoun in the subject position of the subordinate…

  17. A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

    NASA Technical Reports Server (NTRS)

    Straeter, T. A.; Markos, A. T.

    1975-01-01

    A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.

  18. Parallelization of heterogeneous reactor calculations on a graphics processing unit

    NASA Astrophysics Data System (ADS)

    Malofeev, V. M.; Pal'shin, V. A.

    2016-12-01

    Parallelization is applied to the neutron calculations performed by the heterogeneous method on a graphics processing unit. The parallel algorithm of the modified TREC code is described. The efficiency of the parallel algorithm is evaluated.

  19. Parallelization of heterogeneous reactor calculations on a graphics processing unit

    SciTech Connect

    Malofeev, V. M. Pal’shin, V. A.

    2016-12-15

    Parallelization is applied to the neutron calculations performed by the heterogeneous method on a graphics processing unit. The parallel algorithm of the modified TREC code is described. The efficiency of the parallel algorithm is evaluated.

  20. A multiarchitecture parallel-processing development environment

    NASA Technical Reports Server (NTRS)

    Townsend, Scott; Blech, Richard; Cole, Gary

    1993-01-01

    A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.

  1. Oxytocin: parallel processing in the social brain?

    PubMed

    Dölen, Gül

    2015-06-01

    Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.

  2. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  3. Parallel Processing for Computational Continuum Dynamics.

    DTIC Science & Technology

    1985-05-10

    F49620-84-C-0111In I PARALLEL PROCESSING FOR COMPUTATIONAL CONTINUUM DYNAMICS: A FINAL REPORT Accession For Joseph F. McGrath DTIc TAB KMS Fusion, Inc...Uiarmouncod 0P . . B O X 1 5 6 7 J u s t tic a t io - --- - - Ann Arbor, MI 48106 A v ar_ _ la b il it¥ C o d e a 10 May 1985 nF , Final Report ... REPORT (Yr., Mo. a) 15 PAGE COUNT * Final IFROM 5S4i..4r.5 .. Mar. 10 May 1985 42 * 16. SUPPLEMENTARY NOTATION 17. COSATI CODES IB. SUBJECT TERMS

  4. Partitioning sparse rectangular matrices for parallel processing

    SciTech Connect

    Kolda, T.G.

    1998-05-01

    The authors are interested in partitioning sparse rectangular matrices for parallel processing. The partitioning problem has been well-studied in the square symmetric case, but the rectangular problem has received very little attention. They will formalize the rectangular matrix partitioning problem and discuss several methods for solving it. They will extend the spectral partitioning method for symmetric matrices to the rectangular case and compare this method to three new methods -- the alternating partitioning method and two hybrid methods. The hybrid methods will be shown to be best.

  5. Parallel Guessing: A Strategy for High-Speed Computation

    DTIC Science & Technology

    1984-09-19

    for using additional hardware to obtain higher processing speed). In this paper we argue that "parallel guessing" for image analysis is a useful...34distance" from a true solution, or the correctness of a guess, can be readily checked. We review image - analysis algorithms having a parallel guessing or

  6. Parallel processing for digital picture comparison

    NASA Technical Reports Server (NTRS)

    Cheng, H. D.; Kou, L. T.

    1987-01-01

    In picture processing an important problem is to identify two digital pictures of the same scene taken under different lighting conditions. This kind of problem can be found in remote sensing, satellite signal processing and the related areas. The identification can be done by transforming the gray levels so that the gray level histograms of the two pictures are closely matched. The transformation problem can be solved by using the packing method. Researchers propose a VLSI architecture consisting of m x n processing elements with extensive parallel and pipelining computation capabilities to speed up the transformation with the time complexity 0(max(m,n)), where m and n are the numbers of the gray levels of the input picture and the reference picture respectively. If using uniprocessor and a dynamic programming algorithm, the time complexity will be 0(m(3)xn). The algorithm partition problem, as an important issue in VLSI design, is discussed. Verification of the proposed architecture is also given.

  7. Cloud parallel processing of tandem mass spectrometry based proteomics data.

    PubMed

    Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus

    2012-10-05

    Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.

  8. Parallelization strategy for large-scale vibronic coupling calculations.

    PubMed

    Rabidoux, Scott M; Eijkhout, Victor; Stanton, John F

    2014-12-26

    The vibronic coupling model of Köppel, Domcke, and Cederbaum is a powerful means to understand, predict, and analyze electronic spectra of molecules, especially those that exhibit phenomena that involve breakdown of the Born-Oppenheimer approximation. In this work, we describe a new parallel algorithm for carrying out such calculations. The algorithm is conceptually founded upon a "stencil" representation of the required computational steps, which motivates an efficient strategy for coarse-grained parallelization. The equations involved in the direct-CI type diagonalization of the model Hamiltonian are presented, the parallelization strategy is discussed in detail, and the method is illustrated by calculations involving direct-product basis sets with as many as 17 vibrational modes and 130 billion basis functions.

  9. Airbreathing Propulsion System Analysis Using Multithreaded Parallel Processing

    NASA Technical Reports Server (NTRS)

    Schunk, Richard Gregory; Chung, T. J.; Rodriguez, Pete (Technical Monitor)

    2000-01-01

    In this paper, parallel processing is used to analyze the mixing, and combustion behavior of hypersonic flow. Preliminary work for a sonic transverse hydrogen jet injected from a slot into a Mach 4 airstream in a two-dimensional duct combustor has been completed [Moon and Chung, 1996]. Our aim is to extend this work to three-dimensional domain using multithreaded domain decomposition parallel processing based on the flowfield-dependent variation theory. Numerical simulations of chemically reacting flows are difficult because of the strong interactions between the turbulent hydrodynamic and chemical processes. The algorithm must provide an accurate representation of the flowfield, since unphysical flowfield calculations will lead to the faulty loss or creation of species mass fraction, or even premature ignition, which in turn alters the flowfield information. Another difficulty arises from the disparity in time scales between the flowfield and chemical reactions, which may require the use of finite rate chemistry. The situations are more complex when there is a disparity in length scales involved in turbulence. In order to cope with these complicated physical phenomena, it is our plan to utilize the flowfield-dependent variation theory mentioned above, facilitated by large eddy simulation. Undoubtedly, the proposed computation requires the most sophisticated computational strategies. The multithreaded domain decomposition parallel processing will be necessary in order to reduce both computational time and storage. Without special treatments involved in computer engineering, our attempt to analyze the airbreathing combustion appears to be difficult, if not impossible.

  10. Enjoying Sad Music: Paradox or Parallel Processes?

    PubMed

    Schubert, Emery

    2016-01-01

    Enjoyment of negative emotions in music is seen by many as a paradox. This article argues that the paradox exists because it is difficult to view the process that generates enjoyment as being part of the same system that also generates the subjective negative feeling. Compensation theories explain the paradox as the compensation of a negative emotion by the concomitant presence of one or more positive emotions. But compensation brings us no closer to explaining the paradox because it does not explain how experiencing sadness itself is enjoyed. The solution proposed is that an emotion is determined by three critical processes-labeled motivational action tendency (MAT), subjective feeling (SF) and Appraisal. For many emotions the MAT and SF processes are coupled in valence. For example, happiness has positive MAT and positive SF, annoyance has negative MAT and negative SF. However, it is argued that in an aesthetic context, such as listening to music, emotion processes can become decoupled. The decoupling is controlled by the Appraisal process, which can assess if the context of the sadness is real-life (where coupling occurs) or aesthetic (where decoupling can occur). In an aesthetic context sadness retains its negative SF but the aversive, negative MAT is inhibited, leaving sadness to still be experienced as a negative valanced emotion, while contributing to the overall positive MAT. Individual differences, mood and previous experiences mediate the degree to which the aversive aspects of MAT are inhibited according to this Parallel Processing Hypothesis (PPH). The reason for hesitancy in considering or testing PPH, as well as the preponderance of research on sadness at the exclusion of other negative emotions, are discussed.

  11. Enjoying Sad Music: Paradox or Parallel Processes?

    PubMed Central

    Schubert, Emery

    2016-01-01

    Enjoyment of negative emotions in music is seen by many as a paradox. This article argues that the paradox exists because it is difficult to view the process that generates enjoyment as being part of the same system that also generates the subjective negative feeling. Compensation theories explain the paradox as the compensation of a negative emotion by the concomitant presence of one or more positive emotions. But compensation brings us no closer to explaining the paradox because it does not explain how experiencing sadness itself is enjoyed. The solution proposed is that an emotion is determined by three critical processes—labeled motivational action tendency (MAT), subjective feeling (SF) and Appraisal. For many emotions the MAT and SF processes are coupled in valence. For example, happiness has positive MAT and positive SF, annoyance has negative MAT and negative SF. However, it is argued that in an aesthetic context, such as listening to music, emotion processes can become decoupled. The decoupling is controlled by the Appraisal process, which can assess if the context of the sadness is real-life (where coupling occurs) or aesthetic (where decoupling can occur). In an aesthetic context sadness retains its negative SF but the aversive, negative MAT is inhibited, leaving sadness to still be experienced as a negative valanced emotion, while contributing to the overall positive MAT. Individual differences, mood and previous experiences mediate the degree to which the aversive aspects of MAT are inhibited according to this Parallel Processing Hypothesis (PPH). The reason for hesitancy in considering or testing PPH, as well as the preponderance of research on sadness at the exclusion of other negative emotions, are discussed. PMID:27445752

  12. Parallel approach in RDF query processing

    NASA Astrophysics Data System (ADS)

    Vajgl, Marek; Parenica, Jan

    2017-07-01

    Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.

  13. Serial Order: A Parallel Distributed Processing Approach.

    ERIC Educational Resources Information Center

    Jordan, Michael I.

    Human behavior shows a variety of serially ordered action sequences. This paper presents a theory of serial order which describes how sequences of actions might be learned and performed. In this theory, parallel interactions across time (coarticulation) and parallel interactions across space (dual-task interference) are viewed as two aspects of a…

  14. Partitioning And Packing Equations For Parallel Processing

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Milner, Edward J.

    1989-01-01

    Algorithm developed to identify parallelism in set of coupled ordinary differential equations that describe physical system and to divide set into parallel computational paths, along with parts of solution proceeds independently of others during at least part of time. Path-identifying algorithm creates number of paths consisting of equations that must be computed serially and table that gives dependent and independent arguments and "can start," "can end," and "must end" times of each equation. "Must end" time used subsequently by packing algorithm.

  15. Partitioning And Packing Equations For Parallel Processing

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Milner, Edward J.

    1989-01-01

    Algorithm developed to identify parallelism in set of coupled ordinary differential equations that describe physical system and to divide set into parallel computational paths, along with parts of solution proceeds independently of others during at least part of time. Path-identifying algorithm creates number of paths consisting of equations that must be computed serially and table that gives dependent and independent arguments and "can start," "can end," and "must end" times of each equation. "Must end" time used subsequently by packing algorithm.

  16. Parallel Processing with Digital Signal Processing Hardware and Software

    NASA Technical Reports Server (NTRS)

    Swenson, Cory V.

    1995-01-01

    The assembling and testing of a parallel processing system is described which will allow a user to move a Digital Signal Processing (DSP) application from the design stage to the execution/analysis stage through the use of several software tools and hardware devices. The system will be used to demonstrate the feasibility of the Algorithm To Architecture Mapping Model (ATAMM) dataflow paradigm for static multiprocessor solutions of DSP applications. The individual components comprising the system are described followed by the installation procedure, research topics, and initial program development.

  17. A data parallel strategy for aligning multiple biological sequences on multi-core computers.

    PubMed

    Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad

    2013-05-01

    In this paper, we address the large-scale biological sequence alignment problem, which has an increasing demand in computational biology. We employ data parallelism paradigm that is suitable for handling large-scale processing on multi-core computers to achieve a high degree of parallelism. Using the data parallelism paradigm, we propose a general strategy which can be used to speed up any multiple sequence alignment method. We applied five different clustering algorithms in our strategy and implemented rigorous tests on an 8-core computer using four traditional benchmarks and artificially generated sequences. The results show that our multi-core-based implementations can achieve up to 151-fold improvements in execution time while losing 2.19% accuracy on average. The source code of the proposed strategy, together with the test sets used in our analysis, is available on request. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. Direct stereo radargrammetric processing using massively parallel processing

    NASA Astrophysics Data System (ADS)

    Balz, Timo; Zhang, Lu; Liao, Mingsheng

    2013-05-01

    Synthetic Aperture Radar (SAR) offers many ways to reconstruct digital surface models (DSMs). The two most commonly used methods are SAR interferometry (InSAR) and stereo radargrammetry. Stereo radargrammetry is a very stable and reliable process and is far less affected by temporal decorrelation compared with InSAR. It is therefore often used for DSM generation in heavily vegetated areas. However, stereo radargrammetry often produces rather noisy DSMs, sometimes containing large outliers. In this manuscript, we present a new approach for stereo radargrammetric processing, where the homologous points between the images are found by geocoding large amount of points. This offers a very flexible approach, allowing the simultaneous processing of multiple images and of cross-heading image pairs. Our approach relies on a good initial geocoding accuracy of the data and on very fast processing using a massively parallel implementation. The approach is demonstrated using TerraSAR-X images from Mount Song, China, and from Trento, Italy.

  19. Dynamic Load Balancing Strategies for Parallel Reacting Flow Simulations

    NASA Astrophysics Data System (ADS)

    Pisciuneri, Patrick; Meneses, Esteban; Givi, Peyman

    2014-11-01

    Load balancing in parallel computing aims at distributing the work as evenly as possible among the processors. This is a critical issue in the performance of parallel, time accurate, flow simulators. The constraint of time accuracy requires that all processes must be finished with their calculation for a given time step before any process can begin calculation of the next time step. Thus, an irregularly balanced compute load will result in idle time for many processes for each iteration and thus increased walltimes for calculations. Two existing, dynamic load balancing approaches are applied to the simplified case of a partially stirred reactor for methane combustion. The first is Zoltan, a parallel partitioning, load balancing, and data management library developed at the Sandia National Laboratories. The second is Charm++, which is its own machine independent parallel programming system developed at the University of Illinois at Urbana-Champaign. The performance of these two approaches is compared, and the prospects for their application to full 3D, reacting flow solvers is assessed.

  20. Experience in highly parallel processing using DAP

    NASA Technical Reports Server (NTRS)

    Parkinson, D.

    1987-01-01

    Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.

  1. Wavelet Transforms in Parallel Image Processing

    DTIC Science & Technology

    1994-01-27

    NUMBER OF PAGES Object Segmentation, Texture Segmentation, Image Compression, Image 137 Halftoning , Neural Network, Parallel Algorithms, 2D and 3D...Vector Quantization of Wavelet Transform Coefficients ........ ............................. 57 B.1.f Adaptive Image Halftoning based on Wavelet...application has been directed to the adaptive image halftoning . The gray information at a pixel, including its gray value and gradient, is represented by

  2. Efficient parallel implementation of polarimetric synthetic aperture radar data processing

    NASA Astrophysics Data System (ADS)

    Martinez, Sergio S.; Marpu, Prashanth R.; Plaza, Antonio J.

    2014-10-01

    This work investigates the parallel implementation of polarimetric synthetic aperture radar (POLSAR) data processing chain. Such processing can be computationally expensive when large data sets are processed. However, the processing steps can be largely implemented in a high performance computing (HPC) environ- ment. In this work, we studied different aspects of the computations involved in processing the POLSAR data and developed an efficient parallel scheme to achieve near-real time performance. The algorithm is implemented using message parsing interface (MPI) framework in this work, but it can be easily adapted for other parallel architectures such as general purpose graphics processing units (GPGPUs).

  3. Holographic Routing Network For Parallel Processing Machines

    NASA Astrophysics Data System (ADS)

    Maniloff, Eric S.; Johnson, Kristina M.; Reif, John H.

    1989-10-01

    Dynamic holographic architectures for connecting processors in parallel computers have been generally limited by the response time of the holographic recording media. In this paper we present a different approach to dynamic optical interconnects involving spatial light modulators (SLMs) and volume holograms. Multiple-exposure holograms are stored in a volume recording media, which associate the address of a destination processor encoded on a spatial light modulator with a distinct reference beam. A destination address programmed on the spatial light modulator is then holographically steered to the correct destination processor. We present the design and experimental results of a holographic router for connecting four originator processors to four destination processors.

  4. Hypercluster - Parallel processing for computational mechanics

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1988-01-01

    An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.

  5. Bipartite memory network architectures for parallel processing

    SciTech Connect

    Smith, W.; Kale, L.V. . Dept. of Computer Science)

    1990-01-01

    Parallel architectures are boradly classified as either shared memory or distributed memory architectures. In this paper, the authors propose a third family of architectures, called bipartite memory network architectures. In this architecture, processors and memory modules constitute a bipartite graph, where each processor is allowed to access a small subset of the memory modules, and each memory module allows access from a small set of processors. The architecture is particularly suitable for computations requiring dynamic load balancing. The authors explore the properties of this architecture by examining the Perfect Difference set based topology for the graph. Extensions of this topology are also suggested.

  6. Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

    2017-07-01

    Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.

  7. CRBLASTER: A Parallel-Processing Computational Framework for Embarrassingly-Parallel Image-Analysis Algorithms

    NASA Astrophysics Data System (ADS)

    Mighell, Kenneth John

    2011-11-01

    The development of parallel-processing image-analysis codes is generally a challenging task that requires complicated choreography of interprocessor communications. If, however, the image-analysis algorithm is embarrassingly parallel, then the development of a parallel-processing implementation of that algorithm can be a much easier task to accomplish because, by definition, there is little need for communication between the compute processes. I describe the design, implementation, and performance of a parallel-processing image-analysis application, called CRBLASTER, which does cosmic-ray rejection of CCD (charge-coupled device) images using the embarrassingly-parallel L.A.COSMIC algorithm. CRBLASTER is written in C using the high-performance computing industry standard Message Passing Interface (MPI) library. The code has been designed to be used by research scientists who are familiar with C as a parallel-processing computational framework that enables the easy development of parallel-processing image-analysis programs based on embarrassingly-parallel algorithms. The CRBLASTER source code is freely available at the official application website at the National Optical Astronomy Observatory. Removing cosmic rays from a single 800x800 pixel Hubble Space Telescope WFPC2 image takes 44 seconds with the IRAF script lacos_im.cl running on a single core of an Apple Mac Pro computer with two 2.8-GHz quad-core Intel Xeon processors. CRBLASTER is 7.4 times faster processing the same image on a single core on the same machine. Processing the same image with CRBLASTER simultaneously on all 8 cores of the same machine takes 0.875 seconds -- which is a speedup factor of 50.3 times faster than the IRAF script. A detailed analysis is presented of the performance of CRBLASTER using between 1 and 57 processors on a low-power Tilera 700-MHz 64-core TILE64 processor.

  8. "Let's Move" campaign: applying the extended parallel process model.

    PubMed

    Batchelder, Alicia; Matusitz, Jonathan

    2014-01-01

    This article examines Michelle Obama's health campaign, "Let's Move," through the lens of the extended parallel process model (EPPM). "Let's Move" aims to reduce the childhood obesity epidemic in the United States. Developed by Kim Witte, EPPM rests on the premise that people's attitudes can be changed when fear is exploited as a factor of persuasion. Fear appeals work best (a) when a person feels a concern about the issue or situation, and (b) when he or she believes to have the capability of dealing with that issue or situation. Overall, the analysis found that "Let's Move" is based on past health campaigns that have been successful. An important element of the campaign is the use of fear appeals (as it is postulated by EPPM). For example, part of the campaign's strategies is to explain the severity of the diseases associated with obesity. By looking at the steps of EPPM, readers can also understand the strengths and weaknesses of "Let's Move."

  9. Parafrase restructuring of FORTRAN code for parallel processing

    NASA Technical Reports Server (NTRS)

    Wadhwa, Atul

    1988-01-01

    Parafrase transforms a FORTRAN code, subroutine by subroutine, into a parallel code for a vector and/or shared-memory multiprocessor system. Parafrase is not a compiler; it transforms a code and provides information for a vector or concurrent process. Parafrase uses a data dependency to reveal parallelism among instructions. The data dependency test distinguishes between recurrences and statements that can be directly vectorized or parallelized. A number of transformations are required to build a data dependency graph.

  10. Parallel-Processing Test Bed For Simulation Software

    NASA Technical Reports Server (NTRS)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  11. Parallel processing research in the former Soviet Union

    SciTech Connect

    Dongarra, J.J.; Snyder, L.; Wolcott, P.

    1992-03-01

    This technical assessment report examines strengths and weaknesses of parallel processing research and development in the Soviet Union from the 1980s to June 1991. The assessment was carried out by panel of US scientists who are experts on parallel processing hardware, software, algorithms, and applications, and on Soviet computing. Soviet computer research and development organizations have pursued many of the major avenues of inquiry related to parallel processing that the West has chosen to explore. But, the limited size and substantial breadth of their effort have limited the collective depth of Soviet activity. Even more serious limitations (and delays) of Soviet achievement in parallel processing research can be traced to shortcomings of the Soviet computer industry, which was unable to supply adequate, reliable computer components. Without the ability to build, demonstrate, and test embodiments of their ideas in actual high-performance parallel hardware, both the scope of activity and the success of Soviet parallel processing researchers were severely limited. The quality of the Soviet parallel processing research assessed varied from very sound and interesting to pedestrian, with most of the groups at the major hardware and software centers to which the work is largely confined doing good (or at least serious) research. In a few instances, interesting and competent parallel language development work was found at institutions not associated with hardware development efforts. Unlike Soviet mainframe and minicomputer developers, Soviet parallel processing researchers have not concentrated their efforts on reverse- engineering specific Western systems. No evidence was found of successful Soviet attempts to use breakthroughs in parallel processing technology to ``leapfrog`` impediments and limitations that Soviet industrial weakness in microelectronics and other computer manufacturing areas impose on the performance of high-end Soviet computers.

  12. Parallel processing research in the former Soviet Union

    SciTech Connect

    Dongarra, J.J.; Snyder, L.; Wolcott, P.

    1992-03-01

    This technical assessment report examines strengths and weaknesses of parallel processing research and development in the Soviet Union from the 1980s to June 1991. The assessment was carried out by panel of US scientists who are experts on parallel processing hardware, software, algorithms, and applications, and on Soviet computing. Soviet computer research and development organizations have pursued many of the major avenues of inquiry related to parallel processing that the West has chosen to explore. But, the limited size and substantial breadth of their effort have limited the collective depth of Soviet activity. Even more serious limitations (and delays) of Soviet achievement in parallel processing research can be traced to shortcomings of the Soviet computer industry, which was unable to supply adequate, reliable computer components. Without the ability to build, demonstrate, and test embodiments of their ideas in actual high-performance parallel hardware, both the scope of activity and the success of Soviet parallel processing researchers were severely limited. The quality of the Soviet parallel processing research assessed varied from very sound and interesting to pedestrian, with most of the groups at the major hardware and software centers to which the work is largely confined doing good (or at least serious) research. In a few instances, interesting and competent parallel language development work was found at institutions not associated with hardware development efforts. Unlike Soviet mainframe and minicomputer developers, Soviet parallel processing researchers have not concentrated their efforts on reverse- engineering specific Western systems. No evidence was found of successful Soviet attempts to use breakthroughs in parallel processing technology to leapfrog'' impediments and limitations that Soviet industrial weakness in microelectronics and other computer manufacturing areas impose on the performance of high-end Soviet computers.

  13. A high resolution finite volume method for efficient parallel simulation of casting processes on unstructured meshes

    SciTech Connect

    Kothe, D.B.; Turner, J.A.; Mosso, S.J.; Ferrell, R.C.

    1997-03-01

    We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.

  14. [CMACPAR an modified parallel neuro-controller for control processes].

    PubMed

    Ramos, E; Surós, R

    1999-01-01

    CMACPAR is a Parallel Neurocontroller oriented to real time systems as for example Control Processes. Its characteristics are mainly a fast learning algorithm, a reduced number of calculations, great generalization capacity, local learning and intrinsic parallelism. This type of neurocontroller is used in real time applications required by refineries, hydroelectric centers, factories, etc. In this work we present the analysis and the parallel implementation of a modified scheme of the Cerebellar Model CMAC for the n-dimensional space projection using a mean granularity parallel neurocontroller. The proposed memory management allows for a significant memory reduction in training time and required memory size.

  15. Parallel firing strategy on Petri nets: A review

    NASA Astrophysics Data System (ADS)

    Mavlankulov, Gairatzhan; Turaev, Sherzod; Zhumabaeva, Laula; Zhukabayeva, Tamara

    2015-05-01

    In this paper we review the recent results related on Petri net controlled grammars and the close related topics. Though the theme of regulated grammars is one of the classic topics in formal language theory, a Petri net controlled grammar is still interesting subject for the investigation for many reasons. This type of grammars can successfully be used in modeling new problems emerging in manufacturing systems, systems biology and other areas. Moreover, the graphically illustrability, the ability to represent both a grammar and its control in one structure, and the possibility to unify different regulated rewritings make this formalization attractive for the study. We also summarize the obtained results and propose a new conception such as parallel firing strategy on Petri Nets.

  16. Use of parallel computing in mass processing of laser data

    NASA Astrophysics Data System (ADS)

    Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

    2015-12-01

    The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.

  17. schwimmbad: A uniform interface to parallel processing pools in Python

    NASA Astrophysics Data System (ADS)

    Price-Whelan, Adrian M.; Foreman-Mackey, Daniel

    2017-09-01

    Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).

  18. Strategy Process in Higher Education

    ERIC Educational Resources Information Center

    Kettunen, Juha

    2010-01-01

    Higher education institutions educate those who are the most talented and best able to secure the future for the next generation. This study examines an efficient strategy process in higher education and emphasises the importance of sufficient dialogue during the process. The study describes the strategy process of the Turku University of Applied…

  19. The effects of parallel processing architectures on discrete event simulation

    NASA Astrophysics Data System (ADS)

    Cave, William; Slatt, Edward; Wassmer, Robert E.

    2005-05-01

    As systems become more complex, particularly those containing embedded decision algorithms, mathematical modeling presents a rigid framework that often impedes representation to a sufficient level of detail. Using discrete event simulation, one can build models that more closely represent physical reality, with actual algorithms incorporated in the simulations. Higher levels of detail increase simulation run time. Hardware designers have succeeded in producing parallel and distributed processor computers with theoretical speeds well into the teraflop range. However, the practical use of these machines on all but some very special problems is extremely limited. The inability to use this power is due to great difficulties encountered when trying to translate real world problems into software that makes effective use of highly parallel machines. This paper addresses the application of parallel processing to simulations of real world systems of varying inherent parallelism. It provides a brief background in modeling and simulation validity and describes a parameter that can be used in discrete event simulation to vary opportunities for parallel processing at the expense of absolute time synchronization and is constrained by validity. It focuses on the effects of model architecture, run-time software architecture, and parallel processor architecture on speed, while providing an environment where modelers can achieve sufficient model accuracy to produce valid simulation results. It describes an approach to simulation development that captures subject area expert knowledge to leverage inherent parallelism in systems in the following ways: * Data structures are separated from instructions to track which instruction sets share what data. This is used to determine independence and thus the potential for concurrent processing at run-time. * Model connectivity (independence) can be inspected visually to determine if the inherent parallelism of a physical system is properly

  20. Applying Parallel Processing Techniques to Tether Dynamics Simulation

    NASA Technical Reports Server (NTRS)

    Wells, B. Earl

    1996-01-01

    The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.

  1. Parallel astronomical data processing with Python: Recipes for multicore machines

    NASA Astrophysics Data System (ADS)

    Singh, Navtej; Browne, Lisa-Marie; Butler, Ray

    2013-08-01

    High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore processors in the last decade, many serial software codes have been re-implemented in parallel mode to utilize the full potential of these processors. In this paper, we propose parallel processing recipes for multicore machines for astronomical data processing. The target audience is astronomers who use Python as their preferred scripting language and who may be using PyRAF/IRAF for data processing. Three problems of varied complexity were benchmarked on three different types of multicore processors to demonstrate the benefits, in terms of execution time, of parallelizing data processing tasks. The native multiprocessing module available in Python makes it a relatively trivial task to implement the parallel code. We have also compared the three multiprocessing approaches-Pool/Map, Process/Queue and Parallel Python. Our test codes are freely available and can be downloaded from our website.

  2. Parallel Processing and Sentence Comprehension Difficulty

    ERIC Educational Resources Information Center

    Boston, Marisa Ferrara; Hale, John T.; Vasishth, Shravan; Kliegl, Reinhold

    2011-01-01

    Eye fixation durations during normal reading correlate with processing difficulty, but the specific cognitive mechanisms reflected in these measures are not well understood. This study finds support in German readers' eye fixations for two distinct difficulty metrics: surprisal, which reflects the change in probabilities across syntactic analyses…

  3. Image Processing Using a Parallel Architecture.

    DTIC Science & Technology

    1987-12-01

    Computer," Byte, 3: 14-25 (December 1978). McGraw-Hill, 1985 24. Trussell, H. Joel . "Processing of X-ray Images," Proceedings of the IEEE, 69: 615-627...Services Electronics Program contract N00014-79-C-0424 (AD-085-846). 107 Therrien , Charles W. et al. "A Multiprocessor System for Simulation of

  4. Fault-tolerant parallel processing system

    SciTech Connect

    Harper, R.E.; Lala, J.H.

    1990-03-06

    This patent describes a fault tolerant processing system for providing processing operations, while tolerating f failures in the execution thereof. It comprises: at least (3f + 1) fault containment regions. Each of the regions includes a plurality of processors; network means connected to the processors and to the network means of the others of the fault containment regions; groups of one or more processors being configured to form redundant processing sites at least one of the groups having (2f + 1) processors, each of the processors of a group being included in a different one of the fault containment regions. Each network means of a fault containment region includes means for providing communication operations between the network means and the network means of the others of the fault containment regions, each of the network means being connected to each other network means by at lest (2f + 1) disjoint communication paths, a minimum of (f + 1) rounds of communication being provided among the network means of the fault containment regions in the execution of a the processing operation; and means for synchronizing the communication operations of the network means with the communications operations of the network means of the other fault containment regions.

  5. CRBLASTER: A Parallel-Processing Computational Framework for Embarrassingly Parallel Image-Analysis Algorithms

    NASA Astrophysics Data System (ADS)

    Mighell, Kenneth John

    2010-10-01

    The development of parallel-processing image-analysis codes is generally a challenging task that requires complicated choreography of interprocessor communications. If, however, the image-analysis algorithm is embarrassingly parallel, then the development of a parallel-processing implementation of that algorithm can be a much easier task to accomplish because, by definition, there is little need for communication between the compute processes. I describe the design, implementation, and performance of a parallel-processing image-analysis application, called crblaster, which does cosmic-ray rejection of CCD images using the embarrassingly parallel l.a.cosmic algorithm. crblaster is written in C using the high-performance computing industry standard Message Passing Interface (MPI) library. crblaster uses a two-dimensional image partitioning algorithm that partitions an input image into N rectangular subimages of nearly equal area; the subimages include sufficient additional pixels along common image partition edges such that the need for communication between computer processes is eliminated. The code has been designed to be used by research scientists who are familiar with C as a parallel-processing computational framework that enables the easy development of parallel-processing image-analysis programs based on embarrassingly parallel algorithms. The crblaster source code is freely available at the official application Web site at the National Optical Astronomy Observatory. Removing cosmic rays from a single 800 × 800 pixel Hubble Space Telescope WFPC2 image takes 44 s with the IRAF script lacos_im.cl running on a single core of an Apple Mac Pro computer with two 2.8 GHz quad-core Intel Xeon processors. crblaster is 7.4 times faster when processing the same image on a single core on the same machine. Processing the same image with crblaster simultaneously on all eight cores of the same machine takes 0.875 s—which is a speedup factor of 50.3 times faster than the

  6. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  7. Software Engineering Challenges for Parallel Processing Systems

    DTIC Science & Technology

    2008-05-02

    count + 2 = 4 write count = 4 This type of error caused by Therac - 25 radiation therapy machine resulted in 5 deaths Data Race Deadlock PROCESS 1 Send...OpenMP Jacobi using OpenMP 1 5 25 125 625 1 2 4 8 16 Execution Time Sequential OpenMP 1 2 4 8 16 32 64 128 256 1 2 4 8 16 Execution Time Sequential

  8. Sentence Comprehension: A Parallel Distributed Processing Approach

    DTIC Science & Technology

    1989-07-14

    to suggest that the model we have presented here Is a tabula rasa , acquiring knowledge of language without any prior structure. Indeed, the input is...AvailabilitY CodesAvail and/cr .Dist Special 2 What is constructed mentally when we comprehend a sentence? How does this constructive process occur? What role ...expression is a function of the semantic content of each of the parts of the expression and of the organization of the constituents. 1.2. What role do

  9. Developing Software to Use Parallel Processing Effectively

    DTIC Science & Technology

    1988-10-01

    granularity (grain size) e) potential for interleaving processing and communication f) memory access method (global versus local) g ) processor...procedure G that works on both S1 and S2 but depends only on some small common part of these two types. For example, G might be a sort routine that relies on...hierarchy is more problem- atical than the first, since it is unlikely that S, and S2 have exactly the operations required by G . If they do not, then at the

  10. High power parallel ultrashort pulse laser processing

    NASA Astrophysics Data System (ADS)

    Gillner, Arnold; Gretzki, Patrick; Büsing, Lasse

    2016-03-01

    The class of ultra-short-pulse (USP) laser sources are used, whenever high precession and high quality material processing is demanded. These laser sources deliver pulse duration in the range of ps to fs and are characterized with high peak intensities leading to a direct vaporization of the material with a minimum thermal damage. With the availability of industrial laser source with an average power of up to 1000W, the main challenge consist of the effective energy distribution and disposition. Using lasers with high repetition rates in the MHz region can cause thermal issues like overheating, melt production and low ablation quality. In this paper, we will discuss different approaches for multibeam processing for utilization of high pulse energies. The combination of diffractive optics and conventional galvometer scanner can be used for high throughput laser ablation, but are limited in the optical qualities. We will show which applications can benefit from this hybrid optic and which improvements in productivity are expected. In addition, the optical limitations of the system will be compiled, in order to evaluate the suitability of this approach for any given application.

  11. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  12. A novel optimized parallelization strategy to accelerate microwave tomography for breast cancer screening.

    PubMed

    Shahzad, A; O'Halloran, M; Glavin, M; Jones, E

    2014-01-01

    Microwave tomography has been proven to successfully reconstruct the dielectric profile of a human breast when used in breast imaging applications, thereby providing an alternative to other imaging modalities. However, the method suffers from high computational requirements which restrict its use in practical imaging systems. This paper presents a novel parallelization strategy to accelerate microwave tomography for reconstruction of the dielectric properties of the human breast. A Time Domain algorithm using this parallelization strategy has been validated and benchmarked against an optimized sequential implementation on a conventional high-end desktop Central Processing Unit (CPU), and a comparison of throughput is presented in this paper. The gain in computational throughput is shown to be significantly higher compared with the sequential implementation, ranging from a factor of 26 to 58, on imaging grid sizes of up to 25 cm square at 1 mm resolution.

  13. Repartitioning Strategies for Massively Parallel Simulation of Reacting Flow

    NASA Astrophysics Data System (ADS)

    Pisciuneri, Patrick; Zheng, Angen; Givi, Peyman; Labrinidis, Alexandros; Chrysanthis, Panos

    2015-11-01

    The majority of parallel CFD simulators partition the domain into equal regions and assign the calculations for a particular region to a unique processor. This type of domain decomposition is vital to the efficiency of the solver. However, as the simulation develops, the workload among the partitions often become uneven (e.g. by adaptive mesh refinement, or chemically reacting regions) and a new partition should be considered. The process of repartitioning adjusts the current partition to evenly distribute the load again. We compare two repartitioning tools: Zoltan, an architecture-agnostic graph repartitioner developed at the Sandia National Laboratories; and Paragon, an architecture-aware graph repartitioner developed at the University of Pittsburgh. The comparative assessment is conducted via simulation of the Taylor-Green vortex flow with chemical reaction.

  14. Fault Tolerant Statistical Signal Processing Algorithms for Parallel Architectures.

    DTIC Science & Technology

    2014-09-26

    AD-fi57 393 FAULT TOLERANT STATISTICAL SIGNAL PROCESSING ALGORITHMS i/i FOR PARALLEL ARCH U) JOHNS HOPKINS UNIV BALTIMORE MD DEPT OF ELECTRICAL...COVERED * ’ Fault Tolerant Statistical Signal Processing Technical A l g o r i t h m s f o r P a r a l l e l A r c h i t e c t u r e s a ._ P E R F O R M I...Identify by block number) , Fault Tolerance, Signal Processing, Parallel Architecture 0 20. ABSTRACT (Continue on reveree side It neceseary and identify by

  15. Improving operating room productivity via parallel anesthesia processing.

    PubMed

    Brown, Michael J; Subramanian, Arun; Curry, Timothy B; Kor, Daryl J; Moran, Steven L; Rohleder, Thomas R

    2014-01-01

    Parallel processing of regional anesthesia may improve operating room (OR) efficiency in patients undergoes upper extremity surgical procedures. The purpose of this paper is to evaluate whether performing regional anesthesia outside the OR in parallel increases total cases per day, improve efficiency and productivity. Data from all adult patients who underwent regional anesthesia as their primary anesthetic for upper extremity surgery over a one-year period were used to develop a simulation model. The model evaluated pure operating modes of regional anesthesia performed within and outside the OR in a parallel manner. The scenarios were used to evaluate how many surgeries could be completed in a standard work day (555 minutes) and assuming a standard three cases per day, what was the predicted end-of-day time overtime. Modeling results show that parallel processing of regional anesthesia increases the average cases per day for all surgeons included in the study. The average increase was 0.42 surgeries per day. Where it was assumed that three cases per day would be performed by all surgeons, the days going to overtime was reduced by 43 percent with parallel block. The overtime with parallel anesthesia was also projected to be 40 minutes less per day per surgeon. Key limitations include the assumption that all cases used regional anesthesia in the comparisons. Many days may have both regional and general anesthesia. Also, as a case study, single-center research may limit generalizability. Perioperative care providers should consider parallel administration of regional anesthesia where there is a desire to increase daily upper extremity surgical case capacity. Where there are sufficient resources to do parallel anesthesia processing, efficiency and productivity can be significantly improved. Simulation modeling can be an effective tool to show practice change effects at a system-wide level.

  16. FPGA-Based Filterbank Implementation for Parallel Digital Signal Processing

    NASA Technical Reports Server (NTRS)

    Berner, Stephan; DeLeon, Phillip

    1999-01-01

    One approach to parallel digital signal processing decomposes a high bandwidth signal into multiple lower bandwidth (rate) signals by an analysis bank. After processing, the subband signals are recombined into a fullband output signal by a synthesis bank. This paper describes an implementation of the analysis and synthesis banks using (Field Programmable Gate Arrays) FPGAs.

  17. The science of computing - The evolution of parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1985-01-01

    The present paper is concerned with the approaches to be employed to overcome the set of limitations in software technology which impedes currently an effective use of parallel hardware technology. The process required to solve the arising problems is found to involve four different stages. At the present time, Stage One is nearly finished, while Stage Two is under way. Tentative explorations are beginning on Stage Three, and Stage Four is more distant. In Stage One, parallelism is introduced into the hardware of a single computer, which consists of one or more processors, a main storage system, a secondary storage system, and various peripheral devices. In Stage Two, parallel execution of cooperating programs on different machines becomes explicit, while in Stage Three, new languages will make parallelism implicit. In Stage Four, there will be very high level user interfaces capable of interacting with scientists at the same level of abstraction as scientists do with each other.

  18. High-speed parallel-processing networks for advanced architectures

    SciTech Connect

    Morgan, D.R.

    1988-06-01

    This paper describes various parallel-processing architecture networks that are candidates for eventual airborne use. An attempt at projecting which type of network is suitable or optimum for specific metafunction or stand-alone applications is made. However, specific algorithms will need to be developed and bench marks executed before firm conclusions can be drawn. Also, a conceptual projection of how these processors can be built in small, flyable units through the use of wafer-scale integration is offered. The use of the PAVE PILLAR system architecture to provide system level support for these tightly coupled networks is described. The author concludes that: (1) extremely high processing speeds implemented in flyable hardware is possible through parallel-processing networks if development programs are pursued; (2) dramatic speed enhancements through parallel processing requires an excellent match between the algorithm and computer-network architecture; (3) matching several high speed parallel oriented algorithms across the aircraft system to a limited set of hardware modules may be the most cost-effective approach to achieving speed enhancements; and (4) software-development tools and improved operating systems will need to be developed to support efficient parallel-processor use.

  19. Latency and bandwidth considerations in parallel robotics image processing

    SciTech Connect

    Webb, J.A.

    1993-12-31

    Parallel image processing for robotics applications differs in a fundamental way from parallel scientific computing applications: the problem size is fixed, and latency requirements are tight. This brings Amdhal`s law in effect with full force, so that message-passing latency and bandwidth severely restrict performance. In this paper the authors examine an application from this domain, stereo image processing, which has been implemented in Adapt, a niche language for parallel image processing implemented on the Carnegie Mellon-Intel Corporation iWarp. High performance has been achieved for this application. They show how a I/O building block approach on iWarp achieved this, and then examine the implications of this performance for more traditional machines that do not have iWarp`s rich I/O primitive set.

  20. Mapping Pixel Windows To Vectors For Parallel Processing

    NASA Technical Reports Server (NTRS)

    Duong, Tuan A.

    1996-01-01

    Mapping performed by matrices of transistor switches. Arrays of transistor switches devised for use in forming simultaneous connections from square subarray (window) of n x n pixels within electronic imaging device containing np x np array of pixels to linear array of n(sup2) input terminals of electronic neural network or other parallel-processing circuit. Method helps to realize potential for rapidity in parallel processing for such applications as enhancement of images and recognition of patterns. In providing simultaneous connections, overcomes timing bottleneck or older multiplexing, serial-switching, and sample-and-hold methods.

  1. Mapping Pixel Windows To Vectors For Parallel Processing

    NASA Technical Reports Server (NTRS)

    Duong, Tuan A.

    1996-01-01

    Mapping performed by matrices of transistor switches. Arrays of transistor switches devised for use in forming simultaneous connections from square subarray (window) of n x n pixels within electronic imaging device containing np x np array of pixels to linear array of n(sup2) input terminals of electronic neural network or other parallel-processing circuit. Method helps to realize potential for rapidity in parallel processing for such applications as enhancement of images and recognition of patterns. In providing simultaneous connections, overcomes timing bottleneck or older multiplexing, serial-switching, and sample-and-hold methods.

  2. Multi-digit number processing beyond the two-digit number range: a combination of sequential and parallel processes.

    PubMed

    Meyerhoff, Hauke S; Moeller, Korbinian; Debus, Kolja; Nuerk, Hans-Christoph

    2012-05-01

    Investigations of multi-digit number processing typically focus on two-digit numbers. Here, we aim to investigate the generality of results from two-digit numbers for four- and six-digit numbers. Previous studies on two-digit numbers mostly suggested a parallel processing of tens and units. In contrast, the few studies examining the processing of larger numbers suggest sequential processing of the individual constituting digits. In this study, we combined the methodological approaches of studies implying either parallel or sequential processing. Participants completed a number magnitude comparison task on two-, four-, and six-digit numbers including unit-decade compatible and incompatible differing digit pairs (e.g., 32_47, 3<4 and 2<7 vs. 37_52, 3<5 but 7>2, respectively) at all possible digit positions. Response latencies and fixation behavior indicated that sequential and parallel decomposition is not exclusive in multi-digit number processing. Instead, our results clearly suggested that sequential and parallel processing strategies seem to be combined when processing multi-digit numbers beyond the two-digit number range. To account for the results, we propose a chunking hypothesis claiming that multi-digit numbers are separated into chunks of shorter digit strings. While the different chunks are processed sequentially digits within these chunks are processed in parallel. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Massively parallel processing of remotely sensed hyperspectral images

    NASA Astrophysics Data System (ADS)

    Plaza, Javier; Plaza, Antonio; Valencia, David; Paz, Abel

    2009-08-01

    In this paper, we develop several parallel techniques for hyperspectral image processing that have been specifically designed to be run on massively parallel systems. The techniques developed cover the three relevant areas of hyperspectral image processing: 1) spectral mixture analysis, a popular approach to characterize mixed pixels in hyperspectral data addressed in this work via efficient implementation of a morphological algorithm for automatic identification of pure spectral signatures or endmembers from the input data; 2) supervised classification of hyperspectral data using multi-layer perceptron neural networks with back-propagation learning; and 3) automatic target detection in the hyperspectral data using orthogonal subspace projection concepts. The scalability of the proposed parallel techniques is investigated using Barcelona Supercomputing Center's MareNostrum facility, one of the most powerful supercomputers in Europe.

  4. Parallel processing of atmospheric chemistry calculations: Preliminary considerations

    SciTech Connect

    Elliott, S.; Jones, P.

    1995-01-01

    Global climate calculations are already saturating the class modern vector supercomputers with only a few central processing units. Increased resolution and inclusion of routines to deal with biogeochemical portions of the terrestrial climate system will soon demand massively parallel approaches. The atmospheric photochemistry ensemble is intimately linked to climate through the trace greenhouse gases ozone and methane and modules for representing it are being attached to global three dimensional transport and GCM frameworks. Atmospheric kinetics involve dozens of highly interactive tracers and so will accentuate the need for parallel processing of earth system simulations. In the present text we lay some of the groundwork for addition of atmospheric kinetics packages to GCM and global scale atmospheric models on multiply parallel computers. The discussion is tailored for consumption by the photochemical modelling community. After a review of numerical atmospheric chemistry methods, we examine how kinetics can be implemented on a parallel computer. We concentrate especially on data layout and flexibility and how these can be implemented in various programming models. We conclude that chemistry can be implemented rather easily within existing frameworks of several parallel atmospheric models. However, memory limitations may preclude high resolution studies of global chemistry.

  5. Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

    ERIC Educational Resources Information Center

    Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

    2013-01-01

    Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…

  6. Parallel Processing of Objects in a Naming Task

    ERIC Educational Resources Information Center

    Meyer, Antje S.; Ouellet, Marc; Hacker, Christine

    2008-01-01

    The authors investigated whether speakers who named several objects processed them sequentially or in parallel. Speakers named object triplets, arranged in a triangle, in the order left, right, and bottom object. The left object was easy or difficult to identify and name. During the saccade from the left to the right object, the right object shown…

  7. Postscript: Parallel Distributed Processing in Localist Models without Thresholds

    ERIC Educational Resources Information Center

    Plaut, David C.; McClelland, James L.

    2010-01-01

    The current authors reply to a response by Bowers on a comment by the current authors on the original article. Bowers (2010) mischaracterizes the goals of parallel distributed processing (PDP research)--explaining performance on cognitive tasks is the primary motivation. More important, his claim that localist models, such as the interactive…

  8. Parallel Processing of Objects in a Naming Task

    ERIC Educational Resources Information Center

    Meyer, Antje S.; Ouellet, Marc; Hacker, Christine

    2008-01-01

    The authors investigated whether speakers who named several objects processed them sequentially or in parallel. Speakers named object triplets, arranged in a triangle, in the order left, right, and bottom object. The left object was easy or difficult to identify and name. During the saccade from the left to the right object, the right object shown…

  9. Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

    ERIC Educational Resources Information Center

    Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

    2013-01-01

    Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…

  10. The Extended Parallel Process Model: Illuminating the Gaps in Research

    ERIC Educational Resources Information Center

    Popova, Lucy

    2012-01-01

    This article examines constructs, propositions, and assumptions of the extended parallel process model (EPPM). Review of the EPPM literature reveals that its theoretical concepts are thoroughly developed, but the theory lacks consistency in operational definitions of some of its constructs. Out of the 12 propositions of the EPPM, a few have not…

  11. Postscript: Parallel Distributed Processing in Localist Models without Thresholds

    ERIC Educational Resources Information Center

    Plaut, David C.; McClelland, James L.

    2010-01-01

    The current authors reply to a response by Bowers on a comment by the current authors on the original article. Bowers (2010) mischaracterizes the goals of parallel distributed processing (PDP research)--explaining performance on cognitive tasks is the primary motivation. More important, his claim that localist models, such as the interactive…

  12. The Extended Parallel Process Model: Illuminating the Gaps in Research

    ERIC Educational Resources Information Center

    Popova, Lucy

    2012-01-01

    This article examines constructs, propositions, and assumptions of the extended parallel process model (EPPM). Review of the EPPM literature reveals that its theoretical concepts are thoroughly developed, but the theory lacks consistency in operational definitions of some of its constructs. Out of the 12 propositions of the EPPM, a few have not…

  13. Proxy-equation paradigm: A strategy for massively parallel asynchronous computations

    NASA Astrophysics Data System (ADS)

    Mittal, Ankita; Girimaji, Sharath

    2017-09-01

    Massively parallel simulations of transport equation systems call for a paradigm change in algorithm development to achieve efficient scalability. Traditional approaches require time synchronization of processing elements (PEs), which severely restricts scalability. Relaxing synchronization requirement introduces error and slows down convergence. In this paper, we propose and develop a novel "proxy equation" concept for a general transport equation that (i) tolerates asynchrony with minimal added error, (ii) preserves convergence order and thus, (iii) expected to scale efficiently on massively parallel machines. The central idea is to modify a priori the transport equation at the PE boundaries to offset asynchrony errors. Proof-of-concept computations are performed using a one-dimensional advection (convection) diffusion equation. The results demonstrate the promise and advantages of the present strategy.

  14. A novel parallel architecture for real-time image processing

    NASA Astrophysics Data System (ADS)

    Hu, Junhong; Zhang, Tianxu; Zhong, Sheng; Chen, Xujun

    2009-10-01

    A novel DSP/FPGA-based parallel architecture for real-time image processing is presented in this paper, DSPs are the main processing unit and FPGAs are used to be logic units for image interface protocol, image processing, image display, synchronization communication portocol of DSPs and DSP's reprogramming interface of 422/485. The presented architecture is composed of two modules: the preprocessing module and the processing module, and the latter is extendable for better performance. Modules are connected by LINK communication port, whose LVDS protocol has the ability of anti-jamming. And DSP's programs can be updated easily by 422/485 with PC's serial port. Analysis and experiments result shows that the prototype with the proposed parallel architecture has many promising charactersitics such as powerful computing capability, broad data transfer bandwidth, and is easy to be extended and updated.

  15. Overview of parallel processing approaches to image and video compression

    NASA Astrophysics Data System (ADS)

    Shen, Ke; Cook, Gregory W.; Jamieson, Leah H.; Delp, Edward J., III

    1994-05-01

    In this paper we present an overview of techniques used to implement various image and video compression algorithms using parallel processing. Approaches used can largely be divided into four areas. The first is the use of special purpose architectures designed specifically for image and video compression. An example of this is the use of an array of DSP chips to implement a version of MPEG1. The second approach is the use of VLSI techniques. These include various chip sets for JPEG and MPEG1. The third approach is algorithm driven, in which the structure of the compression algorithm describes the architecture, e.g. pyramid algorithms. The fourth approach is the implementation of algorithms on high performance parallel computers. Examples of this approach are the use of a massively parallel computer such as the MasPar MP-1 or the use of a coarse-grained machine such as the Intel Touchstone Delta.

  16. Automating the parallel processing of fluid and structural dynamics calculations

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Cole, Gary L.

    1987-01-01

    The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilties to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.

  17. Automating the parallel processing of fluid and structural dynamics calculations

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.; Cole, Gary L.

    1987-01-01

    The NASA Lewis Research Center is actively involved in the development of expert system technology to assist users in applying parallel processing to computational fluid and structural dynamic analysis. The goal of this effort is to eliminate the necessity for the physical scientist to become a computer scientist in order to effectively use the computer as a research tool. Programming and operating software utilities have previously been developed to solve systems of ordinary nonlinear differential equations on parallel scalar processors. Current efforts are aimed at extending these capabilities to systems of partial differential equations, that describe the complex behavior of fluids and structures within aerospace propulsion systems. This paper presents some important considerations in the redesign, in particular, the need for algorithms and software utilities that can automatically identify data flow patterns in the application program and partition and allocate calculations to the parallel processors. A library-oriented multiprocessing concept for integrating the hardware and software functions is described.

  18. Parallel, semiparallel, and serial processing of visual hyperacuity

    NASA Astrophysics Data System (ADS)

    Fahle, Manfred W.

    1990-10-01

    Humans can discriminate between certain elementary stimulus features in parallel, i.e., simultaneously over the visual field. I present evidence that, in man, vernier rnisalignments in the hyperacuity-range, i.e., below the photoreceptor diameter, can also be detected in parallel. This indicates that the visUal system performs some form of spatial interpolation beyond the photoreceptor spacing simultaneously over the visual field. Vernier offsets are detected in parallel even when orientation cues are masked: deviation from straightness is an elementary feature of visual perception. However, the identification process, that classifies each vernier in a stimulus as being offset to the right (versus to the left) is serial and has to scan the visual field sequentially if orientation cues are masked. Therefore, reaction times and thresholds in vernier acuity tasks increase with the number of verniers presented simultaneously if classification of different features is required. Furthermore, when approaching vernier threshold, simple vernier detection is no longer parallel but becomes partially serial, or semi-parallel.

  19. Parallel-Processing Software for Correlating Stereo Images

    NASA Technical Reports Server (NTRS)

    Klimeck, Gerhard; Deen, Robert; Mcauley, Michael; DeJong, Eric

    2007-01-01

    A computer program implements parallel- processing algorithms for cor relating images of terrain acquired by stereoscopic pairs of digital stereo cameras on an exploratory robotic vehicle (e.g., a Mars rove r). Such correlations are used to create three-dimensional computatio nal models of the terrain for navigation. In this program, the scene viewed by the cameras is segmented into subimages. Each subimage is assigned to one of a number of central processing units (CPUs) opera ting simultaneously.

  20. Parallel Attack and the Enemy’s Decision Making Process

    DTIC Science & Technology

    1998-04-01

    definition of this asymptotic notation. 6 JaJa , Joseph, An Introduction to Parallel Algorithms, Addison-Wesley, 1992, p 158 7 Ibid, Chapter 1 8 Baker, Louis...the processes within the hierarchy for the bureaucratic and organizational process models take the same amount of computational time. Notes 1 JaJa ...Press, New York, 1991 J.F.C. Fuller, The Foundations of the Science of War, London, Hutchinson and Company, 1925, p314. JaJa , Joseph, An Introduction to

  1. Parallel Processing of Broad-Band PPM Signals

    NASA Technical Reports Server (NTRS)

    Gray, Andrew; Kang, Edward; Lay, Norman; Vilnrotter, Victor; Srinivasan, Meera; Lee, Clement

    2010-01-01

    A parallel-processing algorithm and a hardware architecture to implement the algorithm have been devised for timeslot synchronization in the reception of pulse-position-modulated (PPM) optical or radio signals. As in the cases of some prior algorithms and architectures for parallel, discrete-time, digital processing of signals other than PPM, an incoming broadband signal is divided into multiple parallel narrower-band signals by means of sub-sampling and filtering. The number of parallel streams is chosen so that the frequency content of the narrower-band signals is low enough to enable processing by relatively-low speed complementary metal oxide semiconductor (CMOS) electronic circuitry. The algorithm and architecture are intended to satisfy requirements for time-varying time-slot synchronization and post-detection filtering, with correction of timing errors independent of estimation of timing errors. They are also intended to afford flexibility for dynamic reconfiguration and upgrading. The architecture is implemented in a reconfigurable CMOS processor in the form of a field-programmable gate array. The algorithm and its hardware implementation incorporate three separate time-varying filter banks for three distinct functions: correction of sub-sample timing errors, post-detection filtering, and post-detection estimation of timing errors. The design of the filter bank for correction of timing errors, the method of estimating timing errors, and the design of a feedback-loop filter are governed by a host of parameters, the most critical one, with regard to processing very broadband signals with CMOS hardware, being the number of parallel streams (equivalently, the rate-reduction parameter).

  2. Digital image processing using parallel computing based on CUDA technology

    NASA Astrophysics Data System (ADS)

    Skirnevskiy, I. P.; Pustovit, A. V.; Abdrashitova, M. O.

    2017-01-01

    This article describes expediency of using a graphics processing unit (GPU) in big data processing in the context of digital images processing. It provides a short description of a parallel computing technology and its usage in different areas, definition of the image noise and a brief overview of some noise removal algorithms. It also describes some basic requirements that should be met by certain noise removal algorithm in the projection to computer tomography. It provides comparison of the performance with and without using GPU as well as with different percentage of using CPU and GPU.

  3. Digital intermediate frequency QAM modulator using parallel processing

    DOEpatents

    Pao, Hsueh-Yuan [Livermore, CA; Tran, Binh-Nien [San Ramon, CA

    2008-05-27

    The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.

  4. Semi-automatic process partitioning for parallel computation

    NASA Technical Reports Server (NTRS)

    Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

    1988-01-01

    On current multiprocessor architectures one must carefully distribute data in memory in order to achieve high performance. Process partitioning is the operation of rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. A semi-automatic approach to process partitioning is considered in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting task system. This approach is illustrated with a picture processing example written in BLAZE, which is transformed into a task system maximizing locality of memory reference.

  5. A dataflow analysis tool for parallel processing of algorithms

    NASA Technical Reports Server (NTRS)

    Jones, Robert L., III

    1993-01-01

    A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on a set of identical parallel processors. Typical applications include signal processing and control law problems. Graph analysis techniques are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool is shown to facilitate the application of the design process to a given problem.

  6. Applications of massively parallel computers in telemetry processing

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek A.; Pritchard, Jim; Knoble, Gordon

    1994-01-01

    Telemetry processing refers to the reconstruction of full resolution raw instrumentation data with artifacts, of space and ground recording and transmission, removed. Being the first processing phase of satellite data, this process is also referred to as level-zero processing. This study is aimed at investigating the use of massively parallel computing technology in providing level-zero processing to spaceflights that adhere to the recommendations of the Consultative Committee on Space Data Systems (CCSDS). The workload characteristics, of level-zero processing, are used to identify processing requirements in high-performance computing systems. An example of level-zero functions on a SIMD MPP, such as the MasPar, is discussed. The requirements in this paper are based in part on the Earth Observing System (EOS) Data and Operation System (EDOS).

  7. Transactional memories: A new abstraction for parallel processing

    SciTech Connect

    Fasel, J.H.; Lubeck, O.M.; Agrawal, D.; Bruno, J.L.; El Abbadi, A.

    1997-12-01

    This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at Los Alamos National Laboratory (LANL). Current distributed memory multiprocessor computer systems make the development of parallel programs difficult. From a programmer`s perspective, it would be most desirable if the underlying hardware and software could provide the programming abstraction commonly referred to as sequential consistency--a single address space and multiple threads; but enforcement of sequential consistency limits opportunities for architectural and operating system performance optimizations, leading to poor performance. Recently, Herlihy and Moss have introduced a new abstraction called transactional memories for parallel programming. The programming model is shared memory with multiple threads. However, data consistency is obtained through the use of transactions rather than mutual exclusion based on locking. The transaction approach permits the underlying system to exploit the potential parallelism in transaction processing. The authors explore the feasibility of designing parallel programs using the transaction paradigm for data consistency and a barrier type of thread synchronization.

  8. Morphological evidence for parallel processing of information in rat macula

    NASA Technical Reports Server (NTRS)

    Ross, M. D.

    1988-01-01

    Study of montages, tracings and reconstructions prepared from a series of 570 consecutive ultrathin sections shows that rat maculas are morphologically organized for parallel processing of linear acceleratory information. Type II cells of one terminal field distribute information to neighboring terminals as well. The findings are examined in light of physiological data which indicate that macular receptor fields have a preferred directional vector, and are interpreted by analogy to a computer technology known as an information network.

  9. Completion Probabilities and Parallel Restart Strategies under an Imposed Deadline.

    PubMed

    Lorenz, Jan-Hendrik

    2016-01-01

    Let A be any fixed cut-off restart algorithm running in parallel on multiple processors. If the algorithm is only allowed to run for up to time D, then it is no longer guaranteed that a result can be found. In this case, the probability of finding a solution within the time D becomes a measure for the quality of the algorithm. In this paper we address this issue and provide upper and lower bounds for the probability of A finding a solution before a deadline passes under varying assumptions. We also show that the optimal restart times for a fixed cut-off algorithm running in parallel is identical for the optimal restart times for the algorithm running on a single processor. Finally, we conclude that the odds of finding a solution scale superlinearly in the number of processors.

  10. Completion Probabilities and Parallel Restart Strategies under an Imposed Deadline

    PubMed Central

    Lorenz, Jan-Hendrik

    2016-01-01

    Let A be any fixed cut-off restart algorithm running in parallel on multiple processors. If the algorithm is only allowed to run for up to time D, then it is no longer guaranteed that a result can be found. In this case, the probability of finding a solution within the time D becomes a measure for the quality of the algorithm. In this paper we address this issue and provide upper and lower bounds for the probability of A finding a solution before a deadline passes under varying assumptions. We also show that the optimal restart times for a fixed cut-off algorithm running in parallel is identical for the optimal restart times for the algorithm running on a single processor. Finally, we conclude that the odds of finding a solution scale superlinearly in the number of processors. PMID:27732631

  11. Parallel tools in HEVC for high-throughput processing

    NASA Astrophysics Data System (ADS)

    Zhou, Minhua; Sze, Vivienne; Budagavi, Madhukar

    2012-10-01

    HEVC (High Efficiency Video Coding) is the next-generation video coding standard being jointly developed by the ITU-T VCEG and ISO/IEC MPEG JCT-VC team. In addition to the high coding efficiency, which is expected to provide 50% more bit-rate reduction when compared to H.264/AVC, HEVC has built-in parallel processing tools to address bitrate, pixel-rate and motion estimation (ME) throughput requirements. This paper describes how CABAC, which is also used in H.264/AVC, has been redesigned for improved throughput, and how parallel merge/skip and tiles, which are new tools introduced for HEVC, enable high-throughput processing. CABAC has data dependencies which make it difficult to parallelize and thus limit its throughput. The prediction error/residual, represented as quantized transform coefficients, accounts for the majority of the CABAC workload. Various improvements have been made to the context selection and scans in transform coefficient coding that enable CABAC in HEVC to potentially achieve higher throughput and increased coding gains relative to H.264/AVC. The merge/skip mode is a coding efficiency enhancement tool in HEVC; the parallel merge/skip breaks dependency between the regular and merge/skip ME, which provides flexibility for high throughput and high efficiency HEVC encoder designs. For ultra high definition (UHD) video, such as 4kx2k and 8kx4k resolutions, low-latency and real-time processing may be beyond the capability of a single core codec. Tiles are an effective tool which enables pixel-rate balancing among the cores to achieve parallel processing with a throughput scalable implementation of multi-core UHD video codec. With the evenly divided tiles, a multi-core video codec can be realized by simply replicating single core codec and adding a tile boundary processing core on top of that. These tools illustrate that accounting for implementation cost when designing video coding algorithms can enable higher processing speed and reduce

  12. Parallel-Processing Equalizers for Multi-Gbps Communications

    NASA Technical Reports Server (NTRS)

    Gray, Andrew; Ghuman, Parminder; Hoy, Scott; Satorius, Edgar H.

    2004-01-01

    Architectures have been proposed for the design of frequency-domain least-mean-square complex equalizers that would be integral parts of parallel- processing digital receivers of multi-gigahertz radio signals and other quadrature-phase-shift-keying (QPSK) or 16-quadrature-amplitude-modulation (16-QAM) of data signals at rates of multiple gigabits per second. Equalizers as used here denotes receiver subsystems that compensate for distortions in the phase and frequency responses of the broad-band radio-frequency channels typically used to convey such signals. The proposed architectures are suitable for realization in very-large-scale integrated (VLSI) circuitry and, in particular, complementary metal oxide semiconductor (CMOS) application- specific integrated circuits (ASICs) operating at frequencies lower than modulation symbol rates. A digital receiver of the type to which the proposed architecture applies (see Figure 1) would include an analog-to-digital converter (A/D) operating at a rate, fs, of 4 samples per symbol period. To obtain the high speed necessary for sampling, the A/D and a 1:16 demultiplexer immediately following it would be constructed as GaAs integrated circuits. The parallel-processing circuitry downstream of the demultiplexer, including a demodulator followed by an equalizer, would operate at a rate of only fs/16 (in other words, at 1/4 of the symbol rate). The output from the equalizer would be four parallel streams of in-phase (I) and quadrature (Q) samples.

  13. Message passing kernel for the hypercluster parallel processing test bed

    SciTech Connect

    Blech, R.A.; Quealy, A.; Cole, G.L.

    1989-01-01

    A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.

  14. Reducing neural network training time with parallel processing

    NASA Technical Reports Server (NTRS)

    Rogers, James L., Jr.; Lamarsh, William J., II

    1995-01-01

    Obtaining optimal solutions for engineering design problems is often expensive because the process typically requires numerous iterations involving analysis and optimization programs. Previous research has shown that a near optimum solution can be obtained in less time by simulating a slow, expensive analysis with a fast, inexpensive neural network. A new approach has been developed to further reduce this time. This approach decomposes a large neural network into many smaller neural networks that can be trained in parallel. Guidelines are developed to avoid some of the pitfalls when training smaller neural networks in parallel. These guidelines allow the engineer: to determine the number of nodes on the hidden layer of the smaller neural networks; to choose the initial training weights; and to select a network configuration that will capture the interactions among the smaller neural networks. This paper presents results describing how these guidelines are developed.

  15. Probabilistic structural mechanics research for parallel processing computers

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Martin, William R.

    1991-01-01

    Aerospace structures and spacecraft are a complex assemblage of structural components that are subjected to a variety of complex, cyclic, and transient loading conditions. Significant modeling uncertainties are present in these structures, in addition to the inherent randomness of material properties and loads. To properly account for these uncertainties in evaluating and assessing the reliability of these components and structures, probabilistic structural mechanics (PSM) procedures must be used. Much research has focused on basic theory development and the development of approximate analytic solution methods in random vibrations and structural reliability. Practical application of PSM methods was hampered by their computationally intense nature. Solution of PSM problems requires repeated analyses of structures that are often large, and exhibit nonlinear and/or dynamic response behavior. These methods are all inherently parallel and ideally suited to implementation on parallel processing computers. New hardware architectures and innovative control software and solution methodologies are needed to make solution of large scale PSM problems practical.

  16. Dual-thread parallel control strategy for ophthalmic adaptive optics.

    PubMed

    Yu, Yongxin; Zhang, Yuhua

    To improve ophthalmic adaptive optics speed and compensate for ocular wavefront aberration of high temporal frequency, the adaptive optics wavefront correction has been implemented with a control scheme including 2 parallel threads; one is dedicated to wavefront detection and the other conducts wavefront reconstruction and compensation. With a custom Shack-Hartmann wavefront sensor that measures the ocular wave aberration with 193 subapertures across the pupil, adaptive optics has achieved a closed loop updating frequency up to 110 Hz, and demonstrated robust compensation for ocular wave aberration up to 50 Hz in an adaptive optics scanning laser ophthalmoscope.

  17. Stellar Evolution and Social Evolution: A Study in Parallel Processes

    NASA Astrophysics Data System (ADS)

    Carneiro, Robert L.

    From the beginning of anthropology, social evolution has been one of its major interests. However, in recent years the study of this process has languished. Accordingly, those anthropologists who still consider social evolution to be of central importance to their discipline, and who continue to pursue it, find their endeavor bolstered when parallel instances of evolutionary reconstructions can be demonstrated in other fields. Stellar evolution has long been a prime interest to astronomers, and their progress in deciphering its course has been truly remarkable. In examining astronomers' reconstructions of stellar evolution, I have been struck by a number of similarities between ways stars and societies have evolved. The parallels actually begin with the method used by both disciplines, namely, the comparative method. In astronomy, the method involves plotting stars on a Hertzsprung-Russell Diagram, and interpreting, diachonically, the pattern made by essentially synchronic data used for plotting. The comparative method is particularly appropriate when one is studying a process that cannot be observed over its full range in the life of any single individual, be it a star or a society. Parallels also occur in that stars and societies have each followed distinctive stages in their evolution. These stages are, in both cases, sometimes unlinear and sometimes multilinear. Moreover, the distinction drawn by anthropologists between a pristine and a secondary state (which depends on whether state so represented is the first such occurrence in an area, or was a later development derivative from earlier states) finds its astronomical parallel in the relationship existing between Population II and Population I stars. These and other similarities between stellar and social evolution will be cited and discussed.

  18. Intracellular signalling proteins as smart' agents in parallel distributed processes.

    PubMed

    Fisher, M J; Paton, R C; Matsuno, K

    1999-06-01

    In eucaryotic organisms, responses to external signals are mediated by a repertoire of intracellular signalling pathways that ultimately bring about the activation/inactivation of protein kinases and/or protein phosphatases. Until relatively recently, little thought had been given to the intracellular distribution of the components of these signalling pathways. However, experimental evidence from a diverse range of organisms indicates that rather than being freely distributed, many of the protein components of signalling cascades show a significant degree of spatial organisation. Here, we briefly review the roles of 'anchor' 'scaffold' and 'adaptor' proteins in the organisation and functioning of intracellular signalling pathways. We then consider some of the parallel distributed processing capacities of these adaptive systems. We focus on signalling proteins-both as individual 'devices' (agents) and as 'networks' (ecologies) of parallel processes. Signalling proteins are described as 'smart thermodynamic machines' which satisfy 'gluing' (functorial) roles in the information economy of the cell. This combines two information-processing views of signalling proteins. Individually, they show 'cognitive' capacities and collectively they integrate (cohere) cellular processes. We exploit these views by drawing comparisons between signalling proteins and verbs. This text/dialogical metaphor also helps refine our view of signalling proteins as context-sensitive information processing agents.

  19. A Multi-Core Parallelization Strategy for Statistical Significance Testing in Learning Classifier Systems.

    PubMed

    Rudd, James; Moore, Jason H; Urbanowicz, Ryan J

    2013-11-01

    Permutation-based statistics for evaluating the significance of class prediction, predictive attributes, and patterns of association have only appeared within the learning classifier system (LCS) literature since 2012. While still not widely utilized by the LCS research community, formal evaluations of test statistic confidence are imperative to large and complex real world applications such as genetic epidemiology where it is standard practice to quantify the likelihood that a seemingly meaningful statistic could have been obtained purely by chance. LCS algorithms are relatively computationally expensive on their own. The compounding requirements for generating permutation-based statistics may be a limiting factor for some researchers interested in applying LCS algorithms to real world problems. Technology has made LCS parallelization strategies more accessible and thus more popular in recent years. In the present study we examine the benefits of externally parallelizing a series of independent LCS runs such that permutation testing with cross validation becomes more feasible to complete on a single multi-core workstation. We test our python implementation of this strategy in the context of a simulated complex genetic epidemiological data mining problem. Our evaluations indicate that as long as the number of concurrent processes does not exceed the number of CPU cores, the speedup achieved is approximately linear.

  20. Extraction of Hydrological Proximity Measures from DEMs using Parallel Processing

    SciTech Connect

    Tesfa, Teklu K.; Tarboton, David G.; Watson, Daniel W.; Schreuders, Kimberly A.; Baker, Matthew M.; Wallace, Robert M.

    2011-12-01

    Land surface topography is one of the most important terrain properties which impact hydrological, geomorphological, and ecological processes active on a landscape. In our previous efforts to develop a soil depth model based upon topographic and land cover variables, we extracted a set of hydrological proximity measures (HPMs) from a Digital Elevation Model (DEM) as potential explanatory variables for soil depth. These HPMs may also have other, more general modeling applicability in hydrology, geomorphology and ecology, and so are described here from a general perspective. The HPMs we derived are variations of the distance up to ridge points (cells with no incoming flow) and variations of the distance down to stream points (cells with a contributing area greater than a threshold), following the flow path. These HPMs were computed using the D-infinity flow model that apportions flow between adjacent neighbors based on the direction of steepest downward slope on the eight triangular facets constructed in a 3 x 3 grid cell window using the center cell and each pair of adjacent neighboring grid cells in turn. The D-infinity model typically results in multiple flow paths between 2 points on the topography, with the result that distances may be computed as the minimum, maximum or average of the individual flow paths. In addition, each of the HPMs, are calculated vertically, horizontally, and along the land surface. Previously, these HPMs were calculated using recursive serial algorithms which suffered from stack overflow problems when used to process large datasets, limiting the size of DEMs that could be analyzed using that method to approximately 7000 x 7000 cells. To overcome this limitation, we developed a message passing interface (MPI) parallel approach for calculating these HPMs. The parallel algorithms of the HPMs spatially partition the input grid into stripes which are each assigned to separate processes for computation. Each of those processes then uses a

  1. A simple hyperbolic model for communication in parallel processing environments

    NASA Technical Reports Server (NTRS)

    Stoica, Ion; Sultan, Florin; Keyes, David

    1994-01-01

    We introduce a model for communication costs in parallel processing environments called the 'hyperbolic model,' which generalizes two-parameter dedicated-link models in an analytically simple way. Dedicated interprocessor links parameterized by a latency and a transfer rate that are independent of load are assumed by many existing communication models; such models are unrealistic for workstation networks. The communication system is modeled as a directed communication graph in which terminal nodes represent the application processes that initiate the sending and receiving of the information and in which internal nodes, called communication blocks (CBs), reflect the layered structure of the underlying communication architecture. The direction of graph edges specifies the flow of the information carried through messages. Each CB is characterized by a two-parameter hyperbolic function of the message size that represents the service time needed for processing the message. The parameters are evaluated in the limits of very large and very small messages. Rules are given for reducing a communication graph consisting of many to an equivalent two-parameter form, while maintaining an approximation for the service time that is exact in both large and small limits. The model is validated on a dedicated Ethernet network of workstations by experiments with communication subprograms arising in scientific applications, for which a tight fit of the model predictions with actual measurements of the communication and synchronization time between end processes is demonstrated. The model is then used to evaluate the performance of two simple parallel scientific applications from partial differential equations: domain decomposition and time-parallel multigrid. In an appropriate limit, we also show the compatibility of the hyperbolic model with the recently proposed LogP model.

  2. Regional-scale calculation of the LS factor using parallel processing

    NASA Astrophysics Data System (ADS)

    Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

    2015-05-01

    With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.

  3. Studies of random number generators for parallel processing

    SciTech Connect

    Bowman, K.O.; Robinson, M.T.

    1986-09-01

    If Monte Carlo calculations are to be performed in a parallel processing environment, a method of generating appropriate sequences of pseudorandom numbers for each process must be available. Frederickson et al. proposed an elegant algorithm based on the concept of pseudorandom or Lehmer trees: the sequence of numbers from a linear congruential generator is divided into disjoint subsequences by the members of an auxilary sequence. One subsequence can be assigned to each process. Extensive tests show the algorithm to suffer from correlations between the parallel subsequences: this is a result of the small number of bits which particpate in the auxiliary sequence and illustrates the well-known discovery of Marsaglia. Two alternative algorithms are proposed, both of which appear to be free of interprocess correlations. One relaxes the conditions on the Lehmer tree by using an arbitrary auxiliary multiplier: it is not known to what extent the subsequences are disjoint. The other partitions the main sequence into disjoint subsequences by sending one member to each process in turn, minimizing interprocess communication by defining new sequence generating parameters. 10 refs., 4 figs.

  4. Parallel processing at the SSC: The fact and the fiction

    SciTech Connect

    Bourianoff, G.; Cole, B.

    1991-10-01

    Accurately modelling the behavior of particles circulating in accelerators is a computationally demanding task. The particle tracking code currently in use at SSC is based upon a ``thin element`` analysis (TEAPOT). In this model each magnet in the lattice is described by a thin element at which the particle experiences an impulsive kick. Each kick requires approximately 200 floating point operations (``FLOP``). For the SSC collider lattice consisting of 10{sup 4} elements, performing a tracking of study for a set of 100 particles for 10{sup 7} turns would require 2 {times} 10{sup 15} FLOPS. Even on a machine capable of 100 MFLOP/sec (MFLOPS), this would require 2 {times} 10{sup 7} seconds, and many such runs are necessary. It should be noted that the accuracy with which the kicks are to be calculated is important: the large number of iterations involved will magnify the effects of small errors. The inability of current computational resources to effectively perform the full calculation motivates the migration of this calculation to the most powerful computers available. A survey of the current research into new technologies for superconducting reveals that the supercomputers of the future will be parallel in nature. Further, numerous such machines exist today, and are being used to solve other difficult problems. Thus it seems clear that it is not early to begin developing the capability to develop tracking codes for parallel architectures. This report discusses implementing parallel processing on the SCC.

  5. Parallel processing at the SSC: The fact and the fiction

    SciTech Connect

    Bourianoff, G.; Cole, B.

    1991-10-01

    Accurately modelling the behavior of particles circulating in accelerators is a computationally demanding task. The particle tracking code currently in use at SSC is based upon a thin element'' analysis (TEAPOT). In this model each magnet in the lattice is described by a thin element at which the particle experiences an impulsive kick. Each kick requires approximately 200 floating point operations ( FLOP''). For the SSC collider lattice consisting of 10{sup 4} elements, performing a tracking of study for a set of 100 particles for 10{sup 7} turns would require 2 {times} 10{sup 15} FLOPS. Even on a machine capable of 100 MFLOP/sec (MFLOPS), this would require 2 {times} 10{sup 7} seconds, and many such runs are necessary. It should be noted that the accuracy with which the kicks are to be calculated is important: the large number of iterations involved will magnify the effects of small errors. The inability of current computational resources to effectively perform the full calculation motivates the migration of this calculation to the most powerful computers available. A survey of the current research into new technologies for superconducting reveals that the supercomputers of the future will be parallel in nature. Further, numerous such machines exist today, and are being used to solve other difficult problems. Thus it seems clear that it is not early to begin developing the capability to develop tracking codes for parallel architectures. This report discusses implementing parallel processing on the SCC.

  6. A massively parallel solution strategy for efficient thermal radiation simulation

    NASA Astrophysics Data System (ADS)

    Nguyen, P. D.; Moureau, V.; Vervisch, L.; Perret, N.

    2012-06-01

    A novel and efficient methodology to solve the Radiative Transfer Equations (RTE) in thermal radiation is discussed. The BiCGStab(2) iterative solution method, as designed for the non-symmetric linear equation systems, is used to solve the discretized RTE. The numerical upwind and central schemes are blended to provide a stable numerical scheme (MUCS) for interpolation of the cell facial radiation intensities in finite volume formulation. The combination of the BiCGStab(2) and MUCS methods proved to be very efficient when coupling with the DOM approach to solve the RTE. A cost-effective tabulation technique for the gaseous radiative property model SNB-FSCK using 7-point Gauss-Labatto quadrature scheme is also introduced. The whole methodology is implemented into a massively parallel unstructured CFD code where the radiative and fluid flow solutions share the same domain decomposition, which is the bottleneck in current radiative solvers. The dual mesh decomposition at the cell groups level and processors level is adopted to optimize the CFD code for massively parallel computing. The whole method is applied to simulate the radiation heat-transfer in a 3D rectangular enclosure containing non-isothermal CO2 and H2O mixtures. Two test cases are studied for homogeneous and inhomogeneous distributions of CO2 and H2O in the enclosure. The result is reported for the heat flux and radiation energy source and the comparison is also made between the present methodology BiCGStab(2)/MUCS/tabulated SNB-FSCK, the benchmark method SNB-CK (implemented at 25cm-1 narrow-band) and some other methods available in the literature. The present method (BiCGStab(2)/MUCS/tabulated SNB-FSCK) yields more accurate predictions particularly for the radiation source term. When comparing with the benchmark solution, the relative error of the radiation source term is remarkably reduced to less than 4% and the CPU time is drastically diminished.

  7. Parallel approach to incorporating face image information into dialogue processing

    NASA Astrophysics Data System (ADS)

    Ren, Fuji

    2000-10-01

    There are many kinds of so-called irregular expressions in natural dialogues. Even if the content of a conversation is the same in words, different meanings can be interpreted by a person's feeling or face expression. To have a good understanding of dialogues, it is required in a flexible dialogue processing system to infer the speaker's view properly. However, it is difficult to obtain the meaning of the speaker's sentences in various scenes using traditional methods. In this paper, a new approach for dialogue processing that incorporates information from the speaker's face is presented. We first divide conversation statements into several simple tasks. Second, we process each simple task using an independent processor. Third, we employ some speaker's face information to estimate the view of the speakers to solve ambiguities in dialogues. The approach presented in this paper can work efficiently, because independent processors run in parallel, writing partial results to a shared memory, incorporating partial results at appropriate points, and complementing each other. A parallel algorithm and a method for employing the face information in a dialogue machine translation will be discussed, and some results will be included in this paper.

  8. Parallel deterioration to language processing in a bilingual speaker.

    PubMed

    Druks, Judit; Weekes, Brendan Stuart

    2013-01-01

    The convergence hypothesis [Green, D. W. (2003). The neural basis of the lexicon and the grammar in L2 acquisition: The convergence hypothesis. In R. van Hout, A. Hulk, F. Kuiken, & R. Towell (Eds.), The interface between syntax and the lexicon in second language acquisition (pp. 197-218). Amsterdam: John Benjamins] assumes that the neural substrates of language representations are shared between the languages of a bilingual speaker. One prediction of this hypothesis is that neurodegenerative disease should produce parallel deterioration to lexical and grammatical processing in bilingual aphasia. We tested this prediction with a late bilingual Hungarian (first language, L1)-English (second language, L2) speaker J.B. who had nonfluent progressive aphasia (NFPA). J.B. had acquired L2 in adolescence but was premorbidly proficient and used English as his dominant language throughout adult life. Our investigations showed comparable deterioration to lexical and grammatical knowledge in both languages during a one-year period. Parallel deterioration to language processing in a bilingual speaker with NFPA challenges the assumption that L1 and L2 rely on different brain mechanisms as assumed in some theories of bilingual language processing [Ullman, M. T. (2001). The neural basis of lexicon and grammar in first and second language: The declarative/procedural model. Bilingualism: Language and Cognition, 4(1), 105-122].

  9. Parallel-Processing Software for Creating Mosaic Images

    NASA Technical Reports Server (NTRS)

    Klimeck, Gerhard; Deen, Robert; McCauley, Michael; DeJong, Eric

    2008-01-01

    A computer program implements parallel processing for nearly real-time creation of panoramic mosaics of images of terrain acquired by video cameras on an exploratory robotic vehicle (e.g., a Mars rover). Because the original images are typically acquired at various camera positions and orientations, it is necessary to warp the images into the reference frame of the mosaic before stitching them together to create the mosaic. [Also see "Parallel-Processing Software for Correlating Stereo Images," Software Supplement to NASA Tech Briefs, Vol. 31, No. 9 (September 2007) page 26.] The warping algorithm in this computer program reflects the considerations that (1) for every pixel in the desired final mosaic, a good corresponding point must be found in one or more of the original images and (2) for this purpose, one needs a good mathematical model of the cameras and a good correlation of individual pixels with respect to their positions in three dimensions. The desired mosaic is divided into slices, each of which is assigned to one of a number of central processing units (CPUs) operating simultaneously. The results from the CPUs are gathered and placed into the final mosaic. The time taken to create the mosaic depends upon the number of CPUs, the speed of each CPU, and whether a local or a remote data-staging mechanism is used.

  10. Bin-Hash Indexing: A Parallel Method for Fast Query Processing

    SciTech Connect

    Bethel, Edward W; Gosink, Luke J.; Wu, Kesheng; Bethel, Edward Wes; Owens, John D.; Joy, Kenneth I.

    2008-06-27

    This paper presents a new parallel indexing data structure for answering queries. The index, called Bin-Hash, offers extremely high levels of concurrency, and is therefore well-suited for the emerging commodity of parallel processors, such as multi-cores, cell processors, and general purpose graphics processing units (GPU). The Bin-Hash approach first bins the base data, and then partitions and separately stores the values in each bin as a perfect spatial hash table. To answer a query, we first determine whether or not a record satisfies the query conditions based on the bin boundaries. For the bins with records that can not be resolved, we examine the spatial hash tables. The procedures for examining the bin numbers and the spatial hash tables offer the maximum possible level of concurrency; all records are able to be evaluated by our procedure independently in parallel. Additionally, our Bin-Hash procedures access much smaller amounts of data than similar parallel methods, such as the projection index. This smaller data footprint is critical for certain parallel processors, like GPUs, where memory resources are limited. To demonstrate the effectiveness of Bin-Hash, we implement it on a GPU using the data-parallel programming language CUDA. The concurrency offered by the Bin-Hash index allows us to fully utilize the GPU's massive parallelism in our work; over 12,000 records can be simultaneously evaluated at any one time. We show that our new query processing method is an order of magnitude faster than current state-of-the-art CPU-based indexing technologies. Additionally, we compare our performance to existing GPU-based projection index strategies.

  11. Sentence comprehension: A parallel distributed processing approach. Technical report

    SciTech Connect

    McClelland, J.L.; St John, M.; Taraban, R.

    1989-07-14

    Basic aspects are reviewed of conventional approaches to sentence comprehension and point out are some of the difficulties faced by models that take these approaches. An alternative approach is described, based on the principles of parallel distributed processing, and shown how it offers different answers to basic questions about the nature of the language processing mechanism. An illustrative simulation model captures the key characteristics of the approach, and illustrates how it can cope with the difficulties faced by conventional models. Alternative ways of conceptualizing basic aspects of language processing within the framework of this approach will consider how it can address several arguments that might be brought to bear against it, and suggest avenues for future development.

  12. Parallel processes and the problem of internal time

    NASA Astrophysics Data System (ADS)

    Gernert, Dieter

    2001-06-01

    It is the purpose of this paper to study the problem of internal (subjective) time by a novel approach. Until now, this topic has mainly been discussed under the aspects of philosophy, psychology, and neurobiology. By way of contrast, we consider processes within an (abstract) information-processing system that fulfills certain minimum requirements. In this context, a special operator algebra is formulated, which has similarities to those occurring in quantum theory. A specific problem related to parallel (concurrent) processes leads to a concept of time windows (time slices) within the context of the operator algebra; this theoretical finding is compatible with known results. Finally, some hints for possible experiments are given, and it will be briefly discussed how this concept is related to traditional ones.

  13. Parallel Latent Semantic Analysis using a Graphics Processing Unit

    SciTech Connect

    Cui, Xiaohui; Potok, Thomas E; Cavanagh, Joseph M

    2009-01-01

    Latent Semantic Analysis (LSA) can be used to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. In this paper, we presented a parallel LSA implementation on the GPU, using NVIDIA Compute Unified Device Architecture (CUDA) and Compute Unified Basic Linear Algebra Subprograms (CUBLAS). The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. For large matrices that have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version.

  14. Hitch-hiking: a parallel heuristic search strategy, applied to the phylogeny problem.

    PubMed

    Charleston, M A

    2001-01-01

    The article introduces a parallel heuristic search strategy ("Hitch-hiking") which can be used in conjunction with other random-walk heuristic search strategies. It is applied to an artificial phylogeny problem, in which character sequences are evolved using pseudo-random numbers from a hypothetical ancestral sequence. The objective function to be minimized is the minimum number of character-state changes required on a binary tree that could account for the sequences observed at the tips (leaves) of the tree -- the Maximum Parsimony criterion. The Hitch-hiking strategy is shown to be useful in that it is robust and that on average the solutions found using the strategy are better than those found without. Also the strategy can dynamically provide information on the characteristics of the landscape of the problem. I argue that Hitch-hiking as a scheme for parallelization of existing heuristic search strategies is of potentially very general use, in many areas of combinatorial optimization.

  15. Hippocampal-prefrontal dynamics in spatial working memory: interactions and independent parallel processing.

    PubMed

    Churchwell, John C; Kesner, Raymond P

    2011-12-01

    Memory processes may be independent, compete, operate in parallel, or interact. In accordance with this view, behavioral studies suggest that the hippocampus (HPC) and prefrontal cortex (PFC) may act as an integrated circuit during performance of tasks that require working memory over longer delays, whereas during short delays the HPC and PFC may operate in parallel or have completely dissociable functions. In the present investigation we tested rats in a spatial delayed non-match to sample working memory task using short and long time delays to evaluate the hypothesis that intermediate CA1 region of the HPC (iCA1) and medial PFC (mPFC) interact and operate in parallel under different temporal working memory constraints. In order to assess the functional role of these structures, we used an inactivation strategy in which each subject received bilateral chronic cannula implantation of the iCA1 and mPFC, allowing us to perform bilateral, contralateral, ipsilateral, and combined bilateral inactivation of structures and structure pairs within each subject. This novel approach allowed us to test for circuit-level systems interactions, as well as independent parallel processing, while we simultaneously parametrically manipulated the temporal dimension of the task. The current results suggest that, at longer delays, iCA1 and mPFC interact to coordinate retrospective and prospective memory processes in anticipation of obtaining a remote goal, whereas at short delays either structure may independently represent spatial information sufficient to successfully complete the task.

  16. An evaluation of parallelization strategies for low-frequency electromagnetic induction simulators using staggered grid discretizations

    NASA Astrophysics Data System (ADS)

    Weiss, C. J.; Schultz, A.

    2011-12-01

    The high computational cost of the forward solution for modeling low-frequency electromagnetic induction phenomena is one of the primary impediments against broad-scale adoption by the geoscience community of exploration techniques, such as magnetotellurics and geomagnetic depth sounding, that rely on fast and cheap forward solutions to make tractable the inverse problem. As geophysical observables, electromagnetic fields are direct indicators of Earth's electrical conductivity - a physical property independent of (but in some cases correlative with) seismic wavespeed. Electrical conductivity is known to be a function of Earth's physiochemical state and temperature, and to be especially sensitive to the presence of fluids, melts and volatiles. Hence, electromagnetic methods offer a critical and independent constraint on our understanding of Earth's interior processes. Existing methods for parallelization of time-harmonic electromagnetic simulators, as applied to geophysics, have relied heavily on a combination of strategies: coarse-grained decompositions of the model domain; and/or, a high-order functional decomposition across spectral components, which in turn can be domain-decomposed themselves. Hence, in terms of scaling, both approaches are ultimately limited by the growing communication cost as the granularity of the forward problem increases. In this presentation we examine alternate parallelization strategies based on OpenMP shared-memory parallelization and CUDA-based GPU parallelization. As a test case, we use two different numerical simulation packages, each based on a staggered Cartesian grid: FDM3D (Weiss, 2006) which solves the curl-curl equation directly in terms of the scattered electric field (available under the LGPL at www.openem.org); and APHID, the A-Phi Decomposition based on mixed vector and scalar potentials, in which the curl-curl operator is replaced operationally by the vector Laplacian. We describe progress made in modifying the code to

  17. A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data

    NASA Astrophysics Data System (ADS)

    Li, Z.; Hodgson, M.; Li, W.

    2016-12-01

    Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.

  18. Time dependent processing in a parallel pipeline architecture.

    PubMed

    Biddiscombe, John; Geveci, Berk; Martin, Ken; Moreland, Kenneth; Thompson, David

    2007-01-01

    Pipeline architectures provide a versatile and efficient mechanism for constructing visualizations, and they have been implemented in numerous libraries and applications over the past two decades. In addition to allowing developers and users to freely combine algorithms, visualization pipelines have proven to work well when streaming data and scale well on parallel distributed-memory computers. However, current pipeline visualization frameworks have a critical flaw: they are unable to manage time varying data. As data flows through the pipeline, each algorithm has access to only a single snapshot in time of the data. This prevents the implementation of algorithms that do any temporal processing such as particle tracing; plotting over time; or interpolation, fitting, or smoothing of time series data. As data acquisition technology improves, as simulation time-integration techniques become more complex, and as simulations save less frequently and regularly, the ability to analyze the time-behavior of data becomes more important. This paper describes a modification to the traditional pipeline architecture that allows it to accommodate temporal algorithms. Furthermore, the architecture allows temporal algorithms to be used in conjunction with algorithms expecting a single time snapshot, thus simplifying software design and allowing adoption into existing pipeline frameworks. Our architecture also continues to work well in parallel distributed-memory environments. We demonstrate our architecture by modifying the popular VTK framework and exposing the functionality to the ParaView application. We use this framework to apply time-dependent algorithms on large data with a parallel cluster computer and thereby exercise a functionality that previously did not exist.

  19. Parallel Processing of Large Scale Microphone Arrays for Sound Capture

    NASA Astrophysics Data System (ADS)

    Jan, Ea-Ee.

    1995-01-01

    Performance of microphone sound pick up is degraded by deleterious properties of the acoustic environment, such as multipath distortion (reverberation) and ambient noise. The degradation becomes more prominent in a teleconferencing environment in which the microphone is positioned far away from the speaker. Besides, the ideal teleconference should feel as easy and natural as face-to-face communication with another person. This suggests hands-free sound capture with no tether or encumbrance by hand-held or body-worn sound equipment. Microphone arrays for this application represent an appropriate approach. This research develops new microphone array and signal processing techniques for high quality hands-free sound capture in noisy, reverberant enclosures. The new techniques combine matched-filtering of individual sensors and parallel processing to provide acute spatial volume selectivity which is capable of mitigating the deleterious effects of noise interference and multipath distortion. The new method outperforms traditional delay-and-sum beamformers which provide only directional spatial selectivity. The research additionally explores truncated matched-filtering and random distribution of transducers to reduce complexity and improve sound capture quality. All designs are first established by computer simulation of array performance in reverberant enclosures. The simulation is achieved by a room model which can efficiently calculate the acoustic multipath in a rectangular enclosure up to a prescribed order of images. It also calculates the incident angle of the arriving signal. Experimental arrays were constructed and their performance was measured in real rooms. Real room data were collected in a hard-walled laboratory and a controllable variable acoustics enclosure of similar size, approximately 6 x 6 x 3 m. An extensive speech database was also collected in these two enclosures for future research on microphone arrays. The simulation results are shown to be

  20. MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

    SciTech Connect

    Cavanagh, J.; Cui, S.

    2009-01-01

    Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  1. Massively Parallel Processing for Fast and Accurate Stamping Simulations

    NASA Astrophysics Data System (ADS)

    Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

    2005-08-01

    The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.

  2. Parallel information processing channels created in the retina

    PubMed Central

    Schiller, Peter H.

    2010-01-01

    In the retina, several parallel channels originate that extract different attributes from the visual scene. This review describes how these channels arise and what their functions are. Following the introduction four sections deal with these channels. The first discusses the “ON” and “OFF” channels that have arisen for the purpose of rapidly processing images in the visual scene that become visible by virtue of either light increment or light decrement; the ON channel processes images that become visible by virtue of light increment and the OFF channel processes images that become visible by virtue of light decrement. The second section examines the midget and parasol channels. The midget channel processes fine detail, wavelength information, and stereoscopic depth cues; the parasol channel plays a central role in processing motion and flicker as well as motion parallax cues for depth perception. Both these channels have ON and OFF subdivisions. The third section describes the accessory optic system that receives input from the retinal ganglion cells of Dogiel; these cells play a central role, in concert with the vestibular system, in stabilizing images on the retina to prevent the blurring of images that would otherwise occur when an organism is in motion. The last section provides a brief overview of several additional channels that originate in the retina. PMID:20876118

  3. Parallel distributed processing: Implications for cognition and development. Technical report

    SciTech Connect

    McClelland, J.L.

    1988-07-11

    This paper provides a brief overview of the connectionist or parallel distributed processing framework for modeling cognitive processes, and considers the application of the connectionist framework to problems of cognitive development. Several aspects of cognitive development might result from the process of learning as it occurs in multi-layer networks. This learning process has the characteristic that it reduces the discrepancy between expected and observed events. As it does this, representations develop on hidden units which dramatically change both the way in which the network represents the environment from which it learns and the expectations that the network generates about environmental events. The learning process exhibits relatively abrupt transitions corresponding to stage shifts in cognitive development. These points are illustrated using a network that learns to anticipate which side of a balance beam will go down, based on the number of weights on each side of the fulcrum and their distance from the fulcrum on each side of the beam. The network is trained in an environment in which weight more frequently governs which side will go down. It recapitulates the states of development seen in children, as well as the stage transitions, as it learns to represent weight and distance information.

  4. The power and efficiency of advanced software and parallel processing

    NASA Technical Reports Server (NTRS)

    Singh, Ramen P.; Taylor, Lawrence W., Jr.

    1989-01-01

    Real-time simulation of flexible and articulating systems is difficult because of the computational burden of the time varying calculations. The mobile servicing system of the NASA Space Station Freedom will handle heavy payloads by local arm manipulations and by translating along the spline of the Station, it is crucial to have real-time simulation available. To enable such a simulation to be of high fidelity and to be able to be hosted on a modest computer, special care must be made in formulating the structural dynamics. Frontal solution algorithms save considerable time in performing these calculations. In addition, it is necessary to take advantage of parallel processing be compatible to take full advantage of both. An approach is offered which will result in high fidelity, real-time simulation for flexible, articulating systems such as the space Station remote servicing system.

  5. Efficient biased random bit generation for parallel processing

    SciTech Connect

    Slone, Dale M.

    1994-09-28

    A lattice gas automaton was implemented on a massively parallel machine (the BBN TC2000) and a vector supercomputer (the CRAY C90). The automaton models Burgers equation ρt + ρρx = vρxx in 1 dimension. The lattice gas evolves by advecting and colliding pseudo-particles on a 1-dimensional, periodic grid. The specific rules for colliding particles are stochastic in nature and require the generation of many billions of random numbers to create the random bits necessary for the lattice gas. The goal of the thesis was to speed up the process of generating the random bits and thereby lessen the computational bottleneck of the automaton.

  6. Parallel Processing of Adaptive Meshes with Load Balancing

    NASA Technical Reports Server (NTRS)

    Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2001-01-01

    Many scientific applications involve grids that lack a uniform underlying structure. These applications are often also dynamic in nature in that the grid structure significantly changes between successive phases of execution. In parallel computing environments, mesh adaptation of unstructured grids through selective refinement/coarsening has proven to be an effective approach. However, achieving load balance while minimizing interprocessor communication and redistribution costs is a difficult problem. Traditional dynamic load balancers are mostly inadequate because they lack a global view of system loads across processors. In this paper, we propose a novel and general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication topology, and compare its performance with a successful global load balancing environment, called PLUM, specifically created to handle adaptive unstructured applications. Our experimental results on an IBM SP2 demonstrate that the SBN-based load balancer achieves lower redistribution costs than that under PLUM by overlapping processing and data migration.

  7. Applying the Extended Parallel Process Model to workplace safety messages.

    PubMed

    Basil, Michael; Basil, Debra; Deshpande, Sameer; Lavack, Anne M

    2013-01-01

    The extended parallel process model (EPPM) proposes fear appeals are most effective when they combine threat and efficacy. Three studies conducted in the workplace safety context examine the use of various EPPM factors and their effects, especially multiplicative effects. Study 1 was a content analysis examining the use of EPPM factors in actual workplace safety messages. Study 2 experimentally tested these messages with 212 construction trainees. Study 3 replicated this experiment with 1,802 men across four English-speaking countries-Australia, Canada, the United Kingdom, and the United States. The results of these three studies (1) demonstrate the inconsistent use of EPPM components in real-world work safety communications, (2) support the necessity of self-efficacy for the effective use of threat, (3) show a multiplicative effect where communication effectiveness is maximized when all model components are present (severity, susceptibility, and efficacy), and (4) validate these findings with gory appeals across four English-speaking countries.

  8. Massively Parallel Latent Semantic Analyzes using a Graphics Processing Unit

    SciTech Connect

    Cavanagh, Joseph M; Cui, Xiaohui

    2009-01-01

    Latent Semantic Indexing (LSA) aims to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. Due to the GPU s application-specific architecture, harnessing the GPU s computational prowess for LSA is a great challenge. We present a parallel LSA implementation on the GPU, using NVIDIA Compute Unified Device Architecture and Compute Unified Basic Linear Algebra Subprograms. The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1000x1000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version. The large variation is due to architectural benefits the GPU has for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  9. a Robust Parallel Framework for Massive Spatial Data Processing on High Performance Clusters

    NASA Astrophysics Data System (ADS)

    Guan, X.

    2012-07-01

    Massive spatial data requires considerable computing power for real-time processing. With the help of the development of multicore technology and computer component cost reduction in recent years, high performance clusters become the only economically viable solution for this requirement. Massive spatial data processing demands heavy I/O operations however, and should be characterized as a data-intensive application. Data-intensive application parallelization strategies are imcompatible with currently available procssing frameworks, which are basically designed for traditional compute-intensive applications. In this paper we introduce a Split-and-Merge paradigm for spatial data processing and also propose a robust parallel framework in a cluster environment to support this paradigm. The Split-and-Merge paradigm efficiently exploits data parallelism for massive data processing. The proposed framework is based on the open-source TORQUE project and hosted on a multicore-enabled Linux cluster. One common LiDAR point cloud algorithm, Delaunay triangulation, was implemented on the proposed framework to evaluate its efficiency and scalability. Experimental results demonstrate that the system provides efficient performance speedup.

  10. Mobile Devices and GPU Parallelism in Ionospheric Data Processing

    NASA Astrophysics Data System (ADS)

    Mascharka, D.; Pankratius, V.

    2015-12-01

    Scientific data acquisition in the field is often constrained by data transfer backchannels to analysis environments. Geoscientists are therefore facing practical bottlenecks with increasing sensor density and variety. Mobile devices, such as smartphones and tablets, offer promising solutions to key problems in scientific data acquisition, pre-processing, and validation by providing advanced capabilities in the field. This is due to affordable network connectivity options and the increasing mobile computational power. This contribution exemplifies a scenario faced by scientists in the field and presents the "Mahali TEC Processing App" developed in the context of the NSF-funded Mahali project. Aimed at atmospheric science and the study of ionospheric Total Electron Content (TEC), this app is able to gather data from various dual-frequency GPS receivers. It demonstrates parsing of full-day RINEX files on mobile devices and on-the-fly computation of vertical TEC values based on satellite ephemeris models that are obtained from NASA. Our experiments show how parallel computing on the mobile device GPU enables fast processing and visualization of up to 2 million datapoints in real-time using OpenGL. GPS receiver bias is estimated through minimum TEC approximations that can be interactively adjusted by scientists in the graphical user interface. Scientists can also perform approximate computations for "quickviews" to reduce CPU processing time and memory consumption. In the final stage of our mobile processing pipeline, scientists can upload data to the cloud for further processing. Acknowledgements: The Mahali project (http://mahali.mit.edu) is funded by the NSF INSPIRE grant no. AGS-1343967 (PI: V. Pankratius). We would like to acknowledge our collaborators at Boston College, Virginia Tech, Johns Hopkins University, Colorado State University, as well as the support of UNAVCO for loans of dual-frequency GPS receivers for use in this project, and Intel for loans of

  11. Digital signal processor and programming system for parallel signal processing

    SciTech Connect

    Van den Bout, D.E.

    1987-01-01

    This thesis describes an integrated assault upon the problem of designing high-throughput, low-cost digital signal-processing systems. The dual prongs of this assault consist of: (1) the design of a digital signal processor (DSP) which efficiently executes signal-processing algorithms in either a uniprocessor or multiprocessor configuration, (2) the PaLS programming system which accepts an arbitrary algorithm, partitions it across a group of DSPs, synthesizes an optimal communication link topology for the DSPs, and schedules the partitioned algorithm upon the DSPs. The results of applying a new quasi-dynamic analysis technique to a set of high-level signal-processing algorithms were used to determine the uniprocessor features of the DSP design. For multiprocessing applications, the DSP contains an interprocessor communications port (IPC) which supports simple, flexible, dataflow communications while allowing the total communication bandwidth to be incrementally allocated to achieve the best link utilization. The net result is a DSP with a simple architecture that is easy to program for both uniprocessor and multi-processor modes of operation. The PaLS programming system simplifies the task of parallelizing an algorithm for execution upon a multiprocessor built with the DSP.

  12. Processing modes and parallel processors in producing familiar keying sequences.

    PubMed

    Verwey, Willem B

    2003-05-01

    Recent theorizing indicates that the acquisition of movement sequence skill involves the development of several independent sequence representations at the same time. To examine this for the discrete sequence production task, participants in Experiment 1 produced a highly practiced sequence of six key presses in two conditions that allowed little preparation so that interkey intervals were slowed. Analyses of the distributions of moderately slowed interkey intervals indicated that this slowing was caused by the occasional use of two slower processing modes, that probably rely on independent sequence representations, and by reduced parallel processing in the fastest processing mode. Experiment 2 addressed the role of intention for the fast production of familiar keying sequences. It showed that the participants, who were not aware they were executing familiar sequences in a somewhat different task, had no benefits of prior practice. This suggests that the mechanisms underlying sequencing skills are not automatically activated by mere execution of familiar sequences, and that some form of top-down, intentional control remains necessary.

  13. Human pattern recognition: parallel processing and perceptual learning.

    PubMed

    Fahle, M

    1994-01-01

    A new theory of visual object recognition by Poggio et al that is based on multidimensional interpolation between stored templates requires fast, stimulus-specific learning in the visual cortex. Indeed, performance in a number of perceptual tasks improves as a result of practice. We distinguish between two phases of learning a vernier-acuity task, a fast one that takes place within less than 20 min and a slow phase that continues over 10 h of training and probably beyond. The improvement is specific for relatively 'simple' features, such as the orientation of the stimulus presented during training, for the position in the visual field, and for the eye through which learning occurred. Some of these results are simulated by means of a computer model that relies on object recognition by multidimensional interpolation between stored templates. Orientation specificity of learning is also found in a jump-displacement task. In a manner parallel to the improvement in performance, cortical potentials evoked by the jump displacement tend to decrease in latency and to increase in amplitude as a result of training. The distribution of potentials over the brain changes significantly as a result of repeated exposure to the same stimulus. The results both of psychophysical and of electrophysiological experiments indicate that some form of perceptual learning might occur very early during cortical information processing. The hypothesis that vernier breaks are detected 'early' during pattern recognition is supported by the fact that reaction times for the detection of verniers depend hardly at all on the number of stimuli presented simultaneously. Hence, vernier breaks can be detected in parallel at different locations in the visual field, indicating that deviation from straightness is an elementary feature for visual pattern recognition in humans that is detected at an early stage of pattern recognition. Several results obtained during the last few years are reviewed, some new

  14. A parallel strategy for implementing real-time expert systems using CLIPS

    NASA Technical Reports Server (NTRS)

    Ilyes, Laszlo A.; Villaseca, F. Eugenio; Delaat, John

    1994-01-01

    As evidenced by current literature, there appears to be a continued interest in the study of real-time expert systems. It is generally recognized that speed of execution is only one consideration when designing an effective real-time expert system. Some other features one must consider are the expert system's ability to perform temporal reasoning, handle interrupts, prioritize data, contend with data uncertainty, and perform context focusing as dictated by the incoming data to the expert system. This paper presents a strategy for implementing a real time expert system on the iPSC/860 hypercube parallel computer using CLIPS. The strategy takes into consideration not only the execution time of the software, but also those features which define a true real-time expert system. The methodology is then demonstrated using a practical implementation of an expert system which performs diagnostics on the Space Shuttle Main Engine (SSME). This particular implementation uses an eight node hypercube to process ten sensor measurements in order to simultaneously diagnose five different failure modes within the SSME. The main program is written in ANSI C and embeds CLIPS to better facilitate and debug the rule based expert system.

  15. Programming Probabilistic Structural Analysis for Parallel Processing Computer

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

    1991-01-01

    The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.

  16. Graphics processing unit parallel accelerated solution of the discrete ordinates for photon transport in biological tissues.

    PubMed

    Peng, Kuan; Gao, Xinbo; Qu, Xiaochao; Ren, Nunu; Chen, Xueli; He, Xiaowei; Wang, Xiaorei; Liang, Jimin; Tian, Jie

    2011-07-20

    As a widely used numerical solution for the radiation transport equation (RTE), the discrete ordinates can predict the propagation of photons through biological tissues more accurately relative to the diffusion equation. The discrete ordinates reduce the RTE to a serial of differential equations that can be solved by source iteration (SI). However, the tremendous time consumption of SI, which is partly caused by the expensive computation of each SI step, limits its applications. In this paper, we present a graphics processing unit (GPU) parallel accelerated SI method for discrete ordinates. Utilizing the calculation independence on the levels of the discrete ordinate equation and spatial element, the proposed method reduces the time cost of each SI step by parallel calculation. The photon reflection at the boundary was calculated based on the results of the last SI step to ensure the calculation independence on the level of the discrete ordinate equation. An element sweeping strategy was proposed to detect the calculation independence on the level of the spatial element. A GPU parallel frame called the compute unified device architecture was employed to carry out the parallel computation. The simulation experiments, which were carried out with a cylindrical phantom and numerical mouse, indicated that the time cost of each SI step can be reduced up to a factor of 228 by the proposed method with a GTX 260 graphics card. © 2011 Optical Society of America

  17. Resolving Multiscale Processes in Tropical Cyclogenesis Using Parallel EEMD

    NASA Astrophysics Data System (ADS)

    Wu, Y.; Shen, B. W.; Cheung, S.; Li, J. L. F.; Liu, Z.

    2014-12-01

    The recent advance in high-resolution global models has suggested that improved multiscale simulations of tropical waves may help extend the lead time of tropical cyclone (TC) formation prediction (e.g., Shen et al., 2010ab, 2012, 2013a). In previous efforts in the multiscale analysis of tropical waves , the Ensemble Empirical Mode Decomposition (EEMD) has been successfully parallelized and used to detect atmospheric wave signals on different spatial scales (e.g. Shen et al., 2013b) that include Mixed Rossby Gravity (MRG) waves, Western Wind Belt (WWB), African Easterly Waves (AEWs), etc. We now extend the related studies to examine the evolution of the large scale waves and their association with the formation of tropical cyclones in the Atlantic for an extensive time period spanning multiple years. Our goal is to analyze the multiscale interaction in the initiation and early intensification stage of an AEW and its subsequent impact on TC genesis that involves mainly the large scale downscaling processes. Specific focus is on the impact of barotropic instability and critical level (CL, or steering level) that may appear in association with the AEW. The presence of the CL is believed to play an important role in providing a favorable environment in the early TC-genesis stage in the marsupial paradigm scenario. Preliminary analysis of the satellite data obtained from the newly launched Global Precipitation Measurement (GPM) mission linked to the TC genesis processes will be included.

  18. Introduction to Computers: Parallel Alternative Strategies for Students. Course No. 0200000.

    ERIC Educational Resources Information Center

    Chauvenne, Sherry; And Others

    Parallel Alternative Strategies for Students (PASS) is a content-centered package of alternative methods and materials designed to assist secondary teachers to meet the needs of mainstreamed learning-disabled and emotionally-handicapped students of various achievement levels in the basic education content courses. This supplementary text and…

  19. pMHC Multiplexing Strategy to Detect High Numbers of T Cell Responses in Parallel.

    PubMed

    Philips, Daisy; van den Braber, Marlous; Schumacher, Ton N; Kvistborg, Pia

    2017-01-01

    The development of peptide loaded major histocompatibility complexes (MHC) conjugated to fluorochromes by Davis and colleagues 20 years ago provided a highly useful tool to identify and characterize antigen-specific T cells. In this chapter we describe a multiplexing strategy that allows detection of high numbers of T cell responses in parallel.

  20. Introduction to Computers: Parallel Alternative Strategies for Students. Course No. 0200000.

    ERIC Educational Resources Information Center

    Chauvenne, Sherry; And Others

    Parallel Alternative Strategies for Students (PASS) is a content-centered package of alternative methods and materials designed to assist secondary teachers to meet the needs of mainstreamed learning-disabled and emotionally-handicapped students of various achievement levels in the basic education content courses. This supplementary text and…

  1. English III. Teacher's Guide [and Student Workbook]. Revised. Parallel Alternative Strategies for Students (PASS).

    ERIC Educational Resources Information Center

    Atkinson, Missy; Fresen, Sue; Goldstein, Jeren; Harrell, Stephanie; MacEnulty, Patricia; McLain, Janice

    This teacher's guide and student workbook are part of a series of content-centered supplementary curriculum packages of alternative methods and activities designed to help secondary students who have disabilities and those with diverse learning needs succeed in regular education content courses. The content of Parallel Alternative Strategies for…

  2. Configuration Management Process Assessment Strategy

    NASA Technical Reports Server (NTRS)

    Henry, Thad

    2014-01-01

    Purpose: To propose a strategy for assessing the development and effectiveness of configuration management systems within Programs, Projects, and Design Activities performed by technical organizations and their supporting development contractors. Scope: Various entities CM Systems will be assessed dependent on Project Scope (DDT&E), Support Services and Acquisition Agreements. Approach: Model based structured against assessing organizations CM requirements including best practices maturity criteria. The model is tailored to the entity being assessed dependent on their CM system. The assessment approach provides objective feedback to Engineering and Project Management of the observed CM system maturity state versus the ideal state of the configuration management processes and outcomes(system). center dot Identifies strengths and risks versus audit gotcha's (findings/observations). center dot Used "recursively and iteratively" throughout program lifecycle at select points of need. (Typical assessments timing is Post PDR/Post CDR) center dot Ideal state criteria and maturity targets are reviewed with the assessed entity prior to an assessment (Tailoring) and is dependent on the assessed phase of the CM system. center dot Supports exit success criteria for Preliminary and Critical Design Reviews. center dot Gives a comprehensive CM system assessment which ultimately supports configuration verification activities.*

  3. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-18

    Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  4. Evaluating In-Clique and Topological Parallelism Strategies for Junction Tree-Based Bayesian Inference Algorithm on the Cray XMT

    SciTech Connect

    Chin, George; Choudhury, Sutanay; Kangas, Lars J.; McFarlane, Sally A.; Marquez, Andres

    2011-09-01

    Long viewed as a strong statistical inference technique, Bayesian networks have emerged to be an important class of applications for high-performance computing. We have applied an architecture-conscious approach to parallelizing the Lauritzen-Spiegelhalter Junction Tree algorithm for exact inferencing in Bayesian networks. In optimizing the Junction Tree algorithm, we have implemented both in-clique and topological parallelism strategies to best leverage the fine-grained synchronization and massive-scale multithreading of the Cray XMT architecture. Two topological techniques were developed to parallelize the evidence propagation process through the Bayesian network. One technique involves performing intelligent scheduling of junction tree nodes based on its topology and relative size. The second technique involves decomposing the junction tree into a much finer tree-like representation to offer much more opportunities for parallelism. We evaluate these optimizations on five different Bayesian networks and report our findings and observations. Another important contribution of this paper is to demonstrate the application of massive-scale multithreading for load balancing and use of implicit parallelism-based compiler optimizations in designing scalable inferencing algorithms.

  5. Parallel processing using an optical delay-based reservoir computer

    NASA Astrophysics Data System (ADS)

    Van der Sande, Guy; Nguimdo, Romain Modeste; Verschaffelt, Guy

    2016-04-01

    Delay systems subject to delayed optical feedback have recently shown great potential in solving computationally hard tasks. By implementing a neuro-inspired computational scheme relying on the transient response to optical data injection, high processing speeds have been demonstrated. However, reservoir computing systems based on delay dynamics discussed in the literature are designed by coupling many different stand-alone components which lead to bulky, lack of long-term stability, non-monolithic systems. Here we numerically investigate the possibility of implementing reservoir computing schemes based on semiconductor ring lasers. Semiconductor ring lasers are semiconductor lasers where the laser cavity consists of a ring-shaped waveguide. SRLs are highly integrable and scalable, making them ideal candidates for key components in photonic integrated circuits. SRLs can generate light in two counterpropagating directions between which bistability has been demonstrated. We demonstrate that two independent machine learning tasks , even with different nature of inputs with different input data signals can be simultaneously computed using a single photonic nonlinear node relying on the parallelism offered by photonics. We illustrate the performance on simultaneous chaotic time series prediction and a classification of the Nonlinear Channel Equalization. We take advantage of different directional modes to process individual tasks. Each directional mode processes one individual task to mitigate possible crosstalk between the tasks. Our results indicate that prediction/classification with errors comparable to the state-of-the-art performance can be obtained even with noise despite the two tasks being computed simultaneously. We also find that a good performance is obtained for both tasks for a broad range of the parameters. The results are discussed in detail in [Nguimdo et al., IEEE Trans. Neural Netw. Learn. Syst. 26, pp. 3301-3307, 2015

  6. Parallel processing for efficient 3D slope stability modelling

    NASA Astrophysics Data System (ADS)

    Marchesini, Ivan; Mergili, Martin; Alvioli, Massimiliano; Metz, Markus; Schneider-Muntau, Barbara; Rossi, Mauro; Guzzetti, Fausto

    2014-05-01

    We test the performance of the GIS-based, three-dimensional slope stability model r.slope.stability. The model was developed as a C- and python-based raster module of the GRASS GIS software. It considers the three-dimensional geometry of the sliding surface, adopting a modification of the model proposed by Hovland (1977), and revised and extended by Xie and co-workers (2006). Given a terrain elevation map and a set of relevant thematic layers, the model evaluates the stability of slopes for a large number of randomly selected potential slip surfaces, ellipsoidal or truncated in shape. Any single raster cell may be intersected by multiple sliding surfaces, each associated with a value of the factor of safety, FS. For each pixel, the minimum value of FS and the depth of the associated slip surface are stored. This information is used to obtain a spatial overview of the potentially unstable slopes in the study area. We test the model in the Collazzone area, Umbria, central Italy, an area known to be susceptible to landslides of different type and size. Availability of a comprehensive and detailed landslide inventory map allowed for a critical evaluation of the model results. The r.slope.stability code automatically splits the study area into a defined number of tiles, with proper overlap in order to provide the same statistical significance for the entire study area. The tiles are then processed in parallel by a given number of processors, exploiting a multi-purpose computing environment at CNR IRPI, Perugia. The map of the FS is obtained collecting the individual results, taking the minimum values on the overlapping cells. This procedure significantly reduces the processing time. We show how the gain in terms of processing time depends on the tile dimensions and on the number of cores.

  7. Seeing the forest for the trees: Networked workstations as a parallel processing computer

    NASA Technical Reports Server (NTRS)

    Breen, J. O.; Meleedy, D. M.

    1992-01-01

    Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.

  8. Parallel processing methods for space based power systems

    NASA Technical Reports Server (NTRS)

    Berry, F. C.

    1993-01-01

    This report presents a method for doing load-flow analysis of a power system by using a decomposition approach. The power system for the Space Shuttle is used as a basis to build a model for the load-flow analysis. To test the decomposition method for doing load-flow analysis, simulations were performed on power systems of 16, 25, 34, 43, 52, 61, 70, and 79 nodes. Each of the power systems was divided into subsystems and simulated under steady-state conditions. The results from these tests have been found to be as accurate as tests performed using a standard serial simulator. The division of the power systems into different subsystems was done by assigning a processor to each area. There were 13 transputers available, therefore, up to 13 different subsystems could be simulated at the same time. This report has preliminary results for a load-flow analysis using a decomposition principal. The report shows that the decomposition algorithm for load-flow analysis is well suited for parallel processing and provides increases in the speed of execution.

  9. Parallel Computations in Insect and Mammalian Visual Motion Processing.

    PubMed

    Clark, Damon A; Demb, Jonathan B

    2016-10-24

    Sensory systems use receptors to extract information from the environment and neural circuits to perform subsequent computations. These computations may be described as algorithms composed of sequential mathematical operations. Comparing these operations across taxa reveals how different neural circuits have evolved to solve the same problem, even when using different mechanisms to implement the underlying math. In this review, we compare how insect and mammalian neural circuits have solved the problem of motion estimation, focusing on the fruit fly Drosophila and the mouse retina. Although the two systems implement computations with grossly different anatomy and molecular mechanisms, the underlying circuits transform light into motion signals with strikingly similar processing steps. These similarities run from photoreceptor gain control and spatiotemporal tuning to ON and OFF pathway structures, motion detection, and computed motion signals. The parallels between the two systems suggest that a limited set of algorithms for estimating motion satisfies both the needs of sighted creatures and the constraints imposed on them by metabolism, anatomy, and the structure and regularities of the visual world.

  10. An integrated approach to improving the parallel applications development process

    SciTech Connect

    Rasmussen, Craig E; Watson, Gregory R; Tibbitts, Beth R

    2009-01-01

    The development of parallel applications is becoming increasingly important to a broad range of industries. Traditionally, parallel programming was a niche area that was primarily exploited by scientists trying to model extremely complicated physical phenomenon. It is becoming increasingly clear, however, that continued hardware performance improvements through clock scaling and feature-size reduction are simply not going to be achievable for much longer. The hardware vendor's approach to addressing this issue is to employ parallelism through multi-processor and multi-core technologies. While there is little doubt that this approach produces scaling improvements, there are still many significant hurdles to be overcome before parallelism can be employed as a general replacement to more traditional programming techniques. The Parallel Tools Platform (PTP) Project was created in 2005 in an attempt to provide developers with new tools aimed at addressing some of the parallel development issues. Since then, the introduction of a new generation of peta-scale and multi-core systems has highlighted the need for such a platform. In this paper, we describe some of the challenges facing parallel application developers, present the current state of PTP, and provide a simple case study that demonstrates how PTP can be used to locate a potential deadlock situation in an MPI code.

  11. A visual parallel-BCI speller based on the time-frequency coding strategy

    NASA Astrophysics Data System (ADS)

    Xu, Minpeng; Chen, Long; Zhang, Lixin; Qi, Hongzhi; Ma, Lan; Tang, Jiabei; Wan, Baikun; Ming, Dong

    2014-04-01

    Objective. Spelling is one of the most important issues in brain-computer interface (BCI) research. This paper is to develop a visual parallel-BCI speller system based on the time-frequency coding strategy in which the sub-speller switching among four simultaneously presented sub-spellers and the character selection are identified in a parallel mode. Approach. The parallel-BCI speller was constituted by four independent P300+SSVEP-B (P300 plus SSVEP blocking) spellers with different flicker frequencies, thereby all characters had a specific time-frequency code. To verify its effectiveness, 11 subjects were involved in the offline and online spellings. A classification strategy was designed to recognize the target character through jointly using the canonical correlation analysis and stepwise linear discriminant analysis. Main results. Online spellings showed that the proposed parallel-BCI speller had a high performance, reaching the highest information transfer rate of 67.4 bit min-1, with an average of 54.0 bit min-1 and 43.0 bit min-1 in the three rounds and five rounds, respectively. Significance. The results indicated that the proposed parallel-BCI could be effectively controlled by users with attention shifting fluently among the sub-spellers, and highly improved the BCI spelling performance.

  12. Introducing data parallelism into climate model post-processing through a parallel version of the NCAR Command Language (NCL)

    NASA Astrophysics Data System (ADS)

    Jacob, R. L.; Xu, X.; Krishna, J.; Tautges, T.

    2011-12-01

    The relationship between the needs of post-processing climate model output and the capability of the available tools has reached a crisis point. The large volume of data currently produced by climate models is overwhelming the current, decades-old analysis workflow. The tools used to implement that workflow are now a bottleneck in the climate science discovery processes. This crisis will only worsen as ultra-high resolution global climate models with horizontal scales of 4 km or smaller, running on leadership computing facilities, begin to produce tens to hundreds of terabytes for a single, hundred-year climate simulation. While climate models have used parallelism for several years, the post-processing tools are still mostly single-threaded applications. We have created a Parallel Climate Analysis Library (ParCAL) which implements many common climate analysis operations in a data-parallel fashion using the Message Passing Interface. ParCAL has in turn been built on sophisticated packages for describing grids in parallel (the Mesh Oriented database (MOAB) and for performing vector operations on arbitrary grids (Intrepid). ParCAL is also using parallel I/O through the PnetCDF library. ParCAL has been used to implement a parallel version of the NCAR Command Language (NCL). ParNCL/ParCAL not only speeds up analysis of large datasets but also allows operations to be performed on native grids, eliminating the need to transform everything to latitude-longitude grids. In most cases, users NCL scripts can run unaltered in parallel using ParNCL.

  13. Parallel implementation of RX anomaly detection on multi-core processors: impact of data partitioning strategies

    NASA Astrophysics Data System (ADS)

    Molero, Jose M.; Garzón, Ester M.; García, Inmaculada; Plaza, Antonio

    2011-11-01

    Anomaly detection is an important task for remotely sensed hyperspectral data exploitation. One of the most widely used and successful algorithms for anomaly detection in hyperspectral images is the Reed-Xiaoli (RX) algorithm. Despite its wide acceptance and high computational complexity when applied to real hyperspectral scenes, few documented parallel implementations of this algorithm exist, in particular for multi-core processors. The advantage of multi-core platforms over other specialized parallel architectures is that they are a low-power, inexpensive, widely available and well-known technology. A critical issue in the parallel implementation of RX is the sample covariance matrix calculation, which can be approached in global or local fashion. This aspect is crucial for the RX implementation since the consideration of a local or global strategy for the computation of the sample covariance matrix is expected to affect both the scalability of the parallel solution and the anomaly detection results. In this paper, we develop new parallel implementations of the RX in multi-core processors and specifically investigate the impact of different data partitioning strategies when parallelizing its computations. For this purpose, we consider both global and local data partitioning strategies in the spatial domain of the scene, and further analyze their scalability in different multi-core platforms. The numerical effectiveness of the considered solutions is evaluated using receiver operating characteristics (ROC) curves, analyzing their capacity to detect thermal hot spots (anomalies) in hyperspectral data collected by the NASA's Airborne Visible Infra- Red Imaging Spectrometer system over the World Trade Center in New York, five days after the terrorist attacks of September 11th, 2001.

  14. Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

    NASA Technical Reports Server (NTRS)

    Fricker, David M.

    1997-01-01

    The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.

  15. The finite element machine: An experiment in parallel processing

    NASA Technical Reports Server (NTRS)

    Storaasli, O. O.; Peebles, S. W.; Crockett, T. W.; Knott, J. D.; Adams, L.

    1982-01-01

    The finite element machine is a prototype computer designed to support parallel solutions to structural analysis problems. The hardware architecture and support software for the machine, initial solution algorithms and test applications, and preliminary results are described.

  16. Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

    USDA-ARS?s Scientific Manuscript database

    This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...

  17. Parallel versus sequential processing in print and braille reading.

    PubMed

    Veispak, Anneli; Boets, Bart; Ghesquière, Pol

    2012-01-01

    In the current study we investigated word, pseudoword and story reading in Dutch speaking braille and print readers. To examine developmental patterns, these reading skills were assessed in both children and adults. The results reveal that braille readers read less accurately and fast than print readers. While item length has no impact on word reading accuracy and speed in the group of print readers, it has a significant impact on reading accuracy and speed in the group of braille readers, particularly in the younger sample. This suggests that braille readers rely more strongly on an enduring sequential reading strategy. Comparison of the different reading tasks suggests that the advantage in accuracy and speed of reading in adult as compared to young braille readers is achieved through semantic top-down processing. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. 3D data denoising via Nonlocal Means filter by using parallel GPU strategies.

    PubMed

    Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco

    2014-01-01

    Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising.

  19. 3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

    PubMed Central

    Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco

    2014-01-01

    Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397

  20. Parallel processing architecture for computing inverse differential kinematic equations of the PUMA arm

    NASA Technical Reports Server (NTRS)

    Hsia, T. C.; Lu, G. Z.; Han, W. H.

    1987-01-01

    In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.

  1. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  2. Parallel design of JPEG-LS encoder on graphics processing units

    NASA Astrophysics Data System (ADS)

    Duan, Hao; Fang, Yong; Huang, Bormin

    2012-01-01

    With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.

  3. Graphical representation of parallel algorithmic processes. Master's thesis

    SciTech Connect

    Williams, E.M.

    1990-12-01

    Algorithm animation is a visualization method used to enhance understanding of functioning of an algorithm or program. Visualization is used for many purposes, including education, algorithm research, performance analysis, and program debugging. This research applies algorithm animation techniques to programs developed for parallel architectures, with specific on the Intel iPSC/2 hypercube. While both P-time and NP-time algorithms can potentially benefit from using visualization techniques, the set of NP-complete problems provides fertile ground for developing parallel applications, since the combinatoric nature of the problems makes finding the optimum solution impractical. The primary goals for this visualization system are: Data should be displayed as it is generated. The interface to the targe program should be transparent, allowing the animation of existing programs. Flexibility - the system should be able to animate any algorithm. The resulting system incorporates and extends two AFIT products: the AFIT Algorithm Animation Research Facility (AAARF) and the Parallel Resource Analysis Software Environment (PRASE). AAARF is an algorithm animation system developed primarily for sequential programs, but is easily adaptable for use with parallel programs. PRASE is an instrumentation package that extracts system performance data from programs on the Intel hypercubes. Since performance data is an essential part of analyzing any parallel program, views of the performance data are provided as an elementary part of the system. Custom software is designed to interface these systems and to display the program data. The program chosen as the example for this study is a member of the NP-complete problem set; it is a parallel implementation of a general.

  4. The convergence analysis of parallel genetic algorithm based on allied strategy

    NASA Astrophysics Data System (ADS)

    Lin, Feng; Sun, Wei; Chang, K. C.

    2010-04-01

    Genetic algorithms (GAs) have been applied to many difficult optimization problems such as track assignment and hypothesis managements for multisensor integration and data fusion. However, premature convergence has been a main problem for GAs. In order to prevent premature convergence, we introduce an allied strategy based on biological evolution and present a parallel Genetic Algorithm with the allied strategy (PGAAS). The PGAAS can prevent premature convergence, increase the optimization speed, and has been successfully applied in a few applications. In this paper, we first present a Markov chain model in the PGAAS. Based on this model, we analyze the convergence property of PGAAS. We then present the proof of global convergence for the PGAAS algorithm. The experiments results show that PGAAS is an efficient and effective parallel Genetic algorithm. Finally, we discuss several potential applications of the proposed methodology.

  5. Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

    DOE PAGES

    Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

    2015-01-01

    This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less

  6. Parallel Processing Creates a Low-Cost Growth Path.

    ERIC Educational Resources Information Center

    Shekhel, Alex; Freeman, Eva

    1987-01-01

    Discusses the advantages of parallel processor computers in terms of expandibility, cost, performance and reliability, and suggests that such computers be used in library automation systems as a cost effective approach to planning for the growth of information services and computer applications. (CLB)

  7. Large Scale Finite Element Modeling Using Scalable Parallel Processing

    NASA Technical Reports Server (NTRS)

    Cwik, T.; Katz, D.; Zuffada, C.; Jamnejad, V.

    1995-01-01

    An iterative solver for use with finite element codes was developed for the Cray T3D massively parallel processor at the Jet Propulsion Laboratory. Finite element modeling is useful for simulating scattered or radiated electromagnetic fields from complex three-dimensional objects with geometry variations smaller than an electrical wavelength.

  8. Listening beneath the Words: Parallel Processes in Music and Psychotherapy

    ERIC Educational Resources Information Center

    Shapiro, Yakov; Marks-Tarlow,Terry; Fridman, Joseph

    2017-01-01

    The authors investigate the parallels between musical performance and psychoanalytical therapy, using the former as a metaphor for the way therapist and patient jointly compose the therapeutic experience and better the treatment it offers. [Note: The volume and issue number (v9 n1) shown on this PDF is incorrect. The correct citation is v9 n2.

  9. Optical Digital Parallel Truth-Table Look-Up Processing

    NASA Astrophysics Data System (ADS)

    Mirsalehi, Mir Mojtaba

    During the last decade, a number of optical digital processors have been proposed that combine the parallelism and speed of optics with the accuracy and flexibility of a digital representation. In this thesis, two types of such processors (an EXCLUSIVE OR-based processor and a NAND-based processor) that function as content-addressable memories (CAM's) are analyzed. The main factors that affect the performance of the EXCLUSIVE OR-based processor are found to be the Gaussian nature of the reference beam and the finite square aperture of the crystal. A quasi-one-dimensional model is developed to analyze the effect of the Gaussian reference beam, and a circular aperture is used to increase the dynamic range in the output power. The main factors that affect the performance of the NAND-based processor are found to be the variations in the amplitudes and the relative phase of the laser beams during the recording process. A mathematical model is developed for analyzing the probability of error in the output of the processor. Using this model, the performance of the processor for some practical cases is analyzed. Techniques that have been previously used to reduce the number of reference patterns in a CAM include: using the residue number system and applying logical minimization methods. In the present work, these and additional techniques are investigated. A systematic procedure is developed for selecting the optimum set of moduli. The effect of coding is investigated and it is shown that multi-level coding, when used in conjunction with logical minimization techniques, significantly reduces the number of reference patterns. The Quine-McCluskey method is extended to multiple -valued logic and a computer program based on this extension is used for logical minimization. The results show that for moduli expressable as p('n), where p is a prime number and n is an integer greater than one, p-level coding provides significant reduction. The NAND-based processor is modified for

  10. Control of automatic processes: A parallel distributed-processing model of the stroop effect. Technical report

    SciTech Connect

    Cohen, J.D.; Dunbar, K.; McClelland, J.L.

    1988-06-16

    A growing body of evidence suggests that traditional views of automaticity are in need of revision. For example, automaticity has often been treated as an all-or-none phenomenon, and traditional theories have held that automatic processes are independent of attention. Yet recent empirial data suggests that automatic processes are continuous, and furthermore are subject to attentional control. In this paper we present a model of attention which addresses these issues. Using a parallel distributed processing framework we propose that the attributes of automaticity depend upon the strength of a process and that strength increases with training. Using the Stroop effect as an example, we show how automatic processes are continuous and emerge gradually with practice. Specifically, we present a computational model of the Stroop task which simulates the time course of processing as well as the effects of learning.

  11. A 16-bit parallel processing in a molecular assembly

    PubMed Central

    Bandyopadhyay, Anirban; Acharya, Somobrata

    2008-01-01

    A machine assembly consisting of 17 identical molecules of 2,3,5,6-tetramethyl-1–4-benzoquinone (DRQ) executes 16 instructions at a time. A single DRQ is positioned at the center of a circular ring formed by 16 other DRQs, controlling their operation in parallel through hydrogen-bond channels. Each molecule is a logic machine and generates four instructions by rotating its alkyl groups. A single instruction executed by a scanning tunneling microscope tip on the central molecule can change decisions of 16 machines simultaneously, in four billion (416) ways. This parallel communication represents a significant conceptual advance relative to today's fastest processors, which execute only one instruction at a time. PMID:18332437

  12. A Parallel Processing Algorithm for Remote Sensing Classification

    NASA Technical Reports Server (NTRS)

    Gualtieri, J. Anthony

    2005-01-01

    A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.

  13. Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing

    SciTech Connect

    B. Hendrickson; T.G. Kolda

    1998-09-01

    A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.

  14. An approach to real-time simulation using parallel processing

    NASA Technical Reports Server (NTRS)

    Blech, R. A.; Arpasi, D. J.

    1981-01-01

    Current applications of real-time simulations to the development of complex aircraft propulsion system controls have demonstrated the need for accurate, portable, and low-cost simulators. This paper presents a preliminary simulator design that uses a parallel computer organization to provide these features. The hardware and software for this prototype simulator are discussed. A detailed discussion of the inter-computer data transfer mechanism is also presented.

  15. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, D.B.

    1996-12-31

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.

  16. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, Dario B.

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  17. Toward a Model Framework of Generalized Parallel Componential Processing of Multi-Symbol Numbers

    ERIC Educational Resources Information Center

    Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

    2015-01-01

    In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining…

  18. Toward a Model Framework of Generalized Parallel Componential Processing of Multi-Symbol Numbers

    ERIC Educational Resources Information Center

    Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

    2015-01-01

    In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining…

  19. Studies in optical parallel processing. [All optical and electro-optic approaches

    NASA Technical Reports Server (NTRS)

    Lee, S. H.

    1978-01-01

    Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.

  20. Tank Waste Remediation System optimized processing strategy

    SciTech Connect

    Slaathaug, E.J.; Boldt, A.L.; Boomer, K.D.; Galbraith, J.D.; Leach, C.E.; Waldo, T.L.

    1996-03-01

    This report provides an alternative strategy evolved from the current Hanford Site Tank Waste Remediation System (TWRS) programmatic baseline for accomplishing the treatment and disposal of the Hanford Site tank wastes. This optimized processing strategy performs the major elements of the TWRS Program, but modifies the deployment of selected treatment technologies to reduce the program cost. The present program for development of waste retrieval, pretreatment, and vitrification technologies continues, but the optimized processing strategy reuses a single facility to accomplish the separations/low-activity waste (LAW) vitrification and the high-level waste (HLW) vitrification processes sequentially, thereby eliminating the need for a separate HLW vitrification facility.

  1. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    NASA Astrophysics Data System (ADS)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU

  2. Parallel architectures for image processing; Proceedings of the Meeting, Santa Clara, CA, Feb. 14, 15, 1990

    SciTech Connect

    Ghosh, J.; Harrison, C.G.

    1990-01-01

    The present conference discusses topics in the fields of VLSI-based and real-time image-processing systems, parallel architectures for image processing, image-processing algorithms, and image processing on the basis of artificial neural networks. Attention is given to a fixed-point VLSI architecture for high-speed image reconstruction, an orthogonal multiprocessor for image processing with neural networks, massively parallel processors in real-time applications, the use of the adiabatic approximation as a tool in image estimation, parallel algorithms for contour-extraction and coding, and a parallel architecture for multidimensional image processing. Also discussed are concurrent image-processing on hypercube multicomputers, neural-network simulation on a reduced-mesh-of-trees organization, and a goal-seeking neural net for recall and recognition.

  3. Signal processing applications of massively parallel charge domain computing devices

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor)

    1999-01-01

    The present invention is embodied in a charge coupled device (CCD)/charge injection device (CID) architecture capable of performing a Fourier transform by simultaneous matrix vector multiplication (MVM) operations in respective plural CCD/CID arrays in parallel in O(1) steps. For example, in one embodiment, a first CCD/CID array stores charge packets representing a first matrix operator based upon permutations of a Hartley transform and computes the Fourier transform of an incoming vector. A second CCD/CID array stores charge packets representing a second matrix operator based upon different permutations of a Hartley transform and computes the Fourier transform of an incoming vector. The incoming vector is applied to the inputs of the two CCD/CID arrays simultaneously, and the real and imaginary parts of the Fourier transform are produced simultaneously in the time required to perform a single MVM operation in a CCD/CID array.

  4. CRBLASTER: a fast parallel-processing program for cosmic ray rejection

    NASA Astrophysics Data System (ADS)

    Mighell, Kenneth J.

    2008-08-01

    Many astronomical image-analysis programs are based on algorithms that can be described as being embarrassingly parallel, where the analysis of one subimage generally does not affect the analysis of another subimage. Yet few parallel-processing astrophysical image-analysis programs exist that can easily take full advantage of todays fast multi-core servers costing a few thousands of dollars. A major reason for the shortage of state-of-the-art parallel-processing astrophysical image-analysis codes is that the writing of parallel codes has been perceived to be difficult. I describe a new fast parallel-processing image-analysis program called crblaster which does cosmic ray rejection using van Dokkum's L.A.Cosmic algorithm. crblaster is written in C using the industry standard Message Passing Interface (MPI) library. Processing a single 800×800 HST WFPC2 image takes 1.87 seconds using 4 processes on an Apple Xserve with two dual-core 3.0-GHz Intel Xeons; the efficiency of the program running with the 4 processors is 82%. The code can be used as a software framework for easy development of parallel-processing image-anlaysis programs using embarrassing parallel algorithms; the biggest required modification is the replacement of the core image processing function with an alternative image-analysis function based on a single-processor algorithm. I describe the design, implementation and performance of the program.

  5. Implementation science: a role for parallel dual processing models of reasoning?

    PubMed Central

    Sladek, Ruth M; Phillips, Paddy A; Bond, Malcolm J

    2006-01-01

    Background A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Discussion Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence) and cognitive processing (e.g., thinking styles) influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of reasoning are important

  6. Parallel processing of remotely sensed data: Application to the ATSR-2 instrument

    NASA Astrophysics Data System (ADS)

    Simpson, J.; McIntire, T.; Berg, J.; Tsou, Y.

    2007-01-01

    Massively parallel computational paradigms can mitigate many issues associated with the analysis of large and complex remotely sensed data sets. Recently, the Beowulf cluster has emerged as the most attractive, massively parallel architecture due to its low cost and high performance. Whereas most Beowulf designs have emphasized numerical modeling applications, the Parallel Image Processing Environment (PIPE) specifically addresses the unique requirements of remote sensing applications. Automated, parallelization of user-defined analyses is fully supported. A neural network application, applied to Along Track Scanning Radiometer-2 (ATSR-2) data shows the advantages and performance characteristics of PIPE.

  7. Arts Integration Parallels Between Music and Reading: Process, Product and Affective Response.

    ERIC Educational Resources Information Center

    Merrion, Margaret Dee

    The process of aesthetic education is not limited to the fine arts. Parallels may be identified in the language arts and particularly in the art of creative reading. As in a musical experience, a creative reader will apprehend the content of the literature and couple personal feelings with the events of the reading experience. Parallel brain…

  8. Control Strategy of a Parallel System Using Both Matrix Converter and Voltage Type Inverter

    NASA Astrophysics Data System (ADS)

    Itoh, Jun-Ichi; Tamura, Hiroshi

    This paper proposes a control strategy for a matrix converter and voltage type inverter in a parallel system that does not require of interconnection reactors. The proposed control strategy is to divide the operation time between a matrix converter and a voltage type inverter. The operation time of each converter is divided in every carrier cycle. As a result, interconnection reactors are not required and the sinusoidal input current waveform of a matrix converter can be obtained. The total output voltage of the proposed system and the output power division ratio for a matrix converter and a voltage type inverter are controlled by the time division ratio of each converter. Furthermore, the voltage error resulting from the operation of time division control was analyzed and compensated. The availability of the proposed system and the validity of the proposed control method are confirmed by experimental results.

  9. A control strategy for parallel hybrid electric vehicles based on extremum seeking

    NASA Astrophysics Data System (ADS)

    Dinçmen, Erkin; Aksun Güvenç, Bilin

    2012-02-01

    An energy management control strategy for a parallel hybrid electric vehicle based on the extremum-seeking method for splitting torque between the internal combustion engine and electric motor is proposed in this paper. The control strategy has two levels of operation: the upper and lower levels. The upper level decision-making controller chooses the vehicle operation mode such as the simultaneous use of the internal combustion engine and electric motor, use of only the electric motor, use of only the internal combustion engine, or regenerative braking. In the simultaneous use of the internal combustion engine and electric motor, the optimum energy distribution between these two sources of energy is determined via the extremum-seeking algorithm that searches for maximum drivetrain efficiency. A dynamic programming solution is also obtained and used to form a benchmark for performance evaluation of the proposed method based on extremum seeking. Detailed simulations using a realistic model are presented to illustrate the effectiveness of the methodology.

  10. A strategy for the solution-phase parallel synthesis of N-(pyrrolidinylmethyl)hydroxamic acids.

    PubMed

    Takayanagi, M; Flessner, T; Wong, C H

    2000-06-16

    Both five- and six-membered iminocyclitols have proven to be useful transition-state analogue inhibitors of glycosidases. They also mimic the transition-state sugar moiety of the nucleoside phosphate sugar in glycosyltransferase-catalyzed reactions. Described here is the development of a general strategy toward the parallel synthesis of a five-membered iminocyclitol linked to a hydroxamic acid group designed to mimic the transition state of GDP-fucose complexed with Mn(II) in fucosyltransferase reactions. The iminocyclitol 8 containing a protected hydroxylamine unit was prepared from D-mannitol. The hydroxamic acid moiety was introduced via the reaction of 8 with various acid chlorides. The strategy is generally applicable to the construction of libraries for identification of glycosyltransferase inhibitors.

  11. Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

    NASA Technical Reports Server (NTRS)

    Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)

    2001-01-01

    A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.

  12. Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

    NASA Technical Reports Server (NTRS)

    Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)

    2002-01-01

    A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.

  13. Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

    SciTech Connect

    Beckingsale, D. A.; Gaudin, W. P.; Hornung, R. D.; Gunney, B. T.; Gamblin, T.; Herdman, J. A.; Jarvis, S. A.

    2014-11-17

    Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

  14. Parallel systems of error processing in the brain.

    PubMed

    Yordanova, Juliana; Falkenstein, Michael; Hohnsbein, Joachim; Kolev, Vasil

    2004-06-01

    Major neurophysiological principles of performance monitoring are not precisely known. It is a current debate in cognitive neuroscience if an error-detection neural system is involved in behavioral control and adaptation. Such a system should generate error-specific signals, but their existence is questioned by observations that correct and incorrect reactions may elicit similar neuroelectric potentials. A new approach based on a time-frequency decomposition of event-related brain potentials was applied to extract covert sub-components from the classical error-related negativity (Ne) and correct-response-related negativity (Nc) in humans. A unique error-specific sub-component from the delta (1.5-3.5 Hz) frequency band was revealed only for Ne, which was associated with error detection at the level of overall performance monitoring. A sub-component from the theta frequency band (4-8 Hz) was associated with motor response execution, but this sub-component also differentiated error from correct reactions indicating error detection at the level of movement monitoring. It is demonstrated that error-specific signals do exist in the brain. More importantly, error detection may occur in multiple functional systems operating in parallel at different levels of behavioral control.

  15. Advantages of Parallel Processing and the Effects of Communications Time

    NASA Technical Reports Server (NTRS)

    Eddy, Wesley M.; Allman, Mark

    2000-01-01

    Many computing tasks involve heavy mathematical calculations, or analyzing large amounts of data. These operations can take a long time to complete using only one computer. Networks such as the Internet provide many computers with the ability to communicate with each other. Parallel or distributed computing takes advantage of these networked computers by arranging them to work together on a problem, thereby reducing the time needed to obtain the solution. The drawback to using a network of computers to solve a problem is the time wasted in communicating between the various hosts. The application of distributed computing techniques to a space environment or to use over a satellite network would therefore be limited by the amount of time needed to send data across the network, which would typically take much longer than on a terrestrial network. This experiment shows how much faster a large job can be performed by adding more computers to the task, what role communications time plays in the total execution time, and the impact a long-delay network has on a distributed computing system.

  16. Design of a dataway processor for a parallel image signal processing system

    NASA Astrophysics Data System (ADS)

    Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

    1995-04-01

    Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.

  17. Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

    PubMed

    Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R

    2017-01-21

    The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.

  18. A general parallelization strategy for random path based geostatistical simulation methods

    NASA Astrophysics Data System (ADS)

    Mariethoz, Grégoire

    2010-07-01

    The size of simulation grids used for numerical models has increased by many orders of magnitude in the past years, and this trend is likely to continue. Efficient pixel-based geostatistical simulation algorithms have been developed, but for very large grids and complex spatial models, the computational burden remains heavy. As cluster computers become widely available, using parallel strategies is a natural step for increasing the usable grid size and the complexity of the models. These strategies must profit from of the possibilities offered by machines with a large number of processors. On such machines, the bottleneck is often the communication time between processors. We present a strategy distributing grid nodes among all available processors while minimizing communication and latency times. It consists in centralizing the simulation on a master processor that calls other slave processors as if they were functions simulating one node every time. The key is to decouple the sending and the receiving operations to avoid synchronization. Centralization allows having a conflict management system ensuring that nodes being simulated simultaneously do not interfere in terms of neighborhood. The strategy is computationally efficient and is versatile enough to be applicable to all random path based simulation methods.

  19. Parallel plan execution with self-processing networks

    NASA Technical Reports Server (NTRS)

    Dautrechy, C. Lynne; Reggia, James A.

    1989-01-01

    A critical issue for space operations is how to develop and apply advanced automation techniques to reduce the cost and complexity of working in space. In this context, it is important to examine how recent advances in self-processing networks can be applied for planning and scheduling tasks. For this reason, the feasibility of applying self-processing network models to a variety of planning and control problems relevant to spacecraft activities is being explored. Goals are to demonstrate that self-processing methods are applicable to these problems, and that MIRRORS/II, a general purpose software environment for implementing self-processing models, is sufficiently robust to support development of a wide range of application prototypes. Using MIRRORS/II and marker passing modelling techniques, a model of the execution of a Spaceworld plan was implemented. This is a simplified model of the Voyager spacecraft which photographed Jupiter, Saturn, and their satellites. It is shown that plan execution, a task usually solved using traditional artificial intelligence (AI) techniques, can be accomplished using a self-processing network. The fact that self-processing networks were applied to other space-related tasks, in addition to the one discussed here, demonstrates the general applicability of this approach to planning and control problems relevant to spacecraft activities. It is also demonstrated that MIRRORS/II is a powerful environment for the development and evaluation of self-processing systems.

  20. Parallel plan execution with self-processing networks

    NASA Technical Reports Server (NTRS)

    D'Autrechy, C. Lynne; Reggia, James A.

    1989-01-01

    A critical issue for space operations is how to develop and apply advanced automation techniques to reduce the cost and complexity of working in space. In this context, it is important to examine how recent advances in self-processing networks can be applied for planning and scheduling tasks. For this reason, the feasibility of applying self-processing network models to a variety of planning and control problems relevant to spacecraft activities is being explored. Goals are to demonstrate that self-processing methods are applicable to these problems, and that MIRRORS/II, a general purpose software environment for implementing self-processing models, is sufficiently robust to support development of a wide range of application prototypes. Using MIRRORS/II and marker passing modelling techniques, a model of the execution of a Spaceworld plan was implemented. This is a simplified model of the Voyager spacecraft which photographed Jupiter, Saturn, and their satellites. It is shown that plan execution, a task usually solved using traditional artificial intelligence (AI) techniques, can be accomplished using a self-processing network. The fact that self-processing networks were applied to other space-related tasks, in addition to the one discussed here, demonstrates the general applicability of this approach to planning and control problems relevant to spacecraft activities. It is also demonstrated that MIRRORS/II is a powerful environment for the development and evaluation of self-processing systems.

  1. Control of automatic processes: A parallel distributed-processing account of the Stroop effect. Technical report

    SciTech Connect

    Cohen, J.D.; Dunbar, K.; McClelland, J.L.

    1989-11-22

    A growing body of evidence suggests that traditional views of automaticity are in need of revision. For example, automaticity has often been treated as an all-or-none phenomenon, and traditional theories have held that automatic processes are independent of attention. Yet recent empirical data suggest that automatic processes are continuous, and furthermore are subject to attentional control. In this paper we present a model of attention which addresses these issues. Using a parallel distributed processing framework we propose that the attributes of automaticity depend upon the strength of a processing pathway and that strength increases with training. Using the Stroop effect as an example, we show how automatic processes are continuous and emerge gradually with practice. Specifically, we present a computational model of the Stroop task which simulates the time course of processing as well as the effects of learning. This was accomplished by combining the cascade mechanism described by McClelland (1979) with the back propagation learning algorithm (Rumelhart, Hinton, Williams, 1986). The model is able to simulate performance in the standard Stroop task, as well as aspects of performance in variants of this task which manipulate SOA, response set, and degree of practice. In the discussion we contrast our model with other models, and indicate how it relates to many of the central issues in the literature on attention, automaticity, and interference.

  2. An iterative expanding and shrinking process for processor allocation in mixed-parallel workflow scheduling.

    PubMed

    Huang, Kuo-Chan; Wu, Wei-Ya; Wang, Feng-Jian; Liu, Hsiao-Ching; Hung, Chun-Hao

    2016-01-01

    Parallel computation has been widely applied in a variety of large-scale scientific and engineering applications. Many studies indicate that exploiting both task and data parallelisms, i.e. mixed-parallel workflows, to solve large computational problems can get better efficacy compared with either pure task parallelism or pure data parallelism. Scheduling traditional workflows of pure task parallelism on parallel systems has long been known to be an NP-complete problem. Mixed-parallel workflow scheduling has to deal with an additional challenging issue of processor allocation. In this paper, we explore the processor allocation issue in scheduling mixed-parallel workflows of moldable tasks, called M-task, and propose an Iterative Allocation Expanding and Shrinking (IAES) approach. Compared to previous approaches, our IAES has two distinguishing features. The first is allocating more processors to the tasks on allocated critical paths for effectively reducing the makespan of workflow execution. The second is allowing the processor allocation of an M-task to shrink during the iterative procedure, resulting in a more flexible and effective process for finding better allocation. The proposed IAES approach has been evaluated with a series of simulation experiments and compared to several well-known previous methods, including CPR, CPA, MCPA, and MCPA2. The experimental results indicate that our IAES approach outperforms those previous methods significantly in most situations, especially when nodes of the same layer in a workflow might have unequal workloads.

  3. Next Generation Parallelization Systems for Processing and Control of PDS Image Node Assets

    NASA Astrophysics Data System (ADS)

    Verma, R.

    2017-06-01

    We present next-generation parallelization tools to help Planetary Data System (PDS) Imaging Node (IMG) better monitor, process, and control changes to nearly 650 million file assets and over a dozen machines on which they are referenced or stored.

  4. Cancer information and anxiety: applying the extended parallel process model.

    PubMed

    Evans, Ruth Ec; Beeken, Rebecca J; Steptoe, Andrew; Wardle, Jane

    2012-05-01

    There is concern that public education about testicular cancer (TC) may cause unnecessary anxiety. Psychological theory suggests that if threat (eg, TC) information is accompanied with threat control strategies (eg, testicular self-examination; TSE) anxiety is less likely. Male students (N=443) were randomized to either a TC or TC +TSE information group or a no information control group, and assessed at three time points. Anxiety levels did not differ between the groups and exposure to TC+TSE resulted in greater perceived message benefit, increased intention to self-examine and lower message denigration. This suggests TC information is not anxiogenic, but inclusion of TSE information may improve acceptance of disease awareness information.

  5. A new cascaded control strategy for paralleled line-interactive UPS with LCL filter

    NASA Astrophysics Data System (ADS)

    Zhang, X. Y.; Zhang, X. H.; Li, L.; Luo, F.; Zhang, Y. S.

    2016-08-01

    Traditional uninterrupted power supply (UPS) is difficult to meet the output voltage quality and grid-side power quality requirements at the same time, and usually has some disadvantage, such as multi-stage conversion, complex structure, or harmonic current pollution to the utility grid and so on. A three-phase three-level paralleled line-interactive UPS with LCL filter is presented in this paper. It can achieve the output voltage quality and grid-side power quality control simultaneously with only single-conversion power stage, but the multi-objective control strategy design is difficult. Based on the detailed analysis of the circuit structure and operation mechanism, a new cascaded control strategy for the power, voltage, and current is proposed. An outer current control loop based on the resonant control theory is designed to ensure the grid-side power quality. An inner voltage control loop based on the capacitance voltage and capacitance current feedback is designed to ensure the output voltage quality and avoid the resonance peak of the LCL filter. Improved repetitive controller is added to reduce the distortion of the output voltage. The setting of the controller parameters is detailed discussed. A 100kVA UPS prototype is built and experiments under the unbalanced resistive load and nonlinear load are carried out. Theoretical analysis and experimental results show the effectiveness of the control strategy. The paralleled line-interactive UPS can not only remain constant three-phase balanced output voltage, but also has the comprehensive power quality management functions with three-phase balanced grid active power input, low THD of output voltage and grid current, and reactive power compensation. The UPS is a green friendly load to the utility.

  6. Probabilistic Modeling of Tephra Dispersion using Parallel Processing

    NASA Astrophysics Data System (ADS)

    Hincks, T.; Bonadonna, C.; Connor, L.; Connor, C.; Sparks, S.

    2002-12-01

    Numerical models of tephra accumulation are important tools in assessing hazards of volcanic eruptions. Such tools can be used far in advance of future eruptions to calculate possible hazards as conditional probabilities. For example, given that a volcanic eruption occurs, what is the expected range of tephra deposition in a specific location or across a region? An empirical model is presented that uses physical characteristics (e.g., volume, column height, particle size distribution) of a volcanic eruption to calculate expected tephra accumulation at geographic locations distant from the vent. This model results from the combination of the Connor et al. (2001) and Bonadonna et al. (1998, 2002) numerical approaches and is based on application of the diffusion advection equation using a stratified atmosphere and particle fall velocities that account for particle shape, density, and variation in Reynold's number along the path of decent. Distribution of particles in the eruption column is a major source of uncertainty in estimation of tephra hazards. We adopt an approach in which several models of the volcanic column may be used and the impact of these various source term models on hazard estimated. Cast probabilistically, this model can use characteristics of historical eruptions, or data from analogous eruptions, to predict the expected tephra deposition from future eruptions. Application of such a model for computing a large number of events over a grid of many points is computationally expensive. In fact, the utility of the model for stochastic simulations of volcanic eruptions was limited by long execution time. To address this concern, we created a parallel version in C and MPI, a message passing interface, to run on a Beowulf cluster, a private network of reasonably high performance computers. We have discovered that grid or input decomposition and self-scheduling techniques lead to essentially linear speed-up in the code. This means that the code is readily

  7. Parallel Processing Response Times and Experimental Determination of the Stopping Rule

    PubMed

    Townsend; Colonius

    1997-12-01

    It was formerly demonstrated that virtually all reasonable exhaustive serial models, and a more constrained set of exhaustive parallel models, cannot predict critical effects associated with self-terminating models. The present investigation greatly generalizes the parallel class of models covered by similar "impossibility" theorems. Specifically, we prove that if an exhaustive parallel model is not super capacity, and if targets are processed at least as fast as non-targets, then it cannot predict such (self-terminating) effects. Such effects are ubiquitous in the experimental literature, offering strong confirmation for self-terminating processing. Copyright 1997 Academic Press. Copyright 1997 Academic Press

  8. Performance of a VME-based parallel processing LIDAR data acquisition system (summary)

    SciTech Connect

    Moore, K.; Buttler, B.; Caffrey, M.; Soriano, C.

    1995-05-01

    It may be possible to make accurate real time, autonomous, 2 and 3 dimensional wind measurements remotely with an elastic backscatter Light Detection and Ranging (LIDAR) system by incorporating digital parallel processing hardware into the data acquisition system. In this paper, we report the performance of a commercially available digital parallel processing system in implementing the maximum correlation technique for wind sensing using actual LIDAR data. Timing and numerical accuracy are benchmarked against a standard microprocessor impementation.

  9. Distinct lateral inhibitory circuits drive parallel processing of sensory information in the mammalian olfactory bulb

    PubMed Central

    Geramita, Matthew A; Burton, Shawn D; Urban, Nathan N

    2016-01-01

    Splitting sensory information into parallel pathways is a common strategy in sensory systems. Yet, how circuits in these parallel pathways are composed to maintain or even enhance the encoding of specific stimulus features is poorly understood. Here, we have investigated the parallel pathways formed by mitral and tufted cells of the olfactory system in mice and characterized the emergence of feature selectivity in these cell types via distinct lateral inhibitory circuits. We find differences in activity-dependent lateral inhibition between mitral and tufted cells that likely reflect newly described differences in the activation of deep and superficial granule cells. Simulations show that these circuit-level differences allow mitral and tufted cells to best discriminate odors in separate concentration ranges, indicating that segregating information about different ranges of stimulus intensity may be an important function of these parallel sensory pathways. DOI: http://dx.doi.org/10.7554/eLife.16039.001 PMID:27351103

  10. Distinct lateral inhibitory circuits drive parallel processing of sensory information in the mammalian olfactory bulb.

    PubMed

    Geramita, Matthew A; Burton, Shawn D; Urban, Nathan N

    2016-06-28

    Splitting sensory information into parallel pathways is a common strategy in sensory systems. Yet, how circuits in these parallel pathways are composed to maintain or even enhance the encoding of specific stimulus features is poorly understood. Here, we have investigated the parallel pathways formed by mitral and tufted cells of the olfactory system in mice and characterized the emergence of feature selectivity in these cell types via distinct lateral inhibitory circuits. We find differences in activity-dependent lateral inhibition between mitral and tufted cells that likely reflect newly described differences in the activation of deep and superficial granule cells. Simulations show that these circuit-level differences allow mitral and tufted cells to best discriminate odors in separate concentration ranges, indicating that segregating information about different ranges of stimulus intensity may be an important function of these parallel sensory pathways.

  11. Efficient parallel video processing techniques on GPU: from framework to implementation.

    PubMed

    Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan

    2014-01-01

    Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design.

  12. Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

    PubMed Central

    Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan

    2014-01-01

    Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design. PMID:24757432

  13. Dynamic CT perfusion image data compression for efficient parallel processing.

    PubMed

    Barros, Renan Sales; Olabarriaga, Silvia Delgado; Borst, Jordi; van Walderveen, Marianne A A; Posthuma, Jorrit S; Streekstra, Geert J; van Herk, Marcel; Majoie, Charles B L M; Marquering, Henk A

    2016-03-01

    The increasing size of medical imaging data, in particular time series such as CT perfusion (CTP), requires new and fast approaches to deliver timely results for acute care. Cloud architectures based on graphics processing units (GPUs) can provide the processing capacity required for delivering fast results. However, the size of CTP datasets makes transfers to cloud infrastructures time-consuming and therefore not suitable in acute situations. To reduce this transfer time, this work proposes a fast and lossless compression algorithm for CTP data. The algorithm exploits redundancies in the temporal dimension and keeps random read-only access to the image elements directly from the compressed data on the GPU. To the best of our knowledge, this is the first work to present a GPU-ready method for medical image compression with random access to the image elements from the compressed data.

  14. Parallel computing for simultaneous iterative tomographic imaging by graphics processing units

    NASA Astrophysics Data System (ADS)

    Bello-Maldonado, Pedro D.; López, Ricardo; Rogers, Colleen; Jin, Yuanwei; Lu, Enyue

    2016-05-01

    In this paper, we address the problem of accelerating inversion algorithms for nonlinear acoustic tomographic imaging by parallel computing on graphics processing units (GPUs). Nonlinear inversion algorithms for tomographic imaging often rely on iterative algorithms for solving an inverse problem, thus computationally intensive. We study the simultaneous iterative reconstruction technique (SIRT) for the multiple-input-multiple-output (MIMO) tomography algorithm which enables parallel computations of the grid points as well as the parallel execution of multiple source excitation. Using graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA) programming model an overall improvement of 26.33x was achieved when combining both approaches compared with sequential algorithms. Furthermore we propose an adaptive iterative relaxation factor and the use of non-uniform weights to improve the overall convergence of the algorithm. Using these techniques, fast computations can be performed in parallel without the loss of image quality during the reconstruction process.

  15. Parallel Distributed Processing: Implications for Cognition and Development

    DTIC Science & Technology

    1988-07-11

    cope with these kinds of compensation relations between variables. 11 Figure 4: Balance beam of the kind first used by Inhelder and Piaget (1958), and...developed by Inhelder and Piaget (1958), the so-called ba/ance-beam task, Is Eustrated In Figure 4. In this task, the child is shown a balance beam with pegs... Piaget stressed the continuity of the accomodation process, in spite of the overtly stage-like character of development, though he never gave a

  16. Harmony Theory: A Mathematical Framework for Stochastic Parallel Processing.

    DTIC Science & Technology

    1983-12-01

    9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT , TASK Center for Human Information Processing AREA & WORK UNIT NUMBERS...bsinfluene at * his Ideas peiva" this rsearch. Mw search reported here was conducted under Contract N00014-79-C-0323, NR 667-437 with the Personel sad...perspective, Hofstadter (1983) is pursuing a related approach to perceptual grouping; his ideas have been inspirational for my work (Hofs- tadter, 1979). An e

  17. A distributed parallel genetic algorithm of placement strategy for virtual machines deployment on cloud platform.

    PubMed

    Dong, Yu-Shuang; Xu, Gao-Chao; Fu, Xiao-Dong

    2014-01-01

    The cloud platform provides various services to users. More and more cloud centers provide infrastructure as the main way of operating. To improve the utilization rate of the cloud center and to decrease the operating cost, the cloud center provides services according to requirements of users by sharding the resources with virtualization. Considering both QoS for users and cost saving for cloud computing providers, we try to maximize performance and minimize energy cost as well. In this paper, we propose a distributed parallel genetic algorithm (DPGA) of placement strategy for virtual machines deployment on cloud platform. It executes the genetic algorithm parallelly and distributedly on several selected physical hosts in the first stage. Then it continues to execute the genetic algorithm of the second stage with solutions obtained from the first stage as the initial population. The solution calculated by the genetic algorithm of the second stage is the optimal one of the proposed approach. The experimental results show that the proposed placement strategy of VM deployment can ensure QoS for users and it is more effective and more energy efficient than other placement strategies on the cloud platform.

  18. A Distributed Parallel Genetic Algorithm of Placement Strategy for Virtual Machines Deployment on Cloud Platform

    PubMed Central

    Dong, Yu-Shuang; Xu, Gao-Chao; Fu, Xiao-Dong

    2014-01-01

    The cloud platform provides various services to users. More and more cloud centers provide infrastructure as the main way of operating. To improve the utilization rate of the cloud center and to decrease the operating cost, the cloud center provides services according to requirements of users by sharding the resources with virtualization. Considering both QoS for users and cost saving for cloud computing providers, we try to maximize performance and minimize energy cost as well. In this paper, we propose a distributed parallel genetic algorithm (DPGA) of placement strategy for virtual machines deployment on cloud platform. It executes the genetic algorithm parallelly and distributedly on several selected physical hosts in the first stage. Then it continues to execute the genetic algorithm of the second stage with solutions obtained from the first stage as the initial population. The solution calculated by the genetic algorithm of the second stage is the optimal one of the proposed approach. The experimental results show that the proposed placement strategy of VM deployment can ensure QoS for users and it is more effective and more energy efficient than other placement strategies on the cloud platform. PMID:25097872

  19. Connectionism, parallel constraint satisfaction processes, and gestalt principles: (re) introducing cognitive dynamics to social psychology.

    PubMed

    Read, S J; Vanman, E J; Miller, L C

    1997-01-01

    We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.

  20. Parallel Processing Method for Airborne Laser Scanning Data Using a PC Cluster and a Virtual Grid.

    PubMed

    Han, Soo Hee; Heo, Joon; Sohn, Hong Gyoo; Yu, Kiyun

    2009-01-01

    In this study, a parallel processing method using a PC cluster and a virtual grid is proposed for the fast processing of enormous amounts of airborne laser scanning (ALS) data. The method creates a raster digital surface model (DSM) by interpolating point data with inverse distance weighting (IDW), and produces a digital terrain model (DTM) by local minimum filtering of the DSM. To make a consistent comparison of performance between sequential and parallel processing approaches, the means of dealing with boundary data and of selecting interpolation centers were controlled for each processing node in parallel approach. To test the speedup, efficiency and linearity of the proposed algorithm, actual ALS data up to 134 million points were processed with a PC cluster consisting of one master node and eight slave nodes. The results showed that parallel processing provides better performance when the computational overhead, the number of processors, and the data size become large. It was verified that the proposed algorithm is a linear time operation and that the products obtained by parallel processing are identical to those produced by sequential processing.

  1. Parallel Processing Method for Airborne Laser Scanning Data Using a PC Cluster and a Virtual Grid

    PubMed Central

    Han, Soo Hee; Heo, Joon; Sohn, Hong Gyoo; Yu, Kiyun

    2009-01-01

    In this study, a parallel processing method using a PC cluster and a virtual grid is proposed for the fast processing of enormous amounts of airborne laser scanning (ALS) data. The method creates a raster digital surface model (DSM) by interpolating point data with inverse distance weighting (IDW), and produces a digital terrain model (DTM) by local minimum filtering of the DSM. To make a consistent comparison of performance between sequential and parallel processing approaches, the means of dealing with boundary data and of selecting interpolation centers were controlled for each processing node in parallel approach. To test the speedup, efficiency and linearity of the proposed algorithm, actual ALS data up to 134 million points were processed with a PC cluster consisting of one master node and eight slave nodes. The results showed that parallel processing provides better performance when the computational overhead, the number of processors, and the data size become large. It was verified that the proposed algorithm is a linear time operation and that the products obtained by parallel processing are identical to those produced by sequential processing. PMID:22574032

  2. Sculpting in cyberspace: Parallel processing the development of new software

    NASA Technical Reports Server (NTRS)

    Fisher, Rob

    1993-01-01

    Stimulating creativity in problem solving, particularly where software development is involved, is applicable to many disciplines. Metaphorical thinking keeps the problem in focus but in a different light, jarring people out of their mental ruts and sparking fresh insights. It forces the mind to stretch to find patterns between dissimilar concepts, in the hope of discovering unusual ideas in odd associations (Technology Review January 1993, p. 37). With a background in Engineering and Visual Design from MIT, I have for the past 30 years pursued a career as a sculptor of interdisciplinary monumental artworks that bridge the fields of science, engineering and art. Since 1979, I have pioneered the application of computer simulation to solve the complex problems associated with these projects. A recent project for the roof of the Carnegie Science Center in Pittsburgh made particular use of the metaphoric creativity technique described above. The problem-solving process led to the creation of hybrid software combining scientific, architectural and engineering visualization techniques. David Steich, a Doctoral Candidate in Electrical Engineering at Penn State, was commissioned to develop special software that enabled me to create innovative free-form sculpture. This paper explores the process of inventing the software through a detailed analysis of the interaction between an artist and a computer programmer.

  3. Parallel conjugate gradient: effects of ordering strategies, programming paradigms, and architectural platforms

    SciTech Connect

    Oliker, L.; Li, X.; Heber, G.; Biswas, R.

    2000-05-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations with a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multithreaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  4. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  5. Parallel Processing in the Corticogeniculate Pathway of the Macaque Monkey

    PubMed Central

    Briggs, Farran; Usrey, W. Martin

    2009-01-01

    Summary Although corticothalamic feedback is ubiquitous across species and modalities, its role in sensory processing is unclear. This study provides the first detailed description of the visual physiology of corticogeniculate neurons in the primate. Using electrical stimulation to identify corticogeniculate neurons, we distinguish three groups of neurons with response properties that closely resemble those of neurons in the magnocellular, parvocellular and koniocellular layers of their target structure, the lateral geniculate nucleus (LGN) of the thalamus. Our results indicate that corticogeniculate feedback in the primate is stream-specific and provide strong evidence in support of the view that corticothalamic feedback can influence the transmission of sensory information from the thalamus to the cortex in a stream-selective manner. PMID:19376073

  6. Non-parallel processing: Gendered attrition in academic computer science

    NASA Astrophysics Data System (ADS)

    Cohoon, Joanne Louise Mcgrath

    2000-10-01

    This dissertation addresses the issue of disproportionate female attrition from computer science as an instance of gender segregation in higher education. By adopting a theoretical framework from organizational sociology, it demonstrates that the characteristics and processes of computer science departments strongly influence female retention. The empirical data identifies conditions under which women are retained in the computer science major at comparable rates to men. The research for this dissertation began with interviews of students, faculty, and chairpersons from five computer science departments. These exploratory interviews led to a survey of faculty and chairpersons at computer science and biology departments in Virginia. The data from these surveys are used in comparisons of the computer science and biology disciplines, and for statistical analyses that identify which departmental characteristics promote equal attrition for male and female undergraduates in computer science. This three-pronged methodological approach of interviews, discipline comparisons, and statistical analyses shows that departmental variation in gendered attrition rates can be explained largely by access to opportunity, relative numbers, and other characteristics of the learning environment. Using these concepts, this research identifies nine factors that affect the differential attrition of women from CS departments. These factors are: (1) The gender composition of enrolled students and faculty; (2) Faculty turnover; (3) Institutional support for the department; (4) Preferential attitudes toward female students; (5) Mentoring and supervising by faculty; (6) The local job market, starting salaries, and competitiveness of graduates; (7) Emphasis on teaching; and (8) Joint efforts for student success. This work contributes to our understanding of the gender segregation process in higher education. In addition, it contributes information that can lead to effective solutions for an

  7. Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation. Appendix F. Studies in Parallel Image Processing.

    DTIC Science & Technology

    1984-08-01

    represented in the iimage , tie greater the potential benefits to be derived from SIMI)’ implementation of the process . This section begins with an...AD0A167 317 DISTRIBUTED COMPUTING FOR SIGNRL PROCESSING : MODELING 1.12 OF ASYNCHRONOUS PAR.. (U) PURDUE UNIV LAFAYETTE IN SCHOOL OF ELECTRICRL...Sffllffllflflflf lllI ..hhmhmhmhmh. II.LP NA -. II ’** -u 118 U2- miT 11111125 .h 4 6 MICR ACP’ CHART .............. 71Ph.D. Thesis by: Gie-Hing Lin

  8. Climate systems modeling on massively parallel processing computers at Lawrence Livermore National Laboratory

    SciTech Connect

    Wehner, W.F.; Mirin, A.A.; Bolstad, J.H.

    1996-09-01

    A comprehensive climate system model is under development at Lawrence Livermore National Laboratory. The basis for this model is a consistent coupling of multiple complex subsystem models, each describing a major component of the Earth`s climate. Among these are general circulation models of the atmosphere and ocean, a dynamic and thermodynamic sea ice model, and models of the chemical processes occurring in the air, sea water, and near-surface land. The computational resources necessary to carry out simulations at adequate spatial resolutions for durations of climatic time scales exceed those currently available. Distributed memory massively parallel processing (MPP) computers promise to affordably scale to the computational rates required by directing large numbers of relatively inexpensive processors onto a single problem. We have developed a suite of routines designed to exploit current generation MPP architectures via domain and functional decomposition strategies. These message passing techniques have been implemented in each of the component models and in their coupling interfaces. Production runs of the atmospheric and oceanic components performed on the National Environmental Supercomputing Center (NESC) Cray T3D are described.

  9. The role of parallelism in the real-time processing of anaphora

    PubMed Central

    Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P.

    2012-01-01

    Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution. PMID:23741080

  10. Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

    NASA Astrophysics Data System (ADS)

    Doerner, Edgardo

    2016-05-01

    In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.

  11. Adapting high-level language programs for parallel processing using data flow

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  12. Parallel Numerical Solution Process of a Two Dimensional Time Dependent Nonlinear Partial Differential Equation

    NASA Astrophysics Data System (ADS)

    Martin, I.; Tirado, F.; Vazquez, L.

    We present a process to achieve the solution of the two dimensional nonlinear Schrödinger equation using a multigrid technique on a distributed memory machine. Some features about the multigrid technique as its good convergence and parallel properties are explained in this paper. This makes multigrid method the optimal one to solve the systems of equations arising at each time step from an implicit numerical scheme. We give some experimental results about the parallel numerical simulation of this equation on a message passing parallel machine.

  13. Seventh SIAM Conference on Parallel Processing for Scientific Computing. Final technical report

    SciTech Connect

    1996-10-01

    The Seventh SIAM Conference on Parallel Processing for Scientific Computing was held in downtown San Francisco on the dates above. More than 400 people attended the meeting. This SIAM conference is, in this organizer`s opinion, the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Other, related areas, most notably parallel software and applications, are also well represented. The strong contributed sessions and minisymposia at the meeting attest to these claims.

  14. Parallel computer processing and modeling: applications for the ICU

    NASA Astrophysics Data System (ADS)

    Baxter, Grant; Pranger, L. Alex; Draghic, Nicole; Sims, Nathaniel M.; Wiesmann, William P.

    2003-07-01

    Current patient monitoring procedures in hospital intensive care units (ICUs) generate vast quantities of medical data, much of which is considered extemporaneous and not evaluated. Although sophisticated monitors to analyze individual types of patient data are routinely used in the hospital setting, this equipment lacks high order signal analysis tools for detecting long-term trends and correlations between different signals within a patient data set. Without the ability to continuously analyze disjoint sets of patient data, it is difficult to detect slow-forming complications. As a result, the early onset of conditions such as pneumonia or sepsis may not be apparent until the advanced stages. We report here on the development of a distributed software architecture test bed and software medical models to analyze both asynchronous and continuous patient data in real time. Hardware and software has been developed to support a multi-node distributed computer cluster capable of amassing data from multiple patient monitors and projecting near and long-term outcomes based upon the application of physiologic models to the incoming patient data stream. One computer acts as a central coordinating node; additional computers accommodate processing needs. A simple, non-clinical model for sepsis detection was implemented on the system for demonstration purposes. This work shows exceptional promise as a highly effective means to rapidly predict and thereby mitigate the effect of nosocomial infections.

  15. Integration Of Parallel Image Processing With Symbolic And Neural Computations For Imagery Exploitation

    NASA Astrophysics Data System (ADS)

    Roman, Evelyn

    1990-02-01

    In this paper we discuss the work being done at Itek combining parallel, symbolic, and neural methodologies at different stages of processing for imagery exploitation. We describe a prototype system we have been implementing combining real-time parallel image processing on an 8-stage parallel image-processing engine (PIPE) computer with expert system software such as our Multi-Sensor Exploitation Assistant system on the Symbolics LISP machine and with neural computations on the PIPE and on its host IBM AT for target recognition and change detection applications. We also provide a summary of basic neural concepts, and show the commonality between neural nets and related mathematics, artificial intelligence, and traditional image processing concepts. This provides us with numerous choices for the implementation of constraint satisfaction, transformational invariance, inference and representational mechanisms, and software lifecycle engineering methodologies in the different computational layers. Our future work may include optical processing as well, for a real-time capability complementing the PIPE's.

  16. Nonlinear vector eigen-solver and parallel reassembly processing for structural nonlinear vibration

    NASA Astrophysics Data System (ADS)

    Xue, D. Y.; Mei, Chuh

    1993-12-01

    In the frequency domain solution of large amplitude nonlinear vibration, two operations are computationally costly. They are: (1) the iterative eigen-solution and (2) the iterative nonlinear matrix reassembly. This study introduces a nonlinear eigen-solver which greatly speeds up the solution procedure by using a combination of vector iteration and nonlinear matrix updating. A feature of this new method is that it avoids repeatedly using a costly eigen-solver or equation solver. This solution procedure has also been engaged in parallel processing to further speed up the computation. Parallel nonlinear matrix reassembly is the main interest in this parallel processing. Force Macro is used in the parallel program on a CRAY-2S supercomputer.

  17. CRBLASTER: A Fast Parallel-Processing Program for Cosmic Ray Rejection in Space-Based Observations

    NASA Astrophysics Data System (ADS)

    Mighell, K.

    Many astronomical image analysis tasks are based on algorithms that can be described as being embarrassingly parallel - where the analysis of one subimage generally does not affect the analysis of another subimage. Yet few parallel-processing astrophysical image-analysis programs exist that can easily take full advantage of today's fast multi-core servers costing a few thousands of dollars. One reason for the shortage of state-of-the-art parallel processing astrophysical image-analysis codes is that the writing of parallel codes has been perceived to be difficult. I describe a new fast parallel-processing image-analysis program called CRBLASTER which does cosmic ray rejection using van Dokkum's L.A.Cosmic algorithm. CRBLASTER is written in C using the industry standard Message Passing Interface library. Processing a single 800 x 800 Hubble Space Telescope Wide-Field Planetary Camera 2 (WFPC2) image takes 1.9 seconds using 4 processors on an Apple Xserve with two dual-core 3.0-GHz Intel Xeons; the efficiency of the program running with the 4 cores is 82%. The code has been designed to be used as a software framework for the easy development of parallel-processing image-analysis programs using embarrassing parallel algorithms; all that needs to be done is to replace the core image processing task (in this case the C function that performs the L.A.Cosmic algorithm) with an alternative image analysis task based on a single processor algorithm. I describe the design and implementation of the program and then discuss how it could possibly be used to quickly do time-critical analysis applications such as those involved with space surveillance or do complex calibration tasks as part of the pipeline processing of images from large focal plane arrays.

  18. Automatic Mapping Of Large Signal Processing Systems To A Parallel Machine

    NASA Astrophysics Data System (ADS)

    Printz, Harry; Kung, H. T.; Mummert, Todd; Scherer, Paul M.

    1989-12-01

    Since the spring of 1988, Carnegie Mellon University and the Naval Air Development Center have been working together to implement several large signal processing systems on the Warp parallel computer. In the course of this work, we have developed a prototype of a software tool that can automatically and efficiently map signal processing systems to distributed-memory parallel machines, such as Warp. We have used this tool to produce Warp implementations of small test systems. The automatically generated programs compare favorably with hand-crafted code. We believe this tool will be a significant aid in the creation of high speed signal processing systems. We assume that signal processing systems have the following characteristics: •They can be described by directed graphs of computational tasks; these graphs may contain thousands of task vertices. • Some tasks can be parallelized in a systolic or data-partitioned manner, while others cannot be parallelized at all. • The side effects of each task, if any, are limited to changes in local variables. • Each task has a data-independent execution time bound, which may be expressed as a function of the way it is parallelized, and the number of processors it is mapped to. In this paper we describe techniques to automatically map such systems to Warp-like parallel machines. We identify and address key issues in gracefully combining different parallel programming styles, in allocating processor, memory and communication bandwidth, and in generating and scheduling efficient parallel code. When iWarp, the VLSI version of the Warp machine, becomes available in 1990, we will extend this tool to generate efficient code for very large applications, which may require as many as 3000 iWarp processors, with an aggregate peak performance of 60 gigaflops.

  19. High Speed Publication Subscription Brokering Through Highly Parallel Processing on Field Programmable Gate Array (FPGA)

    DTIC Science & Technology

    2010-01-01

    and that Unix style newlines are being used. Section 2. Hardware Required for a Single Node All of the information in the multi- node hardware...AFRL-RI-RS-TR-2010-29 Final Technical Report January 2010 HIGH SPEED PUBLICATION SUBSCRIPTION BROKERING THROUGH HIGHLY PARALLEL ...2007 – August 2009 4. TITLE AND SUBTITLE HIGH SPEED PUBLICATION SUBSCRIPTION BROKERING THROUGH HIGHLY PARALLEL PROCESSING ON FIELD PROGRAMMABLE

  20. Design and implementation of the parallel processing system of multi-channel polarization images

    NASA Astrophysics Data System (ADS)

    Li, Zhi-yong; Huang, Qin-chao

    2013-08-01

    Compared with traditional optical intensity image processing, polarization images processing has two main problems. One is that the amount of data is larger. The other is that processing tasks is more complex. To resolve these problems, the parallel processing system of multi-channel polarization images is designed by the multi-DSP technique. It contains a communication control unit (CCU) and a data processing array (DPA). CCU controls communications inside and outside the system. Its logics are designed by a FPGA chip. DPA is made up of four Digital Signal Processor (DSP) chips, which are interlinked by the loose coupling method. DPA implements processing tasks including images registration and images synthesis by parallel processing methods. The polarization images parallel processing model is designed on multi levels including the system task, the algorithm and the operation. Its program is designed by the assemble language. While the polarization image resolution is 782x582 pixels, the pixel data length is 12 bits in the experiment. After it received 3 channels of polarization image simultaneously, this system implements parallel task to acquire the target polarization characteristics. Experimental results show that this system has good real-time and reliability. The processing time of images registration is 293.343ms while the registration accuracy achieves 0.5 pixel. The processing time of images synthesis is 3.199ms.

  1. Low Activity Waste Feed Process Control Strategy

    SciTech Connect

    STAEHR, T.W.

    2000-06-14

    The primary purpose of this document is to describe the overall process control strategy for monitoring and controlling the functions associated with the Phase 1B high-level waste feed delivery. This document provides the basis for process monitoring and control functions and requirements needed throughput the double-shell tank system during Phase 1 high-level waste feed delivery. This document is intended to be used by (1) the developers of the future Process Control Plan and (2) the developers of the monitoring and control system.

  2. Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Liu, Kuojuey Ray

    1990-01-01

    Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.

  3. The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce

    NASA Astrophysics Data System (ADS)

    Chen, Xi; Zhou, Liqing

    2015-12-01

    With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.

  4. Visual analysis of inter-process communication for large-scale parallel computing.

    PubMed

    Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

    2009-01-01

    In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.

  5. Parallel optical image processing with image-logic algebra and a polynomial approach.

    PubMed

    Bhattacharya, P

    1994-09-10

    An interesting relationship between an optical parallel-processing single-instruction-multiple-data generic language, called image-logic algebra, and a polynomial approach for processing binary images by electronic computers is shown. Using only two basic operations of the ILA, one can reformulate a number of algorithms developed earlier in the polynomial approach into algorithms in the ILA environment. Thus a large number of new algorithms for parallel optical processing of binary images can be developed in the ILA environment that are fast and efficient.

  6. New optical scheme for parallel processing of 1D gray images

    NASA Astrophysics Data System (ADS)

    Huang, Guoliang; Jin, Guofan; Wu, Minxian; Yan, Yingbai

    1994-06-01

    Based on mathematical morphology and digital umbra shading and shadowing algorithm, a new scheme for realizing the fundamental morphological operation of one dimensional gray images is proposed. The mathematical formula for the parallel processing of 1D gray images is summarized; some important conclusions of morphological processing from binary images to gray images are obtained. The advantages of this scheme is simple in structure, high resolution in gray level, and good in parallelism. It can raise the speed of performing morphological processing of gray images greatly and obtain more accurate results.

  7. Parallel processing for large-scale nonlinear control experiments in economics

    SciTech Connect

    Amman, H.M. ); Kendrick, D.A. . Dept. of Economics)

    1991-01-01

    In general, the econometric models relevant for purposes of evaluating economic policy contain a large number of nonlinear equations. Therefore, in applying optimal control techniques, computational difficulties are encountered. This paper presents the most common algorithm for computing nonlinear control problems and investigates the degree to which vector processing and parallel processing can facilitate optimal control experiments.

  8. SURVEY OF HIGHLY PARALLEL INFORMATION PROCESSING TECHNOLOGY AND SYSTEMS. PHASE I OF AN IMPLICATIONS STUDY,

    DTIC Science & Technology

    The purpose of this report is to present the results of a survey of the technology of highly parallel information processing technology and systems... processing technology and systems. Completion of this study will require a survey of naval systems to determine which can benefit from the technology

  9. Parallel Processing of the Target Language during Source Language Comprehension in Interpreting

    ERIC Educational Resources Information Center

    Dong, Yanping; Lin, Jiexuan

    2013-01-01

    Two experiments were conducted to test the hypothesis that the parallel processing of the target language (TL) during source language (SL) comprehension in interpreting may be influenced by two factors: (i) link strength from SL to TL, and (ii) the interpreter's cognitive resources supplement to TL processing during SL comprehension. The…

  10. Parallel Processing of the Target Language during Source Language Comprehension in Interpreting

    ERIC Educational Resources Information Center

    Dong, Yanping; Lin, Jiexuan

    2013-01-01

    Two experiments were conducted to test the hypothesis that the parallel processing of the target language (TL) during source language (SL) comprehension in interpreting may be influenced by two factors: (i) link strength from SL to TL, and (ii) the interpreter's cognitive resources supplement to TL processing during SL comprehension. The…

  11. Parallels between a Collaborative Research Process and the Middle Level Philosophy

    ERIC Educational Resources Information Center

    Dever, Robin; Ross, Diane; Miller, Jennifer; White, Paula; Jones, Karen

    2014-01-01

    The characteristics of the middle level philosophy as described in This We Believe closely parallel the collaborative research process. The journey of one research team is described in relationship to these characteristics. The collaborative process includes strengths such as professional relationships, professional development, courageous…

  12. The hypercluster: A parallel processing test-bed architecture for computational mechanics applications

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.

    1987-01-01

    The development of numerical methods and software tools for parallel processors can be aided through the use of a hardware test-bed. The test-bed architecture must be flexible enough to support investigations into architecture-algorithm interactions. One way to implement a test-bed is to use a commercial parallel processor. Unfortunately, most commercial parallel processors are fixed in their interconnection and/or processor architecture. In this paper, we describe a modified n cube architecture, called the hypercluster, which is a superset of many other processor and interconnection architectures. The hypercluster is intended to support research into parallel processing of computational fluid and structural mechanics problems which may require a number of different architectural configurations. An example of how a typical partial differential equation solution algorithm maps on to the hypercluster is given.

  13. Solution-processed parallel tandem polymer solar cells using silver nanowires as intermediate electrode.

    PubMed

    Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J

    2014-12-23

    Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.

  14. Rapid Pattern Recognition of Three Dimensional Objects Using Parallel Processing Within a Hierarchy of Hexagonal Grids

    NASA Astrophysics Data System (ADS)

    Tang, Haojun

    1995-01-01

    This thesis describes using parallel processing within a hierarchy of hexagonal grids to achieve rapid recognition of patterns. A seven-pixel basic hexagonal neighborhood, a sixty-one-pixel superneighborhood and pyramids of a 2-to-4 area ratio are employed. The hexagonal network achieves improved accuracy over the square network for object boundaries. The hexagonal grid with less directional sensitivity is a better approximation of the human vision grid, is more suited to natural scenes than the square grid and avoids the 4-neighbor/8-neighbor problem. Parallel processing in image analysis saves considerable time versus the traditional line-by-line method. Hexagonal parallel processing combines the optimum hexagonal geometry with the parallel structure. Our work has surveyed behavior and internal properties to construct the image on the different level of hexagonal pixel grids in a parallel computation scheme. A computer code has been developed to detect edges of digital images of real objects taken with a CCD camera within a hexagonal grid at any level. The algorithm uses the differences of the local gray level and those of its six neighbors, and is able to determine the boundary of a digital image in parallel. Also a series of algorithms and techniques have been built up to manage edge linking, feature extraction, etc. The digital images obtained from the improved CRS digital image processing system are a good approximation to the images which would be obtained with a real physical hexagonal grid. We envision that our work done within this little-known area will have some important applications in real-time machine vision. A parallel two-layer hexagonal-array retina has been designed to do pattern recognition using simple operations such as differencing, rationing, thresholding, etc. which may occur in the human retina and other biological vision systems.

  15. Advancing the extended parallel process model through the inclusion of response cost measures.

    PubMed

    Rintamaki, Lance S; Yang, Z Janet

    2014-01-01

    This study advances the Extended Parallel Process Model through the inclusion of response cost measures, which are drawbacks associated with a proposed response to a health threat. A sample of 502 college students completed a questionnaire on perceptions regarding sexually transmitted infections and condom use after reading information from the Centers for Disease Control and Prevention on the health risks of sexually transmitted infections and the utility of latex condoms in preventing sexually transmitted infection transmission. The questionnaire included standard Extended Parallel Process Model assessments of perceived threat and efficacy, as well as questions pertaining to response costs associated with condom use. Results from hierarchical ordinary least squares regression demonstrated how the addition of response cost measures improved the predictive power of the Extended Parallel Process Model, supporting the inclusion of this variable in the model.

  16. Toward a model framework of generalized parallel componential processing of multi-symbol numbers.

    PubMed

    Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

    2015-05-01

    In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining and investigating a sign-decade compatibility effect for the comparison of positive and negative numbers, which extends the unit-decade compatibility effect in 2-digit number processing. Then, we evaluated whether the model is capable of accounting for previous findings in negative number processing. In a magnitude comparison task, in which participants had to single out the larger of 2 integers, we observed a reliable sign-decade compatibility effect with prolonged reaction times for incompatible (e.g., -97 vs. +53; in which the number with the larger decade digit has the smaller, i.e., negative polarity sign) as compared with sign-decade compatible number pairs (e.g., -53 vs. +97). Moreover, an analysis of participants' eye fixation behavior corroborated our model of parallel componential processing of multi-symbol numbers. These results are discussed in light of concurrent theoretical notions about negative number processing. On the basis of the present results, we propose a generalized integrated model framework of parallel componential multi-symbol processing. (c) 2015 APA, all rights reserved).

  17. [Multi-DSP parallel processing technique of hyperspectral RX anomaly detection].

    PubMed

    Guo, Wen-Ji; Zeng, Xiao-Ru; Zhao, Bao-Wei; Ming, Xing; Zhang, Gui-Feng; Lü, Qun-Bo

    2014-05-01

    To satisfy the requirement of high speed, real-time and mass data storage etc. for RX anomaly detection of hyperspectral image data, the present paper proposes a solution of multi-DSP parallel processing system for hyperspectral image based on CPCI Express standard bus architecture. Hardware topological architecture of the system combines the tight coupling of four DSPs sharing data bus and memory unit with the interconnection of Link ports. On this hardware platform, by assigning parallel processing task for each DSP in consideration of the spectrum RX anomaly detection algorithm and the feature of 3D data in the spectral image, a 4DSP parallel processing technique which computes and solves the mean matrix and covariance matrix of the whole image by spatially partitioning the image is proposed. The experiment result shows that, in the case of equivalent detective effect, it can reach the time efficiency 4 times higher than single DSP process with the 4-DSP parallel processing technique of RX anomaly detection algorithm proposed by this paper, which makes a breakthrough in the constraints to the huge data image processing of DSP's internal storage capacity, meanwhile well meeting the demands of the spectral data in real-time processing.

  18. Managing internode data communications for an uninitialized process in a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

    2014-05-20

    A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.

  19. Application of integration algorithms in a parallel processing environment for the simulation of jet engines

    SciTech Connect

    Krosel, S.M.; Milner, E.J.

    1982-01-01

    Illustrates the application of predictor-corrector integration algorithms developed for the digital parallel processing environment. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real-time performance, inter-processor communication and algorithm startup are also discussed. 10 references.

  20. Application of integration algorithms in a parallel processing environment for the simulation of jet engines

    NASA Technical Reports Server (NTRS)

    Krosel, S. M.; Milner, E. J.

    1982-01-01

    The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.

  1. Next generation Purex modeling by way of parallel processing with high performance computers

    SciTech Connect

    DeMuth, S.F.

    1993-08-01

    The Plutonium and Uranium Extraction (Purex) process is the predominant method used worldwide for solvent extraction in reprocessing spent nuclear fuels. Proper flowsheet design has a significant impact on the character of the process waste. Past Purex flowsheet modeling has been based on equilibrium conditions. It can be shown for the Purex process that optimum separation does not necessarily occur at equilibrium conditions. The next generation Purex flowsheet models should incorporate the fundamental diffusion and chemical kinetic processes required to study time-dependent behavior. Use of parallel processing with high-performance computers will permit transient multistage and multispecies design calculations based on mass transfer with simultaneous chemical reaction models. This paper presents an applicable mass transfer with chemical reaction model for the Purex system and presents a parallel processing solution methodology.

  2. American History--Part 2. Teacher's Guide [and Student Workbook]. Revised. Parallel Alternative Strategies for Students (PASS).

    ERIC Educational Resources Information Center

    Fresen, Sue; Logan, Joshua; McCarron, Kathleen

    This teacher's and student's guide is part of a series of content-centered packages of supplemental reading, activities, and methods adapted for students who have disabilities. Parallel Alternative Strategies for Students (PASS) materials are designed to help these students succeed in regular education content courses. The content in PASS differs…

  3. American History--Part 1. Teacher's Guide [and Student Workbook]. Revised. Parallel Alternative Strategies for Students (PASS).

    ERIC Educational Resources Information Center

    Fresen, Sue,; Logan, Joshua; McCarron, Kathleen

    This teacher's and student's guide is part of a series of content-centered packages of supplemental reading, activities, and methods adapted for students who have disabilities. Parallel Alternative Strategies for Students (PASS) materials are designed to help these students succeed in regular education content courses. The content in PASS differs…

  4. Evaluation of parallel reduction strategies for fusion of sensory information from a robot team

    NASA Astrophysics Data System (ADS)

    Lyons, Damian M.; Leroy, Joseph

    2015-05-01

    The advantage of using a team of robots to search or to map an area is that by navigating the robots to different parts of the area, searching or mapping can be completed more quickly. A crucial aspect of the problem is the combination, or fusion, of data from team members to generate an integrated model of the search/mapping area. In prior work we looked at the issue of removing mutual robots views from an integrated point cloud model built from laser and stereo sensors, leading to a cleaner and more accurate model. This paper addresses a further challenge: Even with mutual views removed, the stereo data from a team of robots can quickly swamp a WiFi connection. This paper proposes and evaluates a communication and fusion approach based on the parallel reduction operation, where data is combined in a series of steps of increasing subsets of the team. Eight different strategies for selecting the subsets are evaluated for bandwidth requirements using three robot missions, each carried out with teams of four Pioneer 3-AT robots. Our results indicate that selecting groups to combine based on similar pose but distant location yields the best results.

  5. Processing communications events in parallel active messaging interface by awakening thread from wait state

    DOEpatents

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-22

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  6. Computation of the Density Matrix in Electronic Structure Theory in Parallel on Multiple Graphics Processing Units.

    PubMed

    Cawkwell, M J; Wood, M A; Niklasson, Anders M N; Mniszewski, S M

    2014-12-09

    The algorithm developed in Cawkwell, M. J. et al. J. Chem. Theory Comput. 2012 , 8 , 4094 for the computation of the density matrix in electronic structure theory on a graphics processing unit (GPU) using the second-order spectral projection (SP2) method [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] has been efficiently parallelized over multiple GPUs on a single compute node. The parallel implementation provides significant speed-ups with respect to the single GPU version with no loss of accuracy. The performance and accuracy of the parallel GPU-based algorithm is compared with the performance of the SP2 algorithm and traditional matrix diagonalization methods on a multicore central processing unit (CPU).

  7. Parallel and serial processes in the human oculomotor system: bimodal integration and express saccades.

    PubMed

    Nozawa, G; Reuter-Lorenz, P A; Hughes, H C

    1994-01-01

    Saccadic reaction times (SRTs) were analyzed in the context of stochastic models of information processing (e.g., Townsend and Ashby 1983) to reveal the processing architecture(s) underlying integrative interactions between visual and auditory inputs and the mechanisms of express saccades. The results support the following conclusions. Bimodal (visual+auditory) targets are processed in parallel, and facilitate SRT to an extent that exceeds levels attainable by probability summation. This strongly implies neural summation between elements responding to spatially aligned visual and auditory inputs in the human oculomotor system. Second, express saccades are produced within a separable processing stage that is organized in series with that responsible for intersensory integration. A model is developed that implements this combination of parallel and serial processing. The activity in parallel input channels is summed within a sensory stage which is organized in series with a pre-motor and motor stage. The time course of each subprocess is considered a random variable, and different experimental manipulations can selectively influence different stages. Parallels between the model and physiological data are explored.

  8. Rapid attentional selection processes operate independently and in parallel for multiple targets.

    PubMed

    Grubert, Anna; Eimer, Martin

    2016-12-01

    The question whether multiple objects are selected serially or in parallel remains contentious. Previous studies employed the N2pc component as a marker of attentional selection to show that multiple selection processes can be activated concurrently. The present study demonstrates that the concurrent selection of multiple targets reflects genuinely parallel processing that is unaffected by whether or when an additional selection process is elicited simultaneously for another target. Experiment 1 showed that N2pc components triggered during the selection of a colour-defined target were not modulated by the presence versus absence of a second target that appeared in close temporal proximity. Experiment 2 revealed that the same rapid parallel selection processes were elicited regardless of whether two targets appeared simultaneously or in two successive displays. Results show that rapid attentional selection processes within the first 200ms after stimulus onset can be triggered in parallel for multiple objects in the visual field. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Parallel Digital Watermarking Process on Ultrasound Medical Images in Multicores Environment

    PubMed Central

    Khor, Hui Liang; Liew, Siau-Chuin; Zain, Jasni Mohd.

    2016-01-01

    With the advancement of technology in communication network, it facilitated digital medical images transmitted to healthcare professionals via internal network or public network (e.g., Internet), but it also exposes the transmitted digital medical images to the security threats, such as images tampering or inserting false data in the images, which may cause an inaccurate diagnosis and treatment. Medical image distortion is not to be tolerated for diagnosis purposes; thus a digital watermarking on medical image is introduced. So far most of the watermarking research has been done on single frame medical image which is impractical in the real environment. In this paper, a digital watermarking on multiframes medical images is proposed. In order to speed up multiframes watermarking processing time, a parallel watermarking processing on medical images processing by utilizing multicores technology is introduced. An experiment result has shown that elapsed time on parallel watermarking processing is much shorter than sequential watermarking processing. PMID:26981111

  10. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    NASA Astrophysics Data System (ADS)

    Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

    2015-12-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.

  11. Parallel processing of face and house stimuli by V1 and specialized visual areas: a magnetoencephalographic (MEG) study

    PubMed Central

    Shigihara, Yoshihito; Zeki, Semir

    2014-01-01

    We used easily distinguishable stimuli of faces and houses constituted from straight lines, with the aim of learning whether they activate V1 on the one hand, and the specialized areas that are critical for the processing of faces and houses on the other, with similar latencies. Eighteen subjects took part in the experiment, which used magnetoencephalography (MEG) coupled to analytical methods to detect the time course of the earliest responses which these stimuli provoke in these cortical areas. Both categories of stimuli activated V1 and areas of the visual cortex outside it at around 40 ms after stimulus onset, and the amplitude elicited by face stimuli was significantly larger than that elicited by house stimuli. These results suggest that “low-level” and “high-level” features of form stimuli are processed in parallel by V1 and visual areas outside it. Taken together with our previous results on the processing of simple geometric forms (Shgihara and Zeki, 2013; Shigihara and Zeki, 2014), the present ones reinforce the conclusion that parallel processing is an important component in the strategy used by the brain to process and construct forms. PMID:25426050

  12. The Large Laboratory Course: Organize It to Parallel Industrial Process Development.

    ERIC Educational Resources Information Center

    Eckert, Roger E.; Ybarra, Robert M.

    1988-01-01

    Describes a senior level chemical engineering course at Purdue University that parallels an industrial process development department. Stresses the course organization, manager-engineer contract, evaluation of students, course evaluation, and gives examples of course improvements made during the course. (CW)

  13. Parallel processing in the honeybee olfactory pathway: structure, function, and evolution.

    PubMed

    Rössler, Wolfgang; Brill, Martin F

    2013-11-01

    Animals face highly complex and dynamic olfactory stimuli in their natural environments, which require fast and reliable olfactory processing. Parallel processing is a common principle of sensory systems supporting this task, for example in visual and auditory systems, but its role in olfaction remained unclear. Studies in the honeybee focused on a dual olfactory pathway. Two sets of projection neurons connect glomeruli in two antennal-lobe hemilobes via lateral and medial tracts in opposite sequence with the mushroom bodies and lateral horn. Comparative studies suggest that this dual-tract circuit represents a unique adaptation in Hymenoptera. Imaging studies indicate that glomeruli in both hemilobes receive redundant sensory input. Recent simultaneous multi-unit recordings from projection neurons of both tracts revealed widely overlapping response profiles strongly indicating parallel olfactory processing. Whereas lateral-tract neurons respond fast with broad (generalistic) profiles, medial-tract neurons are odorant specific and respond slower. In analogy to "what-" and "where" subsystems in visual pathways, this suggests two parallel olfactory subsystems providing "what-" (quality) and "when" (temporal) information. Temporal response properties may support across-tract coincidence coding in higher centers. Parallel olfactory processing likely enhances perception of complex odorant mixtures to decode the diverse and dynamic olfactory world of a social insect.

  14. An Inconvenient Truth: An Application of the Extended Parallel Process Model

    ERIC Educational Resources Information Center

    Goodall, Catherine E.; Roberto, Anthony J.

    2008-01-01

    "An Inconvenient Truth" is an Academy Award-winning documentary about global warming presented by Al Gore. This documentary is appropriate for a lesson on fear appeals and the extended parallel process model (EPPM). The EPPM is concerned with the effects of perceived threat and efficacy on behavior change. Perceived threat is composed of an…

  15. Tracking the Continuity of Language Comprehension: Computer Mouse Trajectories Suggest Parallel Syntactic Processing

    ERIC Educational Resources Information Center

    Farmer, Thomas A.; Cargill, Sarah A.; Hindy, Nicholas C.; Dale, Rick; Spivey, Michael J.

    2007-01-01

    Although several theories of online syntactic processing assume the parallel activation of multiple syntactic representations, evidence supporting simultaneous activation has been inconclusive. Here, the continuous and non-ballistic properties of computer mouse movements are exploited, by recording their streaming x, y coordinates to procure…

  16. User's guide to the Parallel Processing Extension of the Prognosis Model

    Treesearch

    Nicholas L. Crookston; Albert R. Stage

    1991-01-01

    The Parallel Processing Extension (PPE) of the Prognosis Model was designed to analyze responses of numerous stands to coordinated management and pest impacts that operate at the landscape level of forests. Vegetation-related resource supply analysis can be readily performed for a thousand or more sample stands for projections 400 years into the future. Capabilities...

  17. Recent development for the ITS code system: Parallel processing and visualization

    SciTech Connect

    Fan, W.C.; Turner, C.D.; Halbleib, J.A. Sr.; Kensek, R.P.

    1996-03-01

    A brief overview is given for two software developments related to the ITS code system. These developments provide parallel processing and visualization capabilities and thus allow users to perform ITS calculations more efficiently. Timing results and a graphical example are presented to demonstrate these capabilities.

  18. Parallel Distributed Processing at 25: Further Explorations in the Microstructure of Cognition

    ERIC Educational Resources Information Center

    Rogers, Timothy T.; McClelland, James L.

    2014-01-01

    This paper introduces a special issue of "Cognitive Science" initiated on the 25th anniversary of the publication of "Parallel Distributed Processing" (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP…

  19. Using the Extended Parallel Process Model to Examine Teachers' Likelihood of Intervening in Bullying

    ERIC Educational Resources Information Center

    Duong, Jeffrey; Bradshaw, Catherine P.

    2013-01-01

    Background: Teachers play a critical role in protecting students from harm in schools, but little is known about their attitudes toward addressing problems like bullying. Previous studies have rarely used theoretical frameworks, making it difficult to advance this area of research. Using the Extended Parallel Process Model (EPPM), we examined the…

  20. Parallel Distributed Processing at 25: Further Explorations in the Microstructure of Cognition

    ERIC Educational Resources Information Center

    Rogers, Timothy T.; McClelland, James L.

    2014-01-01

    This paper introduces a special issue of "Cognitive Science" initiated on the 25th anniversary of the publication of "Parallel Distributed Processing" (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP…

  1. Real-time SHVC software decoding with multi-threaded parallel processing

    NASA Astrophysics Data System (ADS)

    Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

    2014-09-01

    This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.

  2. High Performance Parallel Processing Project: Industrial computing initiative. Progress reports for fiscal year 1995

    SciTech Connect

    Koniges, A.

    1996-02-09

    This project is a package of 11 individual CRADA`s plus hardware. This innovative project established a three-year multi-party collaboration that is significantly accelerating the availability of commercial massively parallel processing computing software technology to U.S. government, academic, and industrial end-users. This report contains individual presentations from nine principal investigators along with overall program information.

  3. Using the Extended Parallel Process Model to Examine Teachers' Likelihood of Intervening in Bullying

    ERIC Educational Resources Information Center

    Duong, Jeffrey; Bradshaw, Catherine P.

    2013-01-01

    Background: Teachers play a critical role in protecting students from harm in schools, but little is known about their attitudes toward addressing problems like bullying. Previous studies have rarely used theoretical frameworks, making it difficult to advance this area of research. Using the Extended Parallel Process Model (EPPM), we examined the…

  4. Parallel Process and Isomorphism: A Model for Decision Making in the Supervisory Triad

    ERIC Educational Resources Information Center

    Koltz, Rebecca L.; Odegard, Melissa A.; Feit, Stephen S.; Provost, Kent; Smith, Travis

    2012-01-01

    Parallel process and isomorphism are two supervisory concepts that are often discussed independently but rarely discussed in connection with each other. These two concepts, philosophically, have different historical roots, as well as different implications for interventions with regard to the supervisory triad. The authors examine the difference…

  5. Cocaine Use and Delinquent Behavior among High-Risk Youths: A Growth Model of Parallel Processes

    ERIC Educational Resources Information Center

    Dembo, Richard; Sullivan, Christopher

    2009-01-01

    We report the results of a parallel-process, latent growth model analysis examining the relationships between cocaine use and delinquent behavior among youths. The study examined a sample of 278 justice-involved juveniles completing at least one of three follow-up interviews as part of a National Institute on Drug Abuse-funded study. The results…

  6. A Neurally Plausible Parallel Distributed Processing Model of Event-Related Potential Word Reading Data

    ERIC Educational Resources Information Center

    Laszlo, Sarah; Plaut, David C.

    2012-01-01

    The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between…

  7. One Factor or Two Parallel Processes? Comorbidity and Development of Adolescent Anxiety and Depressive Disorder Symptoms

    ERIC Educational Resources Information Center

    Hale, William W., III; Raaijmakers, Quinten A. W.; Muris, Peter; van Hoof, Anne; Meeus, Wim H. J.

    2009-01-01

    Background: This study investigates whether anxiety and depressive disorder symptoms of adolescents from the general community are best described by a model that assumes they are indicative of one general factor or by a model that assumes they are two distinct disorders with parallel growth processes. Additional analyses were conducted to explore…

  8. Parallel Process and Isomorphism: A Model for Decision Making in the Supervisory Triad

    ERIC Educational Resources Information Center

    Koltz, Rebecca L.; Odegard, Melissa A.; Feit, Stephen S.; Provost, Kent; Smith, Travis

    2012-01-01

    Parallel process and isomorphism are two supervisory concepts that are often discussed independently but rarely discussed in connection with each other. These two concepts, philosophically, have different historical roots, as well as different implications for interventions with regard to the supervisory triad. The authors examine the difference…

  9. A Neurally Plausible Parallel Distributed Processing Model of Event-Related Potential Word Reading Data

    ERIC Educational Resources Information Center

    Laszlo, Sarah; Plaut, David C.

    2012-01-01

    The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between…

  10. An Inconvenient Truth: An Application of the Extended Parallel Process Model

    ERIC Educational Resources Information Center

    Goodall, Catherine E.; Roberto, Anthony J.

    2008-01-01

    "An Inconvenient Truth" is an Academy Award-winning documentary about global warming presented by Al Gore. This documentary is appropriate for a lesson on fear appeals and the extended parallel process model (EPPM). The EPPM is concerned with the effects of perceived threat and efficacy on behavior change. Perceived threat is composed of an…

  11. Cocaine Use and Delinquent Behavior among High-Risk Youths: A Growth Model of Parallel Processes

    ERIC Educational Resources Information Center

    Dembo, Richard; Sullivan, Christopher

    2009-01-01

    We report the results of a parallel-process, latent growth model analysis examining the relationships between cocaine use and delinquent behavior among youths. The study examined a sample of 278 justice-involved juveniles completing at least one of three follow-up interviews as part of a National Institute on Drug Abuse-funded study. The results…

  12. Tracking the Continuity of Language Comprehension: Computer Mouse Trajectories Suggest Parallel Syntactic Processing

    ERIC Educational Resources Information Center

    Farmer, Thomas A.; Cargill, Sarah A.; Hindy, Nicholas C.; Dale, Rick; Spivey, Michael J.

    2007-01-01

    Although several theories of online syntactic processing assume the parallel activation of multiple syntactic representations, evidence supporting simultaneous activation has been inconclusive. Here, the continuous and non-ballistic properties of computer mouse movements are exploited, by recording their streaming x, y coordinates to procure…

  13. Parallel pulse processing and data acquisition for high speed, low error flow cytometry

    DOEpatents

    Engh, G.J. van den; Stokdijk, W.

    1992-09-22

    A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate. 17 figs.

  14. Parallel pulse processing and data acquisition for high speed, low error flow cytometry

    DOEpatents

    van den Engh, Gerrit J.; Stokdijk, Willem

    1992-01-01

    A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate.

  15. [Work processes in Family Health Strategy team].

    PubMed

    Pavoni, Daniela Soccoloski; Medeiros, Cássia Regina Gotler

    2009-01-01

    The Family Health Strategy requires a redefinition of the health care model, characterized by interdisciplinary team work. This study is aimed at knowiong the work processes in a Family Health Team. The research was qualitative, and 10 team members were interviewed. Results demonstrated that the nurse performs a variety of functions that could be shared with other people; this overloads him/her and makes inherent job task execution difficult. Task planning and performing are usually done in teams, but some professionals get more involved in these activities. It was concluded that there is a need for the team to reflect upon work process as well as reassess task assignment, so that each individual is able to perform the work and contribute for an integrated work.

  16. Intelligent approach for parallel HEV control strategy based on driving cycles

    NASA Astrophysics Data System (ADS)

    Montazeri-Gh, M.; Asadi, M.

    2011-02-01

    This article describes a methodological approach for the intelligent control of parallel hybrid electric vehicle (HEV) by the inclusion of the concept of driving cycles. In this approach, a fuzzy logic controller is designed to manage the internal combustion engine to work in the vicinity of its optimal condition instantaneously. In addition, based on the definition of microtrip, several driving patterns are classified that represent the congested to highway traffic conditions. The driving cycle and traffic conditions are then incorporated in an optimisation process to tune the fuzzy membership function parameters. In this study, the optimisation process is formulated to minimise the HEV fuel consumption (FC) and emissions as well as the satisfaction of the driving performance constraints. Finally, optimisation results are provided for three different driving cycles including ECE-EUDC, FTP and TEH-CAR. TEH-CAR is a driving cycle that is developed based on the experimental data collected from the real traffic condition in the city of Tehran. The results from the computer simulation show the effectiveness of the approach and reduction in FC and emissions while ensuring that the vehicle performance is not sacrificed.

  17. Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

    NASA Astrophysics Data System (ADS)

    Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

    2016-04-01

    Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and

  18. Parallel multigrid solver of radiative transfer equation for photon transport via graphics processing unit.

    PubMed

    Gao, Hao; Phan, Lan; Lin, Yuting

    2012-09-01

    A graphics processing unit-based parallel multigrid solver for a radiative transfer equation with vacuum boundary condition or reflection boundary condition is presented for heterogeneous media with complex geometry based on two-dimensional triangular meshes or three-dimensional tetrahedral meshes. The computational complexity of this parallel solver is linearly proportional to the degrees of freedom in both angular and spatial variables, while the full multigrid method is utilized to minimize the number of iterations. The overall gain of speed is roughly 30 to 300 fold with respect to our prior multigrid solver, which depends on the underlying regime and the parallelization. The numerical validations are presented with the MATLAB codes at https://sites.google.com/site/rtefastsolver/.

  19. Parallel multigrid solver of radiative transfer equation for photon transport via graphics processing unit

    PubMed Central

    Phan, Lan; Lin, Yuting

    2012-01-01

    Abstract. A graphics processing unit–based parallel multigrid solver for a radiative transfer equation with vacuum boundary condition or reflection boundary condition is presented for heterogeneous media with complex geometry based on two-dimensional triangular meshes or three-dimensional tetrahedral meshes. The computational complexity of this parallel solver is linearly proportional to the degrees of freedom in both angular and spatial variables, while the full multigrid method is utilized to minimize the number of iterations. The overall gain of speed is roughly 30 to 300 fold with respect to our prior multigrid solver, which depends on the underlying regime and the parallelization. The numerical validations are presented with the MATLAB codes at https://sites.google.com/site/rtefastsolver/. PMID:23085905

  20. A domain decomposition parallel processing algorithm for molecular dynamics simulations of polymers

    NASA Astrophysics Data System (ADS)

    Brown, David; Clarke, Julian H. R.; Okuda, Motoi; Yamazaki, Takao

    1994-10-01

    We describe in this paper a domain decomposition molecular dynamics algorithm for use on distributed memory parallel computers which is capable of handling systems containing rigid bond constraints and three- and four-body potentials as well as non-bonded potentials. The algorithm has been successfully implemented on the Fujitsu 1024 processor element AP1000 machine. The performance has been compared with and benchmarked against the alternative cloning method of parallel processing [D. Brown, J.H.R. Clarke, M. Okuda and T. Yamazaki, J. Chem. Phys., 100 (1994) 1684] and results obtained using other scalar and vector machines. Two parallel versions of the SHAKE algorithm, which solves the bond length constraints problem, have been compared with regard to optimising the performance of this procedure.

  1. Solving the Quadratic Assignment Problems using Parallel ACO with Symmetric Multi Processing

    NASA Astrophysics Data System (ADS)

    Tsutsui, Shigeyoshi

    In this paper, we propose several types of parallel ant colony optimization algorithms with symmetric multi processing for solving the quadratic assignment problem (QAP). These models include the master-slave models and the island models. As a base ant colony optimization algorithm, we used the cunning Ant System (cAS) which showed promising performance our in previous studies. We evaluated each parallel algorithm with a condition that the run time for each parallel algorithm and the base sequential algorithm are the same. The results suggest that using the master-slave model with increased iteration of ant colony optimization algorithms is promising in solving quadratic assignment problems for real or real-like instances.

  2. Real-time target detection technology of large view-field infrared image based on multicore DSP parallel processing

    NASA Astrophysics Data System (ADS)

    Sun, Gang; Liu, Songlin; Wang, Weihua; Chen, Zengping

    2013-10-01

    In order to implement real-time detection of hedgehopping target in large view-field infrared (LVIR) image, the paper proposes a fast algorithm flow to extract the target region of interest (ROI). The ground building region was rejected quickly and target ROI was segmented roughly through the background classification. Then the background image containing target ROI was matched with previous frame based on a mean removal normalized product correlation (MRNPC) similarity measure function. Finally, the target motion area was extracted by inter-frame difference in time domain. According to the proposed algorithm flow, this paper designs the high-speed real-time signal processing hardware platform based on FPGA + DSP, and also presents a new parallel processing strategy that called function-level and task-level, which could parallel process LVIR image by multi-core and multi-task. Experimental results show that the algorithm can extract low altitude aero target with complex background in large view effectively, and the new design hardware platform could implement real time processing of the IR image with 50000x288 pixels per second in large view-field infrared search system (LVIRSS).

  3. A multi-satellite orbit determination problem in a parallel processing environment

    NASA Technical Reports Server (NTRS)

    Deakyne, M. S.; Anderle, R. J.

    1988-01-01

    The Engineering Orbit Analysis Unit at GE Valley Forge used an Intel Hypercube Parallel Processor to investigate the performance and gain experience of parallel processors with a multi-satellite orbit determination problem. A general study was selected in which major blocks of computation for the multi-satellite orbit computations were used as units to be assigned to the various processors on the Hypercube. Problems encountered or successes achieved in addressing the orbit determination problem would be more likely to be transferable to other parallel processors. The prime objective was to study the algorithm to allow processing of observations later in time than those employed in the state update. Expertise in ephemeris determination was exploited in addressing these problems and the facility used to bring a realism to the study which would highlight the problems which may not otherwise be anticipated. Secondary objectives were to gain experience of a non-trivial problem in a parallel processor environment, to explore the necessary interplay of serial and parallel sections of the algorithm in terms of timing studies, to explore the granularity (coarse vs. fine grain) to discover the granularity limit above which there would be a risk of starvation where the majority of nodes would be idle or under the limit where the overhead associated with splitting the problem may require more work and communication time than is useful.

  4. Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

    NASA Astrophysics Data System (ADS)

    Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

    2017-08-01

    We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU

  5. Parallel processing in a host plus multiple array processor system for radar

    NASA Technical Reports Server (NTRS)

    Barkan, B. Z.

    1983-01-01

    Host plus multiple array processor architecture is demonstrated to yield a modular, fast, and cost-effective system for radar processing. Software methodology for programming such a system is developed. Parallel processing with pipelined data flow among the host, array processors, and discs is implemented. Theoretical analysis of performance is made and experimentally verified. The broad class of problems to which the architecture and methodology can be applied is indicated.

  6. An automated workflow for parallel processing of large multiview SPIM recordings

    PubMed Central

    Schmied, Christopher; Steinbach, Peter; Pietzsch, Tobias; Preibisch, Stephan; Tomancak, Pavel

    2016-01-01

    Summary: Selective Plane Illumination Microscopy (SPIM) allows to image developing organisms in 3D at unprecedented temporal resolution over long periods of time. The resulting massive amounts of raw image data requires extensive processing interactively via dedicated graphical user interface (GUI) applications. The consecutive processing steps can be easily automated and the individual time points can be processed independently, which lends itself to trivial parallelization on a high performance computing (HPC) cluster. Here, we introduce an automated workflow for processing large multiview, multichannel, multiillumination time-lapse SPIM data on a single workstation or in parallel on a HPC cluster. The pipeline relies on snakemake to resolve dependencies among consecutive processing steps and can be easily adapted to any cluster environment for processing SPIM data in a fraction of the time required to collect it. Availability and implementation: The code is distributed free and open source under the MIT license http://opensource.org/licenses/MIT. The source code can be downloaded from github: https://github.com/mpicbg-scicomp/snakemake-workflows. Documentation can be found here: http://fiji.sc/Automated_workflow_for_parallel_Multiview_Reconstruction. Contact: schmied@mpi-cbg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26628585

  7. Static and dynamic load-balancing strategies for parallel reservoir simulation

    SciTech Connect

    Anguille, L.; Killough, J.E.; Li, T.M.C.; Toepfer, J.L.

    1995-12-31

    Accurate simulation of the complex phenomena that occur in flow in porous media can tax even the most powerful serial computers. Emergence of new parallel computer architectures as a future efficient tool in reservoir simulation may overcome this difficulty. Unfortunately, major problems remain to be solved before using parallel computers commercially: production serial programs must be rewritten to be efficient in parallel environments and load balancing methods must be explored to evenly distribute the workload on each processor during the simulation. This study implements both a static load-balancing algorithm and a receiver-initiated dynamic load-sharing algorithm to achieve high parallel efficiencies on both the IBM SP2 and Intel IPSC/860 parallel computers. Significant speedup improvement was recorded for both methods. Further optimization of these algorithms yielded a technique with efficiencies as high as 90% and 70% on 8 and 32 nodes, respectively. The increased performance was the result of the minimization of message-passing overhead.

  8. XPP A High Performance Parallel Signal Processing Platform for Space Applications

    NASA Astrophysics Data System (ADS)

    Schueler, E.; Syed, M.; Helfers, T.

    This document presents the eXtreme Processing Platform (XPP), a new runtime reconfigurable data processing technology developed by PACT GmbH which combines the performance of ASICs with the flexibility of DSPs. The XPP is built using a scalable array of arithmetic processing elements, embedded memories, high bandwidth I/O, an auto synchronizing packet oriented data network, an internal event network that enables the control of program flow using execution flags, and is designed to support different types of parallelism, like multithreading, multitasking and multiple parallel instances. The technology promises to provide high flexible payloads in future telecommunication satellites, scientific missions and short time to market, much needed to cope up with the significant changes in telecommunication market and rapidly changing customer needs.

  9. Parallel processing architecture for H.264 deblocking filter on multi-core platforms

    NASA Astrophysics Data System (ADS)

    Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

    2012-03-01

    filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.

  10. Application of parallel computing to seismic damage process simulation of an arch dam

    NASA Astrophysics Data System (ADS)

    Zhong, Hong; Lin, Gao; Li, Jianbo

    2010-06-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  11. cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

    PubMed

    Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

    2016-01-01

    Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.

  12. Improving computational efficiency and tractability of protein design using a piecemeal approach. A strategy for parallel and distributed protein design.

    PubMed

    Pitman, Derek J; Schenkelberg, Christian D; Huang, Yao-Ming; Teets, Frank D; DiTursi, Daniel; Bystroff, Christopher

    2014-04-15

    Accuracy in protein design requires a fine-grained rotamer search, multiple backbone conformations, and a detailed energy function, creating a burden in runtime and memory requirements. A design task may be split into manageable pieces in both three-dimensional space and in the rotamer search space to produce small, fast jobs that are easily distributed. However, these jobs must overlap, presenting a problem in resolving conflicting solutions in the overlap regions. Piecemeal design, in which the design space is split into overlapping regions and rotamer search spaces, accelerates the design process whether jobs are run in series or in parallel. Large jobs that cannot fit in memory were made possible by splitting. Accepting the consensus amino acid selection in conflict regions led to non-optimal choices. Instead, conflicts were resolved using a second pass, in which the split regions were re-combined and designed as one, producing results that were closer to optimal with a minimal increase in runtime over the consensus strategy. Splitting the search space at the rotamer level instead of at the amino acid level further improved the efficiency by reducing the search space in the second pass. Programs for splitting protein design expressions are available at www.bioinfo.rpi.edu/tools/piecemeal.html CONTACT: bystrc@rpi.edu Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. MC64-ClustalWP2: a highly-parallel hybrid strategy to align multiple sequences in many-core architectures.

    PubMed

    Díaz, David; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio; Dorado, Gabriel; Gálvez, Sergio

    2014-01-01

    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification

  14. Techniques for Real-Time Parallel Processing: Sensor Processing Case Studies

    DTIC Science & Technology

    1994-04-01

    probabilistic data association (. IPDA ) algorithm. Taken together, this is an example of a class 3 algorithm whose running time is strongly dependent on...horizontally and solve a factor of 128/rn more problems, but this was not tested .) The code that computes the MGS in MPL is shown in Figure 5. There is a single...designed so that it could be implemented efficiently on both parallel and sequential machines. Our tests of the SISAL version of JPDA were performed on a

  15. A learnable parallel processing architecture towards unity of memory and computing

    PubMed Central

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-01-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area. PMID:26271243

  16. A learnable parallel processing architecture towards unity of memory and computing.

    PubMed

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  17. A learnable parallel processing architecture towards unity of memory and computing

    NASA Astrophysics Data System (ADS)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  18. Optoelectronic parallel processing with smart pixel arrays for automated screening of cervical smear imagery

    NASA Astrophysics Data System (ADS)

    Metz, John Langdon

    2000-10-01

    This thesis investigates the use of optoelectronic parallel processing systems with smart photosensor arrays (SPAs) to examine cervical smear images. The automation of cervical smear screening seeks to reduce human workload and improve the accuracy of detecting pre- cancerous and cancerous conditions. Increasing the parallelism of image processing improves the speed and accuracy of locating regions-of-interest (ROI) from images of the cervical smear for the first stage of a two-stage screening system. The two-stage approach first detects ROI optoelectronically before classifying them using more time consuming electronic algorithms. The optoelectronic hit/miss transform (HMT) is computed using gray scale modulation spatial light modulators in an optical correlator. To further the parallelism of this system, a novel CMOS SPA computes the post processing steps required by the HMT algorithm. The SPA reduces the subsequent bandwidth passed into the second, electronic image processing stage classifying the detected ROI. Limitations in the miss operation of the HMT suggest using only the hit operation for detecting ROI. This makes possible a single SPA chip approach using only the hit operation for ROI detection which may replace the optoelectronic correlator in the screening system. Both the HMT SPA postprocessor and the SPA ROI detector design provide compact, efficient, and low-cost optoelectronic solutions to performing ROI detection on cervical smears. Analysis of optoelectronic ROI detection with electronic ROI classification shows these systems have the potential to perform at, or above, the current error rates for manual classification of cervical smears.

  19. On the design and implementation of a parallel, object-oriented, image processing toolkit

    SciTech Connect

    Kamath, C; Baldwin, C; Fodor, I; Tang, N A

    2000-06-22

    Advanced in technology have enabled us to collect data from observations, experiments, and simulations at an ever increasing pace. As these data sets approach the terabyte and petabyte range, scientists are increasingly using semi-automated techniques from data mining and pattern recognition to find useful information in the data. In order for data mining to be successful, the raw data must first be processed into a form suitable for the detection of patterns. When the data is in the form of images, this can involve a substantial amount of processing on very large data sets. To help make this task more efficient, they are designing and implementing an object-oriented image processing toolkit that specifically targets massively-parallel, distributed-memory architectures. They first show that it is possible to use object-oriented technology to effectively address the diverse needs of image applications. Next, they describe how we abstract out the similarities in image processing algorithms to enable re-use in the software. They will also discuss the difficulties encountered in parallelizing image algorithms on massively parallel machines as well as the bottlenecks to high performance. They will demonstrate the work using images from an astronomical data set, and illustrate how techniques such as filters and denoising through the thresholding of wavelet coefficients can be applied when a large image is distributed across several processors.

  20. Comparing Binaural Pre-processing Strategies III

    PubMed Central

    Warzybok, Anna; Ernst, Stephan M. A.

    2015-01-01

    A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level. PMID:26721922

  1. Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

    NASA Astrophysics Data System (ADS)

    Moon, Hongsik

    What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the

  2. Serial processing in reading aloud: no challenge for a parallel model.

    PubMed

    Zorzi, M

    2000-04-01

    K. Rastle and M. Coltheart (1999) challenged parallel models of reading by showing that the cost of irregularity in low-frequency exception words was modulated by the position of the irregularity in the word. This position-of-irregularity effect was taken as strong evidence of serial processing in reading. This article refutes Rastle and Coltheart's theoretical conclusions in 3 ways: First, a parallel model, the connectionist dual process model (M. Zorzi, G. Houghton, & B. Butterworth, 1998b), produces a position-of-irregularity effect. Second, the supposed serial effect can be reduced to a position-specific grapheme-phoneme consistency effect. Third, the position-of-irregularity effect vanishes when the experimental data are reanalyzed using grapheme-phoneme consistency as the covariate. This demonstration has broader implications for studies aiming at adjudicating between models: Strong inferences should be avoided until the computational models are actually tested.

  3. Serial and parallel processes in eye movement control: current controversies and future directions.

    PubMed

    Murray, Wayne S; Fischer, Martin H; Tatler, Benjamin W

    2013-01-01

    In this editorial for the special issue on serial and parallel processing in reading we explore the background to the current debate concerning whether the word recognition processes in reading are strictly serial-sequential or take place in an overlapping parallel fashion. We consider the history of the controversy and some of the underlying assumptions, together with an analysis of the types of evidence and arguments that have been adduced to both sides of the debate, concluding that both accounts necessarily presuppose some weakening of, or elasticity in, the eye-mind assumption. We then consider future directions, both for reading research and for scene viewing, and wrap up the editorial with a brief overview of the following articles and their conclusions.

  4. A piloted comparison of elastic and rigid blade-element rotor models using parallel processing technology

    NASA Technical Reports Server (NTRS)

    Hill, Gary; Du Val, Ronald W.; Green, John A.; Huynh, Loc C.

    1990-01-01

    A piloted comparison of rigid and aeroelastic blade-element rotor models was conducted at the Crew Station Research and Development Facility (CSRDF) at Ames Research Center. FLIGHTLAB, a new simulation development and analysis tool, was used to implement these models in real time using parallel processing technology. Pilot comments and quantitative analysis performed both on-line and off-line confirmed that elastic degrees of freedom significantly affect perceived handling qualities. Trim comparisons show improved correlation with flight test data when elastic modes are modeled. The results demonstrate the efficiency with which the mathematical modeling sophistication of existing simulation facilities can be upgraded using parallel processing, and the importance of these upgrades to simulation fidelity.

  5. Timescale-invariant pattern recognition by feedforward inhibition and parallel signal processing.

    PubMed

    Creutzig, Felix; Benda, Jan; Wohlgemuth, Sandra; Stumpner, Andreas; Ronacher, Bernhard; Herz, Andreas V M

    2010-06-01

    The timescale-invariant recognition of temporal stimulus sequences is vital for many species and poses a challenge for their sensory systems. Here we present a simple mechanistic model to address this computational task, based on recent observations in insects that use rhythmic acoustic communication signals for mate finding. In the model framework, feedforward inhibition leads to burst-like response patterns in one neuron of the circuit. Integrating these responses over a fixed time window by a readout neuron creates a timescale-invariant stimulus representation. Only two additional processing channels, each with a feature detector and a readout neuron, plus one final coincidence detector for all three parallel signal streams, are needed to account for the behavioral data. In contrast to previous solutions to the general time-warp problem, no time delay lines or sophisticated neural architectures are required. Our results suggest a new computational role for feedforward inhibition and underscore the power of parallel signal processing.

  6. The Masterson Approach with play therapy: a parallel process between mother and child.

    PubMed

    Mulherin, M A

    2001-01-01

    This paper discusses a case in which the Masterson Approach was used with play therapy to treat a child with a developing personality disorder. It describes the parallel progression of the child and mother in adjunct therapy throughout a six-year period. The unique value of the Masterson Approach is that it provides the therapist with a framework and tool to diagnose and treat a child during the dynamic process of play. The case describes the mother-child dyad throughout therapy. It traces their parallel processes that involve separation, individuation, rapprochement, and the recovery of real self-capacities. Each stage of treatment is described, including verbal interventions. The child's internal affective state and intrapsychic structure during the various stages of treatment are illustrated by representative pictures.

  7. Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex

    PubMed Central

    Lafer-Sousa, Rosa; Conway, Bevil R.

    2014-01-01

    Visual-object processing culminates in inferior temporal (IT) cortex. To assess the organization of IT, we measured fMRI responses in alert monkey to achromatic images (faces, fruit, bodies, places) and colored gratings. IT contained multiple color-biased regions, which were typically ventral to face patches and, remarkably, yoked to them, spaced regularly at four locations predicted by known anatomy. Color and face selectivity increased for more anterior regions, indicative of a broad hierarchical arrangement. Responses to non-face shapes were found across IT, but were stronger outside color-biased regions and face patches, consistent with multiple parallel streams. IT also contained multiple coarse eccentricity maps: face patches overlapped central representations; color-biased regions spanned mid-peripheral representations; and place-biased regions overlapped peripheral representations. These results suggest that IT comprises parallel, multi-stage processing networks subject to one organizing principle. PMID:24141314

  8. Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

    PubMed

    Komarov, Ivan; D'Souza, Roshan M

    2012-01-01

    The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.

  9. Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units

    PubMed Central

    Komarov, Ivan; D'Souza, Roshan M.

    2012-01-01

    The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×−120× performance gain over various state-of-the-art serial algorithms when simulating different types of models. PMID:23152751

  10. Modulator and VCSEL-MSM smart pixels for parallel pipeline networking and signal processing

    NASA Astrophysics Data System (ADS)

    Chen, C.-H.; Hoanca, Bogdan; Kuznia, C. B.; Pansatiankul, Dhawat E.; Zhang, Liping; Sawchuk, Alexander A.

    1999-07-01

    TRANslucent Smart Pixel Array (TRANSPAR) systems perform high performance parallel pipeline networking and signal processing based on optical propagation of 3D data packets. The TRANSPAR smart pixel devices use either self-electro- optic effect GaAs multiple quantum well modulators or CMOS- VCSEL-MSM (CMOS-Vertical Cavity Surface Emitting Laser- Metal-Semiconductor-Metal) technology. The data packets transfer among high throughput photonic network nodes using multiple access/collision detection or token-ring protocols.

  11. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    SciTech Connect

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  12. Fast phase processing in off-axis holography by CUDA including parallel phase unwrapping.

    PubMed

    Backoach, Ohad; Kariv, Saar; Girshovitz, Pinhas; Shaked, Natan T

    2016-02-22

    We present parallel processing implementation for rapid extraction of the quantitative phase maps from off-axis holograms on the Graphics Processing Unit (GPU) of the computer using computer unified device architecture (CUDA) programming. To obtain efficient implementation, we parallelized both the wrapped phase map extraction algorithm and the two-dimensional phase unwrapping algorithm. In contrast to previous implementations, we utilized unweighted least squares phase unwrapping algorithm that better suits parallelism. We compared the proposed algorithm run times on the CPU and the GPU of the computer for various sizes of off-axis holograms. Using the GPU implementation, we extracted the unwrapped phase maps from the recorded off-axis holograms at 35 frames per second (fps) for 4 mega pixel holograms, and at 129 fps for 1 mega pixel holograms, which presents the fastest processing framerates obtained so far, to the best of our knowledge. We then used common-path off-axis interferometric imaging to quantitatively capture the phase maps of a micro-organism with rapid flagellum movements.

  13. Parallel particle swarm optimization on a graphics processing unit with application to trajectory optimization

    NASA Astrophysics Data System (ADS)

    Wu, Q.; Xiong, F.; Wang, F.; Xiong, Y.

    2016-10-01

    In order to reduce the computational time, a fully parallel implementation of the particle swarm optimization (PSO) algorithm on a graphics processing unit (GPU) is presented. Instead of being executed on the central processing unit (CPU) sequentially, PSO is executed in parallel via the GPU on the compute unified device architecture (CUDA) platform. The processes of fitness evaluation, updating of velocity and position of all particles are all parallelized and introduced in detail. Comparative studies on the optimization of four benchmark functions and a trajectory optimization problem are conducted by running PSO on the GPU (GPU-PSO) and CPU (CPU-PSO). The impact of design dimension, number of particles and size of the thread-block in the GPU and their interactions on the computational time is investigated. The results show that the computational time of the developed GPU-PSO is much shorter than that of CPU-PSO, with comparable accuracy, which demonstrates the remarkable speed-up capability of GPU-PSO.

  14. Initial operating capability for the hypercluster parallel-processing test bed

    NASA Technical Reports Server (NTRS)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela

    1989-01-01

    The NASA Lewis Research Center is investigating the benefits of parallel processing to applications in computational fluid and structural mechanics. To aid this investigation, NASA Lewis is developing the Hypercluster, a multi-architecture, parallel-processing test bed. The initial operating capability (IOC) being developed for the Hypercluster is described. The IOC will provide a user with a programming/operating environment that is interactive, responsive, and easy to use. The IOC effort includes the development of the Hypercluster Operating System (HYCLOPS). HYCLOPS runs in conjunction with a vendor-supplied disk operating system on a Front-End Processor (FEP) to provide interactive, run-time operations such as program loading, execution, memory editing, and data retrieval. Run-time libraries, that augment the FEP FORTRAN libraries, are being developed to support parallel and vector processing on the Hypercluster. Special utilities are being provided to enable passage of information about application programs and their mapping to the operating system. Communications between the FEP and the Hypercluster are being handled by dedicated processors, each running a Message-Passing Kernel, (MPK). A shared-memory interface allows rapid data exchange between HYCLOPS and the communications processors. Input/output handlers are built into the HYCLOPS-MPK interface, eliminating the need for the user to supply separate I/O support programs on the FEP.

  15. Initial operating capability for the hypercluster parallel-processing test bed

    SciTech Connect

    Cole, G.L.; Blech, R.A.; Quealy, A.

    1989-03-01

    The NASA Lewis Research Center is investigating the benefits of parallel processing to applications in computational fluid and structural mechanics. To aid this investigation, NASA Lewis is developing the Hypercluster, a multi-architecture, parallel-processing test bed. The initial operating capability (IOC) being developed for the Hypercluster is described. The IOC will provide a user with a programming/operating environment that is interactive, responsive, and easy to use. The IOC effort includes the development of the Hypercluster Operating System (HYCLOPS). HYCLOPS runs in conjunction with a vendor-supplied disk operating system on a Front-End Processor (FEP) to provide interactive, run-time operations such as program loading, execution, memory editing, and data retrieval. Run-time libraries, that augment the FEP FORTRAN libraries, are being developed to support parallel and vector processing on the Hypercluster. Special utilities are being provided to enable passage of information about application programs and their mapping to the operating system. Communications between the FEP and the Hypercluster are being handled by dedicated processors, each running a Message-Passing Kernel, (MPK). A shared-memory interface allows rapid data exchange between HYCLOPS and the communications processors. Input/output handlers are built into the HYCLOPS-MPK interface, eliminating the need for the user to supply separate I/O support programs on the FEP.

  16. Understanding decimal proportions: discrete representations, parallel access, and privileged processing of zero.

    PubMed

    Varma, Sashank; Karl, Stacy R

    2013-05-01

    Much of the research on mathematical cognition has focused on the numbers 1, 2, 3, 4, 5, 6, 7, 8, and 9, with considerably less attention paid to more abstract number classes. The current research investigated how people understand decimal proportions--rational numbers between 0 and 1 expressed in the place-value symbol system. The results demonstrate that proportions are represented as discrete structures and processed in parallel. There was a semantic interference effect: When understanding a proportion expression (e.g., "0.29"), both the correct proportion referent (e.g., 0.29) and the incorrect natural number referent (e.g., 29) corresponding to the visually similar natural number expression (e.g., "29") are accessed in parallel, and when these referents lead to conflicting judgments, performance slows. There was also a syntactic interference effect, generalizing the unit-decade compatibility effect for natural numbers: When comparing two proportions, their tenths and hundredths components are processed in parallel, and when the different components lead to conflicting judgments, performance slows. The results also reveal that zero decimals--proportions ending in zero--serve multiple cognitive functions, including eliminating semantic interference and speeding processing. The current research also extends the distance, semantic congruence, and SNARC effects from natural numbers to decimal proportions. These findings inform how people understand the place-value symbol system, and the mental implementation of mathematical symbol systems more generally. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Tuning of tool dynamics for increased stability of parallel (simultaneous) turning processes

    NASA Astrophysics Data System (ADS)

    Ozturk, E.; Comak, A.; Budak, E.

    2016-01-01

    Parallel (simultaneous) turning operations make use of more than one cutting tool acting on a common workpiece offering potential for higher productivity. However, dynamic interaction between the tools and workpiece and resulting chatter vibrations may create quality problems on machined surfaces. In order to determine chatter free cutting process parameters, stability models can be employed. In this paper, stability of parallel turning processes is formulated in frequency and time domain for two different parallel turning cases. Predictions of frequency and time domain methods demonstrated reasonable agreement with each other. In addition, the predicted stability limits are also verified experimentally. Simulation and experimental results show multi regional stability diagrams which can be used to select most favorable set of process parameters for higher stable material removal rates. In addition to parameter selection, developed models can be used to determine the best natural frequency ratio of tools resulting in the highest stable depth of cuts. It is concluded that the most stable operations are obtained when natural frequency of the tools are slightly off each other and worst stability occurs when the natural frequency of the tools are exactly the same.

  18. Multichannel parallel free-space VCSEL optoelectronic interconnects for digital data transmission and processing

    NASA Astrophysics Data System (ADS)

    Liu, J. Jiang; Lawler, William B.; Riely, Brian P.; Chang, Wayne H.; Shen, Paul H.; Newman, Peter G.; Taysing-Lara, Monica A.; Olver, Kimberly; Koley, Bikash; Dagenais, Mario; Simonis, George J.

    2000-07-01

    A free-space integrated optoelectronic interconnect was built to explore parallel data transmission and processing. This interconnect comprises an 8 X 8 substrate-emitting 980-nm InGaAs/GaAs quantum-well vertical-cavity surface- emitting laser (VCSEL) array and an 8 X 8 InGaAs/InP P-I- N photodetector array. Both VCSEL and detector arrays were flip-chip bonded onto the complimentary metal-oxide- semiconductor (CMOS) circuitry, packaged in pin-grid array packages, and mounted on customized printed circuit boards. Individual data rates as high as 1.2 Gb/s on the VCSEL/CMOS transmitter array were measured. After the optical alignment, we carried out serial and parallel transmissions of digital data and live video scenes through this interconnect between two computers. Images captured by CCD camera were digitized to 8-bit data signals and transferred in serial bit-stream through multiple channels in this parallel VCSEL-detector optical interconnect. A data processing algorithm of edge detection was attempted during the data transfer. Final images were reconstructed back from optically transmitted and processed digital data. Although the transmitter and detector offered much higher data rates, we found that the overall image transfer rate was limited by the CMOS receiver circuits. A new design for the receiver circuitry was accomplished and submitted for fabrication.

  19. GWM-VI: groundwater management with parallel processing for multiple MODFLOW versions

    USGS Publications Warehouse

    Banta, Edward R.; Ahlfeld, David P.

    2013-01-01

    Groundwater Management–Version Independent (GWM–VI) is a new version of the Groundwater Management Process of MODFLOW. The Groundwater Management Process couples groundwater-flow simulation with a capability to optimize stresses on the simulated aquifer based on an objective function and constraints imposed on stresses and aquifer state. GWM–VI extends prior versions of Groundwater Management in two significant ways—(1) it can be used with any version of MODFLOW that meets certain requirements on input and output, and (2) it is structured to allow parallel processing of the repeated runs of the MODFLOW model that are required to solve the optimization problem. GWM–VI uses the same input structure for files that describe the management problem as that used by prior versions of Groundwater Management. GWM–VI requires only minor changes to the input files used by the MODFLOW model. GWM–VI uses the Joint Universal Parameter IdenTification and Evaluation of Reliability Application Programming Interface (JUPITER-API) to implement both version independence and parallel processing. GWM–VI communicates with the MODFLOW model by manipulating certain input files and interpreting results from the MODFLOW listing file and binary output files. Nearly all capabilities of prior versions of Groundwater Management are available in GWM–VI. GWM–VI has been tested with MODFLOW-2005, MODFLOW-NWT (a Newton formulation for MODFLOW-2005), MF2005-FMP2 (the Farm Process for MODFLOW-2005), SEAWAT, and CFP (Conduit Flow Process for MODFLOW-2005). This report provides sample problems that demonstrate a range of applications of GWM–VI and the directory structure and input information required to use the parallel-processing capability.

  20. Real-time processing of radar return on a parallel computer

    NASA Technical Reports Server (NTRS)

    Aalfs, David D.

    1992-01-01

    NASA is working with the FAA to demonstrate the feasibility of pulse Doppler radar as a candidate airborne sensor to detect low altitude windshears. The need to provide the pilot with timely information about possible hazards has motivated a demand for real-time processing of a radar return. Investigated here is parallel processing as a means of accommodating the high data rates required. A PC based parallel computer, called the transputer, is used to investigate issues in real time concurrent processing of radar signals. A transputer network is made up of an array of single instruction stream processors that can be networked in a variety of ways. They are easily reconfigured and software development is largely independent of the particular network topology. The performance of the transputer is evaluated in light of the computational requirements. A number of algorithms have been implemented on the transputers in OCCAM, a language specially designed for parallel processing. These include signal processing algorithms such as the Fast Fourier Transform (FFT), pulse-pair, and autoregressive modelling, as well as routing software to support concurrency. The most computationally intensive task is estimating the spectrum. Two approaches have been taken on this problem, the first and most conventional of which is to use the FFT. By using table look-ups for the basis function and other optimizing techniques, an algorithm has been developed that is sufficient for real time. The other approach is to model the signal as an autoregressive process and estimate the spectrum based on the model coefficients. This technique is attractive because it does not suffer from the spectral leakage problem inherent in the FFT. Benchmark tests indicate that autoregressive modeling is feasible in real time.

  1. Parallel processing of general and specific threat during early stages of perception

    PubMed Central

    2016-01-01

    Differential processing of threat can consummate as early as 100 ms post-stimulus. Moreover, early perception not only differentiates threat from non-threat stimuli but also distinguishes among discrete threat subtypes (e.g. fear, disgust and anger). Combining spatial-frequency-filtered images of fear, disgust and neutral scenes with high-density event-related potentials and intracranial source estimation, we investigated the neural underpinnings of general and specific threat processing in early stages of perception. Conveyed in low spatial frequencies, fear and disgust images evoked convergent visual responses with similarly enhanced N1 potentials and dorsal visual (middle temporal gyrus) cortical activity (relative to neutral cues; peaking at 156 ms). Nevertheless, conveyed in high spatial frequencies, fear and disgust elicited divergent visual responses, with fear enhancing and disgust suppressing P1 potentials and ventral visual (occipital fusiform) cortical activity (peaking at 121 ms). Therefore, general and specific threat processing operates in parallel in early perception, with the ventral visual pathway engaged in specific processing of discrete threats and the dorsal visual pathway in general threat processing. Furthermore, selectively tuned to distinctive spatial-frequency channels and visual pathways, these parallel processes underpin dimensional and categorical threat characterization, promoting efficient threat response. These findings thus lend support to hybrid models of emotion. PMID:26412811

  2. Parallel processing of general and specific threat during early stages of perception.

    PubMed

    You, Yuqi; Li, Wen

    2016-03-01

    Differential processing of threat can consummate as early as 100 ms post-stimulus. Moreover, early perception not only differentiates threat from non-threat stimuli but also distinguishes among discrete threat subtypes (e.g. fear, disgust and anger). Combining spatial-frequency-filtered images of fear, disgust and neutral scenes with high-density event-related potentials and intracranial source estimation, we investigated the neural underpinnings of general and specific threat processing in early stages of perception. Conveyed in low spatial frequencies, fear and disgust images evoked convergent visual responses with similarly enhanced N1 potentials and dorsal visual (middle temporal gyrus) cortical activity (relative to neutral cues; peaking at 156 ms). Nevertheless, conveyed in high spatial frequencies, fear and disgust elicited divergent visual responses, with fear enhancing and disgust suppressing P1 potentials and ventral visual (occipital fusiform) cortical activity (peaking at 121 ms). Therefore, general and specific threat processing operates in parallel in early perception, with the ventral visual pathway engaged in specific processing of discrete threats and the dorsal visual pathway in general threat processing. Furthermore, selectively tuned to distinctive spatial-frequency channels and visual pathways, these parallel processes underpin dimensional and categorical threat characterization, promoting efficient threat response. These findings thus lend support to hybrid models of emotion. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  3. Scalability of preconditioners as a strategy for parallel computation of compressible fluid flow

    SciTech Connect

    Hansen, G.A.

    1996-05-01

    Parallel implementations of a Newton-Krylov-Schwarz algorithm are used to solve a model problem representing low Mach number compressible fluid flow over a backward-facing step. The Mach number is specifically selected to result in a numerically {open_quote}stiff{close_quotes} matrix problem, based on an implicit finite volume discretization of the compressible 2D Navier-Stokes/energy equations using primitive variables. Newton`s method is used to linearize the discrete system, and a preconditioned Krylov projection technique is used to solve the resulting linear system. Domain decomposition enables the development of a global preconditioner via the parallel construction of contributions derived from subdomains. Formation of the global preconditioner is based upon additive and multiplicative Schwarz algorithms, with and without subdomain overlap. The degree of parallelism of this technique is further enhanced with the use of a matrix-free approximation for the Jacobian used in the Krylov technique (in this case, GMRES(k)). Of paramount interest to this study is the implementation and optimization of these techniques on parallel shared-memory hardware, namely the Cray C90 and SGI Challenge architectures. These architectures were chosen as representative and commonly available to researchers interested in the solution of problems of this type. The Newton-Krylov-Schwarz solution technique is increasingly being investigated for computational fluid dynamics (CFD) applications due to the advantages of full coupling of all variables and equations, rapid non-linear convergence, and moderate memory requirements. A parallel version of this method that scales effectively on the above architectures would be extremely attractive to practitioners, resulting in efficient, cost-effective, parallel solutions exhibiting the benefits of the solution technique.

  4. Large-scale data-flow computer for parallel signal processing

    SciTech Connect

    Wong, F.S.; Ito, M.R.

    1982-01-01

    The authors describe a proposed data-driven, parallel computing machine for signal processing applications in which program codes are often executed repeatedly. This dataflow computer (DFC) consists of a large number of processing modules (PM) operating asynchronously; multiple concurrent activations of a single procedure could be supported by each PM without replication of codes. The architectural design emphasizes simplicity of system operations, modularity, speed and feasibility with current technology. Performance studies are carried out via software simulations. Results show some insights to the basic organization and the various modes of computation, the speed-ups and robustness of the design are also tested with the variations of several system parameters. 4 references.

  5. Age-related differences in task goal processing strategies during action cascading.

    PubMed

    Stock, Ann-Kathrin; Gohil, Krutika; Beste, Christian

    2016-06-01

    We are often faced with situations requiring the execution of a coordinated cascade of different actions to achieve a goal, but we can apply different strategies to do so. Until now, these different action cascading strategies have, however, not been examined with respect to possible effects of aging. We tackled this question in a systems neurophysiological study using EEG and source localization in healthy older adults and employing mathematical constraints to determine the strategy applied. The results suggest that older adults seem to apply a less efficient strategy when cascading different actions. Compared to younger adults, older adults seem to struggle to hierarchically organize their actions, which leads to an inefficient and more parallel processing of different task goals. On a systems level, the observed deficit is most likely due to an altered processing of task goals at the response selection level (P3 ERP) and related to changes of neural processes in the temporo-parietal junction.

  6. Transform methods for developing parallel algorithms for cyclic-block signal processing

    NASA Astrophysics Data System (ADS)

    Marshall, T. G., Jr.

    A class of FIR and IIR single and multirate parallel filtering algorithms is introduced in which blocks of inputs and outputs are processed on-the-fly in a cyclic manner. There is no inherent latency introduced by the decomposition procedure giving the parallelism, the system latency being primarily due to the component processors. The structure is particularly well-suited for systems in which the component processors are the familiar DSP chips optimized for convolution although other component structures can be accommodated. In particular, the automatic data shifting feature of the TMS320 series processors can be utilized in these algorithms. A transform notation, introduced for digital filter banks, is recast in the desired form for this application. The resulting structure of the system, in this notation, is a circulant matrix for FIR filtering or a related matrix in other cases. The cyclic properties of the system and useful implementation flexibility result from this matrix structure.

  7. A message passing kernel for the hypercluster parallel processing test bed

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Quealy, Angela; Cole, Gary L.

    1989-01-01

    A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.

  8. Nonlinear structural response using adaptive dynamic relaxation on a massively-parallel-processing system

    NASA Technical Reports Server (NTRS)

    Oakley, David R.; Knight, Norman F., Jr.

    1994-01-01

    A parallel adaptive dynamic relaxation (ADR) algorithm has been developed for nonlinear structural analysis. This algorithm has minimal memory requirements, is easily parallelizable and scalable to many processors, and is generally very reliable and efficient for highly nonlinear problems. Performance evaluations on single-processor computers have shown that the ADR algorithm is reliable and highly vectorizable, and that it is competitive with direct solution methods for the highly nonlinear problems considered. The present algorithm is implemented on the 512-processor Intel Touchstone DELTA system at Caltech, and it is designed to minimize the extent and frequency of interprocessor communication. The algorithm has been used to solve for the nonlinear static response of two and three dimensional hyperelastic systems involving contact. Impressive relative speedups have been achieved and demonstrate the high scalability of the ADR algorithm. For the class of problems addressed, the ADR algorithm represents a very promising approach for parallel-vector processing.

  9. Parallel architecture for labeling, segmentation, and lexical processing in speech understanding

    SciTech Connect

    Bronson, E.C.; Siegel, L.J.

    1983-01-01

    Speech understanding is a complex task which requires extensive computation. To increase the processing speed, a speech understanding system is decomposed into tasks which can be performed by a series of distributed processing subsystems. An architecture to perform labeling, segmentation, and lexical processing is described. Using a parametric characterization of the speech signal, this system divides an utterance into labeled homogeneous regions. The system then performs dictionary lookups based on all probable labelings and segmentations in order to generate a complete set of word hypotheses. Using realistic assumptions from existing speech understanding systems, a statistical model of speech input, and simulations of the speech processing algorithms, the attributes of the parallel system to perform labeling, segmentation, and lexical processing for real-time speech understanding are derived. 36 references.

  10. A targeted enrichment strategy for massively parallel sequencing of angiosperm plastid genomes1

    PubMed Central

    Stull, Gregory W.; Moore, Michael J.; Mandala, Venkata S.; Douglas, Norman A.; Kates, Heather-Rose; Qi, Xinshuai; Brockington, Samuel F.; Soltis, Pamela S.; Soltis, Douglas E.; Gitzendanner, Matthew A.

    2013-01-01

    • Premise of the study: We explored a targeted enrichment strategy to facilitate rapid and low-cost next-generation sequencing (NGS) of numerous complete plastid genomes from across the phylogenetic breadth of angiosperms. • Methods and Results: A custom RNA probe set including the complete sequences of 22 previously sequenced eudicot plastomes was designed to facilitate hybridization-based targeted enrichment of eudicot plastid genomes. Using this probe set and an Agilent SureSelect targeted enrichment kit, we conducted an enrichment experiment including 24 angiosperms (22 eudicots, two monocots), which were subsequently sequenced on a single lane of the Illumina GAIIx with single-end, 100-bp reads. This approach yielded nearly complete to complete plastid genomes with exceptionally high coverage (mean coverage: 717×), even for the two monocots. • Conclusions: Our enrichment experiment was highly successful even though many aspects of the capture process employed were suboptimal. Hence, significant improvements to this methodology are feasible. With this general approach and probe set, it should be possible to sequence more than 300 essentially complete plastid genomes in a single Illumina GAIIx lane (achieving ∼50× mean coverage). However, given the complications of pooling numerous samples for multiplex sequencing and the limited number of barcodes (e.g., 96) available in commercial kits, we recommend 96 samples as a current practical maximum for multiplex plastome sequencing. This high-throughput approach should facilitate large-scale plastid genome sequencing at any level of phylogenetic diversity in angiosperms. PMID:25202518

  11. A Parallel and Distributed Processing Model of Joint Attention, Social-Cognition and Autism

    PubMed Central

    Mundy, Peter; Sullivan, Lisa; Mastergeorge, Ann M.

    2009-01-01

    Scientific Abstract The impaired development of joint attention is a cardinal feature of autism. Therefore, understanding the nature of joint attention is a central to research on this disorder. Joint attention may be best defined in terms of an information processing system that begins to develop by 4–6 months of age. This system integrates the parallel processing of internal information about one’s own visual attention with external information about the visual attention of other people. This type of joint encoding of information about self and other attention requires the activation of a distributed anterior and posterior cortical attention network. Genetic regulation, in conjunction with self-organizing behavioral activity guides the development of functional connectivity in this network. With practice in infancy the joint processing of self-other attention becomes automatically engaged as an executive function. It can be argued that this executive joint-attention is fundamental to human learning, as well as the development of symbolic thought, social-cognition and social-competence throughout the life span. One advantage of this parallel and distributed processing model of joint attention (PDPM) is that it directly connects theory on social pathology to a range of phenomenon in autism associated with neural connectivity, constructivist and connectionist models of cognitive development, early intervention, activity-dependent gene expression, and atypical ocular motor control. PMID:19358304

  12. A parallel and distributed-processing model of joint attention, social cognition and autism.

    PubMed

    Mundy, Peter; Sullivan, Lisa; Mastergeorge, Ann M

    2009-02-01

    The impaired development of joint attention is a cardinal feature of autism. Therefore, understanding the nature of joint attention is central to research on this disorder. Joint attention may be best defined in terms of an information-processing system that begins to develop by 4-6 months of age. This system integrates the parallel processing of internal information about one's own visual attention with external information about the visual attention of other people. This type of joint encoding of information about self and other attention requires the activation of a distributed anterior and posterior cortical attention network. Genetic regulation, in conjunction with self-organizing behavioral activity, guides the development of functional connectivity in this network. With practice in infancy the joint processing of self-other attention becomes automatically engaged as an executive function. It can be argued that this executive joint attention is fundamental to human learning as well as the development of symbolic thought, social cognition and social competence throughout the life span. One advantage of this parallel and distributed-processing model of joint attention is that it directly connects theory on social pathology to a range of phenomena in autism associated with neural connectivity, constructivist and connectionist models of cognitive development, early intervention, activity-dependent gene expression and atypical ocular motor control.

  13. An experimental research on the mixing process of supersonic oxygen-iodine parallel streams

    NASA Astrophysics Data System (ADS)

    Wang, Zengqiang; Sang, Fengting; Zhang, Yuelong; Hui, Xiaokang; Xu, Mingxiu; Zhang, Peng; Zhao, Weili; Fang, Benjie; Duo, Liping; Jin, Yuqi

    2014-12-01

    The O2(1Δ)/I2 mixing process is one of the most important steps in chemical oxygen-iodine laser (COIL). Based on the chemical fluorescence method (CFM), a diagnostic system was set up to image electronically excited fluorescent I2(B3П0) by means of a high speed camera. An optimized data analysis approach was proposed to analyze the mixing process of supersonic oxygen-iodine parallel streams, employing a set of qualitative and quantitative parameters and a proper percentage boundary threshold of the fluorescence zone. A slit nozzle bank with supersonic parallel streams and a trip tab set for enhancing the mixing process were designed and fabricated. With the diagnostic system and the data analysis approach, the performance of the trip tab set was examined and is demonstrated in this work. With the mixing enhancement, the fluorescence zone area was enlarged 3.75 times. We have studied the mixing process under different flow conditions and demonstrated the mixing properties with different iodine buffer gases, including N2, Ar, He and CO2. It was found that, among the four tested gases, Ar had the best penetration ability, whilst He showed the best free diffusion ability, and both of them could be well used as the buffer gas in our experiments. These experimental results can be useful for designing and optimizing COIL systems.

  14. Locality-Aware Parallel Process Mapping for Multi-Core HPC Systems

    SciTech Connect

    Hursey, Joshua J; Squyres, Jeffrey M.; Dontje, Terry

    2011-01-01

    High Performance Computing (HPC) systems are composed of servers containing an ever-increasing number of cores. With such high processor core counts, non-uniform memory access (NUMA) architectures are almost universally used to reduce inter-processor and memory communication bottlenecks by distributing processors and memory throughout a server-internal networking topology. Application studies have shown that the tuning of processes placement in a server s NUMA networking topology to the application can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale HPC applications. This paper presents the Locality-Aware Mapping Algorithm (LAMA) for distributing the individual processes of a parallel application across processing resources in an HPC system, paying particular attention to the internal server NUMA topologies. The algorithm is able to support both homogeneous and heterogeneous hardware systems, and dynamically adapts to the available hardware and user-specified process layout at run-time. As implemented in Open MPI, the LAMA provides 362,880 mapping permutations and is able to naturally scale out to additional hardware resources as they become available in future architectures.

  15. Parallel Optical Control of Spatiotemporal Neuronal Spike Activity Using High-Speed Digital Light Processing

    PubMed Central

    Jerome, Jason; Foehring, Robert C.; Armstrong, William E.; Spain, William J.; Heck, Detlef H.

    2011-01-01

    Neurons in the mammalian neocortex receive inputs from and communicate back to thousands of other neurons, creating complex spatiotemporal activity patterns. The experimental investigation of these parallel dynamic interactions has been limited due to the technical challenges of monitoring or manipulating neuronal activity at that level of complexity. Here we describe a new massively parallel photostimulation system that can be used to control action potential firing in in vitro brain slices with high spatial and temporal resolution while performing extracellular or intracellular electrophysiological measurements. The system uses digital light processing technology to generate 2-dimensional (2D) stimulus patterns with >780,000 independently controlled photostimulation sites that operate at high spatial (5.4 μm) and temporal (>13 kHz) resolution. Light is projected through the quartz–glass bottom of the perfusion chamber providing access to a large area (2.76 mm × 2.07 mm) of the slice preparation. This system has the unique capability to induce temporally precise action potential firing in large groups of neurons distributed over a wide area covering several cortical columns. Parallel photostimulation opens up new opportunities for the in vitro experimental investigation of spatiotemporal neuronal interactions at a broad range of anatomical scales. PMID:21904526

  16. Equivalency-processing parallel photonic integrated circuit (EP3IC): equivalence search module based on multiwavelength guided-wave technology.

    PubMed

    Detofsky, A; Choo, P Y; Louri, A

    2000-02-10

    We present an optoelectronic module called the equivalency-processing parallel photonic integrated circuit (EP(3)IC) that is created specifically to implement high-speed parallel equivalence searches (i.e., database word searches). The module combines a parallel-computation model with multiwavelength photonic integrated-circuit technology to achieve high-speed data processing. On the basis of simulation and initial analytical computation, a single-step multicomparand word-parallel bit-parallel equality search can attain an aggregate processing speed of 82 Tbit/s. We outline the theoretical design of the monolithic module and the integrated components and compare this with a functionally identical bulk-optics implementation. This integrated-circuit solution provides relatively low-power operation, fast switching speed, a compact system footprint, vibration tolerance, and ease of manufacturing.

  17. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    SciTech Connect

    Moreland, Kenneth; Geveci, Berk

    2014-11-01

    The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.

  18. A Theory of Interactive Parallel Processing: New Capacity Measures and Predictions for a Response Time Inequality Series

    ERIC Educational Resources Information Center

    Townsend, James T.; Wenger, Michael J.

    2004-01-01

    The authors present a theory of stochastic interactive parallel processing with special emphasis on channel interactions and their relation to system capacity. The approach is based both on linear systems theory augmented with stochastic elements and decisional operators and on a metatheory of parallel channels' dependencies that incorporates…

  19. Parallel Demand-Withdraw Processes in Family Therapy for Adolescent Drug Abuse

    PubMed Central

    Rynes, Kristina N.; Rohrbaugh, Michael J.; Lebensohn-Chialvo, Florencia; Shoham, Varda

    2013-01-01

    Isomorphism, or parallel process, occurs in family therapy when patterns of therapist-client interaction replicate problematic interaction patterns within the family. This study investigated parallel demand-withdraw processes in Brief Strategic Family Therapy (BSFT) for adolescent drug abuse, hypothesizing that therapist-demand/adolescent-withdraw interaction (TD/AW) cycles observed early in treatment would predict poor adolescent outcomes at follow-up for families who exhibited entrenched parent-demand/adolescent-withdraw interaction (PD/AW) before treatment began. Participants were 91 families who received at least 4 sessions of BSFT in a multi-site clinical trial on adolescent drug abuse (Robbins et al., 2011). Prior to receiving therapy, families completed videotaped family interaction tasks from which trained observers coded PD/AW. Another team of raters coded TD/AW during two early BSFT sessions. The main dependent variable was the number of drug use days that adolescents reported in Timeline Follow-Back interviews 7 to 12 months after family therapy began. Zero-inflated Poisson (ZIP) regression analyses supported the main hypothesis, showing that PD/AW and TD/AW interacted to predict adolescent drug use at follow-up. For adolescents in high PD/AW families, higher levels of TD/AW predicted significant increases in drug use at follow-up, whereas for low PD/AW families, TD/AW and follow-up drug use were unrelated. Results suggest that attending to parallel demand-withdraw processes in parent/adolescent and therapist/adolescent dyads may be useful in family therapy for substance-using adolescents. PMID:23438248

  20. One factor or two parallel processes? Comorbidity and development of adolescent anxiety and depressive disorder symptoms.

    PubMed

    Hale, William W; Raaijmakers, Quinten A W; Muris, Peter; van Hoof, Anne; Meeus, Wim H J

    2009-10-01

    This study investigates whether anxiety and depressive disorder symptoms of adolescents from the general community are best described by a model that assumes they are indicative of one general factor or by a model that assumes they are two distinct disorders with parallel growth processes. Additional analyses were conducted to explore the comorbidity of adolescent anxiety and depressive disorder symptoms and the effects that adolescent anxiety and depressive disorder symptoms have on each other's symptom severity growth. Two cohorts of early (N = 923; Age range 10-15 years; Mean age = 12.4, SD = .59; Girls = 49%) and middle adolescent (N = 390; Age range 16-20 years; Mean age = 16.7, SD = .80; Girls = 57%) boys and girls from the general community were prospectively studied annually for five years. These two adolescent cohorts were divided into five groups: one group at-risk for developing a specific anxiety disorder and four additional groups of healthy adolescents that differed in age and sex. Self-reported anxiety and depressive disorder symptoms were analyzed with latent growth modeling. Comparison of the fit statistics of the two models clearly demonstrates the superiority of the distinct disorders with parallel growth processes model above the one factor model. It was also demonstrated that the initial symptom severity of either anxiety or depression is predictive of the development of the other, though in different ways for the at-risk and healthy adolescent groups. The results of this study established that the development of anxiety and depressive disorder symptoms of adolescents from the general community occurs as two distinct disorders with parallel growth processes, each with their own unique growth characteristics.

  1. Parallel demand-withdraw processes in family therapy for adolescent drug abuse.

    PubMed

    Rynes, Kristina N; Rohrbaugh, Michael J; Lebensohn-Chialvo, Florencia; Shoham, Varda

    2014-06-01

    Isomorphism, or parallel process, occurs in family therapy when patterns of therapist-client interaction replicate problematic interaction patterns within the family. This study investigated parallel demand-withdraw processes in brief strategic family therapy (BSFT) for adolescent drug abuse, hypothesizing that therapist-demand/adolescent-withdraw interaction (TD/AW) cycles observed early in treatment would predict poor adolescent outcomes at follow-up for families who exhibited entrenched parent-demand/adolescent-withdraw interaction (PD/AW) before treatment began. Participants were 91 families who received at least four sessions of BSFT in a multisite clinical trial on adolescent drug abuse (Robbins et al., 2011). Prior to receiving therapy, families completed videotaped family interaction tasks from which trained observers coded PD/AW. Another team of raters coded TD/AW during two early BSFT sessions. The main dependent variable was the number of drug-use days that adolescents reported in timeline follow-back interviews 7 to 12 months after family therapy began. Zero-inflated Poisson regression analyses supported the main hypothesis, showing that PD/AW and TD/AW interacted to predict adolescent drug use at follow-up. For adolescents in high PD/AW families, higher levels of TD/AW predicted significant increases in drug use at follow-up, whereas for low PD/AW families, TD/AW and follow-up drug use were unrelated. Results suggest that attending to parallel demand-withdraw processes in parent-adolescent and therapist-adolescent dyads may be useful in family therapy for substance-using adolescents.

  2. Adventures in Parallel Processing: Entry, Descent and Landing Simulation for the Genesis and Stardust Missions

    NASA Technical Reports Server (NTRS)

    Lyons, Daniel T.; Desai, Prasun N.

    2005-01-01

    This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

  3. An Optimization System with Parallel Processing for Reducing Common-Mode Current on Electronic Control Unit

    NASA Astrophysics Data System (ADS)

    Okazaki, Yuji; Uno, Takanori; Asai, Hideki

    In this paper, we propose an optimization system with parallel processing for reducing electromagnetic interference (EMI) on electronic control unit (ECU). We adopt simulated annealing (SA), genetic algorithm (GA) and taboo search (TS) to seek optimal solutions, and a Spice-like circuit simulator to analyze common-mode current. Therefore, the proposed system can determine the adequate combinations of the parasitic inductance and capacitance values on printed circuit board (PCB) efficiently and practically, to reduce EMI caused by the common-mode current. Finally, we apply the proposed system to an example circuit to verify the validity and efficiency of the system.

  4. Parallel processing system for rapid analysis of speckle-photography and particle-image-velocimetry data.

    PubMed

    Huntley, J M; Goldrein, H T; Benckert, L R

    1993-06-10

    An automated system has been constructed to process double-exposure speckle-photography and particle-image-velocimetry images. A 3 × 3 array of laser beams probes the photograph, forming nine fringe patterns in parallel; these are then analyzed sequentially by digital computer and the use of a two-dimensional Fourier-transform method. Results are presented showing that the random errors in the measured displacements from such a system approach the expected speckle-noise-limited performance, with a total analysis time per displacement vector of 160 ms.

  5. Leveraging human oversight and intervention in large-scale parallel processing of open-source data

    NASA Astrophysics Data System (ADS)

    Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

    2015-05-01

    The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.

  6. Parallel photonic information processing at gigabyte per second data rates using transient states.

    PubMed

    Brunner, Daniel; Soriano, Miguel C; Mirasso, Claudio R; Fischer, Ingo

    2013-01-01

    The increasing demands on information processing require novel computational concepts and true parallelism. Nevertheless, hardware realizations of unconventional computing approaches never exceeded a marginal existence. While the application of optics in super-computing receives reawakened interest, new concepts, partly neuro-inspired, are being considered and developed. Here we experimentally demonstrate the potential of a simple photonic architecture to process information at unprecedented data rates, implementing a learning-based approach. A semiconductor laser subject to delayed self-feedback and optical data injection is employed to solve computationally hard tasks. We demonstrate simultaneous spoken digit and speaker recognition and chaotic time-series prediction at data rates beyond 1 Gbyte/s. We identify all digits with very low classification errors and perform chaotic time-series prediction with 10% error. Our approach bridges the areas of photonic information processing, cognitive and information science.

  7. Parallel photonic information processing at gigabyte per second data rates using transient states

    NASA Astrophysics Data System (ADS)

    Brunner, Daniel; Soriano, Miguel C.; Mirasso, Claudio R.; Fischer, Ingo

    2013-01-01

    The increasing demands on information processing require novel computational concepts and true parallelism. Nevertheless, hardware realizations of unconventional computing approaches never exceeded a marginal existence. While the application of optics in super-computing receives reawakened interest, new concepts, partly neuro-inspired, are being considered and developed. Here we experimentally demonstrate the potential of a simple photonic architecture to process information at unprecedented data rates, implementing a learning-based approach. A semiconductor laser subject to delayed self-feedback and optical data injection is employed to solve computationally hard tasks. We demonstrate simultaneous spoken digit and speaker recognition and chaotic time-series prediction at data rates beyond 1Gbyte/s. We identify all digits with very low classification errors and perform chaotic time-series prediction with 10% error. Our approach bridges the areas of photonic information processing, cognitive and information science.

  8. Towards a Standard Mixed-Signal Parallel Processing Architecture for Miniature and Microrobotics

    PubMed Central

    Sadler, Brian M; Hoyos, Sebastian

    2014-01-01

    The conventional analog-to-digital conversion (ADC) and digital signal processing (DSP) architecture has led to major advances in miniature and micro-systems technology over the past several decades. The outlook for these systems is significantly enhanced by advances in sensing, signal processing, communications and control, and the combination of these technologies enables autonomous robotics on the miniature to micro scales. In this article we look at trends in the combination of analog and digital (mixed-signal) processing, and consider a generalized sampling architecture. Employing a parallel analog basis expansion of the input signal, this scalable approach is adaptable and reconfigurable, and is suitable for a large variety of current and future applications in networking, perception, cognition, and control. PMID:26601042

  9. Towards a Standard Mixed-Signal Parallel Processing Architecture for Miniature and Microrobotics.

    PubMed

    Sadler, Brian M; Hoyos, Sebastian

    2014-01-01

    The conventional analog-to-digital conversion (ADC) and digital signal processing (DSP) architecture has led to major advances in miniature and micro-systems technology over the past several decades. The outlook for these systems is significantly enhanced by advances in sensing, signal processing, communications and control, and the combination of these technologies enables autonomous robotics on the miniature to micro scales. In this article we look at trends in the combination of analog and digital (mixed-signal) processing, and consider a generalized sampling architecture. Employing a parallel analog basis expansion of the input signal, this scalable approach is adaptable and reconfigurable, and is suitable for a large variety of current and future applications in networking, perception, cognition, and control.

  10. Parenting and the parallel processes in parents' counseling supervision for eating-related problems.

    PubMed

    Golan, Moria

    2014-04-01

    This paper presents an integrative model for supervising counselors of parents who face eating-related problems in their families. The model is grounded in the theory of parallel processes which occur during the supervision of health-care professionals as well as the counseling of parents and patients. The aim of this model is to conceptualize components and processes in the supervision space, in order to: (a) create a nurturing environment for health-care facilitators, parents and children, (b) better understand the complex and difficult nature of parenting, the challenge counselors face, and the skills and practices used in parenting and in counseling, and (c) better own practices and oppose the judgment that often dominates in counseling and supervision. This paper reflects upon the tradition of supervision and offers a comprehensive view of this process, including its challenges, skills and practices.

  11. Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

    PubMed Central

    Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

    2014-01-01

    Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868

  12. Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

    PubMed

    Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

    2014-07-01

    Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.

  13. Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

    NASA Astrophysics Data System (ADS)

    Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

    2014-07-01

    Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.

  14. Decomposition: A Strategy for Query Processing.

    ERIC Educational Resources Information Center

    Wong, Eugene; Youssefi, Karel

    Multivariable queries can be processed in the data base management system INGRES. The general procedure is to decompose the query into a sequence of one-variable queries using two processes. One process is reduction which requires breaking off components of the query which are joined to it by a single variable. The other process,…

  15. Efficient Process Migration for Parallel Processing on Non-Dedicated Networks of Workstations

    NASA Technical Reports Server (NTRS)

    Chanchio, Kasidit; Sun, Xian-He

    1996-01-01

    This paper presents the design and preliminary implementation of MpPVM, a software system that supports process migration for PVM application programs in a non-dedicated heterogeneous computing environment. New concepts of migration point as well as migration point analysis and necessary data analysis are introduced. In MpPVM, process migrations occur only at previously inserted migration points. Migration point analysis determines appropriate locations to insert migration points; whereas, necessary data analysis provides a minimum set of variables to be transferred at each migration pint. A new methodology to perform reliable point-to-point data communications in a migration environment is also discussed. Finally, a preliminary implementation of MpPVM and its experimental results are presented, showing the correctness and promising performance of our process migration mechanism in a scalable non-dedicated heterogeneous computing environment. While MpPVM is developed on top of PVM, the process migration methodology introduced in this study is general and can be applied to any distributed software environment.

  16. Investigation of Mediational Processes Using Parallel Process Latent Growth Curve Modeling.

    ERIC Educational Resources Information Center

    Cheong, JeeWon; MacKinnon, David P.; Khoo, Siek Toon

    2003-01-01

    Investigated a method to evaluate mediational processes using latent growth curve modeling and tested it with empirical data from a longitudinal steroid use prevention program focusing on 1,506 high school football players over 4 years. Findings suggest the usefulness of the approach. (SLD)

  17. Parallel processing and image analysis in the eyes of mantis shrimps.

    PubMed

    Cronin, T W; Marshall, J

    2001-04-01

    The compound eyes of mantis shrimps, a group of tropical marine crustaceans, incorporate principles of serial and parallel processing of visual information that may be applicable to artificial imaging systems. Their eyes include numerous specializations for analysis of the spectral and polarizational properties of light, and include more photoreceptor classes for analysis of ultraviolet light, color, and polarization than occur in any other known visual system. This is possible because receptors in different regions of the eye are anatomically diverse and incorporate unusual structural features, such as spectral filters, not seen in other compound eyes. Unlike eyes of most other animals, eyes of mantis shrimps must move to acquire some types of visual information and to integrate color and polarization with spatial vision. Information leaving the retina appears to be processed into numerous parallel data streams leading into the central nervous system, greatly reducing the analytical requirements at higher levels. Many of these unusual features of mantis shrimp vision may inspire new sensor designs for machine vision.

  18. Distributed representation of social odors indicates parallel processing in the antennal lobe of ants.

    PubMed

    Brandstaetter, Andreas Simon; Kleineidam, Christoph Johannes

    2011-11-01

    In colonies of eusocial Hymenoptera cooperation is organized through social odors, and particularly ants rely on a sophisticated odor communication system. Neuronal information about odors is represented in spatial activity patterns in the primary olfactory neuropile of the insect brain, the antennal lobe (AL), which is analog to the vertebrate olfactory bulb. The olfactory system is characterized by neuroanatomical compartmentalization, yet the functional significance of this organization is unclear. Using two-photon calcium imaging, we investigated the neuronal representation of multicomponent colony odors, which the ants assess to discriminate friends (nestmates) from foes (nonnestmates). In the carpenter ant Camponotus floridanus, colony odors elicited spatial activity patterns distributed across different AL compartments. Activity patterns in response to nestmate and nonnestmate colony odors were overlapping. This was expected since both consist of the same components at differing ratios. Colony odors change over time and the nervous system has to constantly adjust for this (template reformation). Measured activity patterns were variable, and variability was higher in response to repeated nestmate than to repeated nonnestmate colony odor stimulation. Variable activity patterns may indicate neuronal plasticity within the olfactory system, which is necessary for template reformation. Our results indicate that information about colony odors is processed in parallel in different neuroanatomical compartments, using the computational power of the whole AL network. Parallel processing might be advantageous, allowing reliable discrimination of highly complex social odors.

  19. Process performance of parallel bioreactors for batch cultivation of Streptomyces tendae.

    PubMed

    Hortsch, Ralf; Krispin, Harald; Weuster-Botz, Dirk

    2011-03-01

    Batch cultivations of the nikkomycin Z producer Streptomyces tendae were performed in three different parallel bioreactor systems (milliliter-scale stirred-tank reactors, shake flasks and shaken microtiter plate) in comparison to a standard liter-scale stirred-tank reactor as reference. Similar dry cell weight concentrations were measured as function of process time in stirred-tank reactors and shake flasks, whereas only poor growth was observed in the shaken microtiter plate. In contrast, the nikkomycin Z production differed significantly between the stirred and shaken bioreactors. The measured product concentrations and product formation kinetics were almost the same in the stirred-tank bioreactors of different scale. Much less nikkomycin Z was formed in the shake flasks and MTP cultivations, most probably due to oxygen limitations. To investigate the non-Newtonian shear-thinning behavior of the culture broth in small-scale bioreactors, a new and simple method was applied to estimate the rheological behavior. The apparent viscosities were found to be very similar in the stirred-tank bioreactors, whereas the apparent viscosity was up to two times increased in the shake flask cultivations due to a lower average shear rate of this reactor system. These data illustrate that different engineering characteristics of parallel bioreactors applied for process development can have major implications for scale-up of bioprocesses with non-Newtonian viscous culture broths.

  20. Accelerated multidimensional radiofrequency pulse design for parallel transmission using concurrent computation on multiple graphics processing units.

    PubMed

    Deng, Weiran; Yang, Cungeng; Stenger, V Andrew

    2011-02-01

    Multidimensional radiofrequency (RF) pulses are of current interest because of their promise for improving high-field imaging and for optimizing parallel transmission methods. One major drawback is that the computation time of numerically designed multidimensional RF pulses increases rapidly with their resolution and number of transmitters. This is critical because the construction of multidimensional RF pulses often needs to be in real time. The use of graphics processing units for computations is a recent approach for accelerating image reconstruction applications. We propose the use of graphics processing units for the design of multidimensional RF pulses including the utilization of parallel transmitters. Using a desktop computer with four NVIDIA Tesla C1060 computing processors, we found acceleration factors on the order of 20 for standard eight-transmitter two-dimensional spiral RF pulses with a 64 × 64 excitation resolution and a 10-μsec dwell time. We also show that even greater acceleration factors can be achieved for more complex RF pulses. Copyright © 2010 Wiley-Liss, Inc.

  1. A Parallel Process Growth Model of Avoidant Personality Disorder Symptoms and Personality Traits

    PubMed Central

    Wright, Aidan G. C.; Pincus, Aaron L.; Lenzenweger, Mark F.

    2012-01-01

    Background Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Methods Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, 2006), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. Results AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. Conclusions These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general. PMID:22506627

  2. A parallel process growth model of avoidant personality disorder symptoms and personality traits.

    PubMed

    Wright, Aidan G C; Pincus, Aaron L; Lenzenweger, Mark F

    2013-07-01

    Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, M. F., 2006, The longitudinal study of personality disorders: History, design considerations, and initial findings. Journal of Personality Disorders, 20, 645-670. doi:10.1521/pedi.2006.20.6.645), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general.

  3. Mobile Monitoring Data Processing and Analysis Strategies

    EPA Science Inventory

    The development of portable, high-time resolution instruments for measuring the concentrations of a variety of air pollutants has made it possible to collect data while in motion. This strategy, known as mobile monitoring, involves mounting air sensors on variety of different pla...

  4. Mobile Monitoring Data Processing & Analysis Strategies

    EPA Science Inventory

    The development of portable, high-time resolution instruments for measuring the concentrations of a variety of air pollutants has made it possible to collect data while in motion. This strategy, known as mobile monitoring, involves mounting air sensors on variety of different pla...

  5. Mobile Monitoring Data Processing & Analysis Strategies

    EPA Science Inventory

    The development of portable, high-time resolution instruments for measuring the concentrations of a variety of air pollutants has made it possible to collect data while in motion. This strategy, known as mobile monitoring, involves mounting air sensors on variety of different pla...

  6. Mobile Monitoring Data Processing and Analysis Strategies

    EPA Science Inventory

    The development of portable, high-time resolution instruments for measuring the concentrations of a variety of air pollutants has made it possible to collect data while in motion. This strategy, known as mobile monitoring, involves mounting air sensors on variety of different pla...

  7. Recurrent modification of floral morphology in heterantherous Solanum reveals a parallel shift in reproductive strategy

    PubMed Central

    Vallejo-Marín, Mario; Walker, Catriona; Friston-Reilly, Philip; Solís-Montero, Lislie; Igic, Boris

    2014-01-01

    Floral morphology determines the pattern of pollen transfer within and between individuals. In hermaphroditic species, the spatial arrangement of sexual organs influences the rate of self-pollination as well as the placement of pollen in different areas of the pollinator's body. Studying the evolutionary modification of floral morphology in closely related species offers an opportunity to investigate the causes and consequences of floral variation. Here, we investigate the recurrent modification of flower morphology in three closely related pairs of taxa in Solanum section Androceras (Solanaceae), a group characterized by the presence of two morphologically distinct types of anthers in the same flower (heteranthery). We use morphometric analyses of plants grown in a common garden to characterize and compare the changes in floral morphology observed in parallel evolutionary transitions from relatively larger to smaller flowers. Our results indicate that the transition to smaller flowers is associated with a reduction in the spatial separation of anthers and stigma, changes in the allometric relationships among floral traits, shifts in pollen allocation to the two anther morphs and reduced pollen : ovule ratios. We suggest that floral modification in this group reflects parallel evolution towards increased self-fertilization and discuss potential selective scenarios that may favour this recurrent shift in floral morphology and function. PMID:25002701

  8. High speed image space parallel processing for computer-generated integral imaging system.

    PubMed

    Kwon, Ki-Chul; Park, Chan; Erdenebat, Munkh-Uchral; Jeong, Ji-Seong; Choi, Jeong-Hun; Kim, Nam; Park, Jae-Hyeung; Lim, Young-Tae; Yoo, Kwan-Hee

    2012-01-16

    In an integral imaging display, the computer-generated integral imaging method has been widely used to create the elemental images from a given three-dimensional object data. Long processing time, however, has been problematic especially when the three-dimensional object data set or the number of the elemental lenses are large. In this paper, we propose an image space parallel processing method, which is implemented by using Open Computer Language (OpenCL) for rapid generation of the elemental images sets from large three-dimensional volume data. Using the proposed technique, it is possible to realize a real-time interactive integral imaging display system for 3D volume data constructed from computational tomography (CT) or magnetic resonance imaging (MRI) data.

  9. A parallel-processing approach to computing for the geographic sciences

    USGS Publications Warehouse

    Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark

    2001-01-01

    The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.

  10. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics.

    PubMed

    Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter

    2015-01-20

    While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.

  11. Teaching ethics to engineers: ethical decision making parallels the engineering design process.

    PubMed

    Bero, Bridget; Kuhlman, Alana

    2011-09-01

    In order to fulfill ABET requirements, Northern Arizona University's Civil and Environmental engineering programs incorporate professional ethics in several of its engineering courses. This paper discusses an ethics module in a 3rd year engineering design course that focuses on the design process and technical writing. Engineering students early in their student careers generally possess good black/white critical thinking skills on technical issues. Engineering design is the first time students are exposed to "grey" or multiple possible solution technical problems. To identify and solve these problems, the engineering design process is used. Ethical problems are also "grey" problems and present similar challenges to students. Students need a practical tool for solving these ethical problems. The step-wise engineering design process was used as a model to demonstrate a similar process for ethical situations. The ethical decision making process of Martin and Schinzinger was adapted for parallelism to the design process and presented to students as a step-wise technique for identification of the pertinent ethical issues, relevant moral theories, possible outcomes and a final decision. Students had greatest difficulty identifying the broader, global issues presented in an ethical situation, but by the end of the module, were better able to not only identify the broader issues, but also to more comprehensively assess specific issues, generate solutions and a desired response to the issue.

  12. Integration of optoelectronic technologies for chip-to- chip interconnections and parallel pipeline processing

    NASA Astrophysics Data System (ADS)

    Wu, Jenming

    Digital information services such as multimedia systems and data communications require the processing and transfer of tremendous amount of data. These data need to be stored, accessed and delivered efficiently and reliably at high speed for various user applications. This represents a great challenge for current electronic systems. Electronics is effective in providing high performance processing and computation, but its input/outputs (I/Os) bandwidth is unable to scale with its processing power. The signal I/Os or interconnections are needed between processors and input devices, between processors for multiprocessor systems, and between processors and storage devices. Novel chip-to-chip interconnect technologies are needed to meet this challenge. This work integrates optoelectronic technologies for chip-to-chip interconnects and parallel pipeline processing. Photonic and electronic technologies are complementary to each other in the sense that electronics is more suitable for high-speed, low cost computation, and photonics is more suitable for high-bandwidth information transmission. Smart pixel technology uses electronics for logic switching and optics for chip-to- chip interconnects, thus combining the abilities of photonics and electronics nicely. This work describes both vertical and horizontal integration of smart pixel technologies for chip-to-chip optical interconnects and its applications. We present smart pixel VLSI designs in both hybrid CMOS/MQW smart pixel and monolithic GaAs smart pixel technologies. We use the CMOS/MQW technology for smart pixel array cellular logic (SPARCL) processors for SIMD parallel pipeline processing. We have tested the chip and constructed a prototype system for device characterization and system demonstration. We have verified the functionality of the system and characterized the electrical functions of the chip and the optoelectronic properties of the MQW devices. We have developed algorithms that utilize SPARCL for various

  13. MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

    PubMed Central

    Díaz, David; Esteban, Francisco J.; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio

    2014-01-01

    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification

  14. Rankings in Institutional Strategies and Processes: Impact or Illusion?

    ERIC Educational Resources Information Center

    Hazelkorn, Ellen; Loukkola, Tia; Zhang, Thérèse

    2014-01-01

    The "Rankings in Institutional Strategies and Processes" (RISP) project is the first pan-European study of the impact and influence of rankings on European higher education institutions. The project has sought to build understanding of how rankings impact and influence the development of institutional strategies and processes and its…

  15. Rankings in Institutional Strategies and Processes: Impact or Illusion?

    ERIC Educational Resources Information Center

    Hazelkorn, Ellen; Loukkola, Tia; Zhang, Thérèse

    2014-01-01

    The "Rankings in Institutional Strategies and Processes" (RISP) project is the first pan-European study of the impact and influence of rankings on European higher education institutions. The project has sought to build understanding of how rankings impact and influence the development of institutional strategies and processes and its…

  16. Surface preparation strategies for improved parallelization and reproducible MALDI-TOF MS ligand binding assays.

    PubMed

    Roth, Michael J; Maresh, Erica M; Plymire, Daniel A; Zhang, Junmei; Corbett, John R; Robbins, Roger; Patrie, Steven M

    2013-01-01

    Immunoassays are employed in academia and the healthcare and biotech industries for high-throughput, quantitative screens of biomolecules. We have developed monolayer-based immunoassays for MALDI-TOF MS. To improve parallelization, we adapted the workflow to photolithography-generated arrays. Our work shows Parylene-C coatings provide excellent "solvent pinning" for reagents and biofluids, enabling sensitive MS detection of immobilized components. With a unique MALDI-matrix crystallization technique we show routine interassay RSD <10% at picomolar concentrations and highlight platform compatibility for relative and label-free quantitation applications. Parylene-arrays provide high sample densities and promise screening throughputs exceeding 1000 samples/h with modern liquid-handlers and MALDI-TOF systems.

  17. FPGA implementation of current-sharing strategy for parallel-connected SEPICs

    NASA Astrophysics Data System (ADS)

    Ezhilarasi, A.; Ramaswamy, M.

    2016-01-01

    The attempt echoes to evolve an equal current-sharing algorithm over a number of single-ended primary inductance converters connected in parallel. The methodology involves the development of state-space model to predict the condition for the existence of a stable equilibrium portrait. It acquires the role of a variable structure controller to guide the trajectory, with a view to circumvent the circuit non-linearities and arrive at a stable performance through a preferred operating range. The design elicits an acceptable servo and regulatory characteristics, the desired time response and ensures regulation of the load voltage. The simulation results validated through a field programmable gate array-based prototype serves to illustrate its suitability for present-day applications.

  18. Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy

    PubMed Central

    Tian, Yuling; Zhang, Hongxian

    2016-01-01

    For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic–there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions. PMID:27487242

  19. Neural processes in symmetry perception: a parallel spatio-temporal model.

    PubMed

    Zhu, Tao

    2014-04-01

    Symmetry is usually computationally expensive to detect reliably, while it is relatively easy to perceive. In spite of many attempts to understand the neurofunctional properties of symmetry processing, no symmetry-specific activation was found in earlier cortical areas. Psychophysical evidence relating to the processing mechanisms suggests that the basic processes of symmetry perception would not perform a serial, point-by-point comparison of structural features but rather operate in parallel. Here, modeling of neural processes in psychophysical detection of bilateral texture symmetry is considered. A simple fine-grained algorithm that is capable of performing symmetry estimation without explicit comparison of remote elements is introduced. A computational model of symmetry perception is then described to characterize the underlying mechanisms as one-dimensional spatio-temporal neural processes, each of which is mediated by intracellular horizontal connections in primary visual cortex and adopts the proposed algorithm for the neural computation. Simulated experiments have been performed to show the efficiency and the dynamics of the model. Model and human performances are comparable for symmetry perception of intensity images. Interestingly, the responses of V1 neurons to propagation activities reflecting higher-order perceptual computations have been reported in neurophysiologic experiments.

  20. P3BSseq: parallel processing pipeline software for automatic analysis of bisulfite sequencing data.

    PubMed

    Luu, Phuc-Loi; Gerovska, Daniela; Arrospide-Elgarresta, Mikel; Retegi-Carrión, Sugoi; Schöler, Hans R; Araúzo-Bravo, Marcos J

    2017-02-01

    Bisulfite sequencing (BSseq) processing is among the most cumbersome next generation sequencing (NGS) applications. Though some BSseq processing tools are available, they are scattered, require puzzling parameters and are running-time and memory-usage demanding. We developed P3BSseq, a parallel processing pipeline for fast, accurate and automatic analysis of BSseq reads that trims, aligns, annotates, records the intermediate results, performs bisulfite conversion quality assessment, generates BED methylome and report files following the NIH standards. P3BSseq outperforms the known BSseq mappers regarding running time, computer hardware requirements (processing power and memory use) and is optimized to process the upcoming, extended BSseq reads. We optimized the P3BSseq parameters for directional and non-directional libraries, and for single-end and paired-end reads of Whole Genome and Reduced Representation BSseq. P3BSseq is a user-friendly streamlined solution for BSseq upstream analysis, requiring only basic computer and NGS knowledge. P3BSseq binaries and documentation are available at: http://sourceforge.net/p/p3bsseq/wiki/Home/ mararabra@yahoo.co.uk Supplementary data are available at Bioinformatics online.

  1. Method of moment solutions to scattering problems in a parallel processing environment

    NASA Technical Reports Server (NTRS)

    Cwik, Tom; Partee, Jonathan; Patterson, Jean

    1991-01-01

    This paper describes the implementation of a parallelized method of moments (MOM) code into an interactive workstation environment. The workstation allows interactive solid body modeling and mesh generation, MOM analysis, and the graphical display of results. After describing the parallel computing environment, the implementation and results of parallelizing a general MOM code are presented in detail.

  2. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain–Computer Interface Feature Extraction

    PubMed Central

    Wilson, J. Adam; Williams, Justin C.

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain–computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix–matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times. PMID:19636394

  3. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

    PubMed

    Wilson, J Adam; Williams, Justin C

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  4. Process control strategies key to refining operations

    SciTech Connect

    1996-07-22

    Panelists and attendees at the most recent National Petroleum Refiners Association Question and Answer Session on Refining and Petrochemical Technology discussed process control issues in detail. Participants shared their experiences on: personal computers (PCs) in process control; programmable logic control issues; neural networks; fieldbus technology; and statistical analyses of refinery data. Questions and answers on each of these subjects are presented.

  5. Improving Learning Processes: Principles, Strategies and Techniques.

    ERIC Educational Resources Information Center

    Cox, Philip

    This guide, which examines the relationship between learning processes and learning outcomes, is aimed at senior managers, quality managers, and others at colleges and other post-16 learning providers in the United Kingdom. It is intended to help them define the key processes undertaken by learning providers, understand the critical relationships…

  6. On Cognitive Strategies for Processing Text.

    ERIC Educational Resources Information Center

    Rigney, Joseph W.; Munro, Allen

    Recent developments in cognitive psychology and artificial intelligence have shown that various types of prior knowledge play important roles in understanding during text processing and have resulted in a new kind of model for conceptual processing, "procedural semantics." This paper discusses two types of units, or schemata, which,…

  7. MiniGhost : a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing.

    SciTech Connect

    Barrett, Richard Frederick; Heroux, Michael Allen; Vaughan, Courtenay Thomas

    2012-04-01

    A broad range of scientific computation involves the use of difference stencils. In a parallel computing environment, this computation is typically implemented by decomposing the spacial domain, inducing a 'halo exchange' of process-owned boundary data. This approach adheres to the Bulk Synchronous Parallel (BSP) model. Because commonly available architectures provide strong inter-node bandwidth relative to latency costs, many codes 'bulk up' these messages by aggregating data into a message as a means of reducing the number of messages. A renewed focus on non-traditional architectures and architecture features provides new opportunities for exploring alternatives to this programming approach. In this report we describe miniGhost, a 'miniapp' designed for exploration of the capabilities of current as well as emerging and future architectures within the context of these sorts of applications. MiniGhost joins the suite of miniapps developed as part of the Mantevo project.

  8. Task-determined strategies of visual process.

    PubMed

    Geiger, G; Lettvin, J Y; Zegarra-Moran, O

    1992-06-01

    Lateral masking in the peripheral field of vision obscures letter recognition and is not accounted for by diminished acuity. In measuring lateral masking between letters in the peripheral visual field we accidentally discovered that ordinary readers and severe dyslexics differ markedly in tachistoscopic letter recognition tasks. Tests were devised to measure the differences accurately. Ordinary readers recognize letters best in and near the center of gaze. Recognition falls off rapidly with angular distance in the peripheral field. Severe dyslexics recognize letters farther in the periphery in the direction of reading (English-natives to the right, Hebrew-natives to the left). They have marked lateral masking in and near the center of the field when letters are presented in aggregates. With dyslexia as an example, we proposed that the distribution of lateral masking is a task-dependent strategy in visual perception. To test this notion we designed an active practise regimen for 4 severe adult dyslexics, who within a few months improved sharply in reading. At the same time their test results changed to those of ordinary readers. We conclude that there are switchable task-determined pre-cognitive strategies of vision that can be learned and that the distribution of lateral masking may be part of what is learned.

  9. Distinct cerebellar lobules process arousal, valence and their interaction in parallel following a temporal hierarchy.

    PubMed

    Styliadis, Charis; Ioannides, Andreas A; Bamidis, Panagiotis D; Papadelis, Christos

    2015-04-15

    The cerebellum participates in emotion-related neural circuits formed by different cortical and subcortical areas, which sub-serve arousal and valence. Recent neuroimaging studies have shown a functional specificity of cerebellar lobules in the processing of emotional stimuli. However, little is known about the temporal component of this process. The goal of the current study is to assess the spatiotemporal profile of neural responses within the cerebellum during the processing of arousal and valence. We hypothesized that the excitation and timing of distinct cerebellar lobules is influenced by the emotional content of the stimuli. By using magnetoencephalography, we recorded magnetic fields from twelve healthy human individuals while passively viewing affective pictures rated along arousal and valence. By using a beamformer, we localized gamma-band activity in the cerebellum across time and we related the foci of activity to the anatomical organization of the cerebellum. Successive cerebellar activations were observed within distinct lobules starting ~160ms after the stimuli onset. Arousal was processed within both vermal (VI and VIIIa) and hemispheric (left Crus II) lobules. Valence (left VI) and its interaction (left V and left Crus I) with arousal were processed only within hemispheric lobules. Arousal processing was identified first at early latencies (160ms) and was long-lived (until 980ms). In contrast, the processing of valence and its interaction to arousal was short lived at later stages (420-530ms and 570-640ms respectively). Our findings provide for the first time evidence that distinct cerebellar lobules process arousal, valence, and their interaction in a parallel yet temporally hierarchical manner determined by the emotional content of the stimuli.

  10. Parallel evolution of the make–accumulate–consume strategy in Saccharomyces and Dekkera yeasts

    PubMed Central

    Rozpędowska, Elżbieta; Hellborg, Linda; Ishchuk, Olena P.; Orhan, Furkan; Galafassi, Silvia; Merico, Annamaria; Woolfit, Megan; Compagno, Concetta; Piškur, Jure

    2011-01-01

    Saccharomyces yeasts degrade sugars to two-carbon components, in particular ethanol, even in the presence of excess oxygen. This characteristic is called the Crabtree effect and is the background for the 'make–accumulate–consume' life strategy, which in natural habitats helps Saccharomyces yeasts to out-compete other microorganisms. A global promoter rewiring in the Saccharomyces cerevisiae lineage, which occurred around 100 mya, was one of the main molecular events providing the background for evolution of this strategy. Here we show that the Dekkera bruxellensis lineage, which separated from the Saccharomyces yeasts more than 200 mya, also efficiently makes, accumulates and consumes ethanol and acetic acid. Analysis of promoter sequences indicates that both lineages independently underwent a massive loss of a specific cis-regulatory element from dozens of genes associated with respiration, and we show that also in D. bruxellensis this promoter rewiring contributes to the observed Crabtree effect. PMID:21556056

  11. Parallel evolution of the make-accumulate-consume strategy in Saccharomyces and Dekkera yeasts.

    PubMed

    Rozpędowska, Elzbieta; Hellborg, Linda; Ishchuk, Olena P; Orhan, Furkan; Galafassi, Silvia; Merico, Annamaria; Woolfit, Megan; Compagno, Concetta; Piskur, Jure

    2011-01-01

    Saccharomyces yeasts degrade sugars to two-carbon components, in particular ethanol, even in the presence of excess oxygen. This characteristic is called the Crabtree effect and is the background for the 'make-accumulate-consume' life strategy, which in natural habitats helps Saccharomyces yeasts to out-compete other microorganisms. A global promoter rewiring in the Saccharomyces cerevisiae lineage, which occurred around 100 mya, was one of the main molecular events providing the background for evolution of this strategy. Here we show that the Dekkera bruxellensis lineage, which separated from the Saccharomyces yeasts more than 200 mya, also efficiently makes, accumulates and consumes ethanol and acetic acid. Analysis of promoter sequences indicates that both lineages independently underwent a massive loss of a specific cis-regulatory element from dozens of genes associated with respiration, and we show that also in D. bruxellensis this promoter rewiring contributes to the observed Crabtree effect.

  12. Parallel processing implementations of a contextual classifier for multispectral remote sensing data

    NASA Technical Reports Server (NTRS)

    Siegel, H. J.; Swain, P. H.; Smith, B. W.

    1980-01-01

    The applicability of parallel processing schemes to the implementation of a contextual classification algorithm which exploits the spatial and spectral context of a multispectral remote sensing pixel to achieve classification is examined. Two algorithms for classifying each multivariate pixel taking into account the probable classifications of neighboring pixels are presented which make use of a size three horizontally linear neighborhood, and the serial computational complexity of the more efficient algorithm is shown to grow in proportion to the number of pixels and the cube of the number of possible categories. The implementation of the more efficient algorithm on a CDC Flexible Processor system and on a multimicroprocessor system such as the proposed PASM is then discussed. It is noted that the use of N processors to perform the calculations N times faster than a single processor overcomes the principal disadvantage of contexual classifiers, i.e., their computational complexity.

  13. Creation of the BMA ensemble for SST using a parallel processing technique

    NASA Astrophysics Data System (ADS)

    Kim, Kwangjin; Lee, Yang Won

    2013-10-01

    Despite the same purpose, each satellite product has different value because of its inescapable uncertainty. Also the satellite products have been calculated for a long time, and the kinds of the products are various and enormous. So the efforts for reducing the uncertainty and dealing with enormous data will be necessary. In this paper, we create an ensemble Sea Surface Temperature (SST) using MODIS Aqua, MODIS Terra and COMS (Communication Ocean and Meteorological Satellite). We used Bayesian Model Averaging (BMA) as ensemble method. The principle of the BMA is synthesizing the conditional probability density function (PDF) using posterior probability as weight. The posterior probability is estimated using EM algorithm. The BMA PDF is obtained by weighted average. As the result, the ensemble SST showed the lowest RMSE and MAE, which proves the applicability of BMA for satellite data ensemble. As future work, parallel processing techniques using Hadoop framework will be adopted for more efficient computation of very big satellite data.

  14. Mean-field analysis for parallel asymmetric exclusion process with anticipation effect.

    PubMed

    Hao, Qing-Yi; Jiang, Rui; Hu, Mao-Bin; Wu, Qing-Song

    2010-08-01

    This paper studies an extended parallel asymmetric exclusion process, in which the anticipation effect is taken into account. The fundamental diagram of the model has been investigated via cluster mean field analysis. Different from previous mean field analysis, in which the n -cluster probabilities P(σ{i},…,σ{i+n-1}) involve the (n+2) -cluster probabilities P(τ{i-1},…,τ{i+n}) , our mean-field analysis is asymmetric because the three-cluster probabilities P(σ{i},σ{i+1},σ{i+2}) involve the six-cluster probabilities P(τ{i-1},…,τ{i+4}) . We find an excellent agreement between Monte Carlo simulations and cluster mean field analysis, which indicates that the mean field analysis might give the exact expression.

  15. A neurally plausible parallel distributed processing model of event-related potential word reading data.

    PubMed

    Laszlo, Sarah; Plaut, David C

    2012-03-01

    The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between explicit, computational models and physiological data collected during the performance of cognitive tasks, we developed a PDP model of visual word recognition which simulates key results from the ERP reading literature, while simultaneously being able to successfully perform lexical decision-a benchmark task for reading models. Simulations reveal that the model's success depends on the implementation of several neurally plausible features in its architecture which are sufficiently domain-general to be relevant to cognitive modeling more generally.

  16. Parallel-Processing CMOS Circuitry for M-QAM and 8PSK TCM

    NASA Technical Reports Server (NTRS)

    Gray, Andrew; Lee, Dennis; Hoy, Scott; Fisher, Dave; Fong, Wai; Ghuman, Parminder

    2009-01-01

    There has been some additional development of parts reported in "Multi-Modulator for Bandwidth-Efficient Communication" (NPO-40807), NASA Tech Briefs, Vol. 32, No. 6 (June 2009), page 34. The focus was on 1) The generation of M-order quadrature amplitude modulation (M-QAM) and octonary-phase-shift-keying, trellis-coded modulation (8PSK TCM), 2) The use of square-root raised-cosine pulse-shaping filters, 3) A parallel-processing architecture that enables low-speed [complementary metal oxide/semiconductor (CMOS)] circuitry to perform the coding, modulation, and pulse-shaping computations at a high rate; and 4) Implementation of the architecture in a CMOS field-programmable gate array.

  17. When parallel processing in visual word recognition is not enough: new evidence from naming.

    PubMed

    Roberts, Martha Anne; Rastle, Kathleen; Coltheart, Max; Besner, Derek

    2003-06-01

    Low-frequency irregular words are named more slowly and are more error prone than low-frequency regular words (the regularity effect). Rastle and Coltheart (1999) reported that this irregularity cost is modulated by the serial position of the irregular grapheme-phoneme correspondence, such that words with early irregularities exhibit a larger cost than words with late ones. They argued that these data implicate rule-based serial processing, and they also reported a successful simulation with a model that has a rule-based serial component--the DRC model of reading aloud (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001). However, Zorzi (2000) also simulated these data with a model that operates solely in parallel. Furthermore, Kwantes and Mewhort (1999) simulated these data with a serial processing model that has no rules for converting orthography to phonology. The human data reported by Rastle and Coltheart therefore neither require a serial processing account, nor successfully discriminate among a number of computational models of reading aloud. New data are presented wherein an interaction between the effects of regularity and serial position of irregularity is again reported for human readers. The DRC model simulated this interaction; no other implemented computational model does so. The present results are thus consistent with rule-based serial processing in reading aloud.

  18. Improved object segmentation using Markov random fields, artificial neural networks, and parallel processing techniques

    NASA Astrophysics Data System (ADS)

    Foulkes, Stephen B.; Booth, David M.

    1997-07-01

    Object segmentation is the process by which a mask is generated which identifies the area of an image which is occupied by an object. Many object recognition techniques depend on the quality of such masks for shape and underlying brightness information, however, segmentation remains notoriously unreliable. This paper considers how the image restoration technique of Geman and Geman can be applied to the improvement of object segmentations generated by a locally adaptive background subtraction technique. Also presented is how an artificial neural network hybrid, consisting of a single layer Kohonen network with each of its nodes connected to a different multi-layer perceptron, can be used to approximate the image restoration process. It is shown that the restoration techniques are very well suited for parallel processing and in particular the artificial neural network hybrid has the potential for near real time image processing. Results are presented for the detection of ships in SPOT panchromatic imagery and the detection of vehicles in infrared linescan images, these being a fair representation of the wider class of problem.

  19. Real time decision support system for diagnosis of rare cancers, trained in parallel, on a graphics processing unit.

    PubMed

    Sidiropoulos, Konstantinos; Glotsos, Dimitrios; Kostopoulos, Spiros; Ravazoula, Panagiota; Kalatzis, Ioannis; Cavouras, Dionisis; Stonham, John

    2012-04-01

    In the present study a new strategy is introduced for designing and developing of an efficient dynamic Decision Support System (DSS) for supporting rare cancers decision making. The proposed DSS operates on a Graphics Processing Unit (GPU) and it is capable of adjusting its design in real time based on user-defined clinical questions in contrast to standard CPU implementations that are limited by processing and memory constrains. The core of the proposed DSS was a Probabilistic Neural Network classifier and was evaluated on 140 rare brain cancer cases, regarding its ability to predict tumors' malignancy, using a panel of 20 morphological and textural features Generalization was estimated using an external 10-fold cross-validation. The proposed GPU-based DSS achieved significantly higher training speed, outperforming the CPU-based system by a factor that ranged from 267 to 288 times. System design was optimized using a combination of 4 textural and morphological features with 78.6% overall accuracy, whereas system generalization was 73.8%±3.2%. By exploiting the inherently parallel architecture of a consumer level GPU, the proposed approach enables real time, optimal design of a DSS for any user-defined clinical question for improving diagnostic assessments, prognostic relevance and concordance rates for rare cancers in clinical practice. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. Calculating Floquet states of large quantum systems: A parallelization strategy and its cluster implementation

    NASA Astrophysics Data System (ADS)

    Laptyeva, T. V.; Kozinov, E. A.; Meyerov, I. B.; Ivanchenko, M. V.; Denisov, S. V.; Hänggi, P.

    2016-04-01

    We present a numerical approach to calculate non-equilibrium eigenstates of a periodically time-modulated quantum system. The approach is based on the use of a chain of single-step propagating operators. Each operator is time-specific and constructed by combining the Magnus expansion of the time-dependent system Hamiltonian with the Chebyshev expansion of an operator exponent. The construction of the unitary Floquet operator, which evolves a system state over the full modulation period, is performed by propagating the identity matrix over the period. The independence of the evolution of basis vectors makes the propagation stage suitable for realization on a parallel cluster. Once the propagation stage is completed, a routine diagonalization of the Floquet matrix is performed. Finally, an additional propagation round, now involving the eigenvectors as the initial states, allows to resolve the time-dependence of the Floquet states and calculate their characteristics. We demonstrate the accuracy and scalability of the algorithm by applying it to calculate the Floquet states of two quantum models, namely (i) a synthesized random-matrix Hamiltonian and (ii) a many-body Bose-Hubbard dimer, both of the size up to 104 states.

  1. Molecular tailoring approach for geometry optimization of large molecules: energy evaluation and parallelization strategies.

    PubMed

    Ganesh, V; Dongare, Rameshwar K; Balanarayan, P; Gadre, Shridhar R

    2006-09-14

    A linear-scaling scheme for estimating the electronic energy, gradients, and Hessian of a large molecule at ab initio level of theory based on fragment set cardinality is presented. With this proposition, a general, cardinality-guided molecular tailoring approach (CG-MTA) for ab initio geometry optimization of large molecules is implemented. The method employs energy gradients extracted from fragment wave functions, enabling computations otherwise impractical on PC hardware. Further, the method is readily amenable to large scale coarse-grain parallelization with minimal communication among nodes, resulting in a near-linear speedup. CG-MTA is applied for density-functional-theory-based geometry optimization of a variety of molecules including alpha-tocopherol, taxol, gamma-cyclodextrin, and two conformations of polyglycine. In the tests performed, energy and gradient estimates obtained from CG-MTA during optimization runs show an excellent agreement with those obtained from actual computation. Accuracy of the Hessian obtained employing CG-MTA provides good hope for the application of Hessian-based geometry optimization to large molecules.

  2. Molecular tailoring approach for geometry optimization of large molecules: Energy evaluation and parallelization strategies

    NASA Astrophysics Data System (ADS)

    Ganesh, V.; Dongare, Rameshwar K.; Balanarayan, P.; Gadre, Shridhar R.

    2006-09-01

    A linear-scaling scheme for estimating the electronic energy, gradients, and Hessian of a large molecule at ab initio level of theory based on fragment set cardinality is presented. With this proposition, a general, cardinality-guided molecular tailoring approach (CG-MTA) for ab initio geometry optimization of large molecules is implemented. The method employs energy gradients extracted from fragment wave functions, enabling computations otherwise impractical on PC hardware. Further, the method is readily amenable to large scale coarse-grain parallelization with minimal communication among nodes, resulting in a near-linear speedup. CG-MTA is applied for density-functional-theory-based geometry optimization of a variety of molecules including α-tocopherol, taxol, γ-cyclodextrin, and two conformations of polyglycine. In the tests performed, energy and gradient estimates obtained from CG-MTA during optimization runs show an excellent agreement with those obtained from actual computation. Accuracy of the Hessian obtained employing CG-MTA provides good hope for the application of Hessian-based geometry optimization to large molecules.

  3. Parallelism and Epistasis in Skeletal Evolution Identified through Use of Phylogenomic Mapping Strategies

    PubMed Central

    Daane, Jacob M.; Rohner, Nicolas; Konstantinidis, Peter; Djuranovic, Sergej; Harris, Matthew P.

    2016-01-01

    The identification of genetic mechanisms underlying evolutionary change is critical to our understanding of natural diversity, but is presently limited by the lack of genetic and genomic resources for most species. Here, we present a new comparative genomic approach that can be applied to a broad taxonomic sampling of nonmodel species to investigate the genetic basis of evolutionary change. Using our analysis pipeline, we show that duplication and divergence of fgfr1a is correlated with the reduction of scales within fishes of the genus Phoxinellus. As a parallel genetic mechanism is observed in scale-reduction within independent lineages of cypriniforms, our finding exposes significant developmental constraint guiding morphological evolution. In addition, we identified fixed variation in fgf20a within Phoxinellus and demonstrated that combinatorial loss-of-function of fgfr1a and fgf20a within zebrafish phenocopies the evolved scalation pattern. Together, these findings reveal epistatic interactions between fgfr1a and fgf20a as a developmental mechanism regulating skeletal variation among fishes. PMID:26452532

  4. Innovative materials processing strategies: A biomimetic approach

    SciTech Connect

    Heuer, A.H.; Blackwell, J.; Caplan, A.I. ); Fink, D.J. ); Laraia, V.J. ); Arias, J.L. ); Calvert, P.D. ); Kendall, K. ); Messing, G.L. ); Rieke, P.C. ); Thompson, D.H. ); Wheeler, A.P. ); Veis, A. )

    1992-02-28

    Many organisms construct structural ceramic (biomineral) composites from seemingly mundane materials; cell-mediated processes control both the nucleation and growth of mineral and the development of composite microarchitecture. Living systems fabricate biocomposites by: confining biomineralization within specific subunit compartments; producing a specific mineral with defined crystal size and orientation; and packaging many incremental units together in a moving front process to form fully densified, macroscopic structures. By adapting biological principles, materials scientists are attempting to produce novel materials. To date, neither the elegance of the biomineral assembly mechanisms nor the intricate composite microarchitectures have been duplicated by nonbiological processing. However, substantial progress has been made in the understanding of how biomineralization occurs, and the first steps are now being taken to exploit the basic principles involved.

  5. Mars sampling strategy and aeolian processes

    NASA Technical Reports Server (NTRS)

    Greeley, Ronald

    1988-01-01

    It is critical that the geological context of planetary samples (both in situ analyses and return samples) be well known and documented. Apollo experience showed that this goal is often difficult to achieve even for a planet on which surficial processes are relatively restricted. On Mars, the variety of present and past surface processes is much greater than on the Moon and establishing the geological context of samples will be much more difficult. In addition to impact hardening, Mars has been modified by running water, periglacial activity, wind, and other processes, all of which have the potential for profoundly affecting the geological integrity of potential samples. Aeolian, or wind, processes are ubiquitous on Mars. In the absence of liquid water on the surface, aeolian activity dominates the present surface as documented by frequent dust storms (both local and global), landforms such as dunes, and variable features, i.e., albedo patterns which change their size, shape, and position with time in response to the wind.

  6. Architecture and design of a 500-MHz gallium-arsenide processing element for a parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Fouts, Douglas J.; Butner, Steven E.

    1991-01-01

    The design of the processing element of GASP, a GaAs supercomputer with a 500-MHz instruction issue rate and 1-GHz subsystem clocks, is presented. The novel, functionally modular, block data flow architecture of GASP is described. The architecture and design of a GASP processing element is then presented. The processing element (PE) is implemented in a hybrid semiconductor module with 152 custom GaAs ICs of eight different types. The effects of the implementation technology on both the system-level architecture and the PE design are discussed. SPICE simulations indicate that parts of the PE are capable of being clocked at 1 GHz, while the rest of the PE uses a 500-MHz clock. The architecture utilizes data flow techniques at a program block level, which allows efficient execution of parallel programs while maintaining reasonably good performance on sequential programs. A simulation study of the architecture indicates that an instruction execution rate of over 30,000 MIPS can be attained with 65 PEs.

  7. Architecture and design of a 500-MHz gallium-arsenide processing element for a parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Fouts, Douglas J.; Butner, Steven E.

    1991-01-01

    The design of the processing element of GASP, a GaAs supercomputer with a 500-MHz instruction issue rate and 1-GHz subsystem clocks, is presented. The novel, functionally modular, block data flow architecture of GASP is described. The architecture and design of a GASP processing element is then presented. The processing element (PE) is implemented in a hybrid semiconductor module with 152 custom GaAs ICs of eight different types. The effects of the implementation technology on both the system-level architecture and the PE design are discussed. SPICE simulations indicate that parts of the PE are capable of being clocked at 1 GHz, while the rest of the PE uses a 500-MHz clock. The architecture utilizes data flow techniques at a program block level, which allows efficient execution of parallel programs while maintaining reasonably good performance on sequential programs. A simulation study of the architecture indicates that an instruction execution rate of over 30,000 MIPS can be attained with 65 PEs.

  8. Comparison of microbial community shifts in two parallel multi-step drinking water treatment processes.

    PubMed

    Xu, Jiajiong; Tang, Wei; Ma, Jun; Wang, Hong

    2017-04-11

    Drinking water treatment processes remove undesirable chemicals and microorganisms from source water, which is vital to public health protection. The purpose of this study was to investigate the effects of treatment processes and configuration on the microbiome by comparing microbial community shifts in two series of different treatment processes operated in parallel within a full-scale drinking water treatment plant (DWTP) in Southeast China. Illumina sequencing of 16S rRNA genes of water samples demonstrated little effect of coagulation/sedimentation and pre-oxidation steps on bacterial communities, in contrast to dramatic and concurrent microbial community shifts during ozonation, granular activated carbon treatment, sand filtration, and disinfection for both series. A large number of unique operational taxonomic units (OTUs) at these four treatment steps further illustrated their strong shaping power towards the drinking water microbial communities. Interestingly, multidimensional scaling analysis revealed tight clustering of biofilm samples collected from different treatment steps, with Nitrospira, the nitrite-oxidizing bacteria, noted at higher relative abundances in biofilm compared to water samples. Overall, this study provides a snapshot of step-to-step microbial evolvement in multi-step drinking water treatment systems, and the results provide insight to control and manipulation of the drinking water microbiome via optimization of DWTP design and operation.

  9. Membrane Transport Processes Analyzed by a Highly Parallel Nanopore Chip System at Single Protein Resolution.

    PubMed

    Urban, Michael; Vor der Brüggen, Marc; Tampé, Robert

    2016-08-16

    Membrane protein transport on the single protein level still evades detailed analysis, if the substrate translocated is non-electrogenic. Considerable efforts have been made in this field, but techniques enabling automated high-throughput transport analysis in combination with solvent-free lipid bilayer techniques required for the analysis of membrane transporters are rare. This class of transporters however is crucial in cell homeostasis and therefore a key target in drug development and methodologies to gain new insights desperately needed. The here presented manuscript describes the establishment and handling of a novel biochip for the analysis of membrane protein mediated transport processes at single transporter resolution. The biochip is composed of microcavities enclosed by nanopores that is highly parallel in its design and can be produced in industrial grade and quantity. Protein-harboring liposomes can directly be applied to the chip surface forming self-assembled pore-spanning lipid bilayers using SSM-techniques (solid supported lipid membranes). Pore-spanning parts of the membrane are freestanding, providing the interface for substrate translocation into or out of the cavity space, which can be followed by multi-spectral fluorescent readout in real-time. The establishment of standard operating procedures (SOPs) allows the straightforward establishment of protein-harboring lipid bilayers on the chip surface of virtually every membrane protein that can be reconstituted functionally. The sole prerequisite is the establishment of a fluorescent read-out system for non-electrogenic transport substrates. High-content screening applications are accomplishable by the use of automated inverted fluorescent microscopes recording multiple chips in parallel. Large data sets can be analyzed using the freely available custom-designed analysis software. Three-color multi spectral fluorescent read-out furthermore allows for unbiased data discrimination into different

  10. SIAM Conference on Parallel Processing for Scientific Computing - March 12-14, 2008

    SciTech Connect

    Kolata, William G.

    2008-09-08

    The themes of the 2008 conference included, but were not limited to: Programming languages, models, and compilation techniques; The transition to ubiquitous multicore/manycore processors; Scientific computing on special-purpose processors (Cell, GPUs, etc.); Architecture-aware algorithms; From scalable algorithms to scalable software; Tools for software development and performance evaluation; Global perspectives on HPC; Parallel computing in industry; Distributed/grid computing; Fault tolerance; Parallel visualization and large scale data management; and The future of parallel architectures.

  11. Computational Investigation of Synchronized Multibeam Strategies for the Selective Laser Melting Process

    NASA Astrophysics Data System (ADS)

    Heeling, Thorsten; Wegener, Konrad

    The selective laser melting process features a nearly incomparable freedom of design. But its potential is still limited due to remaining porosity, cracking, distortion, low build-up rates and a limited range of materials. While there is some progress in process control and multiple parallel scan fields to tackle these issues, the potential of synchronized multibeam strategies has not yet been investigated. The presented synchronized multibeam approach is characterized by two widely overlapping scan fields fed by two independent laser sources that can be controlled to work in a synchronized manner with or without a defined offset. This allows a selective manipulation of the local temperature field and thus of melt pool dynamics, the temperature gradients and cooling rates, which are all influencing the processes' porosity, cracking and distortion behavior. Therefore the influences of these strategies on the melt pool dimensions and dynamics as well as the temperature gradients are investigated in this work.

  12. A Latent Trait Model for Differential Strategies in Cognitive Processes.

    ERIC Educational Resources Information Center

    Samejima, Fumiko

    Some cognitive psychologists, who have tried to approach psychometric theories, say that the psychometric approach does not provide them with theories and methods with which they can deal with differential strategies. In this paper, a general latent trait model for differential strategies in cognitive processes is proposed which includes three…

  13. The Effects of Cultural Schemata on Reading Processing Strategies.

    ERIC Educational Resources Information Center

    Pritchard, Robert

    A study examined the relationship between cultural schemata and the reading process to identify the strategies proficient readers employ to develop their understanding of culturally familiar and unfamiliar passages and to examine those strategies in relation to the cultural backgrounds of the readers and the cultural perspectives of the reading…

  14. QUESTIONS OF DYNAMIC OPTIMIZATION OF THE INFORMATION PROCESS IN A DIGITAL COMPUTER (ETsVM) WITH HIGHLY DEVELOPED PARALLELISM,

    DTIC Science & Technology

    The information process is defined as all the interconnected activity of a digital computer when information and questions are fed to its input. To...digital computer in which the information process occurs has a high degree of parallelism. The most important characteristics of such a computer are

  15. Early and parallel processing of pragmatic and semantic information in speech acts: neurophysiological evidence.

    PubMed

    Egorova, Natalia; Shtyrov, Yury; Pulvermüller, Friedemann

    2013-01-01

    Although language is a tool for communication, most research in the neuroscience of language has focused on studying words and sentences, while little is known about the brain mechanisms of speech acts, or communicative functions, for which words and sentences are used as tools. Here the neural processing of two types of speech acts, Naming and Requesting, was addressed using the time-resolved event-related potential (ERP) technique. The brain responses for Naming and Request diverged as early as ~120 ms after the onset of the critical words, at the same time as, or even before, the earliest brain manifestations of semantic word properties could be detected. Request-evoked potentials were generally larger in amplitude than those for Naming. The use of identical words in closely matched settings for both speech acts rules out explanation of the difference in terms of phonological, lexical, semantic properties, or word expectancy. The cortical sources underlying the ERP enhancement for Requests were found in the fronto-central cortex, consistent with the activation of action knowledge, as well as in the right temporo-parietal junction (TPJ), possibly reflecting additional implications of speech acts for social interaction and theory of mind. These results provide the first evidence for surprisingly early access to pragmatic and social interactive knowledge, which possibly occurs in parallel with other types of linguistic processing, and thus supports the near-simultaneous access to different subtypes of psycholinguistic information.

  16. A model of saccade generation based on parallel processing and competitive inhibition.

    PubMed

    Findlay, J M; Walker, R

    1999-08-01

    During active vision, the eyes continually scan the visual environment using saccadic scanning movements. This target article presents an information processing model for the control of these movements, with some close parallels to established physiological processes in the oculomotor system. Two separate pathways are concerned with the spatial and the temporal programming of the movement. In the temporal pathway there is spatially distributed coding and the saccade target is selected from a "salience map." Both pathways descend through a hierarchy of levels, the lower ones operating automatically. Visual onsets have automatic access to the eye control system via the lower levels. Various centres in each pathway are interconnected via reciprocal inhibition. The model accounts for a number of well-established phenomena in target-elicited saccades: the gap effect, express saccades, the remote distractor effect, and the global effect. High-level control of the pathways in tasks such as visual search and reading is discussed; it operates through spatial selection and search selection, which generally combine in an automated way. The model is examined in relation to data from patients with unilateral neglect.

  17. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers.

    PubMed

    Thompson, Elaine C; Woodruff Carr, Kali; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2017-02-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3-5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ∼12 months), we followed a cohort of 59 preschoolers, ages 3.0-4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. Evidence for a parallel input serial analysis model of word processing.

    PubMed

    Allen, P A; Madden, D J

    1990-02-01

    A parallel input serial analysis (PISA) model of word processing was developed and tested. The goal was to expand on the "critical processing duration" hypothesis of Johnson, Allen, and Strand (1989) so that both single-word and multiple-word presentation, letter detection data could be explained. In Experiments 1-3 four different word frequency categories on a single-presentation, letter detection task were used. These three experiments indicated that there was a curvilinear relationship between word frequency and letter detection reaction time (RT). That is, letter detection RTs for medium-high-frequency words were significantly longer than letter detection RTs for very-high-, low-, and very-low-frequency words. These results support the PISA model rather than the Healy, Oliver, and McNamara (1987) version of the unitization model. In Experiments 4-5 multiple-presentation (i.e., two words), letter detection tasks were used. The PISA model could also account for the results from these two experiments, but the unitization model could not.

  19. Early and parallel processing of pragmatic and semantic information in speech acts: neurophysiological evidence

    PubMed Central

    Egorova, Natalia; Shtyrov, Yury; Pulvermüller, Friedemann

    2013-01-01

    Although language is a tool for communication, most research in the neuroscience of language has focused on studying words and sentences, while little is known about the brain mechanisms of speech acts, or communicative functions, for which words and sentences are used as tools. Here the neural processing of two types of speech acts, Naming and Requesting, was addressed using the time-resolved event-related potential (ERP) technique. The brain responses for Naming and Request diverged as early as ~120 ms after the onset of the critical words, at the same time as, or even before, the earliest brain manifestations of semantic word properties could be detected. Request-evoked potentials were generally larger in amplitude than those for Naming. The use of identical words in closely matched settings for both speech acts rules out explanation of the difference in terms of phonological, lexical, semantic properties, or word expectancy. The cortical sources underlying the ERP enhancement for Requests were found in the fronto-central cortex, consistent with the activation of action knowledge, as well as in the right temporo-parietal junction (TPJ), possibly reflecting additional implications of speech acts for social interaction and theory of mind. These results provide the first evidence for surprisingly early access to pragmatic and social interactive knowledge, which possibly occurs in parallel with other types of linguistic processing, and thus supports the near-simultaneous access to different subtypes of psycholinguistic information. PMID:23543248

  20. Examining Mechanisms Underlying Fear-Control in the Extended Parallel Process Model.

    PubMed

    Quick, Brian L; LaVoie, Nicole R; Reynolds-Tylus, Tobias; Martinez-Gonzalez, Andrea; Skurka, Chris

    2017-01-17

    This investigation sought to advance the extended parallel process model in important ways by testing associations among the strengths of efficacy and threat appeals with fear as well as two outcomes of fear-control processing, psychological reactance and message minimization. Within the context of print ads admonishing against noise-induced hearing loss (NIHL) and the fictitious Trepidosis virus, partial support was found for the additive model with no support for the multiplicative model. High efficacy appeals mitigated freedom threat perceptions across both contexts. Fear was positively associated with both freedom threat perceptions within the NIHL context and favorable attitudes for both NIHL and Trepidosis virus contexts. In line with psychological reactance theory, a freedom threat was positively associated with psychological reactance. Reactance, in turn, was positively associated with message minimization. The models supported reactance preceding message minimization across both message contexts. Both the theoretical and practical implications are discussed with an emphasis on future research opportunities within the fear-appeal literature.