Sample records for parallel processing framework

  1. Toward a Model Framework of Generalized Parallel Componential Processing of Multi-Symbol Numbers

    ERIC Educational Resources Information Center

    Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

    2015-01-01

    In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining…

  2. ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

    PubMed

    Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

    2018-04-27

    A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.

  3. Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

    PubMed Central

    Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

    2014-01-01

    Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868

  4. Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

    PubMed

    Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

    2014-07-01

    Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.

  5. A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data

    NASA Astrophysics Data System (ADS)

    Li, Z.; Hodgson, M.; Li, W.

    2016-12-01

    Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.

  6. Knowledge representation into Ada parallel processing

    NASA Technical Reports Server (NTRS)

    Masotto, Tom; Babikyan, Carol; Harper, Richard

    1990-01-01

    The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.

  7. Mathematical Abstraction: Constructing Concept of Parallel Coordinates

    NASA Astrophysics Data System (ADS)

    Nurhasanah, F.; Kusumah, Y. S.; Sabandar, J.; Suryadi, D.

    2017-09-01

    Mathematical abstraction is an important process in teaching and learning mathematics so pre-service mathematics teachers need to understand and experience this process. One of the theoretical-methodological frameworks for studying this process is Abstraction in Context (AiC). Based on this framework, abstraction process comprises of observable epistemic actions, Recognition, Building-With, Construction, and Consolidation called as RBC + C model. This study investigates and analyzes how pre-service mathematics teachers constructed and consolidated concept of Parallel Coordinates in a group discussion. It uses AiC framework for analyzing mathematical abstraction of a group of pre-service teachers consisted of four students in learning Parallel Coordinates concepts. The data were collected through video recording, students’ worksheet, test, and field notes. The result shows that the students’ prior knowledge related to concept of the Cartesian coordinate has significant role in the process of constructing Parallel Coordinates concept as a new knowledge. The consolidation process is influenced by the social interaction between group members. The abstraction process taken place in this group were dominated by empirical abstraction that emphasizes on the aspect of identifying characteristic of manipulated or imagined object during the process of recognizing and building-with.

  8. New Parallel computing framework for radiation transport codes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kostin, M.A.; /Michigan State U., NSCL; Mokhov, N.V.

    A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility canmore » be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less

  9. Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji

    A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA andmore » MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less

  10. On the Optimality of Serial and Parallel Processing in the Psychological Refractory Period Paradigm: Effects of the Distribution of Stimulus Onset Asynchronies

    ERIC Educational Resources Information Center

    Miller, Jeff; Ulrich, Rolf; Rolke, Bettina

    2009-01-01

    Within the context of the psychological refractory period (PRP) paradigm, we developed a general theoretical framework for deciding when it is more efficient to process two tasks in serial and when it is more efficient to process them in parallel. This analysis suggests that a serial mode is more efficient than a parallel mode under a wide variety…

  11. ng: What next-generation languages can teach us about HENP frameworks in the manycore era

    NASA Astrophysics Data System (ADS)

    Binet, Sébastien

    2011-12-01

    Current High Energy and Nuclear Physics (HENP) frameworks were written before multicore systems became widely deployed. A 'single-thread' execution model naturally emerged from that environment, however, this no longer fits into the processing model on the dawn of the manycore era. Although previous work focused on minimizing the changes to be applied to the LHC frameworks (because of the data taking phase) while still trying to reap the benefits of the parallel-enhanced CPU architectures, this paper explores what new languages could bring to the design of the next-generation frameworks. Parallel programming is still in an intensive phase of R&D and no silver bullet exists despite the 30+ years of literature on the subject. Yet, several parallel programming styles have emerged: actors, message passing, communicating sequential processes, task-based programming, data flow programming, ... to name a few. We present the work of the prototyping of a next-generation framework in new and expressive languages (python and Go) to investigate how code clarity and robustness are affected and what are the downsides of using languages younger than FORTRAN/C/C++.

  12. Distributed parallel computing in stochastic modeling of groundwater systems.

    PubMed

    Dong, Yanhui; Li, Guomin; Xu, Haizhen

    2013-03-01

    Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.

  13. Hadoop neural network for parallel and distributed feature selection.

    PubMed

    Hodge, Victoria J; O'Keefe, Simon; Austin, Jim

    2016-06-01

    In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. Optimization and quantization in gradient symbol systems: a framework for integrating the continuous and the discrete in cognition.

    PubMed

    Smolensky, Paul; Goldrick, Matthew; Mathis, Donald

    2014-08-01

    Mental representations have continuous as well as discrete, combinatorial properties. For example, while predominantly discrete, phonological representations also vary continuously; this is reflected by gradient effects in instrumental studies of speech production. Can an integrated theoretical framework address both aspects of structure? The framework we introduce here, Gradient Symbol Processing, characterizes the emergence of grammatical macrostructure from the Parallel Distributed Processing microstructure (McClelland, Rumelhart, & The PDP Research Group, 1986) of language processing. The mental representations that emerge, Distributed Symbol Systems, have both combinatorial and gradient structure. They are processed through Subsymbolic Optimization-Quantization, in which an optimization process favoring representations that satisfy well-formedness constraints operates in parallel with a distributed quantization process favoring discrete symbolic structures. We apply a particular instantiation of this framework, λ-Diffusion Theory, to phonological production. Simulations of the resulting model suggest that Gradient Symbol Processing offers a way to unify accounts of grammatical competence with both discrete and continuous patterns in language performance. Copyright © 2013 Cognitive Science Society, Inc.

  15. Efficient LIDAR Point Cloud Data Managing and Processing in a Hadoop-Based Distributed Framework

    NASA Astrophysics Data System (ADS)

    Wang, C.; Hu, F.; Sha, D.; Han, X.

    2017-10-01

    Light Detection and Ranging (LiDAR) is one of the most promising technologies in surveying and mapping city management, forestry, object recognition, computer vision engineer and others. However, it is challenging to efficiently storage, query and analyze the high-resolution 3D LiDAR data due to its volume and complexity. In order to improve the productivity of Lidar data processing, this study proposes a Hadoop-based framework to efficiently manage and process LiDAR data in a distributed and parallel manner, which takes advantage of Hadoop's storage and computing ability. At the same time, the Point Cloud Library (PCL), an open-source project for 2D/3D image and point cloud processing, is integrated with HDFS and MapReduce to conduct the Lidar data analysis algorithms provided by PCL in a parallel fashion. The experiment results show that the proposed framework can efficiently manage and process big LiDAR data.

  16. a Hadoop-Based Distributed Framework for Efficient Managing and Processing Big Remote Sensing Images

    NASA Astrophysics Data System (ADS)

    Wang, C.; Hu, F.; Hu, X.; Zhao, S.; Wen, W.; Yang, C.

    2015-07-01

    Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.

  17. A Neurally Plausible Parallel Distributed Processing Model of Event-Related Potential Word Reading Data

    ERIC Educational Resources Information Center

    Laszlo, Sarah; Plaut, David C.

    2012-01-01

    The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between…

  18. War and peace: morphemes and full forms in a noninteractive activation parallel dual-route model.

    PubMed

    Baayen, H; Schreuder, R

    This article introduces a computational tool for modeling the process of morphological segmentation in visual and auditory word recognition in the framework of a parallel dual-route model. Copyright 1999 Academic Press.

  19. Using the Extended Parallel Process Model to Examine Teachers' Likelihood of Intervening in Bullying

    ERIC Educational Resources Information Center

    Duong, Jeffrey; Bradshaw, Catherine P.

    2013-01-01

    Background: Teachers play a critical role in protecting students from harm in schools, but little is known about their attitudes toward addressing problems like bullying. Previous studies have rarely used theoretical frameworks, making it difficult to advance this area of research. Using the Extended Parallel Process Model (EPPM), we examined the…

  20. Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

    PubMed Central

    Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan

    2014-01-01

    Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design. PMID:24757432

  1. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    NASA Astrophysics Data System (ADS)

    Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

    2015-12-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.

  2. Optimizing SIEM Throughput on the Cloud Using Parallelization.

    PubMed

    Alam, Masoom; Ihsan, Asif; Khan, Muazzam A; Javaid, Qaisar; Khan, Abid; Manzoor, Jawad; Akhundzada, Adnan; Khan, Muhammad Khurram; Farooq, Sajid

    2016-01-01

    Processing large amounts of data in real time for identifying security issues pose several performance challenges, especially when hardware infrastructure is limited. Managed Security Service Providers (MSSP), mostly hosting their applications on the Cloud, receive events at a very high rate that varies from a few hundred to a couple of thousand events per second (EPS). It is critical to process this data efficiently, so that attacks could be identified quickly and necessary response could be initiated. This paper evaluates the performance of a security framework OSTROM built on the Esper complex event processing (CEP) engine under a parallel and non-parallel computational framework. We explain three architectures under which Esper can be used to process events. We investigated the effect on throughput, memory and CPU usage in each configuration setting. The results indicate that the performance of the engine is limited by the number of events coming in rather than the queries being processed. The architecture where 1/4th of the total events are submitted to each instance and all the queries are processed by all the units shows best results in terms of throughput, memory and CPU usage.

  3. Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

    Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less

  4. Study on parallel and distributed management of RS data based on spatial database

    NASA Astrophysics Data System (ADS)

    Chen, Yingbiao; Qian, Qinglan; Wu, Hongqiao; Liu, Shijin

    2009-10-01

    With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.

  5. Study on parallel and distributed management of RS data based on spatial data base

    NASA Astrophysics Data System (ADS)

    Chen, Yingbiao; Qian, Qinglan; Liu, Shijin

    2006-12-01

    With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.

  6. Metascalable molecular dynamics simulation of nano-mechano-chemistry

    NASA Astrophysics Data System (ADS)

    Shimojo, F.; Kalia, R. K.; Nakano, A.; Nomura, K.; Vashishta, P.

    2008-07-01

    We have developed a metascalable (or 'design once, scale on new architectures') parallel application-development framework for first-principles based simulations of nano-mechano-chemical processes on emerging petaflops architectures based on spatiotemporal data locality principles. The framework consists of (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms, (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these scalable algorithms onto hardware. The EDC-STEP-HCD framework exposes and expresses maximal concurrency and data locality, thereby achieving parallel efficiency as high as 0.99 for 1.59-billion-atom reactive force field molecular dynamics (MD) and 17.7-million-atom (1.56 trillion electronic degrees of freedom) quantum mechanical (QM) MD in the framework of the density functional theory (DFT) on adaptive multigrids, in addition to 201-billion-atom nonreactive MD, on 196 608 IBM BlueGene/L processors. We have also used the framework for automated execution of adaptive hybrid DFT/MD simulation on a grid of six supercomputers in the US and Japan, in which the number of processors changed dynamically on demand and tasks were migrated according to unexpected faults. The paper presents the application of the framework to the study of nanoenergetic materials: (1) combustion of an Al/Fe2O3 thermite and (2) shock initiation and reactive nanojets at a void in an energetic crystal.

  7. Developing a Hadoop-based Middleware for Handling Multi-dimensional NetCDF

    NASA Astrophysics Data System (ADS)

    Li, Z.; Yang, C. P.; Schnase, J. L.; Duffy, D.; Lee, T. J.

    2014-12-01

    Climate observations and model simulations are collecting and generating vast amounts of climate data, and these data are ever-increasing and being accumulated in a rapid speed. Effectively managing and analyzing these data are essential for climate change studies. Hadoop, a distributed storage and processing framework for large data sets, has attracted increasing attentions in dealing with the Big Data challenge. The maturity of Infrastructure as a Service (IaaS) of cloud computing further accelerates the adoption of Hadoop in solving Big Data problems. However, Hadoop is designed to process unstructured data such as texts, documents and web pages, and cannot effectively handle the scientific data format such as array-based NetCDF files and other binary data format. In this paper, we propose to build a Hadoop-based middleware for transparently handling big NetCDF data by 1) designing a distributed climate data storage mechanism based on POSIX-enabled parallel file system to enable parallel big data processing with MapReduce, as well as support data access by other systems; 2) modifying the Hadoop framework to transparently processing NetCDF data in parallel without sequencing or converting the data into other file formats, or loading them to HDFS; and 3) seamlessly integrating Hadoop, cloud computing and climate data in a highly scalable and fault-tolerance framework.

  8. An Extension of a Parallel-Distributed Processing Framework of Reading Aloud in Japanese: Human Nonword Reading Accuracy Does Not Require a Sequential Mechanism

    ERIC Educational Resources Information Center

    Ikeda, Kenji; Ueno, Taiji; Ito, Yuichi; Kitagami, Shinji; Kawaguchi, Jun

    2017-01-01

    Humans can pronounce a nonword (e.g., rint). Some researchers have interpreted this behavior as requiring a sequential mechanism by which a grapheme-phoneme correspondence rule is applied to each grapheme in turn. However, several parallel-distributed processing (PDP) models in English have simulated human nonword reading accuracy without a…

  9. Framework Programmable Platform for the Advanced Software Development Workstation (FPP/ASDW). Demonstration framework document. Volume 2: Framework process description

    NASA Technical Reports Server (NTRS)

    Mayer, Richard J.; Blinn, Thomas M.; Dewitte, Paula S.; Crump, John W.; Ackley, Keith A.

    1992-01-01

    In the second volume of the Demonstration Framework Document, the graphical representation of the demonstration framework is given. This second document was created to facilitate the reading and comprehension of the demonstration framework. It is designed to be viewed in parallel with Section 4.2 of the first volume to help give a picture of the relationships between the UOB's (Unit of Behavior) of the model. The model is quite large and the design team felt that this form of presentation would make it easier for the reader to get a feel for the processes described in this document. The IDEF3 (Process Description Capture Method) diagrams of the processes of an Information System Development are presented. Volume 1 describes the processes and the agents involved with each process, while this volume graphically shows the precedence relationships among the processes.

  10. Introducing concurrency in the Gaudi data processing framework

    NASA Astrophysics Data System (ADS)

    Clemencic, Marco; Hegner, Benedikt; Mato, Pere; Piparo, Danilo

    2014-06-01

    In the past, the increasing demands for HEP processing resources could be fulfilled by the ever increasing clock-frequencies and by distributing the work to more and more physical machines. Limitations in power consumption of both CPUs and entire data centres are bringing an end to this era of easy scalability. To get the most CPU performance per watt, future hardware will be characterised by less and less memory per processor, as well as thinner, more specialized and more numerous cores per die, and rather heterogeneous resources. To fully exploit the potential of the many cores, HEP data processing frameworks need to allow for parallel execution of reconstruction or simulation algorithms on several events simultaneously. We describe our experience in introducing concurrency related capabilities into Gaudi, a generic data processing software framework, which is currently being used by several HEP experiments, including the ATLAS and LHCb experiments at the LHC. After a description of the concurrent framework and the most relevant design choices driving its development, we describe the behaviour of the framework in a more realistic environment, using a subset of the real LHCb reconstruction workflow, and present our strategy and the used tools to validate the physics outcome of the parallel framework against the results of the present, purely sequential LHCb software. We then summarize the measurement of the code performance of the multithreaded application in terms of memory and CPU usage.

  11. Optimizing SIEM Throughput on the Cloud Using Parallelization

    PubMed Central

    Alam, Masoom; Ihsan, Asif; Javaid, Qaisar; Khan, Abid; Manzoor, Jawad; Akhundzada, Adnan; Khan, M Khurram; Farooq, Sajid

    2016-01-01

    Processing large amounts of data in real time for identifying security issues pose several performance challenges, especially when hardware infrastructure is limited. Managed Security Service Providers (MSSP), mostly hosting their applications on the Cloud, receive events at a very high rate that varies from a few hundred to a couple of thousand events per second (EPS). It is critical to process this data efficiently, so that attacks could be identified quickly and necessary response could be initiated. This paper evaluates the performance of a security framework OSTROM built on the Esper complex event processing (CEP) engine under a parallel and non-parallel computational framework. We explain three architectures under which Esper can be used to process events. We investigated the effect on throughput, memory and CPU usage in each configuration setting. The results indicate that the performance of the engine is limited by the number of events coming in rather than the queries being processed. The architecture where 1/4th of the total events are submitted to each instance and all the queries are processed by all the units shows best results in terms of throughput, memory and CPU usage. PMID:27851762

  12. Parallel task processing of very large datasets

    NASA Astrophysics Data System (ADS)

    Romig, Phillip Richardson, III

    This research concerns the use of distributed computer technologies for the analysis and management of very large datasets. Improvements in sensor technology, an emphasis on global change research, and greater access to data warehouses all are increase the number of non-traditional users of remotely sensed data. We present a framework for distributed solutions to the challenges of datasets which exceed the online storage capacity of individual workstations. This framework, called parallel task processing (PTP), incorporates both the task- and data-level parallelism exemplified by many image processing operations. An implementation based on the principles of PTP, called Tricky, is also presented. Additionally, we describe the challenges and practical issues in modeling the performance of parallel task processing with large datasets. We present a mechanism for estimating the running time of each unit of work within a system and an algorithm that uses these estimates to simulate the execution environment and produce estimated runtimes. Finally, we describe and discuss experimental results which validate the design. Specifically, the system (a) is able to perform computation on datasets which exceed the capacity of any one disk, (b) provides reduction of overall computation time as a result of the task distribution even with the additional cost of data transfer and management, and (c) in the simulation mode accurately predicts the performance of the real execution environment.

  13. Development of a Distributed Parallel Computing Framework to Facilitate Regional/Global Gridded Crop Modeling with Various Scenarios

    NASA Astrophysics Data System (ADS)

    Jang, W.; Engda, T. A.; Neff, J. C.; Herrick, J.

    2017-12-01

    Many crop models are increasingly used to evaluate crop yields at regional and global scales. However, implementation of these models across large areas using fine-scale grids is limited by computational time requirements. In order to facilitate global gridded crop modeling with various scenarios (i.e., different crop, management schedule, fertilizer, and irrigation) using the Environmental Policy Integrated Climate (EPIC) model, we developed a distributed parallel computing framework in Python. Our local desktop with 14 cores (28 threads) was used to test the distributed parallel computing framework in Iringa, Tanzania which has 406,839 grid cells. High-resolution soil data, SoilGrids (250 x 250 m), and climate data, AgMERRA (0.25 x 0.25 deg) were also used as input data for the gridded EPIC model. The framework includes a master file for parallel computing, input database, input data formatters, EPIC model execution, and output analyzers. Through the master file for parallel computing, the user-defined number of threads of CPU divides the EPIC simulation into jobs. Then, Using EPIC input data formatters, the raw database is formatted for EPIC input data and the formatted data moves into EPIC simulation jobs. Then, 28 EPIC jobs run simultaneously and only interesting results files are parsed and moved into output analyzers. We applied various scenarios with seven different slopes and twenty-four fertilizer ranges. Parallelized input generators create different scenarios as a list for distributed parallel computing. After all simulations are completed, parallelized output analyzers are used to analyze all outputs according to the different scenarios. This saves significant computing time and resources, making it possible to conduct gridded modeling at regional to global scales with high-resolution data. For example, serial processing for the Iringa test case would require 113 hours, while using the framework developed in this study requires only approximately 6 hours, a nearly 95% reduction in computing time.

  14. Framework for Parallel Preprocessing of Microarray Data Using Hadoop

    PubMed Central

    2018-01-01

    Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach. PMID:29796018

  15. Efficient particle-in-cell simulation of auroral plasma phenomena using a CUDA enabled graphics processing unit

    NASA Astrophysics Data System (ADS)

    Sewell, Stephen

    This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.

  16. GPU based framework for geospatial analyses

    NASA Astrophysics Data System (ADS)

    Cosmin Sandric, Ionut; Ionita, Cristian; Dardala, Marian; Furtuna, Titus

    2017-04-01

    Parallel processing on multiple CPU cores is already used at large scale in geocomputing, but parallel processing on graphics cards is just at the beginning. Being able to use an simple laptop with a dedicated graphics card for advanced and very fast geocomputation is an advantage that each scientist wants to have. The necessity to have high speed computation in geosciences has increased in the last 10 years, mostly due to the increase in the available datasets. These datasets are becoming more and more detailed and hence they require more space to store and more time to process. Distributed computation on multicore CPU's and GPU's plays an important role by processing one by one small parts from these big datasets. These way of computations allows to speed up the process, because instead of using just one process for each dataset, the user can use all the cores from a CPU or up to hundreds of cores from GPU The framework provide to the end user a standalone tools for morphometry analyses at multiscale level. An important part of the framework is dedicated to uncertainty propagation in geospatial analyses. The uncertainty may come from the data collection or may be induced by the model or may have an infinite sources. These uncertainties plays important roles when a spatial delineation of the phenomena is modelled. Uncertainty propagation is implemented inside the GPU framework using Monte Carlo simulations. The GPU framework with the standalone tools proved to be a reliable tool for modelling complex natural phenomena The framework is based on NVidia Cuda technology and is written in C++ programming language. The code source will be available on github at https://github.com/sandricionut/GeoRsGPU Acknowledgement: GPU framework for geospatial analysis, Young Researchers Grant (ICUB-University of Bucharest) 2016, director Ionut Sandric

  17. Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

    NASA Astrophysics Data System (ADS)

    Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

    2018-01-01

    This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.

  18. Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less

  19. Work stealing for GPU-accelerated parallel programs in a global address space framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less

  20. MOOSE: A PARALLEL COMPUTATIONAL FRAMEWORK FOR COUPLED SYSTEMS OF NONLINEAR EQUATIONS.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    G. Hansen; C. Newman; D. Gaston

    Systems of coupled, nonlinear partial di?erential equations often arise in sim- ulation of nuclear processes. MOOSE: Multiphysics Ob ject Oriented Simulation Environment, a parallel computational framework targeted at solving these systems is presented. As opposed to traditional data / ?ow oriented com- putational frameworks, MOOSE is instead founded on mathematics based on Jacobian-free Newton Krylov (JFNK). Utilizing the mathematical structure present in JFNK, physics are modularized into “Kernels” allowing for rapid production of new simulation tools. In addition, systems are solved fully cou- pled and fully implicit employing physics based preconditioning allowing for a large amount of ?exibility even withmore » large variance in time scales. Background on the mathematics, an inspection of the structure of MOOSE and several rep- resentative solutions from applications built on the framework are presented.« less

  1. MOOSE: A parallel computational framework for coupled systems of nonlinear equations.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Derek Gaston; Chris Newman; Glen Hansen

    Systems of coupled, nonlinear partial differential equations (PDEs) often arise in simulation of nuclear processes. MOOSE: Multiphysics Object Oriented Simulation Environment, a parallel computational framework targeted at the solution of such systems, is presented. As opposed to traditional data-flow oriented computational frameworks, MOOSE is instead founded on the mathematical principle of Jacobian-free Newton-Krylov (JFNK) solution methods. Utilizing the mathematical structure present in JFNK, physics expressions are modularized into `Kernels,'' allowing for rapid production of new simulation tools. In addition, systems are solved implicitly and fully coupled, employing physics based preconditioning, which provides great flexibility even with large variance in timemore » scales. A summary of the mathematics, an overview of the structure of MOOSE, and several representative solutions from applications built on the framework are presented.« less

  2. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moreland, Kenneth; Geveci, Berk

    2014-11-01

    The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less

  3. Multiscale Simulations of Magnetic Island Coalescence

    NASA Technical Reports Server (NTRS)

    Dorelli, John C.

    2010-01-01

    We describe a new interactive parallel Adaptive Mesh Refinement (AMR) framework written in the Python programming language. This new framework, PyAMR, hides the details of parallel AMR data structures and algorithms (e.g., domain decomposition, grid partition, and inter-process communication), allowing the user to focus on the development of algorithms for advancing the solution of a systems of partial differential equations on a single uniform mesh. We demonstrate the use of PyAMR by simulating the pairwise coalescence of magnetic islands using the resistive Hall MHD equations. Techniques for coupling different physics models on different levels of the AMR grid hierarchy are discussed.

  4. Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

    NASA Astrophysics Data System (ADS)

    Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

    2017-07-01

    Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.

  5. What is adaptive about adaptive decision making? A parallel constraint satisfaction account.

    PubMed

    Glöckner, Andreas; Hilbig, Benjamin E; Jekel, Marc

    2014-12-01

    There is broad consensus that human cognition is adaptive. However, the vital question of how exactly this adaptivity is achieved has remained largely open. Herein, we contrast two frameworks which account for adaptive decision making, namely broad and general single-mechanism accounts vs. multi-strategy accounts. We propose and fully specify a single-mechanism model for decision making based on parallel constraint satisfaction processes (PCS-DM) and contrast it theoretically and empirically against a multi-strategy account. To achieve sufficiently sensitive tests, we rely on a multiple-measure methodology including choice, reaction time, and confidence data as well as eye-tracking. Results show that manipulating the environmental structure produces clear adaptive shifts in choice patterns - as both frameworks would predict. However, results on the process level (reaction time, confidence), in information acquisition (eye-tracking), and from cross-predicting choice consistently corroborate single-mechanisms accounts in general, and the proposed parallel constraint satisfaction model for decision making in particular. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. Parallel Distributed Processing at 25: further explorations in the microstructure of cognition.

    PubMed

    Rogers, Timothy T; McClelland, James L

    2014-08-01

    This paper introduces a special issue of Cognitive Science initiated on the 25th anniversary of the publication of Parallel Distributed Processing (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP framework, the key issues the framework has addressed, and the debates the framework has spawned, and presents viewpoints on the current status of these issues. The articles focus on both historical roots and contemporary developments in learning, optimality theory, perception, memory, language, conceptual knowledge, cognitive control, and consciousness. Here we consider the approach more generally, reviewing the original motivations, the resulting framework, and the central tenets of the underlying theory. We then evaluate the impact of PDP both on the field at large and within specific subdomains of cognitive science and consider the current role of PDP models within the broader landscape of contemporary theoretical frameworks in cognitive science. Looking to the future, we consider the implications for cognitive science of the recent success of machine learning systems called "deep networks"-systems that build on key ideas presented in the PDP volumes. Copyright © 2014 Cognitive Science Society, Inc.

  7. NexGen PVAs: Incorporating Eco-Evolutionary Processes into Population Viability Models

    EPA Science Inventory

    We examine how the integration of evolutionary and ecological processes in population dynamics – an emerging framework in ecology – could be incorporated into population viability analysis (PVA). Driven by parallel, complementary advances in population genomics and computational ...

  8. Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

    NASA Astrophysics Data System (ADS)

    Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

    2018-01-01

    We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.

  9. Reply to Comments.

    PubMed

    Sripada, Chandra; Railton, Peter; Baumeister, Roy F; Seligman, Martin E P

    2013-03-01

    Evidence of prospective processes is increasingly common in psychological research, which suggests the fruitfulness of a theoretical framework for mind and brain built around future orientation. No metaphysics of determinism or indeterminism is presupposed by this framework, nor do considerations of scientific method require determinism-successful scientific theories in the natural sciences all involve probabilistic elements. We speculate that expressive behavior and moral decision making use prospective processes parallel to those used in nonmoral decisions. © The Author(s) 2013.

  10. Exploiting parallel computing with limited program changes using a network of microcomputers

    NASA Technical Reports Server (NTRS)

    Rogers, J. L., Jr.; Sobieszczanski-Sobieski, J.

    1985-01-01

    Network computing and multiprocessor computers are two discernible trends in parallel processing. The computational behavior of an iterative distributed process in which some subtasks are completed later than others because of an imbalance in computational requirements is of significant interest. The effects of asynchronus processing was studied. A small existing program was converted to perform finite element analysis by distributing substructure analysis over a network of four Apple IIe microcomputers connected to a shared disk, simulating a parallel computer. The substructure analysis uses an iterative, fully stressed, structural resizing procedure. A framework of beams divided into three substructures is used as the finite element model. The effects of asynchronous processing on the convergence of the design variables are determined by not resizing particular substructures on various iterations.

  11. Fenix, A Fault Tolerant Programming Framework for MPI Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gamel, Marc; Teranihi, Keita; Valenzuela, Eric

    2016-10-05

    Fenix provides APIs to allow the users to add fault tolerance capability to MPI-based parallel programs in a transparent manner. Fenix-enabled programs can run through process failures during program execution using a pool of spare processes accommodated by Fenix.

  12. Expressing Parallelism with ROOT

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  13. Expressing Parallelism with ROOT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Piparo, D.; Tejedor, E.; Guiraud, E.

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less

  14. Quantitative Image Feature Engine (QIFE): an Open-Source, Modular Engine for 3D Quantitative Feature Extraction from Volumetric Medical Images.

    PubMed

    Echegaray, Sebastian; Bakr, Shaimaa; Rubin, Daniel L; Napel, Sandy

    2017-10-06

    The aim of this study was to develop an open-source, modular, locally run or server-based system for 3D radiomics feature computation that can be used on any computer system and included in existing workflows for understanding associations and building predictive models between image features and clinical data, such as survival. The QIFE exploits various levels of parallelization for use on multiprocessor systems. It consists of a managing framework and four stages: input, pre-processing, feature computation, and output. Each stage contains one or more swappable components, allowing run-time customization. We benchmarked the engine using various levels of parallelization on a cohort of CT scans presenting 108 lung tumors. Two versions of the QIFE have been released: (1) the open-source MATLAB code posted to Github, (2) a compiled version loaded in a Docker container, posted to DockerHub, which can be easily deployed on any computer. The QIFE processed 108 objects (tumors) in 2:12 (h/mm) using 1 core, and 1:04 (h/mm) hours using four cores with object-level parallelization. We developed the Quantitative Image Feature Engine (QIFE), an open-source feature-extraction framework that focuses on modularity, standards, parallelism, provenance, and integration. Researchers can easily integrate it with their existing segmentation and imaging workflows by creating input and output components that implement their existing interfaces. Computational efficiency can be improved by parallelizing execution at the cost of memory usage. Different parallelization levels provide different trade-offs, and the optimal setting will depend on the size and composition of the dataset to be processed.

  15. Corral framework: Trustworthy and fully functional data intensive parallel astronomical pipelines

    NASA Astrophysics Data System (ADS)

    Cabral, J. B.; Sánchez, B.; Beroiz, M.; Domínguez, M.; Lares, M.; Gurovich, S.; Granitto, P.

    2017-07-01

    Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral, a Python framework for astronomical pipeline generation. Corral features a Model-View-Controller design pattern on top of an SQL Relational Database capable of handling: custom data models; processing stages; and communication alerts, and also provides automatic quality and structural metrics based on unit testing. The Model-View-Controller provides concept separation between the user logic and the data models, delivering at the same time multi-processing and distributed computing capabilities. Corral represents an improvement over commonly found data processing pipelines in astronomysince the design pattern eases the programmer from dealing with processing flow and parallelization issues, allowing them to focus on the specific algorithms needed for the successive data transformations and at the same time provides a broad measure of quality over the created pipeline. Corral and working examples of pipelines that use it are available to the community at https://github.com/toros-astro.

  16. Barista: A Framework for Concurrent Speech Processing by USC-SAIL

    PubMed Central

    Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G.; Narayanan, Shrikanth S.

    2016-01-01

    We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0. PMID:27610047

  17. Barista: A Framework for Concurrent Speech Processing by USC-SAIL.

    PubMed

    Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G; Narayanan, Shrikanth S

    2014-05-01

    We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.

  18. Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

    NASA Astrophysics Data System (ADS)

    Hayashi, Akihiro; Wada, Yasutaka; Watanabe, Takeshi; Sekiguchi, Takeshi; Mase, Masayoshi; Shirako, Jun; Kimura, Keiji; Kasahara, Hironori

    Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.

  19. The Masterson Approach with play therapy: a parallel process between mother and child.

    PubMed

    Mulherin, M A

    2001-01-01

    This paper discusses a case in which the Masterson Approach was used with play therapy to treat a child with a developing personality disorder. It describes the parallel progression of the child and mother in adjunct therapy throughout a six-year period. The unique value of the Masterson Approach is that it provides the therapist with a framework and tool to diagnose and treat a child during the dynamic process of play. The case describes the mother-child dyad throughout therapy. It traces their parallel processes that involve separation, individuation, rapprochement, and the recovery of real self-capacities. Each stage of treatment is described, including verbal interventions. The child's internal affective state and intrapsychic structure during the various stages of treatment are illustrated by representative pictures.

  20. A unifying framework for rigid multibody dynamics and serial and parallel computational issues

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Jain, Abhinandan

    1989-01-01

    A unifying framework for various formulations of the dynamics of open-chain rigid multibody systems is discussed. Their suitability for serial and parallel processing is assessed. The framework is based on the derivation of intrinsic, i.e., coordinate-free, equations of the algorithms which provides a suitable abstraction and permits a distinction to be made between the computational redundancy in the intrinsic and extrinsic equations. A set of spatial notation is used which allows the derivation of the various algorithms in a common setting and thus clarifies the relationships among them. The three classes of algorithms viz., O(n), O(n exp 2) and O(n exp 3) or the solution of the dynamics problem are investigated. Researchers begin with the derivation of O(n exp 3) algorithms based on the explicit computation of the mass matrix and it provides insight into the underlying basis of the O(n) algorithms. From a computational perspective, the optimal choice of a coordinate frame for the projection of the intrinsic equations is discussed and the serial computational complexity of the different algorithms is evaluated. The three classes of algorithms are also analyzed for suitability for parallel processing. It is shown that the problem belongs to the class of N C and the time and processor bounds are of O(log2/2(n)) and O(n exp 4), respectively. However, the algorithm that achieves the above bounds is not stable. Researchers show that the fastest stable parallel algorithm achieves a computational complexity of O(n) with O(n exp 4), respectively. However, the algorithm that achieves the above bounds is not stable. Researchers show that the fastest stable parallel algorithm achieves a computational complexity of O(n) with O(n exp 2) processors, and results from the parallelization of the O(n exp 3) serial algorithm.

  1. Self-Referent Constructs and Medical Sociology: In Search of an Integrative Framework*

    PubMed Central

    Kaplan, Howard B.

    2010-01-01

    A theoretical framework centering on four classes of self-referent constructs is offered as a device for integrating the diverse areas constituting medical sociology. Guidance by this framework sensitizes the researcher to the occurrence of parallel processes in adjacent disciplines, facilitates recognition of the etiological significance of findings from other disciplines for explaining medical sociological phenomena, and encourages transactions between sociology and medical sociology whereby each informs and is informed by the other. PMID:17583268

  2. Processing Solutions for Big Data in Astronomy

    NASA Astrophysics Data System (ADS)

    Fillatre, L.; Lepiller, D.

    2016-09-01

    This paper gives a simple introduction to processing solutions applied to massive amounts of data. It proposes a general presentation of the Big Data paradigm. The Hadoop framework, which is considered as the pioneering processing solution for Big Data, is described together with YARN, the integrated Hadoop tool for resource allocation. This paper also presents the main tools for the management of both the storage (NoSQL solutions) and computing capacities (MapReduce parallel processing schema) of a cluster of machines. Finally, more recent processing solutions like Spark are discussed. Big Data frameworks are now able to run complex applications while keeping the programming simple and greatly improving the computing speed.

  3. A Flexible Computational Framework Using R and Map-Reduce for Permutation Tests of Massive Genetic Analysis of Complex Traits.

    PubMed

    Mahjani, Behrang; Toor, Salman; Nettelblad, Carl; Holmgren, Sverker

    2017-01-01

    In quantitative trait locus (QTL) mapping significance of putative QTL is often determined using permutation testing. The computational needs to calculate the significance level are immense, 10 4 up to 10 8 or even more permutations can be needed. We have previously introduced the PruneDIRECT algorithm for multiple QTL scan with epistatic interactions. This algorithm has specific strengths for permutation testing. Here, we present a flexible, parallel computing framework for identifying multiple interacting QTL using the PruneDIRECT algorithm which uses the map-reduce model as implemented in Hadoop. The framework is implemented in R, a widely used software tool among geneticists. This enables users to rearrange algorithmic steps to adapt genetic models, search algorithms, and parallelization steps to their needs in a flexible way. Our work underlines the maturity of accessing distributed parallel computing for computationally demanding bioinformatics applications through building workflows within existing scientific environments. We investigate the PruneDIRECT algorithm, comparing its performance to exhaustive search and DIRECT algorithm using our framework on a public cloud resource. We find that PruneDIRECT is vastly superior for permutation testing, and perform 2 ×10 5 permutations for a 2D QTL problem in 15 hours, using 100 cloud processes. We show that our framework scales out almost linearly for a 3D QTL search.

  4. Node Resource Manager: A Distributed Computing Software Framework Used for Solving Geophysical Problems

    NASA Astrophysics Data System (ADS)

    Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.

    2011-12-01

    With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

  5. DataForge: Modular platform for data storage and analysis

    NASA Astrophysics Data System (ADS)

    Nozik, Alexander

    2018-04-01

    DataForge is a framework for automated data acquisition, storage and analysis based on modern achievements of applied programming. The aim of the DataForge is to automate some standard tasks like parallel data processing, logging, output sorting and distributed computing. Also the framework extensively uses declarative programming principles via meta-data concept which allows a certain degree of meta-programming and improves results reproducibility.

  6. Parallel k-means++

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less

  7. Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0

    DOE PAGES

    Huck, Kevin A.; Malony, Allen D.; Shende, Sameer; ...

    2008-01-01

    The integration of scalable performance analysis in parallel development tools is difficult. The potential size of data sets and the need to compare results from multiple experiments presents a challenge to manage and process the information. Simply to characterize the performance of parallel applications running on potentially hundreds of thousands of processor cores requires new scalable analysis techniques. Furthermore, many exploratory analysis processes are repeatable and could be automated, but are now implemented as manual procedures. In this paper, we will discuss the current version of PerfExplorer, a performance analysis framework which provides dimension reduction, clustering and correlation analysis ofmore » individual trails of large dimensions, and can perform relative performance analysis between multiple application executions. PerfExplorer analysis processes can be captured in the form of Python scripts, automating what would otherwise be time-consuming tasks. We will give examples of large-scale analysis results, and discuss the future development of the framework, including the encoding and processing of expert performance rules, and the increasing use of performance metadata.« less

  8. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends

    PubMed Central

    2014-01-01

    The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. PMID:25383096

  9. Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends.

    PubMed

    Mohammed, Emad A; Far, Behrouz H; Naugler, Christopher

    2014-01-01

    The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.

  10. A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nomura, K; Seymour, R; Wang, W

    2009-02-17

    A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less

  11. CMS event processing multi-core efficiency status

    NASA Astrophysics Data System (ADS)

    Jones, C. D.; CMS Collaboration

    2017-10-01

    In 2015, CMS was the first LHC experiment to begin using a multi-threaded framework for doing event processing. This new framework utilizes Intel’s Thread Building Block library to manage concurrency via a task based processing model. During the 2015 LHC run period, CMS only ran reconstruction jobs using multiple threads because only those jobs were sufficiently thread efficient. Recent work now allows simulation and digitization to be thread efficient. In addition, during 2015 the multi-threaded framework could run events in parallel but could only use one thread per event. Work done in 2016 now allows multiple threads to be used while processing one event. In this presentation we will show how these recent changes have improved CMS’s overall threading and memory efficiency and we will discuss work to be done to further increase those efficiencies.

  12. Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

    Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less

  13. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

    PubMed Central

    2011-01-01

    Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538

  14. A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.

    PubMed

    Cieślik, Marcin; Mura, Cameron

    2011-02-25

    Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.

  15. Designing for Peta-Scale in the LSST Database

    NASA Astrophysics Data System (ADS)

    Kantor, J.; Axelrod, T.; Becla, J.; Cook, K.; Nikolaev, S.; Gray, J.; Plante, R.; Nieto-Santisteban, M.; Szalay, A.; Thakar, A.

    2007-10-01

    The Large Synoptic Survey Telescope (LSST), a proposed ground-based 8.4 m telescope with a 10 deg^2 field of view, will generate 15 TB of raw images every observing night. When calibration and processed data are added, the image archive, catalogs, and meta-data will grow 15 PB yr^{-1} on average. The LSST Data Management System (DMS) must capture, process, store, index, replicate, and provide open access to this data. Alerts must be triggered within 30 s of data acquisition. To do this in real-time at these data volumes will require advances in data management, database, and file system techniques. This paper describes the design of the LSST DMS and emphasizes features for peta-scale data. The LSST DMS will employ a combination of distributed database and file systems, with schema, partitioning, and indexing oriented for parallel operations. Image files are stored in a distributed file system with references to, and meta-data from, each file stored in the databases. The schema design supports pipeline processing, rapid ingest, and efficient query. Vertical partitioning reduces disk input/output requirements, horizontal partitioning allows parallel data access using arrays of servers and disks. Indexing is extensive, utilizing both conventional RAM-resident indexes and column-narrow, row-deep tag tables/covering indices that are extracted from tables that contain many more attributes. The DMS Data Access Framework is encapsulated in a middleware framework to provide a uniform service interface to all framework capabilities. This framework will provide the automated work-flow, replication, and data analysis capabilities necessary to make data processing and data quality analysis feasible at this scale.

  16. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  17. A Cyber-ITS Framework for Massive Traffic Data Analysis Using Cyber Infrastructure

    PubMed Central

    Fontaine, Michael D.

    2013-01-01

    Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing. PMID:23766690

  18. A Cyber-ITS framework for massive traffic data analysis using cyber infrastructure.

    PubMed

    Xia, Yingjie; Hu, Jia; Fontaine, Michael D

    2013-01-01

    Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing.

  19. Enabling Big Geoscience Data Analytics with a Cloud-Based, MapReduce-Enabled and Service-Oriented Workflow Framework

    PubMed Central

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012

  20. Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework.

    PubMed

    Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew

    2015-01-01

    Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.

  1. A Parallel Framework with Block Matrices of a Discrete Fourier Transform for Vector-Valued Discrete-Time Signals.

    PubMed

    Soto-Quiros, Pablo

    2015-01-01

    This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT): the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.

  2. Three-dimensional photoacoustic tomography based on graphics-processing-unit-accelerated finite element method.

    PubMed

    Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying

    2013-12-01

    Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.

  3. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    NASA Technical Reports Server (NTRS)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  4. PyPele Rewritten To Use MPI

    NASA Technical Reports Server (NTRS)

    Hockney, George; Lee, Seungwon

    2008-01-01

    A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs.

  5. Composing Data Parallel Code for a SPARQL Graph Engine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste

    Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less

  6. Approximate kernel competitive learning.

    PubMed

    Wu, Jian-Sheng; Zheng, Wei-Shi; Lai, Jian-Huang

    2015-03-01

    Kernel competitive learning has been successfully used to achieve robust clustering. However, kernel competitive learning (KCL) is not scalable for large scale data processing, because (1) it has to calculate and store the full kernel matrix that is too large to be calculated and kept in the memory and (2) it cannot be computed in parallel. In this paper we develop a framework of approximate kernel competitive learning for processing large scale dataset. The proposed framework consists of two parts. First, it derives an approximate kernel competitive learning (AKCL), which learns kernel competitive learning in a subspace via sampling. We provide solid theoretical analysis on why the proposed approximation modelling would work for kernel competitive learning, and furthermore, we show that the computational complexity of AKCL is largely reduced. Second, we propose a pseudo-parallelled approximate kernel competitive learning (PAKCL) based on a set-based kernel competitive learning strategy, which overcomes the obstacle of using parallel programming in kernel competitive learning and significantly accelerates the approximate kernel competitive learning for large scale clustering. The empirical evaluation on publicly available datasets shows that the proposed AKCL and PAKCL can perform comparably as KCL, with a large reduction on computational cost. Also, the proposed methods achieve more effective clustering performance in terms of clustering precision against related approximate clustering approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

    In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO, combined with a highly optimized finegrain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intelmore » Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.« less

  8. Parallel Domain Decomposition Formulation and Software for Large-Scale Sparse Symmetrical/Unsymmetrical Aeroacoustic Applications

    NASA Technical Reports Server (NTRS)

    Nguyen, D. T.; Watson, Willie R. (Technical Monitor)

    2005-01-01

    The overall objectives of this research work are to formulate and validate efficient parallel algorithms, and to efficiently design/implement computer software for solving large-scale acoustic problems, arised from the unified frameworks of the finite element procedures. The adopted parallel Finite Element (FE) Domain Decomposition (DD) procedures should fully take advantages of multiple processing capabilities offered by most modern high performance computing platforms for efficient parallel computation. To achieve this objective. the formulation needs to integrate efficient sparse (and dense) assembly techniques, hybrid (or mixed) direct and iterative equation solvers, proper pre-conditioned strategies, unrolling strategies, and effective processors' communicating schemes. Finally, the numerical performance of the developed parallel finite element procedures will be evaluated by solving series of structural, and acoustic (symmetrical and un-symmetrical) problems (in different computing platforms). Comparisons with existing "commercialized" and/or "public domain" software are also included, whenever possible.

  9. pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.

    PubMed

    Halic, Tansel; Ahn, Woojin; De, Suvranu

    2014-01-01

    This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.

  10. Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu

    2012-10-01

    We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less

  11. Harmony Theory: A Mathematical Framework for Stochastic Parallel Processing.

    ERIC Educational Resources Information Center

    Smolensky, Paul

    This paper presents preliminary results of research founded on the hypothesis that in real environments there exist regularities that can be idealized as mathematical structures that are simple enough to be analyzed. The author considered three steps in analyzing the encoding of modularity of the environment. First, a general information…

  12. CLAss-Specific Subspace Kernel Representations and Adaptive Margin Slack Minimization for Large Scale Classification.

    PubMed

    Yu, Yinan; Diamantaras, Konstantinos I; McKelvey, Tomas; Kung, Sun-Yuan

    2018-02-01

    In kernel-based classification models, given limited computational power and storage capacity, operations over the full kernel matrix becomes prohibitive. In this paper, we propose a new supervised learning framework using kernel models for sequential data processing. The framework is based on two components that both aim at enhancing the classification capability with a subset selection scheme. The first part is a subspace projection technique in the reproducing kernel Hilbert space using a CLAss-specific Subspace Kernel representation for kernel approximation. In the second part, we propose a novel structural risk minimization algorithm called the adaptive margin slack minimization to iteratively improve the classification accuracy by an adaptive data selection. We motivate each part separately, and then integrate them into learning frameworks for large scale data. We propose two such frameworks: the memory efficient sequential processing for sequential data processing and the parallelized sequential processing for distributed computing with sequential data acquisition. We test our methods on several benchmark data sets and compared with the state-of-the-art techniques to verify the validity of the proposed techniques.

  13. Real-time SHVC software decoding with multi-threaded parallel processing

    NASA Astrophysics Data System (ADS)

    Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

    2014-09-01

    This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.

  14. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Kwan-Liu

    Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations willmore » prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale computer. This project concludes with a functional prototype containing pervasively parallel algorithms that perform demonstratively well on many-core processors. These algorithms are fundamental for performing data analysis and visualization at extreme scale.« less

  15. An efficient route to bispecific antibody production using single-reactor mammalian co-culture

    PubMed Central

    Shatz, Whitney; Ng, Domingos; Dutina, George; Wong, Athena W.; Sonoda, Junichiro; Scheer, Justin M.

    2016-01-01

    ABSTRACT Bispecific antibodies have shown promise in the clinic as medicines with novel mechanisms of action. Lack of efficient production of bispecific IgGs, however, has limited their rapid advancement. Here, we describe a single-reactor process using mammalian cell co-culture production to efficiently produce a bispecific IgG with 4 distinct polypeptide chains without the need for parallel processing of each half-antibody or additional framework mutations. This method resembles a conventional process, and the quality and yield of the monoclonal antibodies are equal to those produced using parallel processing methods. We demonstrate the application of the approach to diverse bispecific antibodies, and its suitability for production of a tissue specific molecule targeting fibroblast growth factor receptor 1 and klotho β that is being developed for type 2 diabetes and other obesity-linked disorders. PMID:27680183

  16. GPU-based ultra-fast dose calculation using a finite size pencil beam model.

    PubMed

    Gu, Xuejun; Choi, Dongju; Men, Chunhua; Pan, Hubert; Majumdar, Amitava; Jiang, Steve B

    2009-10-21

    Online adaptive radiation therapy (ART) is an attractive concept that promises the ability to deliver an optimal treatment in response to the inter-fraction variability in patient anatomy. However, it has yet to be realized due to technical limitations. Fast dose deposit coefficient calculation is a critical component of the online planning process that is required for plan optimization of intensity-modulated radiation therapy (IMRT). Computer graphics processing units (GPUs) are well suited to provide the requisite fast performance for the data-parallel nature of dose calculation. In this work, we develop a dose calculation engine based on a finite-size pencil beam (FSPB) algorithm and a GPU parallel computing framework. The developed framework can accommodate any FSPB model. We test our implementation in the case of a water phantom and the case of a prostate cancer patient with varying beamlet and voxel sizes. All testing scenarios achieved speedup ranging from 200 to 400 times when using a NVIDIA Tesla C1060 card in comparison with a 2.27 GHz Intel Xeon CPU. The computational time for calculating dose deposition coefficients for a nine-field prostate IMRT plan with this new framework is less than 1 s. This indicates that the GPU-based FSPB algorithm is well suited for online re-planning for adaptive radiotherapy.

  17. A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations

    NASA Technical Reports Server (NTRS)

    Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos

    2009-01-01

    A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.

  18. An Advanced Simulation Framework for Parallel Discrete-Event Simulation

    NASA Technical Reports Server (NTRS)

    Li, P. P.; Tyrrell, R. Yeung D.; Adhami, N.; Li, T.; Henry, H.

    1994-01-01

    Discrete-event simulation (DEVS) users have long been faced with a three-way trade-off of balancing execution time, model fidelity, and number of objects simulated. Because of the limits of computer processing power the analyst is often forced to settle for less than desired performances in one or more of these areas.

  19. Simulation framework for intelligent transportation systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ewing, T.; Doss, E.; Hanebutte, U.

    1996-10-01

    A simulation framework has been developed for a large-scale, comprehensive, scaleable simulation of an Intelligent Transportation System (ITS). The simulator is designed for running on parallel computers and distributed (networked) computer systems, but can run on standalone workstations for smaller simulations. The simulator currently models instrumented smart vehicles with in-vehicle navigation units capable of optimal route planning and Traffic Management Centers (TMC). The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide two-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphicalmore » user interfaces to support human-factors studies. Realistic modeling of variations of the posted driving speed are based on human factors studies that take into consideration weather, road conditions, driver personality and behavior, and vehicle type. The prototype has been developed on a distributed system of networked UNIX computers but is designed to run on parallel computers, such as ANL`s IBM SP-2, for large-scale problems. A novel feature of the approach is that vehicles are represented by autonomous computer processes which exchange messages with other processes. The vehicles have a behavior model which governs route selection and driving behavior, and can react to external traffic events much like real vehicles. With this approach, the simulation is scaleable to take advantage of emerging massively parallel processor (MPP) systems.« less

  20. Neuromorphic Hardware Architecture Using the Neural Engineering Framework for Pattern Recognition.

    PubMed

    Wang, Runchun; Thakur, Chetan Singh; Cohen, Gregory; Hamilton, Tara Julia; Tapson, Jonathan; van Schaik, Andre

    2017-06-01

    We present a hardware architecture that uses the neural engineering framework (NEF) to implement large-scale neural networks on field programmable gate arrays (FPGAs) for performing massively parallel real-time pattern recognition. NEF is a framework that is capable of synthesising large-scale cognitive systems from subnetworks and we have previously presented an FPGA implementation of the NEF that successfully performs nonlinear mathematical computations. That work was developed based on a compact digital neural core, which consists of 64 neurons that are instantiated by a single physical neuron using a time-multiplexing approach. We have now scaled this approach up to build a pattern recognition system by combining identical neural cores together. As a proof of concept, we have developed a handwritten digit recognition system using the MNIST database and achieved a recognition rate of 96.55%. The system is implemented on a state-of-the-art FPGA and can process 5.12 million digits per second. The architecture and hardware optimisations presented offer high-speed and resource-efficient means for performing high-speed, neuromorphic, and massively parallel pattern recognition and classification tasks.

  1. Chromium: A Stress-Processing Framework for Interactive Rendering on Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Humphreys, G,; Houston, M.; Ng, Y.-R.

    2002-01-11

    We describe Chromium, a system for manipulating streams of graphics API commands on clusters of workstations. Chromium's stream filters can be arranged to create sort-first and sort-last parallel graphics architectures that, in many cases, support the same applications while using only commodity graphics accelerators. In addition, these stream filters can be extended programmatically, allowing the user to customize the stream transformations performed by nodes in a cluster. Because our stream processing mechanism is completely general, any cluster-parallel rendering algorithm can be either implemented on top of or embedded in Chromium. In this paper, we give examples of real-world applications thatmore » use Chromium to achieve good scalability on clusters of workstations, and describe other potential uses of this stream processing technology. By completely abstracting the underlying graphics architecture, network topology, and API command processing semantics, we allow a variety of applications to run in different environments.« less

  2. CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

    PubMed

    Jiang, Hanyu; Ganesan, Narayan

    2016-02-27

    HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors. A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance. CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other acceleration attempts, comprehensive evaluations against high-end CPUs (Intel i5, i7 and Xeon) shows that CUDAMPF yields upto 440 GCUPS for SSV, 277 GCUPS for MSV and 14.3 GCUPS for P7Viterbi all with 100 % accuracy, which translates to a maximum speedup of 37.5, 23.1 and 11.6-fold for MSV, SSV and P7Viterbi respectively. The source code is available at https://github.com/Super-Hippo/CUDAMPF.

  3. Status of the calibration and alignment framework at the Belle II experiment

    NASA Astrophysics Data System (ADS)

    Dossett, D.; Sevior, M.; Ritter, M.; Kuhr, T.; Bilka, T.; Yaschenko, S.; Belle Software Group, II

    2017-10-01

    The Belle II detector at the Super KEKB e+e-collider plans to take first collision data in 2018. The monetary and CPU time costs associated with storing and processing the data mean that it is crucial for the detector components at Belle II to be calibrated quickly and accurately. A fast and accurate calibration system would allow the high level trigger to increase the efficiency of event selection, and can give users analysis-quality reconstruction promptly. A flexible framework to automate the fast production of calibration constants is being developed in the Belle II Analysis Software Framework (basf2). Detector experts only need to create two components from C++ base classes in order to use the automation system. The first collects data from Belle II event data files and outputs much smaller files to pass to the second component. This runs the main calibration algorithm to produce calibration constants ready for upload into the conditions database. A Python framework coordinates the input files, order of processing, and submission of jobs. Splitting the operation into collection and algorithm processing stages allows the framework to optionally parallelize the collection stage on a batch system.

  4. Quantifying the effect of hydrogen on dislocation dynamics: A three-dimensional discrete dislocation dynamics framework

    NASA Astrophysics Data System (ADS)

    Gu, Yejun; El-Awady, Jaafar A.

    2018-03-01

    We present a new framework to quantify the effect of hydrogen on dislocations using large scale three-dimensional (3D) discrete dislocation dynamics (DDD) simulations. In this model, the first order elastic interaction energy associated with the hydrogen-induced volume change is accounted for. The three-dimensional stress tensor induced by hydrogen concentration, which is in equilibrium with respect to the dislocation stress field, is derived using the Eshelby inclusion model, while the hydrogen bulk diffusion is treated as a continuum process. This newly developed framework is utilized to quantify the effect of different hydrogen concentrations on the dynamics of a glide dislocation in the absence of an applied stress field as well as on the spacing between dislocations in an array of parallel edge dislocations. A shielding effect is observed for materials having a large hydrogen diffusion coefficient, with the shield effect leading to the homogenization of the shrinkage process leading to the glide loop maintaining its circular shape, as well as resulting in a decrease in dislocation separation distances in the array of parallel edge dislocations. On the other hand, for materials having a small hydrogen diffusion coefficient, the high hydrogen concentrations around the edge characters of the dislocations act to pin them. Higher stresses are required to be able to unpin the dislocations from the hydrogen clouds surrounding them. Finally, this new framework can open the door for further large scale studies on the effect of hydrogen on the different aspects of dislocation-mediated plasticity in metals. With minor modifications of the current formulations, the framework can also be extended to account for general inclusion-induced stress field in discrete dislocation dynamics simulations.

  5. Software Issues in High-Performance Computing and a Framework for the Development of HPC Applications

    DTIC Science & Technology

    1995-01-01

    possible to determine communication points. For this version, a C program spawning Posix threads and using semaphores to synchronize would have to...performance such as the time required for network communication and synchronization as well as issues of asynchrony and memory hierarchy. For example...enhances reusability. Process (or task) parallel computations can also be succinctly expressed with a small set of process creation and synchronization

  6. Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

    NASA Astrophysics Data System (ADS)

    Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

    2017-12-01

    We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.

  7. A hybrid parallel framework for the cellular Potts model simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, Yi; He, Kejing; Dong, Shoubin

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less

  8. A Stream Tilling Approach to Surface Area Estimation for Large Scale Spatial Data in a Shared Memory System

    NASA Astrophysics Data System (ADS)

    Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua

    2017-12-01

    Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.

  9. Nice Guys Finish Fast and Bad Guys Finish Last: Facilitatory vs. Inhibitory Interaction in Parallel Systems

    PubMed Central

    Eidels, Ami; Houpt, Joseph W.; Altieri, Nicholas; Pei, Lei; Townsend, James T.

    2011-01-01

    Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system. PMID:21516183

  10. Nice Guys Finish Fast and Bad Guys Finish Last: Facilitatory vs. Inhibitory Interaction in Parallel Systems.

    PubMed

    Eidels, Ami; Houpt, Joseph W; Altieri, Nicholas; Pei, Lei; Townsend, James T

    2011-04-01

    Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system.

  11. Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters

    PubMed Central

    Bajaj, Chandrajit

    2009-01-01

    Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231

  12. Leveraging human oversight and intervention in large-scale parallel processing of open-source data

    NASA Astrophysics Data System (ADS)

    Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

    2015-05-01

    The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.

  13. A Micro-Level Data-Calibrated Agent-Based Model: The Synergy between Microsimulation and Agent-Based Modeling.

    PubMed

    Singh, Karandeep; Ahn, Chang-Won; Paik, Euihyun; Bae, Jang Won; Lee, Chun-Hee

    2018-01-01

    Artificial life (ALife) examines systems related to natural life, its processes, and its evolution, using simulations with computer models, robotics, and biochemistry. In this article, we focus on the computer modeling, or "soft," aspects of ALife and prepare a framework for scientists and modelers to be able to support such experiments. The framework is designed and built to be a parallel as well as distributed agent-based modeling environment, and does not require end users to have expertise in parallel or distributed computing. Furthermore, we use this framework to implement a hybrid model using microsimulation and agent-based modeling techniques to generate an artificial society. We leverage this artificial society to simulate and analyze population dynamics using Korean population census data. The agents in this model derive their decisional behaviors from real data (microsimulation feature) and interact among themselves (agent-based modeling feature) to proceed in the simulation. The behaviors, interactions, and social scenarios of the agents are varied to perform an analysis of population dynamics. We also estimate the future cost of pension policies based on the future population structure of the artificial society. The proposed framework and model demonstrates how ALife techniques can be used by researchers in relation to social issues and policies.

  14. Software Engineering Support of the Third Round of Scientific Grand Challenge Investigations: Earth System Modeling Software Framework Survey

    NASA Technical Reports Server (NTRS)

    Talbot, Bryan; Zhou, Shu-Jia; Higgins, Glenn; Zukor, Dorothy (Technical Monitor)

    2002-01-01

    One of the most significant challenges in large-scale climate modeling, as well as in high-performance computing in other scientific fields, is that of effectively integrating many software models from multiple contributors. A software framework facilitates the integration task, both in the development and runtime stages of the simulation. Effective software frameworks reduce the programming burden for the investigators, freeing them to focus more on the science and less on the parallel communication implementation. while maintaining high performance across numerous supercomputer and workstation architectures. This document surveys numerous software frameworks for potential use in Earth science modeling. Several frameworks are evaluated in depth, including Parallel Object-Oriented Methods and Applications (POOMA), Cactus (from (he relativistic physics community), Overture, Goddard Earth Modeling System (GEMS), the National Center for Atmospheric Research Flux Coupler, and UCLA/UCB Distributed Data Broker (DDB). Frameworks evaluated in less detail include ROOT, Parallel Application Workspace (PAWS), and Advanced Large-Scale Integrated Computational Environment (ALICE). A host of other frameworks and related tools are referenced in this context. The frameworks are evaluated individually and also compared with each other.

  15. Accelerating large-scale protein structure alignments with graphics processing units

    PubMed Central

    2012-01-01

    Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132

  16. SISYPHUS: A high performance seismic inversion factory

    NASA Astrophysics Data System (ADS)

    Gokhberg, Alexey; Simutė, Saulė; Boehm, Christian; Fichtner, Andreas

    2016-04-01

    In the recent years the massively parallel high performance computers became the standard instruments for solving the forward and inverse problems in seismology. The respective software packages dedicated to forward and inverse waveform modelling specially designed for such computers (SPECFEM3D, SES3D) became mature and widely available. These packages achieve significant computational performance and provide researchers with an opportunity to solve problems of bigger size at higher resolution within a shorter time. However, a typical seismic inversion process contains various activities that are beyond the common solver functionality. They include management of information on seismic events and stations, 3D models, observed and synthetic seismograms, pre-processing of the observed signals, computation of misfits and adjoint sources, minimization of misfits, and process workflow management. These activities are time consuming, seldom sufficiently automated, and therefore represent a bottleneck that can substantially offset performance benefits provided by even the most powerful modern supercomputers. Furthermore, a typical system architecture of modern supercomputing platforms is oriented towards the maximum computational performance and provides limited standard facilities for automation of the supporting activities. We present a prototype solution that automates all aspects of the seismic inversion process and is tuned for the modern massively parallel high performance computing systems. We address several major aspects of the solution architecture, which include (1) design of an inversion state database for tracing all relevant aspects of the entire solution process, (2) design of an extensible workflow management framework, (3) integration with wave propagation solvers, (4) integration with optimization packages, (5) computation of misfits and adjoint sources, and (6) process monitoring. The inversion state database represents a hierarchical structure with branches for the static process setup, inversion iterations, and solver runs, each branch specifying information at the event, station and channel levels. The workflow management framework is based on an embedded scripting engine that allows definition of various workflow scenarios using a high-level scripting language and provides access to all available inversion components represented as standard library functions. At present the SES3D wave propagation solver is integrated in the solution; the work is in progress for interfacing with SPECFEM3D. A separate framework is designed for interoperability with an optimization module; the workflow manager and optimization process run in parallel and cooperate by exchanging messages according to a specially designed protocol. A library of high-performance modules implementing signal pre-processing, misfit and adjoint computations according to established good practices is included. Monitoring is based on information stored in the inversion state database and at present implements a command line interface; design of a graphical user interface is in progress. The software design fits well into the common massively parallel system architecture featuring a large number of computational nodes running distributed applications under control of batch-oriented resource managers. The solution prototype has been implemented on the "Piz Daint" supercomputer provided by the Swiss Supercomputing Centre (CSCS).

  17. Coarse-grained component concurrency in Earth system modeling: parallelizing atmospheric radiative transfer in the GFDL AM3 model using the Flexible Modeling System coupling framework

    NASA Astrophysics Data System (ADS)

    Balaji, V.; Benson, Rusty; Wyman, Bruce; Held, Isaac

    2016-10-01

    Climate models represent a large variety of processes on a variety of timescales and space scales, a canonical example of multi-physics multi-scale modeling. Current hardware trends, such as Graphical Processing Units (GPUs) and Many Integrated Core (MIC) chips, are based on, at best, marginal increases in clock speed, coupled with vast increases in concurrency, particularly at the fine grain. Multi-physics codes face particular challenges in achieving fine-grained concurrency, as different physics and dynamics components have different computational profiles, and universal solutions are hard to come by. We propose here one approach for multi-physics codes. These codes are typically structured as components interacting via software frameworks. The component structure of a typical Earth system model consists of a hierarchical and recursive tree of components, each representing a different climate process or dynamical system. This recursive structure generally encompasses a modest level of concurrency at the highest level (e.g., atmosphere and ocean on different processor sets) with serial organization underneath. We propose to extend concurrency much further by running more and more lower- and higher-level components in parallel with each other. Each component can further be parallelized on the fine grain, potentially offering a major increase in the scalability of Earth system models. We present here first results from this approach, called coarse-grained component concurrency, or CCC. Within the Geophysical Fluid Dynamics Laboratory (GFDL) Flexible Modeling System (FMS), the atmospheric radiative transfer component has been configured to run in parallel with a composite component consisting of every other atmospheric component, including the atmospheric dynamics and all other atmospheric physics components. We will explore the algorithmic challenges involved in such an approach, and present results from such simulations. Plans to achieve even greater levels of coarse-grained concurrency by extending this approach within other components, such as the ocean, will be discussed.

  18. Study of a Fine Grained Threaded Framework Design

    NASA Astrophysics Data System (ADS)

    Jones, C. D.

    2012-12-01

    Traditionally, HEP experiments exploit the multiple cores in a CPU by having each core process one event. However, future PC designs are expected to use CPUs which double the number of processing cores at the same rate as the cost of memory falls by a factor of two. This effectively means the amount of memory per processing core will remain constant. This is a major challenge for LHC processing frameworks since the LHC is expected to deliver more complex events (e.g. greater pileup events) in the coming years while the LHC experiment's frameworks are already memory constrained. Therefore in the not so distant future we may need to be able to efficiently use multiple cores to process one event. In this presentation we will discuss a design for an HEP processing framework which can allow very fine grained parallelization within one event as well as supporting processing multiple events simultaneously while minimizing the memory footprint of the job. The design is built around the libdispatch framework created by Apple Inc. (a port for Linux is available) whose central concept is the use of task queues. This design also accommodates the reality that not all code will be thread safe and therefore allows one to easily mark modules or sub parts of modules as being thread unsafe. In addition, the design efficiently handles the requirement that events in one run must all be processed before starting to process events from a different run. After explaining the design we will provide measurements from simulating different processing scenarios where the processing times used for the simulation are drawn from processing times measured from actual CMS event processing.

  19. National Combustion Code, a Multidisciplinary Combustor Design System, Will Be Transferred to the Commercial Sector

    NASA Technical Reports Server (NTRS)

    Steele, Gynelle C.

    1999-01-01

    The NASA Lewis Research Center and Flow Parametrics will enter into an agreement to commercialize the National Combustion Code (NCC). This multidisciplinary combustor design system utilizes computer-aided design (CAD) tools for geometry creation, advanced mesh generators for creating solid model representations, a common framework for fluid flow and structural analyses, modern postprocessing tools, and parallel processing. This integrated system can facilitate and enhance various phases of the design and analysis process.

  20. Parallel asynchronous systems and image processing algorithms

    NASA Technical Reports Server (NTRS)

    Coon, D. D.; Perera, A. G. U.

    1989-01-01

    A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.

  1. What is the truth? An application of the Extended Parallel Process Model to televised truth® ads.

    PubMed

    Lavoie, Nicole R; Quick, Brian L

    2013-01-01

    The purpose of this study was to analyze television ads in the truth® campaign using the Extended Parallel Process Model (EPPM) as a framework. Among the ads (n = 86) analyzed, results revealed a heavy reliance on severity messages, modest attention to susceptibility messages, and no inclusion of recommended response messages in the form of self-efficacy and response efficacy. The reliance on emphasizing the health threat, without incorporating recommended response messages, is discussed with respect to the likelihood of galvanizing maladaptive responses such as psychological reactance, denial, and defensive avoidance resulting from exposure to these ads. Additionally, the unintended outcomes for secondary audiences including but not limited to stigma are considered. Implications and suggestions for practitioners and theorists are explored.

  2. Vector processing efficiency of plasma MHD codes by use of the FACOM 230-75 APU

    NASA Astrophysics Data System (ADS)

    Matsuura, T.; Tanaka, Y.; Naraoka, K.; Takizuka, T.; Tsunematsu, T.; Tokuda, S.; Azumi, M.; Kurita, G.; Takeda, T.

    1982-06-01

    In the framework of pipelined vector architecture, the efficiency of vector processing is assessed with respect to plasma MHD codes in nuclear fusion research. By using a vector processor, the FACOM 230-75 APU, the limit of the enhancement factor due to parallelism of current vector machines is examined for three numerical codes based on a fluid model. Reasonable speed-up factors of approximately 6,6 and 4 times faster than the highly optimized scalar version are obtained for ERATO (linear stability code), AEOLUS-R1 (nonlinear stability code) and APOLLO (1-1/2D transport code), respectively. Problems of the pipelined vector processors are discussed from the viewpoint of restructuring, optimization and choice of algorithms. In conclusion, the important concept of "concurrency within pipelined parallelism" is emphasized.

  3. A framework for grand scale parallelization of the combined finite discrete element method in 2d

    NASA Astrophysics Data System (ADS)

    Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.

    2014-09-01

    Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.

  4. Construction Morphology and the Parallel Architecture of Grammar

    ERIC Educational Resources Information Center

    Booij, Geert; Audring, Jenny

    2017-01-01

    This article presents a systematic exposition of how the basic ideas of Construction Grammar (CxG) (Goldberg, 2006) and the Parallel Architecture (PA) of grammar (Jackendoff, 2002]) provide the framework for a proper account of morphological phenomena, in particular word formation. This framework is referred to as Construction Morphology (CxM). As…

  5. Bonsai: an event-based framework for processing and controlling data streams

    PubMed Central

    Lopes, Gonçalo; Bonacchi, Niccolò; Frazão, João; Neto, Joana P.; Atallah, Bassam V.; Soares, Sofia; Moreira, Luís; Matias, Sara; Itskov, Pavel M.; Correia, Patrícia A.; Medina, Roberto E.; Calcaterra, Lorenza; Dreosti, Elena; Paton, Joseph J.; Kampff, Adam R.

    2015-01-01

    The design of modern scientific experiments requires the control and monitoring of many different data streams. However, the serial execution of programming instructions in a computer makes it a challenge to develop software that can deal with the asynchronous, parallel nature of scientific data. Here we present Bonsai, a modular, high-performance, open-source visual programming framework for the acquisition and online processing of data streams. We describe Bonsai's core principles and architecture and demonstrate how it allows for the rapid and flexible prototyping of integrated experimental designs in neuroscience. We specifically highlight some applications that require the combination of many different hardware and software components, including video tracking of behavior, electrophysiology and closed-loop control of stimulation. PMID:25904861

  6. Improving the scalability of hyperspectral imaging applications on heterogeneous platforms using adaptive run-time data compression

    NASA Astrophysics Data System (ADS)

    Plaza, Antonio; Plaza, Javier; Paz, Abel

    2010-10-01

    Latest generation remote sensing instruments (called hyperspectral imagers) are now able to generate hundreds of images, corresponding to different wavelength channels, for the same area on the surface of the Earth. In previous work, we have reported that the scalability of parallel processing algorithms dealing with these high-dimensional data volumes is affected by the amount of data to be exchanged through the communication network of the system. However, large messages are common in hyperspectral imaging applications since processing algorithms are pixel-based, and each pixel vector to be exchanged through the communication network is made up of hundreds of spectral values. Thus, decreasing the amount of data to be exchanged could improve the scalability and parallel performance. In this paper, we propose a new framework based on intelligent utilization of wavelet-based data compression techniques for improving the scalability of a standard hyperspectral image processing chain on heterogeneous networks of workstations. This type of parallel platform is quickly becoming a standard in hyperspectral image processing due to the distributed nature of collected hyperspectral data as well as its flexibility and low cost. Our experimental results indicate that adaptive lossy compression can lead to improvements in the scalability of the hyperspectral processing chain without sacrificing analysis accuracy, even at sub-pixel precision levels.

  7. PFLOTRAN: Reactive Flow & Transport Code for Use on Laptops to Leadership-Class Supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hammond, Glenn E.; Lichtner, Peter C.; Lu, Chuan

    PFLOTRAN, a next-generation reactive flow and transport code for modeling subsurface processes, has been designed from the ground up to run efficiently on machines ranging from leadership-class supercomputers to laptops. Based on an object-oriented design, the code is easily extensible to incorporate additional processes. It can interface seamlessly with Fortran 9X, C and C++ codes. Domain decomposition parallelism is employed, with the PETSc parallel framework used to manage parallel solvers, data structures and communication. Features of the code include a modular input file, implementation of high-performance I/O using parallel HDF5, ability to perform multiple realization simulations with multiple processors permore » realization in a seamless manner, and multiple modes for multiphase flow and multicomponent geochemical transport. Chemical reactions currently implemented in the code include homogeneous aqueous complexing reactions and heterogeneous mineral precipitation/dissolution, ion exchange, surface complexation and a multirate kinetic sorption model. PFLOTRAN has demonstrated petascale performance using 2{sup 17} processor cores with over 2 billion degrees of freedom. Accomplishments achieved to date include applications to the Hanford 300 Area and modeling CO{sub 2} sequestration in deep geologic formations.« less

  8. Molecular Monte Carlo Simulations Using Graphics Processing Units: To Waste Recycle or Not?

    PubMed

    Kim, Jihan; Rodgers, Jocelyn M; Athènes, Manuel; Smit, Berend

    2011-10-11

    In the waste recycling Monte Carlo (WRMC) algorithm, (1) multiple trial states may be simultaneously generated and utilized during Monte Carlo moves to improve the statistical accuracy of the simulations, suggesting that such an algorithm may be well posed for implementation in parallel on graphics processing units (GPUs). In this paper, we implement two waste recycling Monte Carlo algorithms in CUDA (Compute Unified Device Architecture) using uniformly distributed random trial states and trial states based on displacement random-walk steps, and we test the methods on a methane-zeolite MFI framework system to evaluate their utility. We discuss the specific implementation details of the waste recycling GPU algorithm and compare the methods to other parallel algorithms optimized for the framework system. We analyze the relationship between the statistical accuracy of our simulations and the CUDA block size to determine the efficient allocation of the GPU hardware resources. We make comparisons between the GPU and the serial CPU Monte Carlo implementations to assess speedup over conventional microprocessors. Finally, we apply our optimized GPU algorithms to the important problem of determining free energy landscapes, in this case for molecular motion through the zeolite LTA.

  9. Distributed memory parallel Markov random fields using graph partitioning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heinemann, C.; Perciano, T.; Ushizima, D.

    Markov random fields (MRF) based algorithms have attracted a large amount of interest in image analysis due to their ability to exploit contextual information about data. Image data generated by experimental facilities, though, continues to grow larger and more complex, making it more difficult to analyze in a reasonable amount of time. Applying image processing algorithms to large datasets requires alternative approaches to circumvent performance problems. Aiming to provide scientists with a new tool to recover valuable information from such datasets, we developed a general purpose distributed memory parallel MRF-based image analysis framework (MPI-PMRF). MPI-PMRF overcomes performance and memory limitationsmore » by distributing data and computations across processors. The proposed approach was successfully tested with synthetic and experimental datasets. Additionally, the performance of the MPI-PMRF framework is analyzed through a detailed scalability study. We show that a performance increase is obtained while maintaining an accuracy of the segmentation results higher than 98%. The contributions of this paper are: (a) development of a distributed memory MRF framework; (b) measurement of the performance increase of the proposed approach; (c) verification of segmentation accuracy in both synthetic and experimental, real-world datasets« less

  10. A Framework for Performing V&V within Reuse-Based Software Engineering

    NASA Technical Reports Server (NTRS)

    Addy, Edward A.

    1996-01-01

    Verification and validation (V&V) is performed during application development for many systems, especially safety-critical and mission-critical systems. The V&V process is intended to discover errors, especially errors related to critical processing, as early as possible during the development process. Early discovery is important in order to minimize the cost and other impacts of correcting these errors. In order to provide early detection of errors, V&V is conducted in parallel with system development, often beginning with the concept phase. In reuse-based software engineering, however, decisions on the requirements, design and even implementation of domain assets can be made prior to beginning development of a specific system. In this case, V&V must be performed during domain engineering in order to have an impact on system development. This paper describes a framework for performing V&V within architecture-centric, reuse-based software engineering. This framework includes the activities of traditional application-level V&V, and extends these activities into domain engineering and into the transition between domain engineering and application engineering. The framework includes descriptions of the types of activities to be performed during each of the life-cycle phases, and provides motivation for the activities.

  11. A Parallel Processing and Diversified-Hidden-Gene-Based Genetic Algorithm Framework for Fuel-Optimal Trajectory Design for Interplanetary Spacecraft Missions

    NASA Astrophysics Data System (ADS)

    Somavarapu, Dhathri H.

    This thesis proposes a new parallel computing genetic algorithm framework for designing fuel-optimal trajectories for interplanetary spacecraft missions. The framework can capture the deep search space of the problem with the use of a fixed chromosome structure and hidden-genes concept, can explore the diverse set of candidate solutions with the use of the adaptive and twin-space crowding techniques and, can execute on any high-performance computing (HPC) platform with the adoption of the portable message passing interface (MPI) standard. The algorithm is implemented in C++ with the use of the MPICH implementation of the MPI standard. The algorithm uses a patched-conic approach with two-body dynamics assumptions. New procedures are developed for determining trajectories in the Vinfinity-leveraging legs of the flight from the launch and non-launch planets and, deep-space maneuver legs of the flight from the launch and non-launch planets. The chromosome structure maintains the time of flight as a free parameter within certain boundaries. The fitness or the cost function of the algorithm uses only the mission Delta V, and does not include time of flight. The optimization is conducted with two variations for the minimum mission gravity-assist sequence, the 4-gravity-assist, and the 3-gravity-assist, with a maximum of 5 gravity-assists allowed in both the cases. The optimal trajectories discovered using the framework in both of the cases demonstrate the success of this framework.

  12. Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

    PubMed

    Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

    2009-01-01

    Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

  13. Toward a comprehensive landscape vegetation monitoring framework

    NASA Astrophysics Data System (ADS)

    Kennedy, Robert; Hughes, Joseph; Neeti, Neeti; Larrue, Tara; Gregory, Matthew; Roberts, Heather; Ohmann, Janet; Kane, Van; Kane, Jonathan; Hooper, Sam; Nelson, Peder; Cohen, Warren; Yang, Zhiqiang

    2016-04-01

    Blossoming Earth observation resources provide great opportunity to better understand land vegetation dynamics, but also require new techniques and frameworks to exploit their potential. Here, I describe several parallel projects that leverage time-series Landsat imagery to describe vegetation dynamics at regional and continental scales. At the core of these projects are the LandTrendr algorithms, which distill time-series earth observation data into periods of consistent long or short-duration dynamics. In one approach, we built an integrated, empirical framework to blend these algorithmically-processed time-series data with field data and lidar data to ascribe yearly change in forest biomass across the US states of Washington, Oregon, and California. In a separate project, we expanded from forest-only monitoring to full landscape land cover monitoring over the same regional scale, including both categorical class labels and continuous-field estimates. In these and other projects, we apply machine-learning approaches to ascribe all changes in vegetation to driving processes such as harvest, fire, urbanization, etc., allowing full description of both disturbance and recovery processes and drivers. Finally, we are moving toward extension of these same techniques to continental and eventually global scales using Google Earth Engine. Taken together, these approaches provide one framework for describing and understanding processes of change in vegetation communities at broad scales.

  14. ANNarchy: a code generation approach to neural simulations on parallel hardware

    PubMed Central

    Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  15. A neurally plausible parallel distributed processing model of event-related potential word reading data.

    PubMed

    Laszlo, Sarah; Plaut, David C

    2012-03-01

    The Parallel Distributed Processing (PDP) framework has significant potential for producing models of cognitive tasks that approximate how the brain performs the same tasks. To date, however, there has been relatively little contact between PDP modeling and data from cognitive neuroscience. In an attempt to advance the relationship between explicit, computational models and physiological data collected during the performance of cognitive tasks, we developed a PDP model of visual word recognition which simulates key results from the ERP reading literature, while simultaneously being able to successfully perform lexical decision-a benchmark task for reading models. Simulations reveal that the model's success depends on the implementation of several neurally plausible features in its architecture which are sufficiently domain-general to be relevant to cognitive modeling more generally. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. Concurrent Probabilistic Simulation of High Temperature Composite Structural Response

    NASA Technical Reports Server (NTRS)

    Abdi, Frank

    1996-01-01

    A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.

  17. Accelerating Fibre Orientation Estimation from Diffusion Weighted Magnetic Resonance Imaging Using GPUs

    PubMed Central

    Hernández, Moisés; Guerrero, Ginés D.; Cecilia, José M.; García, José M.; Inuggi, Alberto; Jbabdi, Saad; Behrens, Timothy E. J.; Sotiropoulos, Stamatios N.

    2013-01-01

    With the performance of central processing units (CPUs) having effectively reached a limit, parallel processing offers an alternative for applications with high computational demands. Modern graphics processing units (GPUs) are massively parallel processors that can execute simultaneously thousands of light-weight processes. In this study, we propose and implement a parallel GPU-based design of a popular method that is used for the analysis of brain magnetic resonance imaging (MRI). More specifically, we are concerned with a model-based approach for extracting tissue structural information from diffusion-weighted (DW) MRI data. DW-MRI offers, through tractography approaches, the only way to study brain structural connectivity, non-invasively and in-vivo. We parallelise the Bayesian inference framework for the ball & stick model, as it is implemented in the tractography toolbox of the popular FSL software package (University of Oxford). For our implementation, we utilise the Compute Unified Device Architecture (CUDA) programming model. We show that the parameter estimation, performed through Markov Chain Monte Carlo (MCMC), is accelerated by at least two orders of magnitude, when comparing a single GPU with the respective sequential single-core CPU version. We also illustrate similar speed-up factors (up to 120x) when comparing a multi-GPU with a multi-CPU implementation. PMID:23658616

  18. The Parallel System for Integrating Impact Models and Sectors (pSIMS)

    NASA Technical Reports Server (NTRS)

    Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian

    2014-01-01

    We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.

  19. SU-F-SPS-09: Parallel MC Kernel Calculations for VMAT Plan Improvement

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chamberlain, S; Roswell Park Cancer Institute, Buffalo, NY; French, S

    Purpose: Adding kernels (small perturbations in leaf positions) to the existing apertures of VMAT control points may improve plan quality. We investigate the calculation of kernel doses using a parallelized Monte Carlo (MC) method. Methods: A clinical prostate VMAT DICOM plan was exported from Eclipse. An arbitrary control point and leaf were chosen, and a modified MLC file was created, corresponding to the leaf position offset by 0.5cm. The additional dose produced by this 0.5 cm × 0.5 cm kernel was calculated using the DOSXYZnrc component module of BEAMnrc. A range of particle history counts were run (varying from 3more » × 10{sup 6} to 3 × 10{sup 7}); each job was split among 1, 10, or 100 parallel processes. A particle count of 3 × 10{sup 6} was established as the lower range because it provided the minimal accuracy level. Results: As expected, an increase in particle counts linearly increases run time. For the lowest particle count, the time varied from 30 hours for the single-processor run, to 0.30 hours for the 100-processor run. Conclusion: Parallel processing of MC calculations in the EGS framework significantly decreases time necessary for each kernel dose calculation. Particle counts lower than 1 × 10{sup 6} have too large of an error to output accurate dose for a Monte Carlo kernel calculation. Future work will investigate increasing the number of parallel processes and optimizing run times for multiple kernel calculations.« less

  20. GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sengupta, Dipanjan; Song, Shuaiwen; Agarwal, Kapil

    2015-11-15

    Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the host andmore » device.« less

  1. A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development

    NASA Astrophysics Data System (ADS)

    Urfianto, Mohammad Zalfany; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki

    This paper presentss a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.

  2. A unified framework for building high performance DVEs

    NASA Astrophysics Data System (ADS)

    Lei, Kaibin; Ma, Zhixia; Xiong, Hua

    2011-10-01

    A unified framework for integrating PC cluster based parallel rendering with distributed virtual environments (DVEs) is presented in this paper. While various scene graphs have been proposed in DVEs, it is difficult to enable collaboration of different scene graphs. This paper proposes a technique for non-distributed scene graphs with the capability of object and event distribution. With the increase of graphics data, DVEs require more powerful rendering ability. But general scene graphs are inefficient in parallel rendering. The paper also proposes a technique to connect a DVE and a PC cluster based parallel rendering environment. A distributed multi-player video game is developed to show the interaction of different scene graphs and the parallel rendering performance on a large tiled display wall.

  3. A non-voxel-based broad-beam (NVBB) framework for IMRT treatment planning.

    PubMed

    Lu, Weiguo

    2010-12-07

    We present a novel framework that enables very large scale intensity-modulated radiation therapy (IMRT) planning in limited computation resources with improvements in cost, plan quality and planning throughput. Current IMRT optimization uses a voxel-based beamlet superposition (VBS) framework that requires pre-calculation and storage of a large amount of beamlet data, resulting in large temporal and spatial complexity. We developed a non-voxel-based broad-beam (NVBB) framework for IMRT capable of direct treatment parameter optimization (DTPO). In this framework, both objective function and derivative are evaluated based on the continuous viewpoint, abandoning 'voxel' and 'beamlet' representations. Thus pre-calculation and storage of beamlets are no longer needed. The NVBB framework has linear complexities (O(N(3))) in both space and time. The low memory, full computation and data parallelization nature of the framework render its efficient implementation on the graphic processing unit (GPU). We implemented the NVBB framework and incorporated it with the TomoTherapy treatment planning system (TPS). The new TPS runs on a single workstation with one GPU card (NVBB-GPU). Extensive verification/validation tests were performed in house and via third parties. Benchmarks on dose accuracy, plan quality and throughput were compared with the commercial TomoTherapy TPS that is based on the VBS framework and uses a computer cluster with 14 nodes (VBS-cluster). For all tests, the dose accuracy of these two TPSs is comparable (within 1%). Plan qualities were comparable with no clinically significant difference for most cases except that superior target uniformity was seen in the NVBB-GPU for some cases. However, the planning time using the NVBB-GPU was reduced many folds over the VBS-cluster. In conclusion, we developed a novel NVBB framework for IMRT optimization. The continuous viewpoint and DTPO nature of the algorithm eliminate the need for beamlets and lead to better plan quality. The computation parallelization on a GPU instead of a computer cluster significantly reduces hardware and service costs. Compared with using the current VBS framework on a computer cluster, the planning time is significantly reduced using the NVBB framework on a single workstation with a GPU card.

  4. A lightweight messaging-based distributed processing and workflow execution framework for real-time and big data analysis

    NASA Astrophysics Data System (ADS)

    Laban, Shaban; El-Desouky, Aly

    2014-05-01

    To achieve a rapid, simple and reliable parallel processing of different types of tasks and big data processing on any compute cluster, a lightweight messaging-based distributed applications processing and workflow execution framework model is proposed. The framework is based on Apache ActiveMQ and Simple (or Streaming) Text Oriented Message Protocol (STOMP). ActiveMQ , a popular and powerful open source persistence messaging and integration patterns server with scheduler capabilities, acts as a message broker in the framework. STOMP provides an interoperable wire format that allows framework programs to talk and interact between each other and ActiveMQ easily. In order to efficiently use the message broker a unified message and topic naming pattern is utilized to achieve the required operation. Only three Python programs and simple library, used to unify and simplify the implementation of activeMQ and STOMP protocol, are needed to use the framework. A watchdog program is used to monitor, remove, add, start and stop any machine and/or its different tasks when necessary. For every machine a dedicated one and only one zoo keeper program is used to start different functions or tasks, stompShell program, needed for executing the user required workflow. The stompShell instances are used to execute any workflow jobs based on received message. A well-defined, simple and flexible message structure, based on JavaScript Object Notation (JSON), is used to build any complex workflow systems. Also, JSON format is used in configuration, communication between machines and programs. The framework is platform independent. Although, the framework is built using Python the actual workflow programs or jobs can be implemented by any programming language. The generic framework can be used in small national data centres for processing seismological and radionuclide data received from the International Data Centre (IDC) of the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO). Also, it is possible to extend the use of the framework in monitoring the IDC pipeline. The detailed design, implementation,conclusion and future work of the proposed framework will be presented.

  5. Implementation of a parallel protein structure alignment service on cloud.

    PubMed

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

  6. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    PubMed Central

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  7. a Web-Based Framework for Visualizing Industrial Spatiotemporal Distribution Using Standard Deviational Ellipse and Shifting Routes of Gravity Centers

    NASA Astrophysics Data System (ADS)

    Song, Y.; Gui, Z.; Wu, H.; Wei, Y.

    2017-09-01

    Analysing spatiotemporal distribution patterns and its dynamics of different industries can help us learn the macro-level developing trends of those industries, and in turn provides references for industrial spatial planning. However, the analysis process is challenging task which requires an easy-to-understand information presentation mechanism and a powerful computational technology to support the visual analytics of big data on the fly. Due to this reason, this research proposes a web-based framework to enable such a visual analytics requirement. The framework uses standard deviational ellipse (SDE) and shifting route of gravity centers to show the spatial distribution and yearly developing trends of different enterprise types according to their industry categories. The calculation of gravity centers and ellipses is paralleled using Apache Spark to accelerate the processing. In the experiments, we use the enterprise registration dataset in Mainland China from year 1960 to 2015 that contains fine-grain location information (i.e., coordinates of each individual enterprise) to demonstrate the feasibility of this framework. The experiment result shows that the developed visual analytics method is helpful to understand the multi-level patterns and developing trends of different industries in China. Moreover, the proposed framework can be used to analyse any nature and social spatiotemporal point process with large data volume, such as crime and disease.

  8. `Dhara': An Open Framework for Critical Zone Modeling

    NASA Astrophysics Data System (ADS)

    Le, P. V.; Kumar, P.

    2016-12-01

    Processes in the Critical Zone, which sustain terrestrial life, are tightly coupled across hydrological, physical, biological, chemical, pedological, geomorphological and ecological domains over both short and long timescales. Observations and quantification of the Earth's surface across these domains using emerging high resolution measurement technologies such as light detection and ranging (lidar) and hyperspectral remote sensing are enabling us to characterize fine scale landscape attributes over large spatial areas. This presents a unique opportunity to develop novel approaches to model the Critical Zone that can capture fine scale intricate dependencies across the different processes in 3D. The development of interdisciplinary tools that transcend individual disciplines and capture new levels of complexity and emergent properties is at the core of Critical Zone science. Here we introduce an open framework for high-performance computing model (`Dhara') for modeling complex processes in the Critical Zone. The framework is designed to be modular in structure with the aim to create uniform and efficient tools to facilitate and leverage process modeling. It also provides flexibility to maintain, collaborate, and co-develop additional components by the scientific community. We show the essential framework that simulates ecohydrologic dynamics, and surface - sub-surface coupling in 3D using hybrid parallel CPU-GPU. We demonstrate that the open framework in Dhara is feasible for detailed, multi-processes, and large-scale modeling of the Critical Zone, which opens up exciting possibilities. We will also present outcomes from a Modeling Summer Institute led by Intensively Managed Critical Zone Observatory (IMLCZO) with representation from several CZOs and international representatives.

  9. Cloud-Based Perception and Control of Sensor Nets and Robot Swarms

    DTIC Science & Technology

    2016-04-01

    distributed stream processing framework provides the necessary API and infrastructure to develop and execute such applications in a cluster of computation...streaming DDDAS applications based on challenges they present to the backend Cloud control system. Figure 2 Parallel SLAM Application 3 1) Set of...the art deep learning- based object detectors can recognize among hundreds of object classes and this capability would be very useful for mobile

  10. MRUniNovo: an efficient tool for de novo peptide sequencing utilizing the hadoop distributed computing framework.

    PubMed

    Li, Chuang; Chen, Tao; He, Qiang; Zhu, Yunping; Li, Kenli

    2017-03-15

    Tandem mass spectrometry-based de novo peptide sequencing is a complex and time-consuming process. The current algorithms for de novo peptide sequencing cannot rapidly and thoroughly process large mass spectrometry datasets. In this paper, we propose MRUniNovo, a novel tool for parallel de novo peptide sequencing. MRUniNovo parallelizes UniNovo based on the Hadoop compute platform. Our experimental results demonstrate that MRUniNovo significantly reduces the computation time of de novo peptide sequencing without sacrificing the correctness and accuracy of the results, and thus can process very large datasets that UniNovo cannot. MRUniNovo is an open source software tool implemented in java. The source code and the parameter settings are available at http://bioinfo.hupo.org.cn/MRUniNovo/index.php. s131020002@hnu.edu.cn ; taochen1019@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  11. Emerging Adults’ Lived Experience of Formative Family Stress: The Family’s Lasting Influence

    PubMed Central

    Valdez, Carmen R.; Chavez, Tom; Woulfe, Julie

    2014-01-01

    In this article, we use a phenomenology framework to explore emerging adults’ formative experiences of family stress. Fourteen college students participated in a qualitative interview about their experience of family stress. We analyzed the interviews using the empirical phenomenological psychology method. Participants described a variety of family stressors, including parental conflict and divorce, physical or mental illness, and emotional or sexual abuse by a family member. Two general types of parallel processes were essential to the experience of family stress for participants. First, the family stressor was experienced in shifts and progressions reflecting the young person’s attempts to manage the stressor, and second, these shifts and progressions were interdependent with deeply personal psychological meanings of self, sociality, physical and emotional expression, agency, place, space, project, and discourse. We describe each one of these parallel processes, and their subprocesses, and conclude with implications for mental health practice and research. PMID:23771635

  12. Emerging adults' lived experience of formative family stress: the family's lasting influence.

    PubMed

    Valdez, Carmen R; Chavez, Tom; Woulfe, Julie

    2013-08-01

    In this article, we use a phenomenology framework to explore emerging adults' formative experiences of family stress. Fourteen college students participated in a qualitative interview about their experience of family stress. We analyzed the interviews using the empirical phenomenological psychology method. Participants described a variety of family stressors, including parental conflict and divorce, physical or mental illness, and emotional or sexual abuse by a family member. Two general types of parallel processes were essential to the experience of family stress for participants. First, the family stressor was experienced in shifts and progressions reflecting the young person's attempts to manage the stressor, and second, these shifts and progressions were interdependent with deeply personal psychological meanings of self, sociality, physical and emotional expression, agency, place, space, project, and discourse. We describe each of these parallel processes and their subprocesses, and conclude with implications for mental health practice and research.

  13. Bifrost: a Modular Python/C++ Framework for Development of High-Throughput Data Analysis Pipelines

    NASA Astrophysics Data System (ADS)

    Cranmer, Miles; Barsdell, Benjamin R.; Price, Danny C.; Garsden, Hugh; Taylor, Gregory B.; Dowell, Jayce; Schinzel, Frank; Costa, Timothy; Greenhill, Lincoln J.

    2017-01-01

    Large radio interferometers have data rates that render long-term storage of raw correlator data infeasible, thus motivating development of real-time processing software. For high-throughput applications, processing pipelines are challenging to design and implement. Motivated by science efforts with the Long Wavelength Array, we have developed Bifrost, a novel Python/C++ framework that eases the development of high-throughput data analysis software by packaging algorithms as black box processes in a directed graph. This strategy to modularize code allows astronomers to create parallelism without code adjustment. Bifrost uses CPU/GPU ’circular memory’ data buffers that enable ready introduction of arbitrary functions into the processing path for ’streams’ of data, and allow pipelines to automatically reconfigure in response to astrophysical transient detection or input of new observing settings. We have deployed and tested Bifrost at the latest Long Wavelength Array station, in Sevilleta National Wildlife Refuge, NM, where it handles throughput exceeding 10 Gbps per CPU core.

  14. Towards a Better Distributed Framework for Learning Big Data

    DTIC Science & Technology

    2017-06-14

    UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This work aimed at solving issues in distributed machine learning. The PI’s team proposed...communication load. Finally, the team proposed the parallel least-squares policy iteration (parallel LSPI) to parallelize a reinforcement policy learning. 15

  15. An Extensible Processing Framework for Eddy-covariance Data

    NASA Astrophysics Data System (ADS)

    Durden, D.; Fox, A. M.; Metzger, S.; Sturtevant, C.; Durden, N. P.; Luo, H.

    2016-12-01

    The evolution of large data collecting networks has not only led to an increase of available information, but also in the complexity of analyzing the observations. Timely dissemination of readily usable data products necessitates a streaming processing framework that is both automatable and flexible. Tower networks, such as ICOS, Ameriflux, and NEON, exemplify this issue by requiring large amounts of data to be processed from dispersed measurement sites. Eddy-covariance data from across the NEON network are expected to amount to 100 Gigabytes per day. The complexity of the algorithmic processing necessary to produce high-quality data products together with the continued development of new analysis techniques led to the development of a modular R-package, eddy4R. This allows algorithms provided by NEON and the larger community to be deployed in streaming processing, and to be used by community members alike. In order to control the processing environment, provide a proficient parallel processing structure, and certify dependencies are available during processing, we chose Docker as our "Development and Operations" (DevOps) platform. The Docker framework allows our processing algorithms to be developed, maintained and deployed at scale. Additionally, the eddy4R-Docker framework fosters community use and extensibility via pre-built Docker images and the Github distributed version control system. The capability to process large data sets is reliant upon efficient input and output of data, data compressibility to reduce compute resource loads, and the ability to easily package metadata. The Hierarchical Data Format (HDF5) is a file format that can meet these needs. A NEON standard HDF5 file structure and metadata attributes allow users to explore larger data sets in an intuitive "directory-like" structure adopting the NEON data product naming conventions.

  16. AMRZone: A Runtime AMR Data Sharing Framework For Scientific Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Wenzhao; Tang, Houjun; Harenberg, Steven

    Frameworks that facilitate runtime data sharing across multiple applications are of great importance for scientific data analytics. Although existing frameworks work well over uniform mesh data, they can not effectively handle adaptive mesh refinement (AMR) data. Among the challenges to construct an AMR-capable framework include: (1) designing an architecture that facilitates online AMR data management; (2) achieving a load-balanced AMR data distribution for the data staging space at runtime; and (3) building an effective online index to support the unique spatial data retrieval requirements for AMR data. Towards addressing these challenges to support runtime AMR data sharing across scientific applications,more » we present the AMRZone framework. Experiments over real-world AMR datasets demonstrate AMRZone's effectiveness at achieving a balanced workload distribution, reading/writing large-scale datasets with thousands of parallel processes, and satisfying queries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to 16384 cores; in the best case, our framework achieves a 46% performance improvement.« less

  17. Heterogeneous scalable framework for multiphase flows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morris, Karla Vanessa

    2013-09-01

    Two categories of challenges confront the developer of computational spray models: those related to the computation and those related to the physics. Regarding the computation, the trend towards heterogeneous, multi- and many-core platforms will require considerable re-engineering of codes written for the current supercomputing platforms. Regarding the physics, accurate methods for transferring mass, momentum and energy from the dispersed phase onto the carrier fluid grid have so far eluded modelers. Significant challenges also lie at the intersection between these two categories. To be competitive, any physics model must be expressible in a parallel algorithm that performs well on evolving computermore » platforms. This work created an application based on a software architecture where the physics and software concerns are separated in a way that adds flexibility to both. The develop spray-tracking package includes an application programming interface (API) that abstracts away the platform-dependent parallelization concerns, enabling the scientific programmer to write serial code that the API resolves into parallel processes and threads of execution. The project also developed the infrastructure required to provide similar APIs to other application. The API allow object-oriented Fortran applications direct interaction with Trilinos to support memory management of distributed objects in central processing units (CPU) and graphic processing units (GPU) nodes for applications using C++.« less

  18. Integrable Floquet dynamics, generalized exclusion processes and "fused" matrix ansatz

    NASA Astrophysics Data System (ADS)

    Vanicat, Matthieu

    2018-04-01

    We present a general method for constructing integrable stochastic processes, with two-step discrete time Floquet dynamics, from the transfer matrix formalism. The models can be interpreted as a discrete time parallel update. The method can be applied for both periodic and open boundary conditions. We also show how the stationary distribution can be built as a matrix product state. As an illustration we construct parallel discrete time dynamics associated with the R-matrix of the SSEP and of the ASEP, and provide the associated stationary distributions in a matrix product form. We use this general framework to introduce new integrable generalized exclusion processes, where a fixed number of particles is allowed on each lattice site in opposition to the (single particle) exclusion process models. They are constructed using the fusion procedure of R-matrices (and K-matrices for open boundary conditions) for the SSEP and ASEP. We develop a new method, that we named "fused" matrix ansatz, to build explicitly the stationary distribution in a matrix product form. We use this algebraic structure to compute physical observables such as the correlation functions and the mean particle current.

  19. A Framework for Load Balancing of Tensor Contraction Expressions via Dynamic Task Partitioning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lai, Pai-Wei; Stock, Kevin; Rajbhandari, Samyam

    In this paper, we introduce the Dynamic Load-balanced Tensor Contractions (DLTC), a domain-specific library for efficient task parallel execution of tensor contraction expressions, a class of computation encountered in quantum chemistry and physics. Our framework decomposes each contraction into smaller unit of tasks, represented by an abstraction referred to as iterators. We exploit an extra level of parallelism by having tasks across independent contractions executed concurrently through a dynamic load balancing run- time. We demonstrate the improved performance, scalability, and flexibility for the computation of tensor contraction expressions on parallel computers using examples from coupled cluster methods.

  20. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

    PubMed

    Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

    2014-01-16

    To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.

  1. Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

    PubMed Central

    2014-01-01

    Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926

  2. Decentralized diagnostics based on a distributed micro-genetic algorithm for transducer networks monitoring large experimental systems.

    PubMed

    Arpaia, P; Cimmino, P; Girone, M; La Commara, G; Maisto, D; Manna, C; Pezzetti, M

    2014-09-01

    Evolutionary approach to centralized multiple-faults diagnostics is extended to distributed transducer networks monitoring large experimental systems. Given a set of anomalies detected by the transducers, each instance of the multiple-fault problem is formulated as several parallel communicating sub-tasks running on different transducers, and thus solved one-by-one on spatially separated parallel processes. A micro-genetic algorithm merges evaluation time efficiency, arising from a small-size population distributed on parallel-synchronized processors, with the effectiveness of centralized evolutionary techniques due to optimal mix of exploitation and exploration. In this way, holistic view and effectiveness advantages of evolutionary global diagnostics are combined with reliability and efficiency benefits of distributed parallel architectures. The proposed approach was validated both (i) by simulation at CERN, on a case study of a cold box for enhancing the cryogeny diagnostics of the Large Hadron Collider, and (ii) by experiments, under the framework of the industrial research project MONDIEVOB (Building Remote Monitoring and Evolutionary Diagnostics), co-funded by EU and the company Del Bo srl, Napoli, Italy.

  3. Algorithms and analyses for stochastic optimization for turbofan noise reduction using parallel reduced-order modeling

    NASA Astrophysics Data System (ADS)

    Yang, Huanhuan; Gunzburger, Max

    2017-06-01

    Simulation-based optimization of acoustic liner design in a turbofan engine nacelle for noise reduction purposes can dramatically reduce the cost and time needed for experimental designs. Because uncertainties are inevitable in the design process, a stochastic optimization algorithm is posed based on the conditional value-at-risk measure so that an ideal acoustic liner impedance is determined that is robust in the presence of uncertainties. A parallel reduced-order modeling framework is developed that dramatically improves the computational efficiency of the stochastic optimization solver for a realistic nacelle geometry. The reduced stochastic optimization solver takes less than 500 seconds to execute. In addition, well-posedness and finite element error analyses of the state system and optimization problem are provided.

  4. CA1 pyramidal cell diversity enabling parallel information processing in the hippocampus

    PubMed Central

    Soltesz, Ivan; Losonczy, Attila

    2018-01-01

    Hippocampal network operations supporting spatial navigation and declarative memory are traditionally interpreted in a framework where each hippocampal area, such as the dentate gyrus, CA3, and CA1, consists of homogeneous populations of functionally equivalent principal neurons. However, heterogeneity within hippocampal principal cell populations, in particular within pyramidal cells at the main CA1 output node, is increasingly recognized and includes developmental, molecular, anatomical, and functional differences. Here we review recent progress in the delineation of hippocampal principal cell subpopulations by focusing on radially defined subpopulations of CA1 pyramidal cells, and we consider how functional segregation of information streams, in parallel channels with nonuniform properties, could represent a general organizational principle of the hippocampus supporting diverse behaviors. PMID:29593317

  5. Parallel Processing and Learning: Variability and Chaos in Self- Organization of Activity in Groups of Neurons

    DTIC Science & Technology

    1993-03-09

    neurotransmission and neuromodulation (Soinila and Mpitsos, 1992; Soinila ct al., 1992). It is necessary, as these and other publications (e.g., Mpitsos and...neurotransmitters and neuromodulators affect the activity of neural assemblies, and (b) how individual transmitters act within the framework of the many...examined mammalian tissues that may he useful ajs model s~sqerni to examine distributed function in neurotransmission and neuromodulation (Soinila and

  6. Proceedings of the Expert Systems Workshop Held in Pacific Grove, California on 16-18 April 1986

    DTIC Science & Technology

    1986-04-18

    13- NUMBER OF PAGES 197 N IS. SECURITY CLASS, (ol Mm raport) UNCLASSIFIED I5a. DECLASSIFI CATION/DOWNGRADING SCHEDULE 16. DISTRIBUTION...are distributed and parallel. * - Features unimplemented at present; scheduled for phase 2. Table 1-1: Key design characteristics of ABE 2. a...data structuring techniques and a semi- deterministic scheduler . A program for the DF framework consists of a number of independent processing modules

  7. Argonne simulation framework for intelligent transportation systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ewing, T.; Doss, E.; Hanebutte, U.

    1996-04-01

    A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distributed (networked) computer systems; however, a version for a stand alone workstation is also available. The ITS simulator includes an Expert Driver Model (EDM) of instrumented ``smart`` vehicles with in-vehicle navigation units. The EDM is capable of performing optimal route planning and communicating with Traffic Management Centers (TMC). A dynamic road map data base is sued for optimum route planning, where the data is updated periodically tomore » reflect any changes in road or weather conditions. The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide 2-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphical user interfaces that includes human-factors studies to support safety and operational research. Realistic modeling of variations of the posted driving speed are based on human factor studies that take into consideration weather, road conditions, driver`s personality and behavior and vehicle type. The simulator has been developed on a distributed system of networked UNIX computers, but is designed to run on ANL`s IBM SP-X parallel computer system for large scale problems. A novel feature of the developed simulator is that vehicles will be represented by autonomous computer processes, each with a behavior model which performs independent route selection and reacts to external traffic events much like real vehicles. Vehicle processes interact with each other and with ITS components by exchanging messages. With this approach, one will be able to take advantage of emerging massively parallel processor (MPP) systems.« less

  8. Strategies for Large Scale Implementation of a Multiscale, Multiprocess Integrated Hydrologic Model

    NASA Astrophysics Data System (ADS)

    Kumar, M.; Duffy, C.

    2006-05-01

    Distributed models simulate hydrologic state variables in space and time while taking into account the heterogeneities in terrain, surface, subsurface properties and meteorological forcings. Computational cost and complexity associated with these model increases with its tendency to accurately simulate the large number of interacting physical processes at fine spatio-temporal resolution in a large basin. A hydrologic model run on a coarse spatial discretization of the watershed with limited number of physical processes needs lesser computational load. But this negatively affects the accuracy of model results and restricts physical realization of the problem. So it is imperative to have an integrated modeling strategy (a) which can be universally applied at various scales in order to study the tradeoffs between computational complexity (determined by spatio- temporal resolution), accuracy and predictive uncertainty in relation to various approximations of physical processes (b) which can be applied at adaptively different spatial scales in the same domain by taking into account the local heterogeneity of topography and hydrogeologic variables c) which is flexible enough to incorporate different number and approximation of process equations depending on model purpose and computational constraint. An efficient implementation of this strategy becomes all the more important for Great Salt Lake river basin which is relatively large (~89000 sq. km) and complex in terms of hydrologic and geomorphic conditions. Also the types and the time scales of hydrologic processes which are dominant in different parts of basin are different. Part of snow melt runoff generated in the Uinta Mountains infiltrates and contributes as base flow to the Great Salt Lake over a time scale of decades to centuries. The adaptive strategy helps capture the steep topographic and climatic gradient along the Wasatch front. Here we present the aforesaid modeling strategy along with an associated hydrologic modeling framework which facilitates a seamless, computationally efficient and accurate integration of the process model with the data model. The flexibility of this framework leads to implementation of multiscale, multiresolution, adaptive refinement/de-refinement and nested modeling simulations with least computational burden. However, performing these simulations and related calibration of these models over a large basin at higher spatio- temporal resolutions is computationally intensive and requires use of increasing computing power. With the advent of parallel processing architectures, high computing performance can be achieved by parallelization of existing serial integrated-hydrologic-model code. This translates to running the same model simulation on a network of large number of processors thereby reducing the time needed to obtain solution. The paper also discusses the implementation of the integrated model on parallel processors. Also will be discussed the mapping of the problem on multi-processor environment, method to incorporate coupling between hydrologic processes using interprocessor communication models, model data structure and parallel numerical algorithms to obtain high performance.

  9. Processes of recovery through routine or specialist treatment for borderline personality disorder (BPD): a qualitative study.

    PubMed

    Katsakou, Christina; Pistrang, Nancy; Barnicot, Kirsten; White, Hayley; Priebe, Stefan

    2017-07-04

    Recovery processes in borderline personality disorder (BPD) are poorly understood. This study explored how recovery in BPD occurs through routine or specialist treatment, as perceived by service users (SUs) and therapists. SUs were recruited from two specialist BPD services, three community mental health teams, and one psychological therapies service. Semi-structured interviews were conducted with 48 SUs and 15 therapists. The "framework" approach was used to analyse the data. The findings were organized into two domains of themes. The first domain described three parallel processes that constituted SUs' recovery journey: fighting ambivalence and committing to taking action; moving from shame to self-acceptance and compassion; and moving from distrust and defensiveness to opening up to others. The second domain described four therapeutic challenges that needed to be addressed to support this journey: balancing self-exploration and finding solutions; balancing structure and flexibility; confronting interpersonal difficulties and practicing new ways of relating; and balancing support and independence. Therapies facilitating the identified processes may promote recovery. The recovery processes and therapeutic challenges identified in this study could provide a framework to guide future research.

  10. Development of a software framework for data assimilation and its applications for streamflow forecasting in Japan

    NASA Astrophysics Data System (ADS)

    Noh, S. J.; Tachikawa, Y.; Shiiba, M.; Yorozu, K.; Kim, S.

    2012-04-01

    Data assimilation methods have received increased attention to accomplish uncertainty assessment and enhancement of forecasting capability in various areas. Despite of their potentials, applicable software frameworks to probabilistic approaches and data assimilation are still limited because the most of hydrologic modeling software are based on a deterministic approach. In this study, we developed a hydrological modeling framework for sequential data assimilation, so called MPI-OHyMoS. MPI-OHyMoS allows user to develop his/her own element models and to easily build a total simulation system model for hydrological simulations. Unlike process-based modeling framework, this software framework benefits from its object-oriented feature to flexibly represent hydrological processes without any change of the main library. Sequential data assimilation based on the particle filters is available for any hydrologic models based on MPI-OHyMoS considering various sources of uncertainty originated from input forcing, parameters and observations. The particle filters are a Bayesian learning process in which the propagation of all uncertainties is carried out by a suitable selection of randomly generated particles without any assumptions about the nature of the distributions. In MPI-OHyMoS, ensemble simulations are parallelized, which can take advantage of high performance computing (HPC) system. We applied this software framework for short-term streamflow forecasting of several catchments in Japan using a distributed hydrologic model. Uncertainty of model parameters and remotely-sensed rainfall data such as X-band or C-band radar is estimated and mitigated in the sequential data assimilation.

  11. A person-centered integrated care quality framework, based on a qualitative study of patients' evaluation of care in light of chronic care ideals.

    PubMed

    Berntsen, Gro; Høyem, Audhild; Lettrem, Idar; Ruland, Cornelia; Rumpsfeld, Markus; Gammon, Deede

    2018-06-20

    Person-Centered Integrated Care (PC-IC) is believed to improve outcomes and experience for persons with multiple long-term and complex conditions. No broad consensus exists regarding how to capture the patient-experienced quality of PC-IC. Most PC-IC evaluation tools focus on care events or care in general. Building on others' and our previous work, we outlined a 4-stage goal-oriented PC-IC process ideal: 1) Personalized goal setting 2) Care planning aligned with goals 3) Care delivery according to plan, and 4) Evaluation of goal attainment. We aimed to explore, apply, refine and operationalize this quality of care framework. This paper is a qualitative evaluative review of the individual Patient Pathways (iPP) experiences of 19 strategically chosen persons with multimorbidity in light of ideals for chronic care. The iPP includes all care events, addressing the persons collected health issues, organized by time. We constructed iPPs based on the electronic health record (from general practice, nursing services, and hospital) with patient follow-up interviews. The application of the framework and its refinement were parallel processes. Both were based on analysis of salient themes in the empirical material in light of the PC-IC process ideal and progressively more informed applications of themes and questions. The informants consistently reviewed care quality by how care supported/ threatened their long-term goals. Personal goals were either implicit or identified by "What matters to you?" Informants expected care to address their long-term goals and placed responsibility for care quality and delivery at the system level. The PC-IC process framework exposed system failure in identifying long-term goals, provision of shared long-term multimorbidity care plans, monitoring of care delivery and goal evaluation. The PC-IC framework includes descriptions of ideal care, key questions and literature references for each stage of the PC-IC process. This first version of a PC-IC process framework needs further validation in other settings. Gaps in care that are invisible with event-based quality of care frameworks become apparent when evaluated by a long-term goal-driven PC-IC process framework. The framework appears meaningful to persons with multimorbidity.

  12. Parallelization of fine-scale computation in Agile Multiscale Modelling Methodology

    NASA Astrophysics Data System (ADS)

    Macioł, Piotr; Michalik, Kazimierz

    2016-10-01

    Nowadays, multiscale modelling of material behavior is an extensively developed area. An important obstacle against its wide application is high computational demands. Among others, the parallelization of multiscale computations is a promising solution. Heterogeneous multiscale models are good candidates for parallelization, since communication between sub-models is limited. In this paper, the possibility of parallelization of multiscale models based on Agile Multiscale Methodology framework is discussed. A sequential, FEM based macroscopic model has been combined with concurrently computed fine-scale models, employing a MatCalc thermodynamic simulator. The main issues, being investigated in this work are: (i) the speed-up of multiscale models with special focus on fine-scale computations and (ii) on decreasing the quality of computations enforced by parallel execution. Speed-up has been evaluated on the basis of Amdahl's law equations. The problem of `delay error', rising from the parallel execution of fine scale sub-models, controlled by the sequential macroscopic sub-model is discussed. Some technical aspects of combining third-party commercial modelling software with an in-house multiscale framework and a MPI library are also discussed.

  13. Framework for analysis of guaranteed QOS systems

    NASA Astrophysics Data System (ADS)

    Chaudhry, Shailender; Choudhary, Alok

    1997-01-01

    Multimedia data is isochronous in nature and entails managing and delivering high volumes of data. Multiprocessors with their large processing power, vast memory, and fast interconnects, are an ideal candidate for the implementation of multimedia applications. Initially, multiprocessors were designed to execute scientific programs and thus their architecture was optimized to provide low message latency and efficiently support regular communication patterns. Hence, they have a regular network topology and most use wormhole routing. The design offers the benefits of a simple router, small buffer size, and network latency that is almost independent of path length. Among the various multimedia applications, video on demand (VOD) server is well-suited for implementation using parallel multiprocessors. Logical models for VOD servers are presently mapped onto multiprocessors. Our paper provides a framework for calculating bounds on utilization of system resources with which QoS parameters for each isochronous stream can be guaranteed. Effects of the architecture of multiprocessors, and efficiency of various local models and mapping on particular architectures can be investigated within our framework. Our framework is based on rigorous proofs and provides tight bounds. The results obtained may be used as the basis for admission control tests. To illustrate the versatility of our framework, we provide bounds on utilization for various logical models applied to mesh connected architectures for a video on demand server. Our results show that worm hole routing can lead to packets waiting for transmission of other packets that apparently share no common resources. This situation is analogous to head-of-the-line blocking. We find that the provision of multiple VCs per link and multiple flit buffers improves utilization (even under guaranteed QoS parameters). This analogous to parallel iterative matching.

  14. Distributed Processing System for Restoration of Electric Power Distribution Network Using Two-Layered Contract Net Protocol

    NASA Astrophysics Data System (ADS)

    Kodama, Yu; Hamagami, Tomoki

    Distributed processing system for restoration of electric power distribution network using two-layered CNP is proposed. The goal of this study is to develop the restoration system which adjusts to the future power network with distributed generators. The state of the art of this study is that the two-layered CNP is applied for the distributed computing environment in practical use. The two-layered CNP has two classes of agents, named field agent and operating agent in the network. In order to avoid conflicts of tasks, operating agent controls privilege for managers to send the task announcement messages in CNP. This technique realizes the coordination between agents which work asynchronously in parallel with others. Moreover, this study implements the distributed processing system using a de-fact standard multi-agent framework, JADE(Java Agent DEvelopment framework). This study conducts the simulation experiments of power distribution network restoration and compares the proposed system with the previous system. We confirmed the results show effectiveness of the proposed system.

  15. Parallel Index and Query for Large Scale Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chou, Jerry; Wu, Kesheng; Ruebel, Oliver

    2011-07-18

    Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less

  16. Online Meta-data Collection and Monitoring Framework for the STAR Experiment at RHIC

    NASA Astrophysics Data System (ADS)

    Arkhipkin, D.; Lauret, J.; Betts, W.; Van Buren, G.

    2012-12-01

    The STAR Experiment further exploits scalable message-oriented model principles to achieve a high level of control over online data streams. In this paper we present an AMQP-powered Message Interface and Reliable Architecture framework (MIRA), which allows STAR to orchestrate the activities of Meta-data Collection, Monitoring, Online QA and several Run-Time and Data Acquisition system components in a very efficient manner. The very nature of the reliable message bus suggests parallel usage of multiple independent storage mechanisms for our meta-data. We describe our experience with a robust data-taking setup employing MySQL- and HyperTable-based archivers for meta-data processing. In addition, MIRA has an AJAX-enabled web GUI, which allows real-time visualisation of online process flow and detector subsystem states, and doubles as a sophisticated alarm system when combined with complex event processing engines like Esper, Borealis or Cayuga. The performance data and our planned path forward are based on our experience during the 2011-2012 running of STAR.

  17. Craniux: A LabVIEW-Based Modular Software Framework for Brain-Machine Interface Research

    PubMed Central

    Degenhart, Alan D.; Kelly, John W.; Ashmore, Robin C.; Collinger, Jennifer L.; Tyler-Kabara, Elizabeth C.; Weber, Douglas J.; Wang, Wei

    2011-01-01

    This paper presents “Craniux,” an open-access, open-source software framework for brain-machine interface (BMI) research. Developed in LabVIEW, a high-level graphical programming environment, Craniux offers both out-of-the-box functionality and a modular BMI software framework that is easily extendable. Specifically, it allows researchers to take advantage of multiple features inherent to the LabVIEW environment for on-the-fly data visualization, parallel processing, multithreading, and data saving. This paper introduces the basic features and system architecture of Craniux and describes the validation of the system under real-time BMI operation using simulated and real electrocorticographic (ECoG) signals. Our results indicate that Craniux is able to operate consistently in real time, enabling a seamless work flow to achieve brain control of cursor movement. The Craniux software framework is made available to the scientific research community to provide a LabVIEW-based BMI software platform for future BMI research and development. PMID:21687575

  18. Craniux: a LabVIEW-based modular software framework for brain-machine interface research.

    PubMed

    Degenhart, Alan D; Kelly, John W; Ashmore, Robin C; Collinger, Jennifer L; Tyler-Kabara, Elizabeth C; Weber, Douglas J; Wang, Wei

    2011-01-01

    This paper presents "Craniux," an open-access, open-source software framework for brain-machine interface (BMI) research. Developed in LabVIEW, a high-level graphical programming environment, Craniux offers both out-of-the-box functionality and a modular BMI software framework that is easily extendable. Specifically, it allows researchers to take advantage of multiple features inherent to the LabVIEW environment for on-the-fly data visualization, parallel processing, multithreading, and data saving. This paper introduces the basic features and system architecture of Craniux and describes the validation of the system under real-time BMI operation using simulated and real electrocorticographic (ECoG) signals. Our results indicate that Craniux is able to operate consistently in real time, enabling a seamless work flow to achieve brain control of cursor movement. The Craniux software framework is made available to the scientific research community to provide a LabVIEW-based BMI software platform for future BMI research and development.

  19. Reliable, Memory Speed Storage for Cluster Computing Frameworks

    DTIC Science & Technology

    2014-06-16

    specification API that can capture computations in many of today’s popular data -parallel computing models, e.g., MapReduce and SQL. We also ported the Hadoop ...today’s big data workloads: • Immutable data : Data is immutable once written, since dominant underlying storage systems, such as HDFS [3], only support...network transfers, so reads can be data -local. • Program size vs. data size: In big data processing, the same operation is repeatedly applied on massive

  20. Anisotropic three-dimensional inversion of CSEM data using finite-element techniques on unstructured grids

    NASA Astrophysics Data System (ADS)

    Wang, Feiyan; Morten, Jan Petter; Spitzer, Klaus

    2018-05-01

    In this paper, we present a recently developed anisotropic 3-D inversion framework for interpreting controlled-source electromagnetic (CSEM) data in the frequency domain. The framework integrates a high-order finite-element forward operator and a Gauss-Newton inversion algorithm. Conductivity constraints are applied using a parameter transformation. We discretize the continuous forward and inverse problems on unstructured grids for a flexible treatment of arbitrarily complex geometries. Moreover, an unstructured mesh is more desirable in comparison to a single rectilinear mesh for multisource problems because local grid refinement will not significantly influence the mesh density outside the region of interest. The non-uniform spatial discretization facilitates parametrization of the inversion domain at a suitable scale. For a rapid simulation of multisource EM data, we opt to use a parallel direct solver. We further accelerate the inversion process by decomposing the entire data set into subsets with respect to frequencies (and transmitters if memory requirement is affordable). The computational tasks associated with each data subset are distributed to different processes and run in parallel. We validate the scheme using a synthetic marine CSEM model with rough bathymetry, and finally, apply it to an industrial-size 3-D data set from the Troll field oil province in the North Sea acquired in 2008 to examine its robustness and practical applicability.

  1. GPU: the biggest key processor for AI and parallel processing

    NASA Astrophysics Data System (ADS)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  2. Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Xuanhua; Luo, Xuan; Liang, Junling

    GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weightmore » asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and datasets, Frog is able to significantly outperform existing GPU-based graph processing systems except Gunrock and MapGraph. MapGraph gets better performance than Frog when running BFS on RoadNet-CA. The comparison between Gunrock and Frog is inconclusive. Frog can outperform Gunrock more than 1.04X when running PageRank and SSSP, while the advantage of Frog is not obvious when running BFS and CC on some datasets especially for RoadNet-CA.« less

  3. The island dynamics model on parallel quadtree grids

    NASA Astrophysics Data System (ADS)

    Mistani, Pouria; Guittet, Arthur; Bochkov, Daniil; Schneider, Joshua; Margetis, Dionisios; Ratsch, Christian; Gibou, Frederic

    2018-05-01

    We introduce an approach for simulating epitaxial growth by use of an island dynamics model on a forest of quadtree grids, and in a parallel environment. To this end, we use a parallel framework introduced in the context of the level-set method. This framework utilizes: discretizations that achieve a second-order accurate level-set method on non-graded adaptive Cartesian grids for solving the associated free boundary value problem for surface diffusion; and an established library for the partitioning of the grid. We consider the cases with: irreversible aggregation, which amounts to applying Dirichlet boundary conditions at the island boundary; and an asymmetric (Ehrlich-Schwoebel) energy barrier for attachment/detachment of atoms at the island boundary, which entails the use of a Robin boundary condition. We provide the scaling analyses performed on the Stampede supercomputer and numerical examples that illustrate the capability of our methodology to efficiently simulate different aspects of epitaxial growth. The combination of adaptivity and parallelism in our approach enables simulations that are several orders of magnitude faster than those reported in the recent literature and, thus, provides a viable framework for the systematic study of mound formation on crystal surfaces.

  4. Scalable geocomputation: evolving an environmental model building platform from single-core to supercomputers

    NASA Astrophysics Data System (ADS)

    Schmitz, Oliver; de Jong, Kor; Karssenberg, Derek

    2017-04-01

    There is an increasing demand to run environmental models on a big scale: simulations over large areas at high resolution. The heterogeneity of available computing hardware such as multi-core CPUs, GPUs or supercomputer potentially provides significant computing power to fulfil this demand. However, this requires detailed knowledge of the underlying hardware, parallel algorithm design and the implementation thereof in an efficient system programming language. Domain scientists such as hydrologists or ecologists often lack this specific software engineering knowledge, their emphasis is (and should be) on exploratory building and analysis of simulation models. As a result, models constructed by domain specialists mostly do not take full advantage of the available hardware. A promising solution is to separate the model building activity from software engineering by offering domain specialists a model building framework with pre-programmed building blocks that they combine to construct a model. The model building framework, consequently, needs to have built-in capabilities to make full usage of the available hardware. Developing such a framework providing understandable code for domain scientists and being runtime efficient at the same time poses several challenges on developers of such a framework. For example, optimisations can be performed on individual operations or the whole model, or tasks need to be generated for a well-balanced execution without explicitly knowing the complexity of the domain problem provided by the modeller. Ideally, a modelling framework supports the optimal use of available hardware whichsoever combination of model building blocks scientists use. We demonstrate our ongoing work on developing parallel algorithms for spatio-temporal modelling and demonstrate 1) PCRaster, an environmental software framework (http://www.pcraster.eu) providing spatio-temporal model building blocks and 2) parallelisation of about 50 of these building blocks using the new Fern library (https://github.com/geoneric/fern/), an independent generic raster processing library. Fern is a highly generic software library and its algorithms can be configured according to the configuration of a modelling framework. With manageable programming effort (e.g. matching data types between programming and domain language) we created a binding between Fern and PCRaster. The resulting PCRaster Python multicore module can be used to execute existing PCRaster models without having to make any changes to the model code. We show initial results on synthetic and geoscientific models indicating significant runtime improvements provided by parallel local and focal operations. We further outline challenges in improving remaining algorithms such as flow operations over digital elevation maps and further potential improvements like enhancing disk I/O.

  5. Rapid indirect trajectory optimization on highly parallel computing architectures

    NASA Astrophysics Data System (ADS)

    Antony, Thomas

    Trajectory optimization is a field which can benefit greatly from the advantages offered by parallel computing. The current state-of-the-art in trajectory optimization focuses on the use of direct optimization methods, such as the pseudo-spectral method. These methods are favored due to their ease of implementation and large convergence regions while indirect methods have largely been ignored in the literature in the past decade except for specific applications in astrodynamics. It has been shown that the shortcomings conventionally associated with indirect methods can be overcome by the use of a continuation method in which complex trajectory solutions are obtained by solving a sequence of progressively difficult optimization problems. High performance computing hardware is trending towards more parallel architectures as opposed to powerful single-core processors. Graphics Processing Units (GPU), which were originally developed for 3D graphics rendering have gained popularity in the past decade as high-performance, programmable parallel processors. The Compute Unified Device Architecture (CUDA) framework, a parallel computing architecture and programming model developed by NVIDIA, is one of the most widely used platforms in GPU computing. GPUs have been applied to a wide range of fields that require the solution of complex, computationally demanding problems. A GPU-accelerated indirect trajectory optimization methodology which uses the multiple shooting method and continuation is developed using the CUDA platform. The various algorithmic optimizations used to exploit the parallelism inherent in the indirect shooting method are described. The resulting rapid optimal control framework enables the construction of high quality optimal trajectories that satisfy problem-specific constraints and fully satisfy the necessary conditions of optimality. The benefits of the framework are highlighted by construction of maximum terminal velocity trajectories for a hypothetical long range weapon system. The techniques used to construct an initial guess from an analytic near-ballistic trajectory and the methods used to formulate the necessary conditions of optimality in a manner that is transparent to the designer are discussed. Various hypothetical mission scenarios that enforce different combinations of initial, terminal, interior point and path constraints demonstrate the rapid construction of complex trajectories without requiring any a-priori insight into the structure of the solutions. Trajectory problems of this kind were previously considered impractical to solve using indirect methods. The performance of the GPU-accelerated solver is found to be 2x--4x faster than MATLAB's bvp4c, even while running on GPU hardware that is five years behind the state-of-the-art.

  6. An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

    NASA Astrophysics Data System (ADS)

    Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

    2018-02-01

    De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.

  7. A Parallel Stochastic Framework for Reservoir Characterization and History Matching

    DOE PAGES

    Thomas, Sunil G.; Klie, Hector M.; Rodriguez, Adolfo A.; ...

    2011-01-01

    The spatial distribution of parameters that characterize the subsurface is never known to any reasonable level of accuracy required to solve the governing PDEs of multiphase flow or species transport through porous media. This paper presents a numerically cheap, yet efficient, accurate and parallel framework to estimate reservoir parameters, for example, medium permeability, using sensor information from measurements of the solution variables such as phase pressures, phase concentrations, fluxes, and seismic and well log data. Numerical results are presented to demonstrate the method.

  8. Component Framework for Loosely Coupled High Performance Integrated Plasma Simulations

    NASA Astrophysics Data System (ADS)

    Elwasif, W. R.; Bernholdt, D. E.; Shet, A. G.; Batchelor, D. B.; Foley, S.

    2010-11-01

    We present the design and implementation of a component-based simulation framework for the execution of coupled time-dependent plasma modeling codes. The Integrated Plasma Simulator (IPS) provides a flexible lightweight component model that streamlines the integration of stand alone codes into coupled simulations. Standalone codes are adapted to the IPS component interface specification using a thin wrapping layer implemented in the Python programming language. The framework provides services for inter-component method invocation, configuration, task, and data management, asynchronous event management, simulation monitoring, and checkpoint/restart capabilities. Services are invoked, as needed, by the computational components to coordinate the execution of different aspects of coupled simulations on Massive parallel Processing (MPP) machines. A common plasma state layer serves as the foundation for inter-component, file-based data exchange. The IPS design principles, implementation details, and execution model will be presented, along with an overview of several use cases.

  9. A framework for human microbiome research.

    PubMed

    2012-06-13

    A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies.

  10. PIXIE3D: A Parallel, Implicit, eXtended MHD 3D Code.

    NASA Astrophysics Data System (ADS)

    Chacon, L.; Knoll, D. A.

    2004-11-01

    We report on the development of PIXIE3D, a 3D parallel, fully implicit Newton-Krylov extended primitive-variable MHD code in general curvilinear geometry. PIXIE3D employs a second-order, finite-volume-based spatial discretization that satisfies remarkable properties such as being conservative, solenoidal in the magnetic field, non-dissipative, and stable in the absence of physical dissipation.(L. Chacón , phComput. Phys. Comm.) submitted (2004) PIXIE3D employs fully-implicit Newton-Krylov methods for the time advance. Currently, first and second-order implicit schemes are available, although higher-order temporal implicit schemes can be effortlessly implemented within the Newton-Krylov framework. A successful, scalable, MG physics-based preconditioning strategy, similar in concept to previous 2D MHD efforts,(L. Chacón et al., phJ. Comput. Phys). 178 (1), 15- 36 (2002); phJ. Comput. Phys., 188 (2), 573-592 (2003) has been developed. We are currently in the process of parallelizing the code using the PETSc library, and a Newton-Krylov-Schwarz approach for the parallel treatment of the preconditioner. In this poster, we will report on both the serial and parallel performance of PIXIE3D, focusing primarily on scalability and CPU speedup vs. an explicit approach.

  11. PENTACLE: Parallelized particle-particle particle-tree code for planet formation

    NASA Astrophysics Data System (ADS)

    Iwasawa, Masaki; Oshino, Shoichi; Fujii, Michiko S.; Hori, Yasunori

    2017-10-01

    We have newly developed a parallelized particle-particle particle-tree code for planet formation, PENTACLE, which is a parallelized hybrid N-body integrator executed on a CPU-based (super)computer. PENTACLE uses a fourth-order Hermite algorithm to calculate gravitational interactions between particles within a cut-off radius and a Barnes-Hut tree method for gravity from particles beyond. It also implements an open-source library designed for full automatic parallelization of particle simulations, FDPS (Framework for Developing Particle Simulator), to parallelize a Barnes-Hut tree algorithm for a memory-distributed supercomputer. These allow us to handle 1-10 million particles in a high-resolution N-body simulation on CPU clusters for collisional dynamics, including physical collisions in a planetesimal disc. In this paper, we show the performance and the accuracy of PENTACLE in terms of \\tilde{R}_cut and a time-step Δt. It turns out that the accuracy of a hybrid N-body simulation is controlled through Δ t / \\tilde{R}_cut and Δ t / \\tilde{R}_cut ˜ 0.1 is necessary to simulate accurately the accretion process of a planet for ≥106 yr. For all those interested in large-scale particle simulations, PENTACLE, customized for planet formation, will be freely available from https://github.com/PENTACLE-Team/PENTACLE under the MIT licence.

  12. A parallel algorithm for step- and chain-growth polymerization in molecular dynamics.

    PubMed

    de Buyl, Pierre; Nies, Erik

    2015-04-07

    Classical Molecular Dynamics (MD) simulations provide insight into the properties of many soft-matter systems. In some situations, it is interesting to model the creation of chemical bonds, a process that is not part of the MD framework. In this context, we propose a parallel algorithm for step- and chain-growth polymerization that is based on a generic reaction scheme, works at a given intrinsic rate and produces continuous trajectories. We present an implementation in the ESPResSo++ simulation software and compare it with the corresponding feature in LAMMPS. For chain growth, our results are compared to the existing simulation literature. For step growth, a rate equation is proposed for the evolution of the crosslinker population that compares well to the simulations for low crosslinker functionality or for short times.

  13. A parallel algorithm for step- and chain-growth polymerization in molecular dynamics

    NASA Astrophysics Data System (ADS)

    de Buyl, Pierre; Nies, Erik

    2015-04-01

    Classical Molecular Dynamics (MD) simulations provide insight into the properties of many soft-matter systems. In some situations, it is interesting to model the creation of chemical bonds, a process that is not part of the MD framework. In this context, we propose a parallel algorithm for step- and chain-growth polymerization that is based on a generic reaction scheme, works at a given intrinsic rate and produces continuous trajectories. We present an implementation in the ESPResSo++ simulation software and compare it with the corresponding feature in LAMMPS. For chain growth, our results are compared to the existing simulation literature. For step growth, a rate equation is proposed for the evolution of the crosslinker population that compares well to the simulations for low crosslinker functionality or for short times.

  14. COMPUTERIZED TRAINING OF CRYOSURGERY – A SYSTEM APPROACH

    PubMed Central

    Keelan, Robert; Yamakawa, Soji; Shimada, Kenji; Rabin, Yoed

    2014-01-01

    The objective of the current study is to provide the foundation for a computerized training platform for cryosurgery. Consistent with clinical practice, the training process targets the correlation of the frozen region contour with the target region shape, using medical imaging and accepted criteria for clinical success. The current study focuses on system design considerations, including a bioheat transfer model, simulation techniques, optimal cryoprobe layout strategy, and a simulation core framework. Two fundamentally different approaches were considered for the development of a cryosurgery simulator, based on a finite-elements (FE) commercial code (ANSYS) and a proprietary finite-difference (FD) code. Results of this study demonstrate that the FE simulator is superior in terms of geometric modeling, while the FD simulator is superior in terms of runtime. Benchmarking results further indicate that the FD simulator is superior in terms of usage of memory resources, pre-processing, parallel processing, and post-processing. It is envisioned that future integration of a human-interface module and clinical data into the proposed computer framework will make computerized training of cryosurgery a practical reality. PMID:23995400

  15. Effects of Plant Traits on Ecosystem and Regional Processes: a Conceptual Framework for Predicting the Consequences of Global Change

    PubMed Central

    CHAPIN, F. STUART

    2003-01-01

    Human activities are causing widespread changes in the species composition of natural and managed ecosystems, but the consequences of these changes are poorly understood. This paper presents a conceptual framework for predicting the ecosystem and regional consequences of changes in plant species composition. Changes in species composition have greatest ecological effects when they modify the ecological factors that directly control (and respond to) ecosystem processes. These interactive controls include: functional types of organisms present in the ecosystem; soil resources used by organisms to grow and reproduce; modulators such as microclimate that influence the activity of organisms; disturbance regime; and human activities. Plant traits related to size and growth rate are particularly important because they determine the productive capacity of vegetation and the rates of decomposition and nitrogen mineralization. Because the same plant traits affect most key processes in the cycling of carbon and nutrients, changes in plant traits tend to affect most biogeochemical cycling processes in parallel. Plant traits also have landscape and regional effects through their effects on water and energy exchange and disturbance regime. PMID:12588725

  16. Visual Data-Analytics of Large-Scale Parallel Discrete-Event Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ross, Caitlin; Carothers, Christopher D.; Mubarak, Misbah

    Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of highperformance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has tomore » gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a scaling study that compares instrumented ROSS simulations with their noninstrumented counterparts in order to determine the amount of perturbation when running at different simulation scales.« less

  17. Creation of the BMA ensemble for SST using a parallel processing technique

    NASA Astrophysics Data System (ADS)

    Kim, Kwangjin; Lee, Yang Won

    2013-10-01

    Despite the same purpose, each satellite product has different value because of its inescapable uncertainty. Also the satellite products have been calculated for a long time, and the kinds of the products are various and enormous. So the efforts for reducing the uncertainty and dealing with enormous data will be necessary. In this paper, we create an ensemble Sea Surface Temperature (SST) using MODIS Aqua, MODIS Terra and COMS (Communication Ocean and Meteorological Satellite). We used Bayesian Model Averaging (BMA) as ensemble method. The principle of the BMA is synthesizing the conditional probability density function (PDF) using posterior probability as weight. The posterior probability is estimated using EM algorithm. The BMA PDF is obtained by weighted average. As the result, the ensemble SST showed the lowest RMSE and MAE, which proves the applicability of BMA for satellite data ensemble. As future work, parallel processing techniques using Hadoop framework will be adopted for more efficient computation of very big satellite data.

  18. gadfly: A pandas-based Framework for Analyzing GADGET Simulation Data

    NASA Astrophysics Data System (ADS)

    Hummel, Jacob A.

    2016-11-01

    We present the first public release (v0.1) of the open-source gadget Dataframe Library: gadfly. The aim of this package is to leverage the capabilities of the broader python scientific computing ecosystem by providing tools for analyzing simulation data from the astrophysical simulation codes gadget and gizmo using pandas, a thoroughly documented, open-source library providing high-performance, easy-to-use data structures that is quickly becoming the standard for data analysis in python. Gadfly is a framework for analyzing particle-based simulation data stored in the HDF5 format using pandas DataFrames. The package enables efficient memory management, includes utilities for unit handling, coordinate transformations, and parallel batch processing, and provides highly optimized routines for visualizing smoothed-particle hydrodynamics data sets.

  19. GPU-completeness: theory and implications

    NASA Astrophysics Data System (ADS)

    Lin, I.-Jong

    2011-01-01

    This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.

  20. A pluggable framework for parallel pairwise sequence search.

    PubMed

    Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli

    2007-01-01

    The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.

  1. Equalizer: a scalable parallel rendering framework.

    PubMed

    Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato

    2009-01-01

    Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.

  2. Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.

    PubMed

    Meng, Bowen; Pratx, Guillem; Xing, Lei

    2011-12-01

    Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. An ultrafast, reliable and scalable 4D CBCT∕CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment.

  3. Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment

    PubMed Central

    Meng, Bowen; Pratx, Guillem; Xing, Lei

    2011-01-01

    Purpose: Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT/CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. Methods: In this work, we accelerated the Feldcamp–Davis–Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT/CT reconstruction algorithm. Results: Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10−7. Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed successfully with identical results even when half of the nodes were manually terminated in the middle of the process. Conclusions: An ultrafast, reliable and scalable 4D CBCT/CT reconstruction method was developed using the MapReduce framework. Unlike other parallel computing approaches, the parallelization and speedup required little modification of the original reconstruction code. MapReduce provides an efficient and fault tolerant means of solving large-scale computing problems in a cloud computing environment. PMID:22149842

  4. An Evaluation of Different Statistical Targets for Assembling Parallel Forms in Item Response Theory

    PubMed Central

    Ali, Usama S.; van Rijn, Peter W.

    2015-01-01

    Assembly of parallel forms is an important step in the test development process. Therefore, choosing a suitable theoretical framework to generate well-defined test specifications is critical. The performance of different statistical targets of test specifications using the test characteristic curve (TCC) and the test information function (TIF) was investigated. Test length, the number of test forms, and content specifications are considered as well. The TCC target results in forms that are parallel in difficulty, but not necessarily in terms of precision. Vice versa, test forms created using a TIF target are parallel in terms of precision, but not necessarily in terms of difficulty. As sometimes the focus is either on TIF or TCC, differences in either difficulty or precision can arise. Differences in difficulty can be mitigated by equating, but differences in precision cannot. In a series of simulations using a real item bank, the two-parameter logistic model, and mixed integer linear programming for automated test assembly, these differences were found to be quite substantial. When both TIF and TCC are combined into one target with manipulation to relative importance, these differences can be made to disappear.

  5. Argonne Simulation Framework for Intelligent Transportation Systems

    DOT National Transportation Integrated Search

    1996-01-01

    A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distribu...

  6. Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.

    2015-03-01

    We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.

  7. Coupling between a multi-physics workflow engine and an optimization framework

    NASA Astrophysics Data System (ADS)

    Di Gallo, L.; Reux, C.; Imbeaux, F.; Artaud, J.-F.; Owsiak, M.; Saoutic, B.; Aiello, G.; Bernardi, P.; Ciraolo, G.; Bucalossi, J.; Duchateau, J.-L.; Fausser, C.; Galassi, D.; Hertout, P.; Jaboulay, J.-C.; Li-Puma, A.; Zani, L.

    2016-03-01

    A generic coupling method between a multi-physics workflow engine and an optimization framework is presented in this paper. The coupling architecture has been developed in order to preserve the integrity of the two frameworks. The objective is to provide the possibility to replace a framework, a workflow or an optimizer by another one without changing the whole coupling procedure or modifying the main content in each framework. The coupling is achieved by using a socket-based communication library for exchanging data between the two frameworks. Among a number of algorithms provided by optimization frameworks, Genetic Algorithms (GAs) have demonstrated their efficiency on single and multiple criteria optimization. Additionally to their robustness, GAs can handle non-valid data which may appear during the optimization. Consequently GAs work on most general cases. A parallelized framework has been developed to reduce the time spent for optimizations and evaluation of large samples. A test has shown a good scaling efficiency of this parallelized framework. This coupling method has been applied to the case of SYCOMORE (SYstem COde for MOdeling tokamak REactor) which is a system code developed in form of a modular workflow for designing magnetic fusion reactors. The coupling of SYCOMORE with the optimization platform URANIE enables design optimization along various figures of merit and constraints.

  8. Carpet: Adaptive Mesh Refinement for the Cactus Framework

    NASA Astrophysics Data System (ADS)

    Schnetter, Erik; Hawley, Scott; Hawke, Ian

    2016-11-01

    Carpet is an adaptive mesh refinement and multi-patch driver for the Cactus Framework (ascl:1102.013). Cactus is a software framework for solving time-dependent partial differential equations on block-structured grids, and Carpet acts as driver layer providing adaptive mesh refinement, multi-patch capability, as well as parallelization and efficient I/O.

  9. Does the Intel Xeon Phi processor fit HEP workloads?

    NASA Astrophysics Data System (ADS)

    Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

    2014-06-01

    This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.

  10. Shared direct memory access on the Explorer 2-LX

    NASA Technical Reports Server (NTRS)

    Musgrave, Jeffrey L.

    1990-01-01

    Advances in Expert System technology and Artificial Intelligence have provided a framework for applying automated Intelligence to the solution of problems which were generally perceived as intractable using more classical approaches. As a result, hybrid architectures and parallel processing capability have become more common in computing environments. The Texas Instruments Explorer II-LX is an example of a machine which combines a symbolic processing environment, and a computationally oriented environment in a single chassis for integrated problem solutions. This user's manual is an attempt to make these capabilities more accessible to a wider range of engineers and programmers with problems well suited to solution in such an environment.

  11. CAVIAR: a 45k neuron, 5M synapse, 12G connects/s AER hardware sensory-processing- learning-actuating system for high-speed visual object recognition and tracking.

    PubMed

    Serrano-Gotarredona, Rafael; Oster, Matthias; Lichtsteiner, Patrick; Linares-Barranco, Alejandro; Paz-Vicente, Rafael; Gomez-Rodriguez, Francisco; Camunas-Mesa, Luis; Berner, Raphael; Rivas-Perez, Manuel; Delbruck, Tobi; Liu, Shih-Chii; Douglas, Rodney; Hafliger, Philipp; Jimenez-Moreno, Gabriel; Civit Ballcels, Anton; Serrano-Gotarredona, Teresa; Acosta-Jimenez, Antonio J; Linares-Barranco, Bernabé

    2009-09-01

    This paper describes CAVIAR, a massively parallel hardware implementation of a spike-based sensing-processing-learning-actuating system inspired by the physiology of the nervous system. CAVIAR uses the asychronous address-event representation (AER) communication framework and was developed in the context of a European Union funded project. It has four custom mixed-signal AER chips, five custom digital AER interface components, 45k neurons (spiking cells), up to 5M synapses, performs 12G synaptic operations per second, and achieves millisecond object recognition and tracking latencies.

  12. The Climate Data Analytic Services (CDAS) Framework.

    NASA Astrophysics Data System (ADS)

    Maxwell, T. P.; Duffy, D.

    2016-12-01

    Faced with unprecedented growth in climate data volume and demand, NASA has developed the Climate Data Analytic Services (CDAS) framework. This framework enables scientists to execute data processing workflows combining common analysis operations in a high performance environment close to the massive data stores at NASA. The data is accessed in standard (NetCDF, HDF, etc.) formats in a POSIX file system and processed using vetted climate data analysis tools (ESMF, CDAT, NCO, etc.). A dynamic caching architecture enables interactive response times. CDAS utilizes Apache Spark for parallelization and a custom array framework for processing huge datasets within limited memory spaces. CDAS services are accessed via a WPS API being developed in collaboration with the ESGF Compute Working Team to support server-side analytics for ESGF. The API can be accessed using either direct web service calls, a python script, a unix-like shell client, or a javascript-based web application. Client packages in python, scala, or javascript contain everything needed to make CDAS requests. The CDAS architecture brings together the tools, data storage, and high-performance computing required for timely analysis of large-scale data sets, where the data resides, to ultimately produce societal benefits. It is is currently deployed at NASA in support of the Collaborative REAnalysis Technical Environment (CREATE) project, which centralizes numerous global reanalysis datasets onto a single advanced data analytics platform. This service permits decision makers to investigate climate changes around the globe, inspect model trends and variability, and compare multiple reanalysis datasets.

  13. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less

  14. Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

    NASA Astrophysics Data System (ADS)

    Sandalski, Stou

    Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.

  15. Surrogates for numerical simulations; optimization of eddy-promoter heat exchangers

    NASA Technical Reports Server (NTRS)

    Patera, Anthony T.; Patera, Anthony

    1993-01-01

    Although the advent of fast and inexpensive parallel computers has rendered numerous previously intractable calculations feasible, many numerical simulations remain too resource-intensive to be directly inserted in engineering optimization efforts. An attractive alternative to direct insertion considers models for computational systems: the expensive simulation is evoked only to construct and validate a simplified, input-output model; this simplified input-output model then serves as a simulation surrogate in subsequent engineering optimization studies. A simple 'Bayesian-validated' statistical framework for the construction, validation, and purposive application of static computer simulation surrogates is presented. As an example, dissipation-transport optimization of laminar-flow eddy-promoter heat exchangers are considered: parallel spectral element Navier-Stokes calculations serve to construct and validate surrogates for the flowrate and Nusselt number; these surrogates then represent the originating Navier-Stokes equations in the ensuing design process.

  16. Accelerating electron tomography reconstruction algorithm ICON with GPU.

    PubMed

    Chen, Yu; Wang, Zihao; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Sun, Fei; Zhang, Fa

    2017-01-01

    Electron tomography (ET) plays an important role in studying in situ cell ultrastructure in three-dimensional space. Due to limited tilt angles, ET reconstruction always suffers from the "missing wedge" problem. With a validation procedure, iterative compressed-sensing optimized NUFFT reconstruction (ICON) demonstrates its power in the restoration of validated missing information for low SNR biological ET dataset. However, the huge computational demand has become a major problem for the application of ICON. In this work, we analyzed the framework of ICON and classified the operations of major steps of ICON reconstruction into three types. Accordingly, we designed parallel strategies and implemented them on graphics processing units (GPU) to generate a parallel program ICON-GPU. With high accuracy, ICON-GPU has a great acceleration compared to its CPU version, up to 83.7×, greatly relieving ICON's dependence on computing resource.

  17. A hybrid framework of first principles molecular orbital calculations and a three-dimensional integral equation theory for molecular liquids: Multi-center molecular Ornstein-Zernike self-consistent field approach

    NASA Astrophysics Data System (ADS)

    Kido, Kentaro; Kasahara, Kento; Yokogawa, Daisuke; Sato, Hirofumi

    2015-07-01

    In this study, we reported the development of a new quantum mechanics/molecular mechanics (QM/MM)-type framework to describe chemical processes in solution by combining standard molecular-orbital calculations with a three-dimensional formalism of integral equation theory for molecular liquids (multi-center molecular Ornstein-Zernike (MC-MOZ) method). The theoretical procedure is very similar to the 3D-reference interaction site model self-consistent field (RISM-SCF) approach. Since the MC-MOZ method is highly parallelized for computation, the present approach has the potential to be one of the most efficient procedures to treat chemical processes in solution. Benchmark tests to check the validity of this approach were performed for two solute (solute water and formaldehyde) systems and a simple SN2 reaction (Cl- + CH3Cl → ClCH3 + Cl-) in aqueous solution. The results for solute molecular properties and solvation structures obtained by the present approach were in reasonable agreement with those obtained by other hybrid frameworks and experiments. In particular, the results of the proposed approach are in excellent agreements with those of 3D-RISM-SCF.

  18. A hybrid framework of first principles molecular orbital calculations and a three-dimensional integral equation theory for molecular liquids: multi-center molecular Ornstein-Zernike self-consistent field approach.

    PubMed

    Kido, Kentaro; Kasahara, Kento; Yokogawa, Daisuke; Sato, Hirofumi

    2015-07-07

    In this study, we reported the development of a new quantum mechanics/molecular mechanics (QM/MM)-type framework to describe chemical processes in solution by combining standard molecular-orbital calculations with a three-dimensional formalism of integral equation theory for molecular liquids (multi-center molecular Ornstein-Zernike (MC-MOZ) method). The theoretical procedure is very similar to the 3D-reference interaction site model self-consistent field (RISM-SCF) approach. Since the MC-MOZ method is highly parallelized for computation, the present approach has the potential to be one of the most efficient procedures to treat chemical processes in solution. Benchmark tests to check the validity of this approach were performed for two solute (solute water and formaldehyde) systems and a simple SN2 reaction (Cl(-) + CH3Cl → ClCH3 + Cl(-)) in aqueous solution. The results for solute molecular properties and solvation structures obtained by the present approach were in reasonable agreement with those obtained by other hybrid frameworks and experiments. In particular, the results of the proposed approach are in excellent agreements with those of 3D-RISM-SCF.

  19. GELATIO: a general framework for modular digital analysis of high-purity Ge detector signals

    NASA Astrophysics Data System (ADS)

    Agostini, M.; Pandola, L.; Zavarise, P.; Volynets, O.

    2011-08-01

    GELATIO is a new software framework for advanced data analysis and digital signal processing developed for the GERDA neutrinoless double beta decay experiment. The framework is tailored to handle the full analysis flow of signals recorded by high purity Ge detectors and photo-multipliers from the veto counters. It is designed to support a multi-channel modular and flexible analysis, widely customizable by the user either via human-readable initialization files or via a graphical interface. The framework organizes the data into a multi-level structure, from the raw data up to the condensed analysis parameters, and includes tools and utilities to handle the data stream between the different levels. GELATIO is implemented in C++. It relies upon ROOT and its extension TAM, which provides compatibility with PROOF, enabling the software to run in parallel on clusters of computers or many-core machines. It was tested on different platforms and benchmarked in several GERDA-related applications. A stable version is presently available for the GERDA Collaboration and it is used to provide the reference analysis of the experiment data.

  20. MRXCAT: Realistic numerical phantoms for cardiovascular magnetic resonance

    PubMed Central

    2014-01-01

    Background Computer simulations are important for validating novel image acquisition and reconstruction strategies. In cardiovascular magnetic resonance (CMR), numerical simulations need to combine anatomical information and the effects of cardiac and/or respiratory motion. To this end, a framework for realistic CMR simulations is proposed and its use for image reconstruction from undersampled data is demonstrated. Methods The extended Cardiac-Torso (XCAT) anatomical phantom framework with various motion options was used as a basis for the numerical phantoms. Different tissue, dynamic contrast and signal models, multiple receiver coils and noise are simulated. Arbitrary trajectories and undersampled acquisition can be selected. The utility of the framework is demonstrated for accelerated cine and first-pass myocardial perfusion imaging using k-t PCA and k-t SPARSE. Results MRXCAT phantoms allow for realistic simulation of CMR including optional cardiac and respiratory motion. Example reconstructions from simulated undersampled k-t parallel imaging demonstrate the feasibility of simulated acquisition and reconstruction using the presented framework. Myocardial blood flow assessment from simulated myocardial perfusion images highlights the suitability of MRXCAT for quantitative post-processing simulation. Conclusion The proposed MRXCAT phantom framework enables versatile and realistic simulations of CMR including breathhold and free-breathing acquisitions. PMID:25204441

  1. Scaling Up Coordinate Descent Algorithms for Large ℓ1 Regularization Problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Scherrer, Chad; Halappanavar, Mahantesh; Tewari, Ambuj

    2012-07-03

    We present a generic framework for parallel coordinate descent (CD) algorithms that has as special cases the original sequential algorithms of Cyclic CD and Stochastic CD, as well as the recent parallel Shotgun algorithm of Bradley et al. We introduce two novel parallel algorithms that are also special cases---Thread-Greedy CD and Coloring-Based CD---and give performance measurements for an OpenMP implementation of these.

  2. YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste

    Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less

  3. The path toward HEP High Performance Computing

    NASA Astrophysics Data System (ADS)

    Apostolakis, John; Brun, René; Carminati, Federico; Gheata, Andrei; Wenzel, Sandro

    2014-06-01

    High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a "High Performance" implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a highperformance prototype for particle transport. Achieving a good concurrency level on the emerging parallel architectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. The talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit best from the recent technology evolution in computing.

  4. The Ophidia framework: toward cloud-based data analytics for climate change

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; D'Anca, Alessandro; Elia, Donatello; Mancini, Marco; Mariello, Andrea; Mirto, Maria; Palazzo, Cosimo; Aloisio, Giovanni

    2015-04-01

    The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in the climate change domain. It provides parallel (server-side) data analysis, an internal storage model and a hierarchical data organization to manage large amount of multidimensional scientific data. The Ophidia analytics platform provides several MPI-based parallel operators to manipulate large datasets (data cubes) and array-based primitives to perform data analysis on large arrays of scientific data. The most relevant data analytics use cases implemented in national and international projects target fire danger prevention (OFIDIA), interactions between climate change and biodiversity (EUBrazilCC), climate indicators and remote data analysis (CLIP-C), sea situational awareness (TESSA), large scale data analytics on CMIP5 data in NetCDF format, Climate and Forecast (CF) convention compliant (ExArch). Two use cases regarding the EU FP7 EUBrazil Cloud Connect and the INTERREG OFIDIA projects will be presented during the talk. In the former case (EUBrazilCC) the Ophidia framework is being extended to integrate scalable VM-based solutions for the management of large volumes of scientific data (both climate and satellite data) in a cloud-based environment to study how climate change affects biodiversity. In the latter one (OFIDIA) the data analytics framework is being exploited to provide operational support regarding processing chains devoted to fire danger prevention. To tackle the project challenges, data analytics workflows consisting of about 130 operators perform, among the others, parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, import/export of datasets in NetCDF format. Finally, the entire Ophidia software stack has been deployed at CMCC on 24-nodes (16-cores/node) of the Athena HPC cluster. Moreover, a cloud-based release tested with OpenNebula is also available and running in the private cloud infrastructure of the CMCC Supercomputing Centre.

  5. Biologically driven neural platform invoking parallel electrophoretic separation and urinary metabolite screening.

    PubMed

    Page, Tessa; Nguyen, Huong Thi Huynh; Hilts, Lindsey; Ramos, Lorena; Hanrahan, Grady

    2012-06-01

    This work reveals a computational framework for parallel electrophoretic separation of complex biological macromolecules and model urinary metabolites. More specifically, the implementation of a particle swarm optimization (PSO) algorithm on a neural network platform for multiparameter optimization of multiplexed 24-capillary electrophoresis technology with UV detection is highlighted. Two experimental systems were examined: (1) separation of purified rabbit metallothioneins and (2) separation of model toluene urinary metabolites and selected organic acids. Results proved superior to the use of neural networks employing standard back propagation when examining training error, fitting response, and predictive abilities. Simulation runs were obtained as a result of metaheuristic examination of the global search space with experimental responses in good agreement with predicted values. Full separation of selected analytes was realized after employing optimal model conditions. This framework provides guidance for the application of metaheuristic computational tools to aid in future studies involving parallel chemical separation and screening. Adaptable pseudo-code is provided to enable users of varied software packages and modeling framework to implement the PSO algorithm for their desired use.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wylie, Brian Neil; Moreland, Kenneth D.

    Graphs are a vital way of organizing data with complex correlations. A good visualization of a graph can fundamentally change human understanding of the data. Consequently, there is a rich body of work on graph visualization. Although there are many techniques that are effective on small to medium sized graphs (tens of thousands of nodes), there is a void in the research for visualizing massive graphs containing millions of nodes. Sandia is one of the few entities in the world that has the means and motivation to handle data on such a massive scale. For example, homeland security generates graphsmore » from prolific media sources such as television, telephone, and the Internet. The purpose of this project is to provide the groundwork for visualizing such massive graphs. The research provides for two major feature gaps: a parallel, interactive visualization framework and scalable algorithms to make the framework usable to a practical application. Both the frameworks and algorithms are designed to run on distributed parallel computers, which are already available at Sandia. Some features are integrated into the ThreatView{trademark} application and future work will integrate further parallel algorithms.« less

  7. Parallels in Computer-Aided Design Framework and Software Development Environment Efforts.

    DTIC Science & Technology

    1992-05-01

    de - sign kits, and tool and design management frameworks. Also, books about software engineer- ing environments [Long 91] and electronic design...tool integration [Zarrella 90], and agreement upon a universal de - sign automation framework, such as the CAD Framework Initiative (CFI) [Malasky 91...ments: identification, control, status accounting, and audit and review. The paper by Dart ex- tracts 15 CM concepts from existing SDEs and tools

  8. Information processing capacity in psychopathy: Effects of anomalous attention.

    PubMed

    Hamilton, Rachel K B; Newman, Joseph P

    2018-03-01

    Hamilton and colleagues (2015) recently proposed that an integrative deficit in psychopathy restricts simultaneous processing, thereby leaving fewer resources available for information encoding, narrowing the scope of attention, and undermining associative processing. The current study evaluated this parallel processing deficit proposal using the Simultaneous-Sequential paradigm. This investigation marks the first a priori test of the Hamilton et al.'s theoretical framework. We predicted that psychopathy would be associated with inferior performance (as indexed by lower accuracy and longer response time) on trials requiring simultaneous processing of visual information relative to trials necessitating sequential processing. Results were consistent with these predictions, supporting the proposal that psychopathy is characterized by a reduced capacity to process multicomponent perceptual information concurrently. We discuss the potential implications of impaired simultaneous processing for the conceptualization of the psychopathic deficit. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  9. Mathematical Frameworks for Diagnostics, Prognostics and Condition Based Maintenance Problems

    DTIC Science & Technology

    2008-08-15

    REPORT Mathematical Frameworks for Diagnostics, Prognostics and Condition Based Maintenance Problems (W911NF-05-1-0426) 14. ABSTRACT 16. SECURITY ...other documentation. 12. DISTRIBUTION AVAILIBILITY STATEMENT Approved for Public Release; Distribution Unlimited 9. SPONSORING/MONITORING AGENCY NAME...parallel and distributed computing environment were researched. In support of the Condition Based Maintenance (CBM) philosophy, a theoretical framework

  10. Advances in the spatially distributed ages-w model: parallel computation, java connection framework (JCF) integration, and streamflow/nitrogen dynamics assessment

    USDA-ARS?s Scientific Manuscript database

    AgroEcoSystem-Watershed (AgES-W) is a modular, Java-based spatially distributed model which implements hydrologic and water quality (H/WQ) simulation components under the Java Connection Framework (JCF) and the Object Modeling System (OMS) environmental modeling framework. AgES-W is implicitly scala...

  11. Semantic biomedical resource discovery: a Natural Language Processing framework.

    PubMed

    Sfakianaki, Pepi; Koumakis, Lefteris; Sfakianakis, Stelios; Iatraki, Galatia; Zacharioudakis, Giorgos; Graf, Norbert; Marias, Kostas; Tsiknakis, Manolis

    2015-09-30

    A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language. A Natural Language Processing engine which can "translate" free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at ( http://calchas.ics.forth.gr/ ). For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant. There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.

  12. Semantics-based distributed I/O with the ParaMEDIC framework.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Balaji, P.; Feng, W.; Lin, H.

    2008-01-01

    Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site oftentimes has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. Thus, we present a framework called 'ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing' which uses application-specific semantic information to convert the generatedmore » data to orders-of-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and re-process the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.« less

  13. High-Throughput Characterization of Porous Materials Using Graphics Processing Units

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Jihan; Martin, Richard L.; Rübel, Oliver

    We have developed a high-throughput graphics processing units (GPU) code that can characterize a large database of crystalline porous materials. In our algorithm, the GPU is utilized to accelerate energy grid calculations where the grid values represent interactions (i.e., Lennard-Jones + Coulomb potentials) between gas molecules (i.e., CHmore » $$_{4}$$ and CO$$_{2}$$) and material's framework atoms. Using a parallel flood fill CPU algorithm, inaccessible regions inside the framework structures are identified and blocked based on their energy profiles. Finally, we compute the Henry coefficients and heats of adsorption through statistical Widom insertion Monte Carlo moves in the domain restricted to the accessible space. The code offers significant speedup over a single core CPU code and allows us to characterize a set of porous materials at least an order of magnitude larger than ones considered in earlier studies. For structures selected from such a prescreening algorithm, full adsorption isotherms can be calculated by conducting multiple grand canonical Monte Carlo simulations concurrently within the GPU.« less

  14. Accumulating pyramid spatial-spectral collaborative coding divergence for hyperspectral anomaly detection

    NASA Astrophysics Data System (ADS)

    Sun, Hao; Zou, Huanxin; Zhou, Shilin

    2016-03-01

    Detection of anomalous targets of various sizes in hyperspectral data has received a lot of attention in reconnaissance and surveillance applications. Many anomaly detectors have been proposed in literature. However, current methods are susceptible to anomalies in the processing window range and often make critical assumptions about the distribution of the background data. Motivated by the fact that anomaly pixels are often distinctive from their local background, in this letter, we proposed a novel hyperspectral anomaly detection framework for real-time remote sensing applications. The proposed framework consists of four major components, sparse feature learning, pyramid grid window selection, joint spatial-spectral collaborative coding and multi-level divergence fusion. It exploits the collaborative representation difference in the feature space to locate potential anomalies and is totally unsupervised without any prior assumptions. Experimental results on airborne recorded hyperspectral data demonstrate that the proposed methods adaptive to anomalies in a large range of sizes and is well suited for parallel processing.

  15. Integrated geophysical and geological study of the tectonic framework of the 38th Parallel Lineament in the vicinity of its intersection with the extension of the New Madrid Fault Zone

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Braile, L.W.; Hinze, W.J.; Keller, G.R.

    1978-06-01

    Extensive gravity and aeromagnetic surveys have been conducted in critical areas of Kentucky, Illinois, and Indiana centering around the intersection of the 38th Parallel Lineament and the extension of the New Madrid Fault Zone. Available aeromagnetic maps have been digitized and these data have been processed by a suite of computer programs developed for this purpose. Seismic equipment has been prepared for crustal seismic studies and a 150 km long seismic refraction line has been observed along the Wabash River Valley Fault System. Preliminary basement rock and configuration maps have been prepared based on studies of the samples derived frommore » basement drill holes. Interpretation of these data are at a preliminary stage, but studies to this date indicate that the 38th Parallel Lineament features extend as far north as 39/sup 0/N and a subtle northeasterly striking magnetic and gravity anomaly cuts across Indiana from the southwest corner of the state, roughly on strike with the New Madrid Seismic Zone.« less

  16. A Programming Framework for Scientific Applications on CPU-GPU Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Owens, John

    2013-03-24

    At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiplemore » parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.« less

  17. PARAMO: A Parallel Predictive Modeling Platform for Healthcare Analytic Research using Electronic Health Records

    PubMed Central

    Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R.; Stewart, Walter F.; Malin, Bradley; Sun, Jimeng

    2014-01-01

    Objective Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: 1) cohort construction, 2) feature construction, 3) cross-validation, 4) feature selection, and 5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. Methods To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which 1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, 2) schedules the tasks in a topological ordering of the graph, and 3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. Results We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3 hours in parallel compared to 9 days if running sequentially. Conclusion This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. PMID:24370496

  18. PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records.

    PubMed

    Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R; Stewart, Walter F; Malin, Bradley; Sun, Jimeng

    2014-04-01

    Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3h in parallel compared to 9days if running sequentially. This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, D. H.; Barszcz, E.; Barton, J. T.; Carter, R. L.; Lasinski, T. A.; Browning, D. S.; Dagum, L.; Fatoohi, R. A.; Frederickson, P. O.; Schreiber, R. S.

    1991-01-01

    A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers in the framework of the NASA Ames Numerical Aerodynamic Simulation (NAS) Program. These consist of five 'parallel kernel' benchmarks and three 'simulated application' benchmarks. Together they mimic the computation and data movement characteristics of large-scale computational fluid dynamics applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification-all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  20. Streaming data analytics via message passing with application to graph algorithms

    DOE PAGES

    Plimpton, Steven J.; Shead, Tim

    2014-05-06

    The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less

  1. Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud.

    PubMed

    Yang, Andrian; Troup, Michael; Lin, Peijie; Ho, Joshua W K

    2017-03-01

    Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq datasets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6-145.4 times faster using Falco than running on a highly optimized standalone computer. Falco also allows users to utilize low-cost spot instances of Amazon Web Services, providing a ∼65% reduction in cost of analysis. Falco is available via a GNU General Public License at https://github.com/VCCRI/Falco/. j.ho@victorchang.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  2. The Earth Data Analytic Services (EDAS) Framework

    NASA Astrophysics Data System (ADS)

    Maxwell, T. P.; Duffy, D.

    2017-12-01

    Faced with unprecedented growth in earth data volume and demand, NASA has developed the Earth Data Analytic Services (EDAS) framework, a high performance big data analytics framework built on Apache Spark. This framework enables scientists to execute data processing workflows combining common analysis operations close to the massive data stores at NASA. The data is accessed in standard (NetCDF, HDF, etc.) formats in a POSIX file system and processed using vetted earth data analysis tools (ESMF, CDAT, NCO, etc.). EDAS utilizes a dynamic caching architecture, a custom distributed array framework, and a streaming parallel in-memory workflow for efficiently processing huge datasets within limited memory spaces with interactive response times. EDAS services are accessed via a WPS API being developed in collaboration with the ESGF Compute Working Team to support server-side analytics for ESGF. The API can be accessed using direct web service calls, a Python script, a Unix-like shell client, or a JavaScript-based web application. New analytic operations can be developed in Python, Java, or Scala (with support for other languages planned). Client packages in Python, Java/Scala, or JavaScript contain everything needed to build and submit EDAS requests. The EDAS architecture brings together the tools, data storage, and high-performance computing required for timely analysis of large-scale data sets, where the data resides, to ultimately produce societal benefits. It is is currently deployed at NASA in support of the Collaborative REAnalysis Technical Environment (CREATE) project, which centralizes numerous global reanalysis datasets onto a single advanced data analytics platform. This service enables decision makers to compare multiple reanalysis datasets and investigate trends, variability, and anomalies in earth system dynamics around the globe.

  3. Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

    NASA Astrophysics Data System (ADS)

    Galiatsatos, P. G.; Tennyson, J.

    2012-11-01

    The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.

  4. Design and development of a medical big data processing system based on Hadoop.

    PubMed

    Yao, Qin; Tian, Yu; Li, Peng-Fei; Tian, Li-Li; Qian, Yang-Ming; Li, Jing-Song

    2015-03-01

    Secondary use of medical big data is increasingly popular in healthcare services and clinical research. Understanding the logic behind medical big data demonstrates tendencies in hospital information technology and shows great significance for hospital information systems that are designing and expanding services. Big data has four characteristics--Volume, Variety, Velocity and Value (the 4 Vs)--that make traditional systems incapable of processing these data using standalones. Apache Hadoop MapReduce is a promising software framework for developing applications that process vast amounts of data in parallel with large clusters of commodity hardware in a reliable, fault-tolerant manner. With the Hadoop framework and MapReduce application program interface (API), we can more easily develop our own MapReduce applications to run on a Hadoop framework that can scale up from a single node to thousands of machines. This paper investigates a practical case of a Hadoop-based medical big data processing system. We developed this system to intelligently process medical big data and uncover some features of hospital information system user behaviors. This paper studies user behaviors regarding various data produced by different hospital information systems for daily work. In this paper, we also built a five-node Hadoop cluster to execute distributed MapReduce algorithms. Our distributed algorithms show promise in facilitating efficient data processing with medical big data in healthcare services and clinical research compared with single nodes. Additionally, with medical big data analytics, we can design our hospital information systems to be much more intelligent and easier to use by making personalized recommendations.

  5. Intersection of migration and turnover theories-What can we learn?

    PubMed

    Brewer, Carol S; Kovner, Christine T

    2014-01-01

    The international migration of nurses has become a major issue in the international health and workforce policy circles, but analyses are not based on a comprehensive theory. The purpose of this article was to compare the concepts of an integrated nursing turnover theory with the concepts of one international migration framework. An integrated turnover theory is compared with a frequently used migration framework using examples of each. Migration concepts relate well to turnover concepts, but the relative importance and strength of various concepts may differ. For example, identification, development, and measurement of the concept of national commitment, if it exists, is parallel to organizational commitment and may be fruitful in understanding the processes that lead to nurse migration. The turnover theory provides a framework for examining migration concepts and considering how these concepts could relate to each other in a future theory of migration. Ultimately, a better understanding of the relationships and strengths of these concepts could lead to more effective policy. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes

    NASA Technical Reports Server (NTRS)

    Heber, Gerd; Biswas, Rupak; Gao, Guang R.

    1999-01-01

    Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.

  7. Engineering Digestion: Multiscale Processes of Food Digestion.

    PubMed

    Bornhorst, Gail M; Gouseti, Ourania; Wickham, Martin S J; Bakalis, Serafim

    2016-03-01

    Food digestion is a complex, multiscale process that has recently become of interest to the food industry due to the developing links between food and health or disease. Food digestion can be studied by using either in vitro or in vivo models, each having certain advantages or disadvantages. The recent interest in food digestion has resulted in a large number of studies in this area, yet few have provided an in-depth, quantitative description of digestion processes. To provide a framework to develop these quantitative comparisons, a summary is given here between digestion processes and parallel unit operations in the food and chemical industry. Characterization parameters and phenomena are suggested for each step of digestion. In addition to the quantitative characterization of digestion processes, the multiscale aspect of digestion must also be considered. In both food systems and the gastrointestinal tract, multiple length scales are involved in food breakdown, mixing, absorption. These different length scales influence digestion processes independently as well as through interrelated mechanisms. To facilitate optimized development of functional food products, a multiscale, engineering approach may be taken to describe food digestion processes. A framework for this approach is described in this review, as well as examples that demonstrate the importance of process characterization as well as the multiple, interrelated length scales in the digestion process. © 2016 Institute of Food Technologists®

  8. Relative entropy and optimization-driven coarse-graining methods in VOTCA

    DOE PAGES

    Mashayak, S. Y.; Jochum, Mara N.; Koschke, Konstantin; ...

    2015-07-20

    We discuss recent advances of the VOTCA package for systematic coarse-graining. Two methods have been implemented, namely the downhill simplex optimization and the relative entropy minimization. We illustrate the new methods by coarse-graining SPC/E bulk water and more complex water-methanol mixture systems. The CG potentials obtained from both methods are then evaluated by comparing the pair distributions from the coarse-grained to the reference atomistic simulations.We have also added a parallel analysis framework to improve the computational efficiency of the coarse-graining process.

  9. Formation Flying With Decentralized Control in Libration Point Orbits

    NASA Technical Reports Server (NTRS)

    Folta, David; Carpenter, J. Russell; Wagner, Christoph

    2000-01-01

    A decentralized control framework is investigated for applicability of formation flying control in libration orbits. The decentralized approach, being non-hierarchical, processes only direct measurement data, in parallel with the other spacecraft. Control is accomplished via linearization about a reference libration orbit with standard control using a Linear Quadratic Regulator (LQR) or the GSFC control algorithm. Both are linearized about the current state estimate as with the extended Kalman filter. Based on this preliminary work, the decentralized approach appears to be feasible for upcoming libration missions using distributed spacecraft.

  10. Use Hierarchical Storage and Analysis to Exploit Intrinsic Parallelism

    NASA Astrophysics Data System (ADS)

    Zender, C. S.; Wang, W.; Vicente, P.

    2013-12-01

    Big Data is an ugly name for the scientific opportunities and challenges created by the growing wealth of geoscience data. How to weave large, disparate datasets together to best reveal their underlying properties, to exploit their strengths and minimize their weaknesses, to continually aggregate more information than the world knew yesterday and less than we will learn tomorrow? Data analytics techniques (statistics, data mining, machine learning, etc.) can accelerate pattern recognition and discovery. However, often researchers must, prior to analysis, organize multiple related datasets into a coherent framework. Hierarchical organization permits entire dataset to be stored in nested groups that reflect their intrinsic relationships and similarities. Hierarchical data can be simpler and faster to analyze by coding operators to automatically parallelize processes over isomorphic storage units, i.e., groups. The newest generation of netCDF Operators (NCO) embody this hierarchical approach, while still supporting traditional analysis approaches. We will use NCO to demonstrate the trade-offs involved in processing a prototypical Big Data application (analysis of CMIP5 datasets) using hierarchical and traditional analysis approaches.

  11. A further extension of the Extended Parallel Process Model (E-EPPM): implications of cognitive appraisal theory of emotion and dispositional coping style.

    PubMed

    So, Jiyeon

    2013-01-01

    For two decades, the extended parallel process model (EPPM; Witte, 1992 ) has been one of the most widely used theoretical frameworks in health risk communication. The model has gained much popularity because it recognizes that, ironically, preceding fear appeal models do not incorporate the concept of fear as a legitimate and central part of them. As a remedy to this situation, the EPPM aims at "putting the fear back into fear appeals" ( Witte, 1992 , p. 330). Despite this attempt, however, this article argues that the EPPM still does not fully capture the essence of fear as an emotion. Specifically, drawing upon Lazarus's (1991 ) cognitive appraisal theory of emotion and the concept of dispositional coping style ( Miller, 1995 ), this article seeks to further extend the EPPM. The revised EPPM incorporates a more comprehensive perspective on risk perceptions as a construct involving both cognitive and affective aspects (i.e., fear and anxiety) and integrates the concept of monitoring and blunting coping style as a moderator of further information seeking regarding a given risk topic.

  12. Integrating Remote and Social Sensing Data for a Scenario on Secure Societies in Big Data Platform

    NASA Astrophysics Data System (ADS)

    Albani, Sergio; Lazzarini, Michele; Koubarakis, Manolis; Taniskidou, Efi Karra; Papadakis, George; Karkaletsis, Vangelis; Giannakopoulos, George

    2016-08-01

    In the framework of the Horizon 2020 project BigDataEurope (Integrating Big Data, Software & Communities for Addressing Europe's Societal Challenges), a pilot for the Secure Societies Societal Challenge was designed considering the requirements coming from relevant stakeholders. The pilot is focusing on the integration in a Big Data platform of data coming from remote and social sensing.The information on land changes coming from the Copernicus Sentinel 1A sensor (Change Detection workflow) is integrated with information coming from selected Twitter and news agencies accounts (Event Detection workflow) in order to provide the user with multiple sources of information.The Change Detection workflow implements a processing chain in a distributed parallel manner, exploiting the Big Data capabilities in place; the Event Detection workflow implements parallel and distributed social media and news agencies monitoring as well as suitable mechanisms to detect and geo-annotate the related events.

  13. Atlas : A library for numerical weather prediction and climate modelling

    NASA Astrophysics Data System (ADS)

    Deconinck, Willem; Bauer, Peter; Diamantakis, Michail; Hamrud, Mats; Kühnlein, Christian; Maciel, Pedro; Mengaldo, Gianmarco; Quintino, Tiago; Raoult, Baudouin; Smolarkiewicz, Piotr K.; Wedi, Nils P.

    2017-11-01

    The algorithms underlying numerical weather prediction (NWP) and climate models that have been developed in the past few decades face an increasing challenge caused by the paradigm shift imposed by hardware vendors towards more energy-efficient devices. In order to provide a sustainable path to exascale High Performance Computing (HPC), applications become increasingly restricted by energy consumption. As a result, the emerging diverse and complex hardware solutions have a large impact on the programming models traditionally used in NWP software, triggering a rethink of design choices for future massively parallel software frameworks. In this paper, we present Atlas, a new software library that is currently being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), with the scope of handling data structures required for NWP applications in a flexible and massively parallel way. Atlas provides a versatile framework for the future development of efficient NWP and climate applications on emerging HPC architectures. The applications range from full Earth system models, to specific tools required for post-processing weather forecast products. The Atlas library thus constitutes a step towards affordable exascale high-performance simulations by providing the necessary abstractions that facilitate the application in heterogeneous HPC environments by promoting the co-design of NWP algorithms with the underlying hardware.

  14. Efficient Parallel Algorithms for Landscape Evolution Modelling

    NASA Astrophysics Data System (ADS)

    Moresi, L. N.; Mather, B.; Beucher, R.

    2017-12-01

    Landscape erosion and the deposition of sediments by river systems are strongly controlled bytopography, rainfall patterns, and the susceptibility of the basement to the action ofrunning water. It is well understood that each of these processes depends on the other, for example:topography results from active tectonic processes; deformation, metamorphosis andexhumation alter the competence of the basement; rainfall patterns depend on topography;uplift and subsidence in response to tectonic stress can be amplified by erosionand sediment deposition. We typically gain understanding of such coupled systems through forward models which capture theessential interactions of the various components and attempt parameterise those parts of the individual systemthat are unresolvable at the scale of the interaction. Here we address the problem of predicting erosion and deposition rates at a continental scalewith a resolution of tens to hundreds of metres in a dynamic, Lagrangian framework. This isa typical requirement for a code to interface with a mantle / lithosphere dynamics model anddemands an efficient, unstructured, parallel implementation. We address this through a very general algorithm that treats all parts of the landscape evolution equationsin sparse-matrix form including those for stream-flow accumulation, dam-filling and catchment determination. This givesus considerable flexibility in developing unstructured, parallel code, and in creating a modular packagethat can be configured by users to work at different temporal and spatial scales, but is also has potential advantagesin treating the non-linear parts of the problem in a general manner.

  15. Interlaboratory Validation of the Leaching Environmental Assessment Framework (LEAF) Method 1313 and Method 1316

    EPA Science Inventory

    This document summarizes the results of an interlaboratory study conducted to generate precision estimates for two parallel batch leaching methods which are part of the Leaching Environmental Assessment Framework (LEAF). These methods are: (1) Method 1313: Liquid-Solid Partition...

  16. Hybrid Optimization Parallel Search PACKage

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2009-11-10

    HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework provides a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, amore » useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less

  17. A WENO-Limited, ADER-DT, Finite-Volume Scheme for Efficient, Robust, and Communication-Avoiding Multi-Dimensional Transport

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Norman, Matthew R

    2014-01-01

    The novel ADER-DT time discretization is applied to two-dimensional transport in a quadrature-free, WENO- and FCT-limited, Finite-Volume context. Emphasis is placed on (1) the serial and parallel computational properties of ADER-DT and this framework and (2) the flexibility of ADER-DT and this framework in efficiently balancing accuracy with other constraints important to transport applications. This study demonstrates a range of choices for the user when approaching their specific application while maintaining good parallel properties. In this method, genuine multi-dimensionality, single-step and single-stage time stepping, strict positivity, and a flexible range of limiting are all achieved with only one parallel synchronizationmore » and data exchange per time step. In terms of parallel data transfers per simulated time interval, this improves upon multi-stage time stepping and post-hoc filtering techniques such as hyperdiffusion. This method is evaluated with standard transport test cases over a range of limiting options to demonstrate quantitatively and qualitatively what a user should expect when employing this method in their application.« less

  18. Highly Asynchronous VisitOr Queue Graph Toolkit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pearce, R.

    2012-10-01

    HAVOQGT is a C++ framework that can be used to create highly parallel graph traversal algorithms. The framework stores the graph and algorithmic data structures on external memory that is typically mapped to high performance locally attached NAND FLASH arrays. The framework supports a vertex-centered visitor programming model. The frameworkd has been used to implement breadth first search, connected components, and single source shortest path.

  19. Development and Evaluation of Vectorised and Multi-Core Event Reconstruction Algorithms within the CMS Software Framework

    NASA Astrophysics Data System (ADS)

    Hauth, T.; Innocente and, V.; Piparo, D.

    2012-12-01

    The processing of data acquired by the CMS detector at LHC is carried out with an object-oriented C++ software framework: CMSSW. With the increasing luminosity delivered by the LHC, the treatment of recorded data requires extraordinary large computing resources, also in terms of CPU usage. A possible solution to cope with this task is the exploitation of the features offered by the latest microprocessor architectures. Modern CPUs present several vector units, the capacity of which is growing steadily with the introduction of new processor generations. Moreover, an increasing number of cores per die is offered by the main vendors, even on consumer hardware. Most recent C++ compilers provide facilities to take advantage of such innovations, either by explicit statements in the programs sources or automatically adapting the generated machine instructions to the available hardware, without the need of modifying the existing code base. Programming techniques to implement reconstruction algorithms and optimised data structures are presented, that aim to scalable vectorization and parallelization of the calculations. One of their features is the usage of new language features of the C++11 standard. Portions of the CMSSW framework are illustrated which have been found to be especially profitable for the application of vectorization and multi-threading techniques. Specific utility components have been developed to help vectorization and parallelization. They can easily become part of a larger common library. To conclude, careful measurements are described, which show the execution speedups achieved via vectorised and multi-threaded code in the context of CMSSW.

  20. A constraint logic programming approach to associate 1D and 3D structural components for large protein complexes.

    PubMed

    Dal Palù, Alessandro; Pontelli, Enrico; He, Jing; Lu, Yonggang

    2007-01-01

    The paper describes a novel framework, constructed using Constraint Logic Programming (CLP) and parallelism, to determine the association between parts of the primary sequence of a protein and alpha-helices extracted from 3D low-resolution descriptions of large protein complexes. The association is determined by extracting constraints from the 3D information, regarding length, relative position and connectivity of helices, and solving these constraints with the guidance of a secondary structure prediction algorithm. Parallelism is employed to enhance performance on large proteins. The framework provides a fast, inexpensive alternative to determine the exact tertiary structure of unknown proteins.

  1. Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

    NASA Astrophysics Data System (ADS)

    Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

    2016-08-01

    We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.

  2. Partitioning in parallel processing of production systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oflazer, K.

    1987-01-01

    This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less

  3. Parallel processing considerations for image recognition tasks

    NASA Astrophysics Data System (ADS)

    Simske, Steven J.

    2011-01-01

    Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.

  4. A Novel Design Framework for Structures/Materials with Enhanced Mechanical Performance

    PubMed Central

    Liu, Jie; Fan, Xiaonan; Wen, Guilin; Qing, Qixiang; Wang, Hongxin; Zhao, Gang

    2018-01-01

    Structure/material requires simultaneous consideration of both its design and manufacturing processes to dramatically enhance its manufacturability, assembly and maintainability. In this work, a novel design framework for structural/material with a desired mechanical performance and compelling topological design properties achieved using origami techniques is presented. The framework comprises four procedures, including topological design, unfold, reduction manufacturing, and fold. The topological design method, i.e., the solid isotropic material penalization (SIMP) method, serves to optimize the structure in order to achieve the preferred mechanical characteristics, and the origami technique is exploited to allow the structure to be rapidly and easily fabricated. Topological design and unfold procedures can be conveniently completed in a computer; then, reduction manufacturing, i.e., cutting, is performed to remove materials from the unfolded flat plate; the final structure is obtained by folding out the plate from the previous procedure. A series of cantilevers, consisting of origami parallel creases and Miura-ori (usually regarded as a metamaterial) and made of paperboard, are designed with the least weight and the required stiffness by using the proposed framework. The findings here furnish an alternative design framework for engineering structures that could be better than the 3D-printing technique, especially for large structures made of thin metal materials. PMID:29642555

  5. A Novel Design Framework for Structures/Materials with Enhanced Mechanical Performance.

    PubMed

    Liu, Jie; Fan, Xiaonan; Wen, Guilin; Qing, Qixiang; Wang, Hongxin; Zhao, Gang

    2018-04-09

    Abstract : Structure/material requires simultaneous consideration of both its design and manufacturing processes to dramatically enhance its manufacturability, assembly and maintainability. In this work, a novel design framework for structural/material with a desired mechanical performance and compelling topological design properties achieved using origami techniques is presented. The framework comprises four procedures, including topological design, unfold, reduction manufacturing, and fold. The topological design method, i.e., the solid isotropic material penalization (SIMP) method, serves to optimize the structure in order to achieve the preferred mechanical characteristics, and the origami technique is exploited to allow the structure to be rapidly and easily fabricated. Topological design and unfold procedures can be conveniently completed in a computer; then, reduction manufacturing, i.e., cutting, is performed to remove materials from the unfolded flat plate; the final structure is obtained by folding out the plate from the previous procedure. A series of cantilevers, consisting of origami parallel creases and Miura-ori (usually regarded as a metamaterial) and made of paperboard, are designed with the least weight and the required stiffness by using the proposed framework. The findings here furnish an alternative design framework for engineering structures that could be better than the 3D-printing technique, especially for large structures made of thin metal materials.

  6. Photonic reservoir computing: a new approach to optical information processing

    NASA Astrophysics Data System (ADS)

    Vandoorne, Kristof; Fiers, Martin; Verstraeten, David; Schrauwen, Benjamin; Dambre, Joni; Bienstman, Peter

    2010-06-01

    Despite ever increasing computational power, recognition and classification problems remain challenging to solve. Recently, advances have been made by the introduction of the new concept of reservoir computing. This is a methodology coming from the field of machine learning and neural networks that has been successfully used in several pattern classification problems, like speech and image recognition. Thus far, most implementations have been in software, limiting their speed and power efficiency. Photonics could be an excellent platform for a hardware implementation of this concept because of its inherent parallelism and unique nonlinear behaviour. Moreover, a photonic implementation offers the promise of massively parallel information processing with low power and high speed. We propose using a network of coupled Semiconductor Optical Amplifiers (SOA) and show in simulation that it could be used as a reservoir by comparing it to conventional software implementations using a benchmark speech recognition task. In spite of the differences with classical reservoir models, the performance of our photonic reservoir is comparable to that of conventional implementations and sometimes slightly better. As our implementation uses coherent light for information processing, we find that phase tuning is crucial to obtain high performance. In parallel we investigate the use of a network of photonic crystal cavities. The coupled mode theory (CMT) is used to investigate these resonators. A new framework is designed to model networks of resonators and SOAs. The same network topologies are used, but feedback is added to control the internal dynamics of the system. By adjusting the readout weights of the network in a controlled manner, we can generate arbitrary periodic patterns.

  7. A real-time multi-scale 2D Gaussian filter based on FPGA

    NASA Astrophysics Data System (ADS)

    Luo, Haibo; Gai, Xingqin; Chang, Zheng; Hui, Bin

    2014-11-01

    Multi-scale 2-D Gaussian filter has been widely used in feature extraction (e.g. SIFT, edge etc.), image segmentation, image enhancement, image noise removing, multi-scale shape description etc. However, their computational complexity remains an issue for real-time image processing systems. Aimed at this problem, we propose a framework of multi-scale 2-D Gaussian filter based on FPGA in this paper. Firstly, a full-hardware architecture based on parallel pipeline was designed to achieve high throughput rate. Secondly, in order to save some multiplier, the 2-D convolution is separated into two 1-D convolutions. Thirdly, a dedicate first in first out memory named as CAFIFO (Column Addressing FIFO) was designed to avoid the error propagating induced by spark on clock. Finally, a shared memory framework was designed to reduce memory costs. As a demonstration, we realized a 3 scales 2-D Gaussian filter on a single ALTERA Cyclone III FPGA chip. Experimental results show that, the proposed framework can computing a Multi-scales 2-D Gaussian filtering within one pixel clock period, is further suitable for real-time image processing. Moreover, the main principle can be popularized to the other operators based on convolution, such as Gabor filter, Sobel operator and so on.

  8. Modelling and analysis of the sugar cataract development process using stochastic hybrid systems.

    PubMed

    Riley, D; Koutsoukos, X; Riley, K

    2009-05-01

    Modelling and analysis of biochemical systems such as sugar cataract development (SCD) are critical because they can provide new insights into systems, which cannot be easily tested with experiments; however, they are challenging problems due to the highly coupled chemical reactions that are involved. The authors present a stochastic hybrid system (SHS) framework for modelling biochemical systems and demonstrate the approach for the SCD process. A novel feature of the framework is that it allows modelling the effect of drug treatment on the system dynamics. The authors validate the three sugar cataract models by comparing trajectories computed by two simulation algorithms. Further, the authors present a probabilistic verification method for computing the probability of sugar cataract formation for different chemical concentrations using safety and reachability analysis methods for SHSs. The verification method employs dynamic programming based on a discretisation of the state space and therefore suffers from the curse of dimensionality. To analyse the SCD process, a parallel dynamic programming implementation that can handle large, realistic systems was developed. Although scalability is a limiting factor, this work demonstrates that the proposed method is feasible for realistic biochemical systems.

  9. Substitute decision-making for adults with intellectual disabilities living in residential care: learning through experience.

    PubMed

    Dunn, Michael C; Clare, Isabel C H; Holland, Anthony J

    2008-03-01

    In the UK, current policies and services for people with mental disorders, including those with intellectual disabilities (ID), presume that these men and women can, do, and should, make decisions for themselves. The new Mental Capacity Act (England and Wales) 2005 (MCA) sets this presumption into statute, and codifies how decisions relating to health and welfare should be made for those adults judged unable to make one or more such decisions autonomously. The MCA uses a procedural checklist to guide this process of substitute decision-making. The personal experiences of providing direct support to seven men and women with ID living in residential care, however, showed that substitute decision-making took two forms, depending on the type of decision to be made. The first process, 'strategic substitute decision-making', paralleled the MCA's legal and ethical framework, whilst the second process, 'relational substitute decision-making', was markedly different from these statutory procedures. In this setting, 'relational substitute decision-making' underpinned everyday personal and social interventions connected with residents' daily living, and was situated within a framework of interpersonal and interdependent care relationships. The implications of these findings for residential services and the implementation of the MCA are discussed.

  10. A Framework for Understanding Community Colleges' Organizational Capacity for Data Use: A Convergent Parallel Mixed Methods Study

    ERIC Educational Resources Information Center

    Kerrigan, Monica Reid

    2014-01-01

    This convergent parallel design mixed methods case study of four community colleges explores the relationship between organizational capacity and implementation of data-driven decision making (DDDM). The article also illustrates purposive sampling using replication logic for cross-case analysis and the strengths and weaknesses of quantitizing…

  11. Accelerating population balance-Monte Carlo simulation for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU parallel computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Zuwei; Zhao, Haibo, E-mail: klinsmannzhb@163.com; Zheng, Chuguang

    2015-01-15

    This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule providesmore » a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated in a physically realistic Brownian coagulation case. The computational accuracy is validated with benchmark solution of discrete-sectional method. The simulation results show that the comprehensive approach can attain very favorable improvement in cost without sacrificing computational accuracy.« less

  12. Cerebellarlike corrective model inference engine for manipulation tasks.

    PubMed

    Luque, Niceto Rafael; Garrido, Jesús Alberto; Carrillo, Richard Rafael; Coenen, Olivier J-M D; Ros, Eduardo

    2011-10-01

    This paper presents how a simple cerebellumlike architecture can infer corrective models in the framework of a control task when manipulating objects that significantly affect the dynamics model of the system. The main motivation of this paper is to evaluate a simplified bio-mimetic approach in the framework of a manipulation task. More concretely, the paper focuses on how the model inference process takes place within a feedforward control loop based on the cerebellar structure and on how these internal models are built up by means of biologically plausible synaptic adaptation mechanisms. This kind of investigation may provide clues on how biology achieves accurate control of non-stiff-joint robot with low-power actuators which involve controlling systems with high inertial components. This paper studies how a basic temporal-correlation kernel including long-term depression (LTD) and a constant long-term potentiation (LTP) at parallel fiber-Purkinje cell synapses can effectively infer corrective models. We evaluate how this spike-timing-dependent plasticity correlates sensorimotor activity arriving through the parallel fibers with teaching signals (dependent on error estimates) arriving through the climbing fibers from the inferior olive. This paper addresses the study of how these LTD and LTP components need to be well balanced with each other to achieve accurate learning. This is of interest to evaluate the relevant role of homeostatic mechanisms in biological systems where adaptation occurs in a distributed manner. Furthermore, we illustrate how the temporal-correlation kernel can also work in the presence of transmission delays in sensorimotor pathways. We use a cerebellumlike spiking neural network which stores the corrective models as well-structured weight patterns distributed among the parallel fibers to Purkinje cell connections.

  13. An Advanced Framework for Improving Situational Awareness in Electric Power Grid Operation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Yousu; Huang, Zhenyu; Zhou, Ning

    With the deployment of new smart grid technologies and the penetration of renewable energy in power systems, significant uncertainty and variability is being introduced into power grid operation. Traditionally, the Energy Management System (EMS) operates the power grid in a deterministic mode, and thus will not be sufficient for the future control center in a stochastic environment with faster dynamics. One of the main challenges is to improve situational awareness. This paper reviews the current status of power grid operation and presents a vision of improving wide-area situational awareness for a future control center. An advanced framework, consisting of parallelmore » state estimation, state prediction, parallel contingency selection, parallel contingency analysis, and advanced visual analytics, is proposed to provide capabilities needed for better decision support by utilizing high performance computing (HPC) techniques and advanced visual analytic techniques. Research results are presented to support the proposed vision and framework.« less

  14. Implementation science: a role for parallel dual processing models of reasoning?

    PubMed Central

    Sladek, Ruth M; Phillips, Paddy A; Bond, Malcolm J

    2006-01-01

    Background A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Discussion Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence) and cognitive processing (e.g., thinking styles) influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of reasoning are important considerations in any discussion relating to changing clinical practice. Summary It is imperative that change strategies in healthcare consider relevant theoretical frameworks from other disciplines such as psychology. Generic dual processing models of reasoning are proposed as potentially useful in identifying factors within doctors that may moderate their individual uptake of evidence into clinical decision-making. Such factors can then inform strategies to change practice. PMID:16725023

  15. Implementation science: a role for parallel dual processing models of reasoning?

    PubMed

    Sladek, Ruth M; Phillips, Paddy A; Bond, Malcolm J

    2006-05-25

    A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence) and cognitive processing (e.g., thinking styles) influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of reasoning are important considerations in any discussion relating to changing clinical practice. It is imperative that change strategies in healthcare consider relevant theoretical frameworks from other disciplines such as psychology. Generic dual processing models of reasoning are proposed as potentially useful in identifying factors within doctors that may moderate their individual uptake of evidence into clinical decision-making. Such factors can then inform strategies to change practice.

  16. Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Kyungjoo; Rajamanickam, Sivasankaran; Stelle, George Widgery

    We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-byblocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented onmore » both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based factorization. Experimental results demonstrate that our task-parallel implementation delivers about 26.6x speedup (geometric mean) over single-threaded incomplete Choleskyby- blocks and 19.2x speedup over serial Cholesky performance which does not carry tasking overhead using 56 threads on the Intel Xeon Phi processor for sparse matrices arising from various application problems.« less

  17. An integrated geophysical and geological study of the tectonic framework of the 38th Parallel Lineament in the vicinity of its intersection with the extension of the New Madrid Fault Zone. Geotechnical report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Braile, L.W.; Hinze, J.H.; Keller, G.R.

    1978-09-01

    Extensive gravity and aeromagnetic surveys have been conducted in critical areas of Kentucky, Illinois, and Indiana centering around the intersection of the 38th Parallel Lineament and the extension of the New Madrid Fault Zone. Available aeromagnetic maps have been digitized and these data have been processed by a suite of computer programs developed for this purpose. Seismic equipment has been prepared for crustal seismic studies and a 150 km long seismic refraction line has been observed along the Wabash River Valley Fault System. Preliminary basement rock and configuration maps have been prepared based on studies of the samples derived frommore » basement drill holes. Interpretation of these data are only at a preliminary stage, but studies to this date indicate that the 38th Parallel Lineament features extend as far north as 39 degrees N and a subtle northeasterly-striking magnetic and gravity anomaly cuts across Indiana from the southwest corner of the state, roughly on strike with the New Madrid Seismic Zone.« less

  18. A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs

    DOE PAGES

    Azad, Ariful; Buluç, Aydın

    2016-05-16

    We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributed-memory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrix-algebraic formulation of maximal matching. This generic matrix-algebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graph-based algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulk-synchronous matrix operations,more » these algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. We report high-performance implementations of three maximal matching algorithms using hybrid OpenMP-MPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.« less

  19. FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo³ Framework.

    PubMed

    Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo

    2018-06-08

    Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.

  20. COBRApy: COnstraints-Based Reconstruction and Analysis for Python.

    PubMed

    Ebrahim, Ali; Lerman, Joshua A; Palsson, Bernhard O; Hyduke, Daniel R

    2013-08-08

    COnstraint-Based Reconstruction and Analysis (COBRA) methods are widely used for genome-scale modeling of metabolic networks in both prokaryotes and eukaryotes. Due to the successes with metabolism, there is an increasing effort to apply COBRA methods to reconstruct and analyze integrated models of cellular processes. The COBRA Toolbox for MATLAB is a leading software package for genome-scale analysis of metabolism; however, it was not designed to elegantly capture the complexity inherent in integrated biological networks and lacks an integration framework for the multiomics data used in systems biology. The openCOBRA Project is a community effort to promote constraints-based research through the distribution of freely available software. Here, we describe COBRA for Python (COBRApy), a Python package that provides support for basic COBRA methods. COBRApy is designed in an object-oriented fashion that facilitates the representation of the complex biological processes of metabolism and gene expression. COBRApy does not require MATLAB to function; however, it includes an interface to the COBRA Toolbox for MATLAB to facilitate use of legacy codes. For improved performance, COBRApy includes parallel processing support for computationally intensive processes. COBRApy is an object-oriented framework designed to meet the computational challenges associated with the next generation of stoichiometric constraint-based models and high-density omics data sets. http://opencobra.sourceforge.net/

  1. Password Cracking Using Sony Playstations

    NASA Astrophysics Data System (ADS)

    Kleinhans, Hugo; Butts, Jonathan; Shenoi, Sujeet

    Law enforcement agencies frequently encounter encrypted digital evidence for which the cryptographic keys are unknown or unavailable. Password cracking - whether it employs brute force or sophisticated cryptanalytic techniques - requires massive computational resources. This paper evaluates the benefits of using the Sony PlayStation 3 (PS3) to crack passwords. The PS3 offers massive computational power at relatively low cost. Moreover, multiple PS3 systems can be introduced easily to expand parallel processing when additional power is needed. This paper also describes a distributed framework designed to enable law enforcement agents to crack encrypted archives and applications in an efficient and cost-effective manner.

  2. Enterprise Imaging Governance: HIMSS-SIIM Collaborative White Paper.

    PubMed

    Roth, Christopher J; Lannum, Louis M; Joseph, Carol L

    2016-10-01

    Enterprise imaging governance is an emerging need in health enterprises today. This white paper highlights the decision-making body, framework, and process for optimal enterprise imaging governance inclusive of five areas of focus: program governance, technology governance, information governance, clinical governance, and financial governance. It outlines relevant parallels and differences when forming or optimizing imaging governance as compared with other established broad horizontal governance groups, such as for the electronic health record. It is intended for CMIOs and health informatics leaders looking to grow and govern a program to optimally capture, store, index, distribute, view, exchange, and analyze the images of their enterprise.

  3. Thermodynamic Model of Spatial Memory

    NASA Astrophysics Data System (ADS)

    Kaufman, Miron; Allen, P.

    1998-03-01

    We develop and test a thermodynamic model of spatial memory. Our model is an application of statistical thermodynamics to cognitive science. It is related to applications of the statistical mechanics framework in parallel distributed processes research. Our macroscopic model allows us to evaluate an entropy associated with spatial memory tasks. We find that older adults exhibit higher levels of entropy than younger adults. Thurstone's Law of Categorical Judgment, according to which the discriminal processes along the psychological continuum produced by presentations of a single stimulus are normally distributed, is explained by using a Hooke spring model of spatial memory. We have also analyzed a nonlinear modification of the ideal spring model of spatial memory. This work is supported by NIH/NIA grant AG09282-06.

  4. Cheetah: A Framework for Scalable Hierarchical Collective Operations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Graham, Richard L; Gorentla Venkata, Manjunath; Ladd, Joshua S

    2011-01-01

    Collective communication operations, used by many scientific applications, tend to limit overall parallel application performance and scalability. Computer systems are becoming more heterogeneous with increasing node and core-per-node counts. Also, a growing number of data-access mechanisms, of varying characteristics, are supported within a single computer system. We describe a new hierarchical collective communication framework that takes advantage of hardware-specific data-access mechanisms. It is flexible, with run-time hierarchy specification, and sharing of collective communication primitives between collective algorithms. Data buffers are shared between levels in the hierarchy reducing collective communication management overhead. We have implemented several versions of the Message Passingmore » Interface (MPI) collective operations, MPI Barrier() and MPI Bcast(), and run experiments using up to 49, 152 processes on a Cray XT5, and a small InfiniBand based cluster. At 49, 152 processes our barrier implementation outperforms the optimized native implementation by 75%. 32 Byte and one Mega-Byte broadcasts outperform it by 62% and 11%, respectively, with better scalability characteristics. Improvements relative to the default Open MPI implementation are much larger.« less

  5. Using a source-to-source transformation to introduce multi-threading into the AliRoot framework for a parallel event reconstruction

    NASA Astrophysics Data System (ADS)

    Lohn, Stefan B.; Dong, Xin; Carminati, Federico

    2012-12-01

    Chip-Multiprocessors are going to support massive parallelism by many additional physical and logical cores. Improving performance can no longer be obtained by increasing clock-frequency because the technical limits are almost reached. Instead, parallel execution must be used to gain performance. Resources like main memory, the cache hierarchy, bandwidth of the memory bus or links between cores and sockets are not going to be improved as fast. Hence, parallelism can only result into performance gains if the memory usage is optimized and the communication between threads is minimized. Besides concurrent programming has become a domain for experts. Implementing multi-threading is error prone and labor-intensive. A full reimplementation of the whole AliRoot source-code is unaffordable. This paper describes the effort to evaluate the adaption of AliRoot to the needs of multi-threading and to provide the capability of parallel processing by using a semi-automatic source-to-source transformation to address the problems as described before and to provide a straight-forward way of parallelization with almost no interference between threads. This makes the approach simple and reduces the required manual changes in the code. In a first step, unconditional thread-safety will be introduced to bring the original sequential and thread unaware source-code into the position of utilizing multi-threading. Afterwards further investigations have to be performed to point out candidates of classes that are useful to share amongst threads. Then in a second step, the transformation has to change the code to share these classes and finally to verify if there are anymore invalid interferences between threads.

  6. Parameter-induced uncertainty quantification of crop yields, soil N2O and CO2 emission for 8 arable sites across Europe using the LandscapeDNDC model

    NASA Astrophysics Data System (ADS)

    Santabarbara, Ignacio; Haas, Edwin; Kraus, David; Herrera, Saul; Klatt, Steffen; Kiese, Ralf

    2014-05-01

    When using biogeochemical models to estimate greenhouse gas emissions at site to regional/national levels, the assessment and quantification of the uncertainties of simulation results are of significant importance. The uncertainties in simulation results of process-based ecosystem models may result from uncertainties of the process parameters that describe the processes of the model, model structure inadequacy as well as uncertainties in the observations. Data for development and testing of uncertainty analisys were corp yield observations, measurements of soil fluxes of nitrous oxide (N2O) and carbon dioxide (CO2) from 8 arable sites across Europe. Using the process-based biogeochemical model LandscapeDNDC for simulating crop yields, N2O and CO2 emissions, our aim is to assess the simulation uncertainty by setting up a Bayesian framework based on Metropolis-Hastings algorithm. Using Gelman statistics convergence criteria and parallel computing techniques, enable multi Markov Chains to run independently in parallel and create a random walk to estimate the joint model parameter distribution. Through means distribution we limit the parameter space, get probabilities of parameter values and find the complex dependencies among them. With this parameter distribution that determines soil-atmosphere C and N exchange, we are able to obtain the parameter-induced uncertainty of simulation results and compare them with the measurements data.

  7. COMP Superscalar, an interoperable programming framework

    NASA Astrophysics Data System (ADS)

    Badia, Rosa M.; Conejero, Javier; Diaz, Carlos; Ejarque, Jorge; Lezzi, Daniele; Lordan, Francesc; Ramon-Cortes, Cristian; Sirvent, Raul

    2015-12-01

    COMPSs is a programming framework that aims to facilitate the parallelization of existing applications written in Java, C/C++ and Python scripts. For that purpose, it offers a simple programming model based on sequential development in which the user is mainly responsible for (i) identifying the functions to be executed as asynchronous parallel tasks and (ii) annotating them with annotations or standard Python decorators. A runtime system is in charge of exploiting the inherent concurrency of the code, automatically detecting and enforcing the data dependencies between tasks and spawning these tasks to the available resources, which can be nodes in a cluster, clouds or grids. In cloud environments, COMPSs provides scalability and elasticity features allowing the dynamic provision of resources.

  8. Polyphony: A Workflow Orchestration Framework for Cloud Computing

    NASA Technical Reports Server (NTRS)

    Shams, Khawaja S.; Powell, Mark W.; Crockett, Tom M.; Norris, Jeffrey S.; Rossi, Ryan; Soderstrom, Tom

    2010-01-01

    Cloud Computing has delivered unprecedented compute capacity to NASA missions at affordable rates. Missions like the Mars Exploration Rovers (MER) and Mars Science Lab (MSL) are enjoying the elasticity that enables them to leverage hundreds, if not thousands, or machines for short durations without making any hardware procurements. In this paper, we describe Polyphony, a resilient, scalable, and modular framework that efficiently leverages a large set of computing resources to perform parallel computations. Polyphony can employ resources on the cloud, excess capacity on local machines, as well as spare resources on the supercomputing center, and it enables these resources to work in concert to accomplish a common goal. Polyphony is resilient to node failures, even if they occur in the middle of a transaction. We will conclude with an evaluation of a production-ready application built on top of Polyphony to perform image-processing operations of images from around the solar system, including Mars, Saturn, and Titan.

  9. A framework for human microbiome research

    PubMed Central

    Methé, Barbara A.; Nelson, Karen E.; Pop, Mihai; Creasy, Heather H.; Giglio, Michelle G.; Huttenhower, Curtis; Gevers, Dirk; Petrosino, Joseph F.; Abubucker, Sahar; Badger, Jonathan H.; Chinwalla, Asif T.; Earl, Ashlee M.; FitzGerald, Michael G.; Fulton, Robert S.; Hallsworth-Pepin, Kymberlie; Lobos, Elizabeth A.; Madupu, Ramana; Magrini, Vincent; Martin, John C.; Mitreva, Makedonka; Muzny, Donna M.; Sodergren, Erica J.; Versalovic, James; Wollam, Aye M.; Worley, Kim C.; Wortman, Jennifer R.; Young, Sarah K.; Zeng, Qiandong; Aagaard, Kjersti M.; Abolude, Olukemi O.; Allen-Vercoe, Emma; Alm, Eric J.; Alvarado, Lucia; Andersen, Gary L.; Anderson, Scott; Appelbaum, Elizabeth; Arachchi, Harindra M.; Armitage, Gary; Arze, Cesar A.; Ayvaz, Tulin; Baker, Carl C.; Begg, Lisa; Belachew, Tsegahiwot; Bhonagiri, Veena; Bihan, Monika; Blaser, Martin J.; Bloom, Toby; Vivien Bonazzi, J.; Brooks, Paul; Buck, Gregory A.; Buhay, Christian J.; Busam, Dana A.; Campbell, Joseph L.; Canon, Shane R.; Cantarel, Brandi L.; Chain, Patrick S.; Chen, I-Min A.; Chen, Lei; Chhibba, Shaila; Chu, Ken; Ciulla, Dawn M.; Clemente, Jose C.; Clifton, Sandra W.; Conlan, Sean; Crabtree, Jonathan; Cutting, Mary A.; Davidovics, Noam J.; Davis, Catherine C.; DeSantis, Todd Z.; Deal, Carolyn; Delehaunty, Kimberley D.; Dewhirst, Floyd E.; Deych, Elena; Ding, Yan; Dooling, David J.; Dugan, Shannon P.; Dunne, Wm. Michael; Durkin, A. Scott; Edgar, Robert C.; Erlich, Rachel L.; Farmer, Candace N.; Farrell, Ruth M.; Faust, Karoline; Feldgarden, Michael; Felix, Victor M.; Fisher, Sheila; Fodor, Anthony A.; Forney, Larry; Foster, Leslie; Di Francesco, Valentina; Friedman, Jonathan; Friedrich, Dennis C.; Fronick, Catrina C.; Fulton, Lucinda L.; Gao, Hongyu; Garcia, Nathalia; Giannoukos, Georgia; Giblin, Christina; Giovanni, Maria Y.; Goldberg, Jonathan M.; Goll, Johannes; Gonzalez, Antonio; Griggs, Allison; Gujja, Sharvari; Haas, Brian J.; Hamilton, Holli A.; Harris, Emily L.; Hepburn, Theresa A.; Herter, Brandi; Hoffmann, Diane E.; Holder, Michael E.; Howarth, Clinton; Huang, Katherine H.; Huse, Susan M.; Izard, Jacques; Jansson, Janet K.; Jiang, Huaiyang; Jordan, Catherine; Joshi, Vandita; Katancik, James A.; Keitel, Wendy A.; Kelley, Scott T.; Kells, Cristyn; Kinder-Haake, Susan; King, Nicholas B.; Knight, Rob; Knights, Dan; Kong, Heidi H.; Koren, Omry; Koren, Sergey; Kota, Karthik C.; Kovar, Christie L.; Kyrpides, Nikos C.; La Rosa, Patricio S.; Lee, Sandra L.; Lemon, Katherine P.; Lennon, Niall; Lewis, Cecil M.; Lewis, Lora; Ley, Ruth E.; Li, Kelvin; Liolios, Konstantinos; Liu, Bo; Liu, Yue; Lo, Chien-Chi; Lozupone, Catherine A.; Lunsford, R. Dwayne; Madden, Tessa; Mahurkar, Anup A.; Mannon, Peter J.; Mardis, Elaine R.; Markowitz, Victor M.; Mavrommatis, Konstantinos; McCorrison, Jamison M.; McDonald, Daniel; McEwen, Jean; McGuire, Amy L.; McInnes, Pamela; Mehta, Teena; Mihindukulasuriya, Kathie A.; Miller, Jason R.; Minx, Patrick J.; Newsham, Irene; Nusbaum, Chad; O’Laughlin, Michelle; Orvis, Joshua; Pagani, Ioanna; Palaniappan, Krishna; Patel, Shital M.; Pearson, Matthew; Peterson, Jane; Podar, Mircea; Pohl, Craig; Pollard, Katherine S.; Priest, Margaret E.; Proctor, Lita M.; Qin, Xiang; Raes, Jeroen; Ravel, Jacques; Reid, Jeffrey G.; Rho, Mina; Rhodes, Rosamond; Riehle, Kevin P.; Rivera, Maria C.; Rodriguez-Mueller, Beltran; Rogers, Yu-Hui; Ross, Matthew C.; Russ, Carsten; Sanka, Ravi K.; Pamela Sankar, J.; Sathirapongsasuti, Fah; Schloss, Jeffery A.; Schloss, Patrick D.; Schmidt, Thomas M.; Scholz, Matthew; Schriml, Lynn; Schubert, Alyxandria M.; Segata, Nicola; Segre, Julia A.; Shannon, William D.; Sharp, Richard R.; Sharpton, Thomas J.; Shenoy, Narmada; Sheth, Nihar U.; Simone, Gina A.; Singh, Indresh; Smillie, Chris S.; Sobel, Jack D.; Sommer, Daniel D.; Spicer, Paul; Sutton, Granger G.; Sykes, Sean M.; Tabbaa, Diana G.; Thiagarajan, Mathangi; Tomlinson, Chad M.; Torralba, Manolito; Treangen, Todd J.; Truty, Rebecca M.; Vishnivetskaya, Tatiana A.; Walker, Jason; Wang, Lu; Wang, Zhengyuan; Ward, Doyle V.; Warren, Wesley; Watson, Mark A.; Wellington, Christopher; Wetterstrand, Kris A.; White, James R.; Wilczek-Boney, Katarzyna; Wu, Yuan Qing; Wylie, Kristine M.; Wylie, Todd; Yandava, Chandri; Ye, Liang; Ye, Yuzhen; Yooseph, Shibu; Youmans, Bonnie P.; Zhang, Lan; Zhou, Yanjiao; Zhu, Yiming; Zoloth, Laurie; Zucker, Jeremy D.; Birren, Bruce W.; Gibbs, Richard A.; Highlander, Sarah K.; Weinstock, George M.; Wilson, Richard K.; White, Owen

    2012-01-01

    A variety of microbial communities and their genes (microbiome) exist throughout the human body, playing fundamental roles in human health and disease. The NIH funded Human Microbiome Project (HMP) Consortium has established a population-scale framework which catalyzed significant development of metagenomic protocols resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 to 18 body sites up to three times, which to date, have generated 5,177 microbial taxonomic profiles from 16S rRNA genes and over 3.5 Tb of metagenomic sequence. In parallel, approximately 800 human-associated reference genomes have been sequenced. Collectively, these data represent the largest resource to date describing the abundance and variety of the human microbiome, while providing a platform for current and future studies. PMID:22699610

  10. Development of the Internet-Based Customer-Oriented Ordering System Framework for Complicated Mechanical Product

    NASA Astrophysics Data System (ADS)

    Ong, Mingwei; Watanuki, Keiichi

    Recently, as consumers gradually prefer buying products that reflect their own personality, there exist some consumers who wish to involve in the product design process. Parallel with the popularization of e-business, many manufacturers have utilized the Internet to promote their products, and some have even built websites that enable consumers to select their desirable product specifications. Nevertheless, this method has not been applied on complicated mechanical product due to the facts that complicated mechanical product has a large number of specifications that inter-relate among one another. In such a case, ordinary consumers who are lacking of design knowledge, are not capable of determining these specifications. In this paper, a prototype framework called Internet-based consumer-oriented product ordering system has been developed in which it enables ordinary consumers to have large freedom in determining complicated mechanical product specifications, and meanwhile ensures that the manufacturing of the determined product is feasible.

  11. Computational Models of Anterior Cingulate Cortex: At the Crossroads between Prediction and Effort.

    PubMed

    Vassena, Eliana; Holroyd, Clay B; Alexander, William H

    2017-01-01

    In the last two decades the anterior cingulate cortex (ACC) has become one of the most investigated areas of the brain. Extensive neuroimaging evidence suggests countless functions for this region, ranging from conflict and error coding, to social cognition, pain and effortful control. In response to this burgeoning amount of data, a proliferation of computational models has tried to characterize the neurocognitive architecture of ACC. Early seminal models provided a computational explanation for a relatively circumscribed set of empirical findings, mainly accounting for EEG and fMRI evidence. More recent models have focused on ACC's contribution to effortful control. In parallel to these developments, several proposals attempted to explain within a single computational framework a wider variety of empirical findings that span different cognitive processes and experimental modalities. Here we critically evaluate these modeling attempts, highlighting the continued need to reconcile the array of disparate ACC observations within a coherent, unifying framework.

  12. Parallel-In-Time For Moving Meshes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Falgout, R. D.; Manteuffel, T. A.; Southworth, B.

    2016-02-04

    With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is appliedmore » to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.« less

  13. A distributed pipeline for DIDSON data processing

    USGS Publications Warehouse

    Li, Liling; Danner, Tyler; Eickholt, Jesse; McCann, Erin L.; Pangle, Kevin; Johnson, Nicholas

    2018-01-01

    Technological advances in the field of ecology allow data on ecological systems to be collected at high resolution, both temporally and spatially. Devices such as Dual-frequency Identification Sonar (DIDSON) can be deployed in aquatic environments for extended periods and easily generate several terabytes of underwater surveillance data which may need to be processed multiple times. Due to the large amount of data generated and need for flexibility in processing, a distributed pipeline was constructed for DIDSON data making use of the Hadoop ecosystem. The pipeline is capable of ingesting raw DIDSON data, transforming the acoustic data to images, filtering the images, detecting and extracting motion, and generating feature data for machine learning and classification. All of the tasks in the pipeline can be run in parallel and the framework allows for custom processing. Applications of the pipeline include monitoring migration times, determining the presence of a particular species, estimating population size and other fishery management tasks.

  14. Diffusion of the Internet within a Graduate School.

    ERIC Educational Resources Information Center

    Sherry, Lorraine

    This paper reports the results of a five-year case study of the use of online tools: Internet, e-mail, and the World Wide Web, within a Graduate School of Education. The conceptual framework was independently developed, but because of the striking parallel with activity theory, activity theory became the overall framework for interpreting…

  15. Making Learning Personally Meaningful: A New Framework for Relevance Research

    ERIC Educational Resources Information Center

    Priniski, Stacy J.; Hecht, Cameron A.; Harackiewicz, Judith M.

    2018-01-01

    Personal relevance goes by many names in the motivation literature, stemming from a number of theoretical frameworks. Currently these lines of research are being conducted in parallel with little synthesis across them, perhaps because there is no unifying definition of the relevance construct within which this research can be situated. In this…

  16. Some thoughts about parallel process and psychotherapy supervision: when is a parallel just a parallel?

    PubMed

    Watkins, C Edward

    2012-09-01

    In a way not done before, Tracey, Bludworth, and Glidden-Tracey ("Are there parallel processes in psychotherapy supervision: An empirical examination," Psychotherapy, 2011, advance online publication, doi.10.1037/a0026246) have shown us that parallel process in psychotherapy supervision can indeed be rigorously and meaningfully researched, and their groundbreaking investigation provides a nice prototype for future supervision studies to emulate. In what follows, I offer a brief complementary comment to Tracey et al., addressing one matter that seems to be a potentially important conceptual and empirical parallel process consideration: When is a parallel just a parallel? PsycINFO Database Record (c) 2012 APA, all rights reserved.

  17. A Component-Based FPGA Design Framework for Neuronal Ion Channel Dynamics Simulations

    PubMed Central

    Mak, Terrence S. T.; Rachmuth, Guy; Lam, Kai-Pui; Poon, Chi-Sang

    2008-01-01

    Neuron-machine interfaces such as dynamic clamp and brain-implantable neuroprosthetic devices require real-time simulations of neuronal ion channel dynamics. Field Programmable Gate Array (FPGA) has emerged as a high-speed digital platform ideal for such application-specific computations. We propose an efficient and flexible component-based FPGA design framework for neuronal ion channel dynamics simulations, which overcomes certain limitations of the recently proposed memory-based approach. A parallel processing strategy is used to minimize computational delay, and a hardware-efficient factoring approach for calculating exponential and division functions in neuronal ion channel models is used to conserve resource consumption. Performances of the various FPGA design approaches are compared theoretically and experimentally in corresponding implementations of the AMPA and NMDA synaptic ion channel models. Our results suggest that the component-based design framework provides a more memory economic solution as well as more efficient logic utilization for large word lengths, whereas the memory-based approach may be suitable for time-critical applications where a higher throughput rate is desired. PMID:17190033

  18. Seeing the forest for the trees: Networked workstations as a parallel processing computer

    NASA Technical Reports Server (NTRS)

    Breen, J. O.; Meleedy, D. M.

    1992-01-01

    Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.

  19. Developing a Shuffled Complex-Self Adaptive Hybrid Evolution (SC-SAHEL) Framework for Water Resources Management and Water-Energy System Optimization

    NASA Astrophysics Data System (ADS)

    Rahnamay Naeini, M.; Sadegh, M.; AghaKouchak, A.; Hsu, K. L.; Sorooshian, S.; Yang, T.

    2017-12-01

    Meta-Heuristic optimization algorithms have gained a great deal of attention in a wide variety of fields. Simplicity and flexibility of these algorithms, along with their robustness, make them attractive tools for solving optimization problems. Different optimization methods, however, hold algorithm-specific strengths and limitations. Performance of each individual algorithm obeys the "No-Free-Lunch" theorem, which means a single algorithm cannot consistently outperform all possible optimization problems over a variety of problems. From users' perspective, it is a tedious process to compare, validate, and select the best-performing algorithm for a specific problem or a set of test cases. In this study, we introduce a new hybrid optimization framework, entitled Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL), which combines the strengths of different evolutionary algorithms (EAs) in a parallel computing scheme, and allows users to select the most suitable algorithm tailored to the problem at hand. The concept of SC-SAHEL is to execute different EAs as separate parallel search cores, and let all participating EAs to compete during the course of the search. The newly developed SC-SAHEL algorithm is designed to automatically select, the best performing algorithm for the given optimization problem. This algorithm is rigorously effective in finding the global optimum for several strenuous benchmark test functions, and computationally efficient as compared to individual EAs. We benchmark the proposed SC-SAHEL algorithm over 29 conceptual test functions, and two real-world case studies - one hydropower reservoir model and one hydrological model (SAC-SMA). Results show that the proposed framework outperforms individual EAs in an absolute majority of the test problems, and can provide competitive results to the fittest EA algorithm with more comprehensive information during the search. The proposed framework is also flexible for merging additional EAs, boundary-handling techniques, and sampling schemes, and has good potential to be used in Water-Energy system optimal operation and management.

  20. Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.

  1. FleCSPH - a parallel and distributed SPH implementation based on the FleCSI framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Junghans, Christoph; Loiseau, Julien

    2017-06-20

    FleCSPH is a multi-physics compact application that exercises FleCSI parallel data structures for tree-based particle methods. In particular, FleCSPH implements a smoothed-particle hydrodynamics (SPH) solver for the solution of Lagrangian problems in astrophysics and cosmology. FleCSPH includes support for gravitational forces using the fast multipole method (FMM).

  2. SCaLeM: A Framework for Characterizing and Analyzing Execution Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chavarría-Miranda, Daniel; Manzano Franco, Joseph B.; Krishnamoorthy, Sriram

    2014-10-13

    As scalable parallel systems evolve towards more complex nodes with many-core architectures and larger trans-petascale & upcoming exascale deployments, there is a need to understand, characterize and quantify the underlying execution models being used on such systems. Execution models are a conceptual layer between applications & algorithms and the underlying parallel hardware and systems software on which those applications run. This paper presents the SCaLeM (Synchronization, Concurrency, Locality, Memory) framework for characterizing and execution models. SCaLeM consists of three basic elements: attributes, compositions and mapping of these compositions to abstract parallel systems. The fundamental Synchronization, Concurrency, Locality and Memory attributesmore » are used to characterize each execution model, while the combinations of those attributes in the form of compositions are used to describe the primitive operations of the execution model. The mapping of the execution model’s primitive operations described by compositions, to an underlying abstract parallel system can be evaluated quantitatively to determine its effectiveness. Finally, SCaLeM also enables the representation and analysis of applications in terms of execution models, for the purpose of evaluating the effectiveness of such mapping.« less

  3. Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.

  4. Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.

  5. Automatic Thread-Level Parallelization in the Chombo AMR Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Christen, Matthias; Keen, Noel; Ligocki, Terry

    2011-05-26

    The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number ofmore » existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.« less

  6. Parallel Processing at the High School Level.

    ERIC Educational Resources Information Center

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  7. Bilingual parallel programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foster, I.; Overbeek, R.

    1990-01-01

    Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less

  8. A multi-fidelity framework for physics based rotor blade simulation and optimization

    NASA Astrophysics Data System (ADS)

    Collins, Kyle Brian

    New helicopter rotor designs are desired that offer increased efficiency, reduced vibration, and reduced noise. Rotor Designers in industry need methods that allow them to use the most accurate simulation tools available to search for these optimal designs. Computer based rotor analysis and optimization have been advanced by the development of industry standard codes known as "comprehensive" rotorcraft analysis tools. These tools typically use table look-up aerodynamics, simplified inflow models and perform aeroelastic analysis using Computational Structural Dynamics (CSD). Due to the simplified aerodynamics, most design studies are performed varying structural related design variables like sectional mass and stiffness. The optimization of shape related variables in forward flight using these tools is complicated and results are viewed with skepticism because rotor blade loads are not accurately predicted. The most accurate methods of rotor simulation utilize Computational Fluid Dynamics (CFD) but have historically been considered too computationally intensive to be used in computer based optimization, where numerous simulations are required. An approach is needed where high fidelity CFD rotor analysis can be utilized in a shape variable optimization problem with multiple objectives. Any approach should be capable of working in forward flight in addition to hover. An alternative is proposed and founded on the idea that efficient hybrid CFD methods of rotor analysis are ready to be used in preliminary design. In addition, the proposed approach recognizes the usefulness of lower fidelity physics based analysis and surrogate modeling. Together, they are used with high fidelity analysis in an intelligent process of surrogate model building of parameters in the high fidelity domain. Closing the loop between high and low fidelity analysis is a key aspect of the proposed approach. This is done by using information from higher fidelity analysis to improve predictions made with lower fidelity models. This thesis documents the development of automated low and high fidelity physics based rotor simulation frameworks. The low fidelity framework uses a comprehensive code with simplified aerodynamics. The high fidelity model uses a parallel processor capable CFD/CSD methodology. Both low and high fidelity frameworks include an aeroacoustic simulation for prediction of noise. A synergistic process is developed that uses both the low and high fidelity frameworks together to build approximate models of important high fidelity metrics as functions of certain design variables. To test the process, a 4-bladed hingeless rotor model is used as a baseline. The design variables investigated include tip geometry and spanwise twist distribution. Approximation models are built for metrics related to rotor efficiency and vibration using the results from 60+ high fidelity (CFD/CSD) experiments and 400+ low fidelity experiments. Optimization using the approximation models found the Pareto Frontier anchor points, or the design having maximum rotor efficiency and the design having minimum vibration. Various Pareto generation methods are used to find designs on the frontier between these two anchor designs. When tested in the high fidelity framework, the Pareto anchor designs are shown to be very good designs when compared with other designs from the high fidelity database. This provides evidence that the process proposed has merit. Ultimately, this process can be utilized by industry rotor designers with their existing tools to bring high fidelity analysis into the preliminary design stage of rotors. In conclusion, the methods developed and documented in this thesis have made several novel contributions. First, an automated high fidelity CFD based forward flight simulation framework has been built for use in preliminary design optimization. The framework was built around an integrated, parallel processor capable CFD/CSD/AA process. Second, a novel method of building approximate models of high fidelity parameters has been developed. The method uses a combination of low and high fidelity results and combines Design of Experiments, statistical effects analysis, and aspects of approximation model management. And third, the determination of rotor blade shape variables through optimization using CFD based analysis in forward flight has been performed. This was done using the high fidelity CFD/CSD/AA framework and method mentioned above. While the low and high fidelity predictions methods used in the work still have inaccuracies that can affect the absolute levels of the results, a framework has been successfully developed and demonstrated that allows for an efficient process to improve rotor blade designs in terms of a selected choice of objective function(s). Using engineering judgment, this methodology could be applied today to investigate opportunities to improve existing designs. With improvements in the low and high fidelity prediction components that will certainly occur, this framework could become a powerful tool for future rotorcraft design work. (Abstract shortened by UMI.)

  9. Efficient in-situ visualization of unsteady flows in climate simulation

    NASA Astrophysics Data System (ADS)

    Vetter, Michael; Olbrich, Stephan

    2017-04-01

    The simulation of climate data tends to produce very large data sets, which hardly can be processed in classical post-processing visualization applications. Typically, the visualization pipeline consisting of the processes data generation, visualization mapping and rendering is distributed into two parts over the network or separated via file transfer. Within most traditional post-processing scenarios the simulation is done on a supercomputer whereas the data analysis and visualization is done on a graphics workstation. That way temporary data sets with huge volume have to be transferred over the network, which leads to bandwidth bottlenecks and volume limitations. The solution to this issue is the avoidance of temporary storage, or at least significant reduction of data complexity. Within the Climate Visualization Lab - as part of the Cluster of Excellence "Integrated Climate System Analysis and Prediction" (CliSAP) at the University of Hamburg, in cooperation with the German Climate Computing Center (DKRZ) - we develop and integrate an in-situ approach. Our software framework DSVR is based on the separation of the process chain between the mapping and the rendering processes. It couples the mapping process directly to the simulation by calling methods of a parallelized data extraction library, which create a time-based sequence of geometric 3D scenes. This sequence is stored on a special streaming server with an interactive post-filtering option and then played-out asynchronously in a separate 3D viewer application. Since the rendering is part of this viewer application, the scenes can be navigated interactively. In contrast to other in-situ approaches where 2D images are created as part of the simulation or synchronous co-visualization takes place, our method supports interaction in 3D space and in time, as well as fixed frame rates. To integrate in-situ processing based on our DSVR framework and methods in the ICON climate model, we are continuously evolving the data structures and mapping algorithms of the framework to support the ICON model's native grid structures, since DSVR originally was designed for rectilinear grids only. We now have implemented a new output module to ICON to take advantage of the DSVR visualization. The visualization can be configured as most output modules by using a specific namelist and is exemplarily integrated within the non-hydrostatic atmospheric model time loop. With the integration of a DSVR based in-situ pathline extraction within ICON, a further milestone is reached. The pathline algorithm as well as the grid data structures have been optimized for the domain decomposition used for the parallelization of ICON based on MPI and OpenMP. The software implementation and evaluation is done on the supercomputers at DKRZ. In principle, the data complexity is reduced from O(n3) to O(m), where n is the grid resolution and m the number of supporting point of all pathlines. The stability and scalability evaluation is done using Atmospheric Model Intercomparison Project (AMIP) runs. We will give a short introduction in our software framework, as well as a short overview on the implementation and usage of DSVR within ICON. Furthermore, we will present visualization and evaluation results of sample applications.

  10. A framework for accelerated phototrophic bioprocess development: integration of parallelized microscale cultivation, laboratory automation and Kriging-assisted experimental design.

    PubMed

    Morschett, Holger; Freier, Lars; Rohde, Jannis; Wiechert, Wolfgang; von Lieres, Eric; Oldiges, Marco

    2017-01-01

    Even though microalgae-derived biodiesel has regained interest within the last decade, industrial production is still challenging for economic reasons. Besides reactor design, as well as value chain and strain engineering, laborious and slow early-stage parameter optimization represents a major drawback. The present study introduces a framework for the accelerated development of phototrophic bioprocesses. A state-of-the-art micro-photobioreactor supported by a liquid-handling robot for automated medium preparation and product quantification was used. To take full advantage of the technology's experimental capacity, Kriging-assisted experimental design was integrated to enable highly efficient execution of screening applications. The resulting platform was used for medium optimization of a lipid production process using Chlorella vulgaris toward maximum volumetric productivity. Within only four experimental rounds, lipid production was increased approximately threefold to 212 ± 11 mg L -1  d -1 . Besides nitrogen availability as a key parameter, magnesium, calcium and various trace elements were shown to be of crucial importance. Here, synergistic multi-parameter interactions as revealed by the experimental design introduced significant further optimization potential. The integration of parallelized microscale cultivation, laboratory automation and Kriging-assisted experimental design proved to be a fruitful tool for the accelerated development of phototrophic bioprocesses. By means of the proposed technology, the targeted optimization task was conducted in a very timely and material-efficient manner.

  11. Parallel trends in cortical gray and white matter architecture and connections in primates allow fine study of pathways in humans and reveal network disruptions in autism

    PubMed Central

    García-Cabezas, Miguel Ángel; Barbas, Helen

    2018-01-01

    Noninvasive imaging and tractography methods have yielded information on broad communication networks but lack resolution to delineate intralaminar cortical and subcortical pathways in humans. An important unanswered question is whether we can use the wealth of precise information on pathways from monkeys to understand connections in humans. We addressed this question within a theoretical framework of systematic cortical variation and used identical high-resolution methods to compare the architecture of cortical gray matter and the white matter beneath, which gives rise to short- and long-distance pathways in humans and rhesus monkeys. We used the prefrontal cortex as a model system because of its key role in attention, emotions, and executive function, which are processes often affected in brain diseases. We found striking parallels and consistent trends in the gray and white matter architecture in humans and monkeys and between the architecture and actual connections mapped with neural tracers in rhesus monkeys and, by extension, in humans. Using the novel architectonic portrait as a base, we found significant changes in pathways between nearby prefrontal and distant areas in autism. Our findings reveal that a theoretical framework allows study of normal neural communication in humans at high resolution and specific disruptions in diverse psychiatric and neurodegenerative diseases. PMID:29401206

  12. Iterative load-balancing method with multigrid level relaxation for particle simulation with short-range interactions

    NASA Astrophysics Data System (ADS)

    Furuichi, Mikito; Nishiura, Daisuke

    2017-10-01

    We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.

  13. Jali - Unstructured Mesh Infrastructure for Multi-Physics Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garimella, Rao V; Berndt, Markus; Coon, Ethan

    2017-04-13

    Jali is a parallel unstructured mesh infrastructure library designed for use by multi-physics simulations. It supports 2D and 3D arbitrary polyhedral meshes distributed over hundreds to thousands of nodes. Jali can read write Exodus II meshes along with fields and sets on the mesh and support for other formats is partially implemented or is (https://github.com/MeshToolkit/MSTK), an open source general purpose unstructured mesh infrastructure library from Los Alamos National Laboratory. While it has been made to work with other mesh frameworks such as MOAB and STKmesh in the past, support for maintaining the interface to these frameworks has been suspended formore » now. Jali supports distributed as well as on-node parallelism. Support of on-node parallelism is through direct use of the the mesh in multi-threaded constructs or through the use of "tiles" which are submeshes or sub-partitions of a partition destined for a compute node.« less

  14. Moose: An Open-Source Framework to Enable Rapid Development of Collaborative, Multi-Scale, Multi-Physics Simulation Tools

    NASA Astrophysics Data System (ADS)

    Slaughter, A. E.; Permann, C.; Peterson, J. W.; Gaston, D.; Andrs, D.; Miller, J.

    2014-12-01

    The Idaho National Laboratory (INL)-developed Multiphysics Object Oriented Simulation Environment (MOOSE; www.mooseframework.org), is an open-source, parallel computational framework for enabling the solution of complex, fully implicit multiphysics systems. MOOSE provides a set of computational tools that scientists and engineers can use to create sophisticated multiphysics simulations. Applications built using MOOSE have computed solutions for chemical reaction and transport equations, computational fluid dynamics, solid mechanics, heat conduction, mesoscale materials modeling, geomechanics, and others. To facilitate the coupling of diverse and highly-coupled physical systems, MOOSE employs the Jacobian-free Newton-Krylov (JFNK) method when solving the coupled nonlinear systems of equations arising in multiphysics applications. The MOOSE framework is written in C++, and leverages other high-quality, open-source scientific software packages such as LibMesh, Hypre, and PETSc. MOOSE uses a "hybrid parallel" model which combines both shared memory (thread-based) and distributed memory (MPI-based) parallelism to ensure efficient resource utilization on a wide range of computational hardware. MOOSE-based applications are inherently modular, which allows for simulation expansion (via coupling of additional physics modules) and the creation of multi-scale simulations. Any application developed with MOOSE supports running (in parallel) any other MOOSE-based application. Each application can be developed independently, yet easily communicate with other applications (e.g., conductivity in a slope-scale model could be a constant input, or a complete phase-field micro-structure simulation) without additional code being written. This method of development has proven effective at INL and expedites the development of sophisticated, sustainable, and collaborative simulation tools.

  15. A framework for plasticity implementation on the SpiNNaker neural architecture.

    PubMed

    Galluppi, Francesco; Lagorce, Xavier; Stromatias, Evangelos; Pfeiffer, Michael; Plana, Luis A; Furber, Steve B; Benosman, Ryad B

    2014-01-01

    Many of the precise biological mechanisms of synaptic plasticity remain elusive, but simulations of neural networks have greatly enhanced our understanding of how specific global functions arise from the massively parallel computation of neurons and local Hebbian or spike-timing dependent plasticity rules. For simulating large portions of neural tissue, this has created an increasingly strong need for large scale simulations of plastic neural networks on special purpose hardware platforms, because synaptic transmissions and updates are badly matched to computing style supported by current architectures. Because of the great diversity of biological plasticity phenomena and the corresponding diversity of models, there is a great need for testing various hypotheses about plasticity before committing to one hardware implementation. Here we present a novel framework for investigating different plasticity approaches on the SpiNNaker distributed digital neural simulation platform. The key innovation of the proposed architecture is to exploit the reconfigurability of the ARM processors inside SpiNNaker, dedicating a subset of them exclusively to process synaptic plasticity updates, while the rest perform the usual neural and synaptic simulations. We demonstrate the flexibility of the proposed approach by showing the implementation of a variety of spike- and rate-based learning rules, including standard Spike-Timing dependent plasticity (STDP), voltage-dependent STDP, and the rate-based BCM rule. We analyze their performance and validate them by running classical learning experiments in real time on a 4-chip SpiNNaker board. The result is an efficient, modular, flexible and scalable framework, which provides a valuable tool for the fast and easy exploration of learning models of very different kinds on the parallel and reconfigurable SpiNNaker system.

  16. A framework for plasticity implementation on the SpiNNaker neural architecture

    PubMed Central

    Galluppi, Francesco; Lagorce, Xavier; Stromatias, Evangelos; Pfeiffer, Michael; Plana, Luis A.; Furber, Steve B.; Benosman, Ryad B.

    2015-01-01

    Many of the precise biological mechanisms of synaptic plasticity remain elusive, but simulations of neural networks have greatly enhanced our understanding of how specific global functions arise from the massively parallel computation of neurons and local Hebbian or spike-timing dependent plasticity rules. For simulating large portions of neural tissue, this has created an increasingly strong need for large scale simulations of plastic neural networks on special purpose hardware platforms, because synaptic transmissions and updates are badly matched to computing style supported by current architectures. Because of the great diversity of biological plasticity phenomena and the corresponding diversity of models, there is a great need for testing various hypotheses about plasticity before committing to one hardware implementation. Here we present a novel framework for investigating different plasticity approaches on the SpiNNaker distributed digital neural simulation platform. The key innovation of the proposed architecture is to exploit the reconfigurability of the ARM processors inside SpiNNaker, dedicating a subset of them exclusively to process synaptic plasticity updates, while the rest perform the usual neural and synaptic simulations. We demonstrate the flexibility of the proposed approach by showing the implementation of a variety of spike- and rate-based learning rules, including standard Spike-Timing dependent plasticity (STDP), voltage-dependent STDP, and the rate-based BCM rule. We analyze their performance and validate them by running classical learning experiments in real time on a 4-chip SpiNNaker board. The result is an efficient, modular, flexible and scalable framework, which provides a valuable tool for the fast and easy exploration of learning models of very different kinds on the parallel and reconfigurable SpiNNaker system. PMID:25653580

  17. The NOvA software testing framework

    NASA Astrophysics Data System (ADS)

    Tamsett, M.; C Group

    2015-12-01

    The NOvA experiment at Fermilab is a long-baseline neutrino experiment designed to study vε appearance in a vμ beam. NOvA has already produced more than one million Monte Carlo and detector generated files amounting to more than 1 PB in size. This data is divided between a number of parallel streams such as far and near detector beam spills, cosmic ray backgrounds, a number of data-driven triggers and over 20 different Monte Carlo configurations. Each of these data streams must be processed through the appropriate steps of the rapidly evolving, multi-tiered, interdependent NOvA software framework. In total there are greater than 12 individual software tiers, each of which performs a different function and can be configured differently depending on the input stream. In order to regularly test and validate that all of these software stages are working correctly NOvA has designed a powerful, modular testing framework that enables detailed validation and benchmarking to be performed in a fast, efficient and accessible way with minimal expert knowledge. The core of this system is a novel series of python modules which wrap, monitor and handle the underlying C++ software framework and then report the results to a slick front-end web-based interface. This interface utilises modern, cross-platform, visualisation libraries to render the test results in a meaningful way. They are fast and flexible, allowing for the easy addition of new tests and datasets. In total upwards of 14 individual streams are regularly tested amounting to over 70 individual software processes, producing over 25 GB of output files. The rigour enforced through this flexible testing framework enables NOvA to rapidly verify configurations, results and software and thus ensure that data is available for physics analysis in a timely and robust manner.

  18. Enhancing GIS Capabilities for High Resolution Earth Science Grids

    NASA Astrophysics Data System (ADS)

    Koziol, B. W.; Oehmke, R.; Li, P.; O'Kuinghttons, R.; Theurich, G.; DeLuca, C.

    2017-12-01

    Applications for high performance GIS will continue to increase as Earth system models pursue more realistic representations of Earth system processes. Finer spatial resolution model input and output, unstructured or irregular modeling grids, data assimilation, and regional coordinate systems present novel challenges for GIS frameworks operating in the Earth system modeling domain. This presentation provides an overview of two GIS-driven applications that combine high performance software with big geospatial datasets to produce value-added tools for the modeling and geoscientific community. First, a large-scale interpolation experiment using National Hydrography Dataset (NHD) catchments, a high resolution rectilinear CONUS grid, and the Earth System Modeling Framework's (ESMF) conservative interpolation capability will be described. ESMF is a parallel, high-performance software toolkit that provides capabilities (e.g. interpolation) for building and coupling Earth science applications. ESMF is developed primarily by the NOAA Environmental Software Infrastructure and Interoperability (NESII) group. The purpose of this experiment was to test and demonstrate the utility of high performance scientific software in traditional GIS domains. Special attention will be paid to the nuanced requirements for dealing with high resolution, unstructured grids in scientific data formats. Second, a chunked interpolation application using ESMF and OpenClimateGIS (OCGIS) will demonstrate how spatial subsetting can virtually remove computing resource ceilings for very high spatial resolution interpolation operations. OCGIS is a NESII-developed Python software package designed for the geospatial manipulation of high-dimensional scientific datasets. An overview of the data processing workflow, why a chunked approach is required, and how the application could be adapted to meet operational requirements will be discussed here. In addition, we'll provide a general overview of OCGIS's parallel subsetting capabilities including challenges in the design and implementation of a scientific data subsetter.

  19. The neurobiology of syntax: beyond string sets.

    PubMed

    Petersson, Karl Magnus; Hagoort, Peter

    2012-07-19

    The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty.

  20. The neurobiology of syntax: beyond string sets

    PubMed Central

    Petersson, Karl Magnus; Hagoort, Peter

    2012-01-01

    The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty. PMID:22688633

  1. Experimental implementation of parallel riverbed erosion to study vegetation uprooting by flow

    NASA Astrophysics Data System (ADS)

    Perona, Paolo; Edmaier, Katharina; Crouzy, Benoît

    2014-05-01

    In nature, flow erosion leading to the uprooting of vegetation is often a delayed process that gradually reduces anchoring by root exposure and correspondingly increases drag on the exposed biomass. The process determining scouring or deposition of the riverbed, and consequently plant root exposure is complex and scale dependent. At the local scale, it is hydrodynamically driven and depends on obstacle porosity, as well as sediment vs obstacle size ratio. At a larger scale it results from morphodynamic conditions, which mostly depend on riverbed topography and stream bedload transport capacity. In the latter case, ablation of sediment gradually reduces local bed elevation around the obstacle at a scale larger than the obstacle size, and uprooting eventually occurs when flow drag exceeds the residual anchoring. Ideally, one would study the timescales of vegetation uprooting by flow by inducing parallel bed erosion. This condition is not trivial to obtain experimentally because bed elevation adjustments occur in relation to longitudinal changes in sediment apportion as described by Exner's equation. In this work, we study the physical conditions leading to parallel bed erosion by reducing Exner equation closed for bedload transport to a nonlinear partial differential equation, and showing that this is a particular "boundary value" problem. Eventually, we use the data of Edmaier (2014) from a small scale mobile-bed flume setup to verify the proposed theoretical framework, and to show how such a simple experiment can provide useful insights into the timescales of the uprooting process (Edmaier et al., 2011). REFERENCES - Edmaier, K., P. Burlando, and P. Perona (2011). Mechanisms of vegetation uprooting by flow in alluvial non-cohesive sediment. Hydrology and Earth System Sciences, vol. 15, p. 1615-1627. - Edmaier, K. Uprooting mechanisms of juvenile vegetation by flow. PhD thesis, EPFL, in preparation.

  2. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    DOE PAGES

    Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...

    2013-01-01

    Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less

  3. The GBS code for tokamak scrape-off layer simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Halpern, F.D., E-mail: federico.halpern@epfl.ch; Ricci, P.; Jolliet, S.

    2016-06-15

    We describe a new version of GBS, a 3D global, flux-driven plasma turbulence code to simulate the turbulent dynamics in the tokamak scrape-off layer (SOL), superseding the code presented by Ricci et al. (2012) [14]. The present work is driven by the objective of studying SOL turbulent dynamics in medium size tokamaks and beyond with a high-fidelity physics model. We emphasize an intertwining framework of improved physics models and the computational improvements that allow them. The model extensions include neutral atom physics, finite ion temperature, the addition of a closed field line region, and a non-Boussinesq treatment of the polarizationmore » drift. GBS has been completely refactored with the introduction of a 3-D Cartesian communicator and a scalable parallel multigrid solver. We report dramatically enhanced parallel scalability, with the possibility of treating electromagnetic fluctuations very efficiently. The method of manufactured solutions as a verification process has been carried out for this new code version, demonstrating the correct implementation of the physical model.« less

  4. Status of parallel Python-based implementation of UEDGE

    NASA Astrophysics Data System (ADS)

    Umansky, M. V.; Pankin, A. Y.; Rognlien, T. D.; Dimits, A. M.; Friedman, A.; Joseph, I.

    2017-10-01

    The tokamak edge transport code UEDGE has long used the code-development and run-time framework Basis. However, with the support for Basis expected to terminate in the coming years, and with the advent of the modern numerical language Python, it has become desirable to move UEDGE to Python, to ensure its long-term viability. Our new Python-based UEDGE implementation takes advantage of the portable build system developed for FACETS. The new implementation gives access to Python's graphical libraries and numerical packages for pre- and post-processing, and support of HDF5 simplifies exchanging data. The older serial version of UEDGE has used for time-stepping the Newton-Krylov solver NKSOL. The renovated implementation uses backward Euler discretization with nonlinear solvers from PETSc, which has the promise to significantly improve the UEDGE parallel performance. We will report on assessment of some of the extended UEDGE capabilities emerging in the new implementation, and will discuss the future directions. Work performed for U.S. DOE by LLNL under contract DE-AC52-07NA27344.

  5. The source of dual-task limitations: Serial or parallel processing of multiple response selections?

    PubMed Central

    Marois, René

    2014-01-01

    Although it is generally recognized that the concurrent performance of two tasks incurs costs, the sources of these dual-task costs remain controversial. The serial bottleneck model suggests that serial postponement of task performance in dual-task conditions results from a central stage of response selection that can only process one task at a time. Cognitive-control models, by contrast, propose that multiple response selections can proceed in parallel, but that serial processing of task performance is predominantly adopted because its processing efficiency is higher than that of parallel processing. In the present study, we empirically tested this proposition by examining whether parallel processing would occur when it was more efficient and financially rewarded. The results indicated that even when parallel processing was more efficient and was incentivized by financial reward, participants still failed to process tasks in parallel. We conclude that central information processing is limited by a serial bottleneck. PMID:23864266

  6. Parallel Activation in Bilingual Phonological Processing

    ERIC Educational Resources Information Center

    Lee, Su-Yeon

    2011-01-01

    In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…

  7. The Role of Nonlinear Gradients in Parallel Imaging: A k-Space Based Analysis.

    PubMed

    Galiana, Gigi; Stockmann, Jason P; Tam, Leo; Peters, Dana; Tagare, Hemant; Constable, R Todd

    2012-09-01

    Sequences that encode the spatial information of an object using nonlinear gradient fields are a new frontier in MRI, with potential to provide lower peripheral nerve stimulation, windowed fields of view, tailored spatially-varying resolution, curved slices that mirror physiological geometry, and, most importantly, very fast parallel imaging with multichannel coils. The acceleration for multichannel images is generally explained by the fact that curvilinear gradient isocontours better complement the azimuthal spatial encoding provided by typical receiver arrays. However, the details of this complementarity have been more difficult to specify. We present a simple and intuitive framework for describing the mechanics of image formation with nonlinear gradients, and we use this framework to review some the main classes of nonlinear encoding schemes.

  8. Understanding Science: Frameworks for using stories to facilitate systems thinking

    NASA Astrophysics Data System (ADS)

    ElShafie, S. J.; Bean, J. R.

    2017-12-01

    Studies indicate that using a narrative structure for teaching and learning helps audiences to process and recall new information. Stories also help audiences retain specific information, such as character names or plot points, in the context of a broader narrative. Stories can therefore facilitate high-context systems learning in addition to low-context declarative learning. Here we incorporate a framework for science storytelling, which we use in communication workshops, with the Understanding Science framework developed by the UC Museum of Paleontology (UCMP) to explore the application of storytelling to systems thinking. We translate portions of the Understanding Science flowchart into narrative terms. Placed side by side, the two charts illustrate the parallels between the scientific process and the story development process. They offer a roadmap for developing stories about scientific studies and concepts. We also created a series of worksheets for use with the flowcharts. These new tools can generate stories from any perspective, including a scientist conducting a study; a character that plays a role in a larger system (e.g., foraminifera or a carbon atom); an entire system that interacts with other systems (e.g., the carbon cycle). We will discuss exemplar stories about climate change from each of these perspectives, which we are developing for workshops using content and storyboard models from the new UCMP website Understanding Global Change. This conceptual framework and toolkit will help instructors to develop stories about scientific concepts for use in a classroom setting. It will also help students to analyze stories presented in class, and to create their own stories about new concepts. This approach facilitates student metacognition of the learning process, and can also be used as a form of evaluation. We are testing this flowchart and its use in systems teaching with focus groups, in preparation for use in teacher professional development workshops.

  9. Molpher: a software framework for systematic chemical space exploration

    PubMed Central

    2014-01-01

    Background Chemical space is virtual space occupied by all chemically meaningful organic compounds. It is an important concept in contemporary chemoinformatics research, and its systematic exploration is vital to the discovery of either novel drugs or new tools for chemical biology. Results In this paper, we describe Molpher, an open-source framework for the systematic exploration of chemical space. Through a process we term ‘molecular morphing’, Molpher produces a path of structurally-related compounds. This path is generated by the iterative application of so-called ‘morphing operators’ that represent simple structural changes, such as the addition or removal of an atom or a bond. Molpher incorporates an optimized parallel exploration algorithm, compound logging and a two-dimensional visualization of the exploration process. Its feature set can be easily extended by implementing additional morphing operators, chemical fingerprints, similarity measures and visualization methods. Molpher not only offers an intuitive graphical user interface, but also can be run in batch mode. This enables users to easily incorporate molecular morphing into their existing drug discovery pipelines. Conclusions Molpher is an open-source software framework for the design of virtual chemical libraries focused on a particular mechanistic class of compounds. These libraries, represented by a morphing path and its surroundings, provide valuable starting data for future in silico and in vitro experiments. Molpher is highly extensible and can be easily incorporated into any existing computational drug design pipeline. PMID:24655571

  10. Molpher: a software framework for systematic chemical space exploration.

    PubMed

    Hoksza, David; Skoda, Petr; Voršilák, Milan; Svozil, Daniel

    2014-03-21

    Chemical space is virtual space occupied by all chemically meaningful organic compounds. It is an important concept in contemporary chemoinformatics research, and its systematic exploration is vital to the discovery of either novel drugs or new tools for chemical biology. In this paper, we describe Molpher, an open-source framework for the systematic exploration of chemical space. Through a process we term 'molecular morphing', Molpher produces a path of structurally-related compounds. This path is generated by the iterative application of so-called 'morphing operators' that represent simple structural changes, such as the addition or removal of an atom or a bond. Molpher incorporates an optimized parallel exploration algorithm, compound logging and a two-dimensional visualization of the exploration process. Its feature set can be easily extended by implementing additional morphing operators, chemical fingerprints, similarity measures and visualization methods. Molpher not only offers an intuitive graphical user interface, but also can be run in batch mode. This enables users to easily incorporate molecular morphing into their existing drug discovery pipelines. Molpher is an open-source software framework for the design of virtual chemical libraries focused on a particular mechanistic class of compounds. These libraries, represented by a morphing path and its surroundings, provide valuable starting data for future in silico and in vitro experiments. Molpher is highly extensible and can be easily incorporated into any existing computational drug design pipeline.

  11. An object-oriented approach to nested data parallelism

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.; Chatterjee, Siddhartha

    1994-01-01

    This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.

  12. The family medicine curriculum resource project structural framework.

    PubMed

    Stearns, Jeffrey A; Stearns, Marjorie A; Davis, Ardis K; Chessman, Alexander W

    2007-01-01

    In the original contract for the Family Medicine Curricular Resource Project (FMCRP), the Health Resources and Services Administration (HRSA), Division of Medicine and Dentistry, charged the FMCRP executive committee with reviewing recent medical education reform proposals and relevant recent curricula to develop an analytical framework for the project. The FMCRP executive and advisory committees engaged in a review and analysis of a variety of curricular reform proposals generated during the last decade of the 20th century. At the same time, in a separate and parallel process, representative individuals from all the family medicine organizations, all levels of learners, internal medicine and pediatric faculty, and the national associations of medical and osteopathic colleges (Association of American Medical Colleges and the American Association of Colleges of Osteopathic Medicine) were involved in group discussions to identify educational needs for physicians practicing in the 21st century. After deliberation, a theoretical framework was chosen for this undergraduate medical education resource that mirrors the Accreditation Council for Graduate Medical Education (ACGME) competencies, a conceptual design originated for graduate medical education. In addition to reflecting the current environment calling for change and greater accountability in medical education, use of the ACGME competencies as the theoretical framework for the FMCR provides a continuum of focus between the two major segments of physician education: medical school and residency.

  13. The Transition to a Many-core World

    NASA Astrophysics Data System (ADS)

    Mattson, T. G.

    2012-12-01

    The need to increase performance within a fixed energy budget has pushed the computer industry to many core processors. This is grounded in the physics of computing and is not a trend that will just go away. It is hard to overestimate the profound impact of many-core processors on software developers. Virtually every facet of the software development process will need to change to adapt to these new processors. In this talk, we will look at many-core hardware and consider its evolution from a perspective grounded in the CPU. We will show that the number of cores will inevitably increase, but in addition, a quest to maximize performance per watt will push these cores to be heterogeneous. We will show that the inevitable result of these changes is a computing landscape where the distinction between the CPU and the GPU is blurred. We will then consider the much more pressing problem of software in a many core world. Writing software for heterogeneous many core processors is well beyond the ability of current programmers. One solution is to support a software development process where programmer teams are split into two distinct groups: a large group of domain-expert productivity programmers and much smaller team of computer-scientist efficiency programmers. The productivity programmers work in terms of high level frameworks to express the concurrency in their problems while avoiding any details for how that concurrency is exploited. The second group, the efficiency programmers, map applications expressed in terms of these frameworks onto the target many-core system. In other words, we can solve the many-core software problem by creating a software infrastructure that only requires a small subset of programmers to become master parallel programmers. This is different from the discredited dream of automatic parallelism. Note that productivity programmers still need to define the architecture of their software in a way that exposes the concurrency inherent in their problem. We submit that domain-expert programmers understand "what is concurrent". The parallel programming problem emerges from the complexity of "how that concurrency is utilized" on real hardware. The research described in this talk was carried out in collaboration with the ParLab at UC Berkeley. We use a design pattern language to define the high level frameworks exposed to domain-expert, productivity programmers. We then use tools from the SEJITS project (Selective embedded Just In time Specializers) to build the software transformation tool chains thst turn these framework-oriented designs into highly efficient code. The final ingredient is a software platform to serve as a target for these tools. One such platform is the OpenCL industry standard for programming heterogeneous systems. We will briefly describe OpenCL and show how it provides a vendor-neutral software target for current and future many core systems; both CPU-based, GPU-based, and heterogeneous combinations of the two.

  14. EvoGraph: On-The-Fly Efficient Mining of Evolving Graphs on GPU

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sengupta, Dipanjan; Song, Shuaiwen

    With the prevalence of the World Wide Web and social networks, there has been a growing interest in high performance analytics for constantly-evolving dynamic graphs. Modern GPUs provide massive AQ1 amount of parallelism for efficient graph processing, but the challenges remain due to their lack of support for the near real-time streaming nature of dynamic graphs. Specifically, due to the current high volume and velocity of graph data combined with the complexity of user queries, traditional processing methods by first storing the updates and then repeatedly running static graph analytics on a sequence of versions or snapshots are deemed undesirablemore » and computational infeasible on GPU. We present EvoGraph, a highly efficient and scalable GPU- based dynamic graph analytics framework.« less

  15. Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

    NASA Astrophysics Data System (ADS)

    Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

    1997-12-01

    Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.

  16. On the Use of CAD and Cartesian Methods for Aerodynamic Optimization

    NASA Technical Reports Server (NTRS)

    Nemec, M.; Aftosmis, M. J.; Pulliam, T. H.

    2004-01-01

    The objective for this paper is to present the development of an optimization capability for Curt3D, a Cartesian inviscid-flow analysis package. We present the construction of a new optimization framework and we focus on the following issues: 1) Component-based geometry parameterization approach using parametric-CAD models and CAPRI. A novel geometry server is introduced that addresses the issue of parallel efficiency while only sparingly consuming CAD resources; 2) The use of genetic and gradient-based algorithms for three-dimensional aerodynamic design problems. The influence of noise on the optimization methods is studied. Our goal is to create a responsive and automated framework that efficiently identifies design modifications that result in substantial performance improvements. In addition, we examine the architectural issues associated with the deployment of a CAD-based approach in a heterogeneous parallel computing environment that contains both CAD workstations and dedicated compute engines. We demonstrate the effectiveness of the framework for a design problem that features topology changes and complex geometry.

  17. A Component-Based Extension Framework for Large-Scale Parallel Simulations in NEURON

    PubMed Central

    King, James G.; Hines, Michael; Hill, Sean; Goodman, Philip H.; Markram, Henry; Schürmann, Felix

    2008-01-01

    As neuronal simulations approach larger scales with increasing levels of detail, the neurosimulator software represents only a part of a chain of tools ranging from setup, simulation, interaction with virtual environments to analysis and visualizations. Previously published approaches to abstracting simulator engines have not received wide-spread acceptance, which in part may be to the fact that they tried to address the challenge of solving the model specification problem. Here, we present an approach that uses a neurosimulator, in this case NEURON, to describe and instantiate the network model in the simulator's native model language but then replaces the main integration loop with its own. Existing parallel network models are easily adopted to run in the presented framework. The presented approach is thus an extension to NEURON but uses a component-based architecture to allow for replaceable spike exchange components and pluggable components for monitoring, analysis, or control that can run in this framework alongside with the simulation. PMID:19430597

  18. GOCE gravity field simulation based on actual mission scenario

    NASA Astrophysics Data System (ADS)

    Pail, R.; Goiginger, H.; Mayrhofer, R.; Höck, E.; Schuh, W.-D.; Brockmann, J. M.; Krasbutter, I.; Fecher, T.; Gruber, T.

    2009-04-01

    In the framework of the ESA-funded project "GOCE High-level Processing Facility", an operational hardware and software system for the scientific processing (Level 1B to Level 2) of GOCE data has been set up by the European GOCE Gravity Consortium EGG-C. One key component of this software system is the processing of a spherical harmonic Earth's gravity field model and the corresponding full variance-covariance matrix from the precise GOCE orbit and calibrated and corrected satellite gravity gradiometry (SGG) data. In the framework of the time-wise approach a combination of several processing strategies for the optimum exploitation of the information content of the GOCE data has been set up: The Quick-Look Gravity Field Analysis is applied to derive a fast diagnosis of the GOCE system performance and to monitor the quality of the input data. In the Core Solver processing a rigorous high-precision solution of the very large normal equation systems is derived by applying parallel processing techniques on a PC cluster. Before the availability of real GOCE data, by means of a realistic numerical case study, which is based on the actual GOCE orbit and mission scenario and simulation data stemming from the most recent ESA end-to-end simulation, the expected GOCE gravity field performance is evaluated. Results from this simulation as well as recently developed features of the software system are presented. Additionally some aspects on data combination with complementary data sources are addressed.

  19. Probing the hydrogen equilibrium and kinetics in zeolite imidazolate frameworks via molecular dynamics and quasi-elastic neutron scattering experiments.

    PubMed

    Pantatosaki, Evangelia; Jobic, Hervé; Kolokolov, Daniil I; Karmakar, Shilpi; Biniwale, Rajesh; Papadopoulos, George K

    2013-01-21

    The problem of simulating processes involving equilibria and dynamics of guest sorbates within zeolitic imidazolate frameworks (ZIF) by means of molecular dynamics (MD) computer experiments is of growing importance because of the promising role of ZIFs as molecular "traps" for clean energy applications. A key issue for validating such an atomistic modeling attempt is the possibility of comparing the MD results, with real experiments being able to capture analogous space and time scales to the ones pertained to the computer experiments. In the present study, this prerequisite is fulfilled through the quasi-elastic neutron scattering technique (QENS) for measuring self-diffusivity, by elaborating the incoherent scattering signal of hydrogen nuclei. QENS and MD experiments were performed in parallel to probe the hydrogen motion, for the first time in ZIF members. The predicted and measured dynamics behaviors show considerable concentration variation of the hydrogen self-diffusion coefficient in the two topologically different ZIF pore networks of this study, the ZIF-3 and ZIF-8. Modeling options such as the flexibility of the entire matrix versus a rigid framework version, the mobility of the imidazolate ligand, and the inclusion of quantum mechanical effects in the potential functions were examined in detail for the sorption thermodynamics and kinetics of hydrogen and also of deuterium, by employing MD combined with Widom averaging towards studying phase equilibria. The latter methodology ensures a rigorous and efficient way for post-processing the dynamics trajectory, thereby avoiding stochastic moves via Monte Carlo simulation, over the large number of configurational degrees of freedom a nonrigid framework encompasses.

  20. The Goddard Space Flight Center Program to develop parallel image processing systems

    NASA Technical Reports Server (NTRS)

    Schaefer, D. H.

    1972-01-01

    Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.

  1. Mobile Ultrasound Plane Wave Beamforming on iPhone or iPad using Metal- based GPU Processing

    NASA Astrophysics Data System (ADS)

    Hewener, Holger J.; Tretbar, Steffen H.

    Mobile and cost effective ultrasound devices are being used in point of care scenarios or the drama room. To reduce the costs of such devices we already presented the possibilities of consumer devices like the Apple iPad for full signal processing of raw data for ultrasound image generation. Using technologies like plane wave imaging to generate a full image with only one excitation/reception event the acquisition times and power consumption of ultrasound imaging can be reduced for low power mobile devices based on consumer electronics realizing the transition from FPGA or ASIC based beamforming into more flexible software beamforming. The massive parallel beamforming processing can be done with the Apple framework "Metal" for advanced graphics and general purpose GPU processing for the iOS platform. We were able to integrate the beamforming reconstruction into our mobile ultrasound processing application with imaging rates up to 70 Hz on iPad Air 2 hardware.

  2. Rapid performance modeling and parameter regression of geodynamic models

    NASA Astrophysics Data System (ADS)

    Brown, J.; Duplyakin, D.

    2016-12-01

    Geodynamic models run in a parallel environment have many parameters with complicated effects on performance and scientifically-relevant functionals. Manually choosing an efficient machine configuration and mapping out the parameter space requires a great deal of expert knowledge and time-consuming experiments. We propose an active learning technique based on Gaussion Process Regression to automatically select experiments to map out the performance landscape with respect to scientific and machine parameters. The resulting performance model is then used to select optimal experiments for improving the accuracy of a reduced order model per unit of computational cost. We present the framework and evaluate its quality and capability using popular lithospheric dynamics models.

  3. A Correlation-Based Transition Model using Local Variables. Part 2; Test Cases and Industrial Applications

    NASA Technical Reports Server (NTRS)

    Langtry, R. B.; Menter, F. R.; Likki, S. R.; Suzen, Y. B.; Huang, P. G.; Volker, S.

    2006-01-01

    A new correlation-based transition model has been developed, which is built strictly on local variables. As a result, the transition model is compatible with modern computational fluid dynamics (CFD) methods using unstructured grids and massive parallel execution. The model is based on two transport equations, one for the intermittency and one for the transition onset criteria in terms of momentum thickness Reynolds number. The proposed transport equations do not attempt to model the physics of the transition process (unlike, e.g., turbulence models), but form a framework for the implementation of correlation-based models into general-purpose CFD methods.

  4. A Correlation-Based Transition Model using Local Variables. Part 1; Model Formation

    NASA Technical Reports Server (NTRS)

    Menter, F. R.; Langtry, R. B.; Likki, S. R.; Suzen, Y. B.; Huang, P. G.; Volker, S.

    2006-01-01

    A new correlation-based transition model has been developed, which is based strictly on local variables. As a result, the transition model is compatible with modern computational fluid dynamics (CFD) approaches, such as unstructured grids and massive parallel execution. The model is based on two transport equations, one for intermittency and one for the transition onset criteria in terms of momentum thickness Reynolds number. The proposed transport equations do not attempt to model the physics of the transition process (unlike, e.g., turbulence models) but from a framework for the implementation of correlation-based models into general-purpose CFD methods.

  5. Advances in simulation of wave interactions with extended MHD phenomena

    NASA Astrophysics Data System (ADS)

    Batchelor, D.; Abla, G.; D'Azevedo, E.; Bateman, G.; Bernholdt, D. E.; Berry, L.; Bonoli, P.; Bramley, R.; Breslau, J.; Chance, M.; Chen, J.; Choi, M.; Elwasif, W.; Foley, S.; Fu, G.; Harvey, R.; Jaeger, E.; Jardin, S.; Jenkins, T.; Keyes, D.; Klasky, S.; Kruger, S.; Ku, L.; Lynch, V.; McCune, D.; Ramos, J.; Schissel, D.; Schnack, D.; Wright, J.

    2009-07-01

    The Integrated Plasma Simulator (IPS) provides a framework within which some of the most advanced, massively-parallel fusion modeling codes can be interoperated to provide a detailed picture of the multi-physics processes involved in fusion experiments. The presentation will cover four topics: 1) recent improvements to the IPS, 2) application of the IPS for very high resolution simulations of ITER scenarios, 3) studies of resistive and ideal MHD stability in tokamk discharges using IPS facilities, and 4) the application of RF power in the electron cyclotron range of frequencies to control slowly growing MHD modes in tokamaks and initial evaluations of optimized location for RF power deposition.

  6. Emulating the logic of monoterpenoid alkaloid biogenesis to access a skeletally diverse chemical library.

    PubMed

    Liu, Song; Scotti, John S; Kozmin, Sergey A

    2013-09-06

    We have developed a synthetic strategy that mimics the diversity-generating power of monoterpenoid indole alkaloid biosynthesis. Our general approach goes beyond diversification of a single natural product-like substructure and enables production of a highly diverse collection of small molecules. The reaction sequence begins with rapid and highly modular assembly of the tetracyclic indoloquinolizidine core, which can be chemoselectively processed into several additional skeletally diverse structural frameworks. The general utility of this approach was demonstrated by parallel synthesis of two representative chemical libraries containing 847 compounds with favorable physicochemical properties to enable its subsequent broad pharmacological evaluation.

  7. Massively parallel GPU-accelerated minimization of classical density functional theory

    NASA Astrophysics Data System (ADS)

    Stopper, Daniel; Roth, Roland

    2017-08-01

    In this paper, we discuss the ability to numerically minimize the grand potential of hard disks in two-dimensional and of hard spheres in three-dimensional space within the framework of classical density functional and fundamental measure theory on modern graphics cards. Our main finding is that a massively parallel minimization leads to an enormous performance gain in comparison to standard sequential minimization schemes. Furthermore, the results indicate that in complex multi-dimensional situations, a heavy parallel minimization of the grand potential seems to be mandatory in order to reach a reasonable balance between accuracy and computational cost.

  8. Portable parallel portfolio optimization in the Aurora Financial Management System

    NASA Astrophysics Data System (ADS)

    Laure, Erwin; Moritsch, Hans

    2001-07-01

    Financial planning problems are formulated as large scale, stochastic, multiperiod, tree structured optimization problems. An efficient technique for solving this kind of problems is the nested Benders decomposition method. In this paper we present a parallel, portable, asynchronous implementation of this technique. To achieve our portability goals we elected the programming language Java for our implementation and used a high level Java based framework, called OpusJava, for expressing the parallelism potential as well as synchronization constraints. Our implementation is embedded within a modular decision support tool for portfolio and asset liability management, the Aurora Financial Management System.

  9. The Brazilian Audit Tribunal's role in improving the federal environmental licensing process

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lima, Luiz Henrique, E-mail: luizhlima@terra.com.b; Magrini, Alessandra, E-mail: ale@ppe.ufrj.b; Centro de Tecnologia - Bloco C Sala 211, Ilha do Fundao, 21949-900 - Rio de Janeiro, Caixa-Postal: 68565, RJ

    This article describes the role played by the Brazilian Audit Tribunal (Tribunal de Contas da Uniao - TCU) in the external auditing of environmental management in Brazil, highlighting the findings of an operational audit conducted in 2007 of the federal environmental licensing process. Initially, it records the constitutional and legal framework of Brazilian environmental licensing, describing the powers and duties granted to federal, state and municipal institutions. In addition, it presents the responsibilities of the TCU in the environmental area, comparing these with those of other Supreme Audit Institutions (SAI) that are members of the International Organization of Supreme Auditmore » Institutions (INTOSAI). It also describes the work carried out in the operational audit of the Brazilian environmental licensing process and its main conclusions and recommendations. Finally, it draws a parallel between the findings and recommendations made in Brazil with those of academic studies and audits conducted in other countries.« less

  10. A genetic algorithm-based job scheduling model for big data analytics.

    PubMed

    Lu, Qinghua; Li, Shanshan; Zhang, Weishan; Zhang, Lei

    Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and not mutually separated. The existing work mainly focuses on executing jobs in sequence, which are often inefficient and consume high energy. In this paper, we propose a genetic algorithm-based job scheduling model for big data analytics applications to improve the efficiency of big data analytics. To implement the job scheduling model, we leverage an estimation module to predict the performance of clusters when executing analytics jobs. We have evaluated the proposed job scheduling model in terms of feasibility and accuracy.

  11. A 2D MTF approach to evaluate and guide dynamic imaging developments.

    PubMed

    Chao, Tzu-Cheng; Chung, Hsiao-Wen; Hoge, W Scott; Madore, Bruno

    2010-02-01

    As the number and complexity of partially sampled dynamic imaging methods continue to increase, reliable strategies to evaluate performance may prove most useful. In the present work, an analytical framework to evaluate given reconstruction methods is presented. A perturbation algorithm allows the proposed evaluation scheme to perform robustly without requiring knowledge about the inner workings of the method being evaluated. A main output of the evaluation process consists of a two-dimensional modulation transfer function, an easy-to-interpret visual rendering of a method's ability to capture all combinations of spatial and temporal frequencies. Approaches to evaluate noise properties and artifact content at all spatial and temporal frequencies are also proposed. One fully sampled phantom and three fully sampled cardiac cine datasets were subsampled (R = 4 and 8) and reconstructed with the different methods tested here. A hybrid method, which combines the main advantageous features observed in our assessments, was proposed and tested in a cardiac cine application, with acceleration factors of 3.5 and 6.3 (skip factors of 4 and 8, respectively). This approach combines features from methods such as k-t sensitivity encoding, unaliasing by Fourier encoding the overlaps in the temporal dimension-sensitivity encoding, generalized autocalibrating partially parallel acquisition, sensitivity profiles from an array of coils for encoding and reconstruction in parallel, self, hybrid referencing with unaliasing by Fourier encoding the overlaps in the temporal dimension and generalized autocalibrating partially parallel acquisition, and generalized autocalibrating partially parallel acquisition-enhanced sensitivity maps for sensitivity encoding reconstructions.

  12. Crops in silico: A community wide multi-scale computational modeling framework of plant canopies

    NASA Astrophysics Data System (ADS)

    Srinivasan, V.; Christensen, A.; Borkiewic, K.; Yiwen, X.; Ellis, A.; Panneerselvam, B.; Kannan, K.; Shrivastava, S.; Cox, D.; Hart, J.; Marshall-Colon, A.; Long, S.

    2016-12-01

    Current crop models predict a looming gap between supply and demand for primary foodstuffs over the next 100 years. While significant yield increases were achieved in major food crops during the early years of the green revolution, the current rates of yield increases are insufficient to meet future projected food demand. Furthermore, with projected reduction in arable land, decrease in water availability, and increasing impacts of climate change on future food production, innovative technologies are required to sustainably improve crop yield. To meet these challenges, we are developing Crops in silico (Cis), a biologically informed, multi-scale, computational modeling framework that can facilitate whole plant simulations of crop systems. The Cis framework is capable of linking models of gene networks, protein synthesis, metabolic pathways, physiology, growth, and development in order to investigate crop response to different climate scenarios and resource constraints. This modeling framework will provide the mechanistic details to generate testable hypotheses toward accelerating directed breeding and engineering efforts to increase future food security. A primary objective for building such a framework is to create synergy among an inter-connected community of biologists and modelers to create a realistic virtual plant. This framework advantageously casts the detailed mechanistic understanding of individual plant processes across various scales in a common scalable framework that makes use of current advances in high performance and parallel computing. We are currently designing a user friendly interface that will make this tool equally accessible to biologists and computer scientists. Critically, this framework will provide the community with much needed tools for guiding future crop breeding and engineering, understanding the emergent implications of discoveries at the molecular level for whole plant behavior, and improved prediction of plant and ecosystem responses to the environment.

  13. Implementation of highly parallel and large scale GW calculations within the OpenAtom software

    NASA Astrophysics Data System (ADS)

    Ismail-Beigi, Sohrab

    The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.

  14. LORAKS Makes Better SENSE: Phase-Constrained Partial Fourier SENSE Reconstruction without Phase Calibration

    PubMed Central

    Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P.

    2016-01-01

    Purpose Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. Theory and Methods The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly-accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely-used calibrationless uniformly-undersampled trajectories. Results Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. Conclusion The SENSE-LORAKS framework provides promising new opportunities for highly-accelerated MRI. PMID:27037836

  15. Parallel Hough Transform-Based Straight Line Detection and Its FPGA Implementation in Embedded Vision

    PubMed Central

    Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam

    2013-01-01

    Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness. PMID:23867746

  16. Parallel Hough Transform-based straight line detection and its FPGA implementation in embedded vision.

    PubMed

    Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam

    2013-07-17

    Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness.

  17. A New Framework for Systematic Reviews: Application to Social Skills Interventions for Preschoolers with Autism

    ERIC Educational Resources Information Center

    Goldstein, Howard; Lackey, Kimberly C.; Schneider, Naomi J. B.

    2014-01-01

    This review presents a novel framework for evaluating evidence based on a set of parallel criteria that can be applied to both group and single-subject experimental design (SSED) studies. The authors illustrate use of this evaluation system in a systematic review of 67 articles investigating social skills interventions for preschoolers with autism…

  18. Causes of drug shortages in the legal pharmaceutical framework.

    PubMed

    De Weerdt, Elfi; Simoens, Steven; Hombroeckx, Luc; Casteels, Minne; Huys, Isabelle

    2015-03-01

    Different causes of drug shortages can be linked to the pharmaceutical legal framework, such as: parallel trade, quality requirements, economic decisions to suspend or cease production, etc. However until now no in-depth study of the different regulations affecting drug shortages is available. The aim of this paper is to provide an analysis of relevant legal and regulatory measures in the European pharmaceutical framework which influence drug shortages. Different European and national legislations governing human medicinal products were analyzed (e.g. Directive 2001/83/EC and Directive 2011/62/EU), supplemented with literature studies. For patented drugs, external price referencing may encompass the largest impact on drug shortages. For generic medicines, internal or external reference pricing, tendering as well as price capping may affect drug shortages. Manufacturing/quality requirements also contribute to drug shortages, since non-compliance leads to recalls. The influence of parallel trade on drug shortages is still rather disputable. Price and quality regulations are both important causes of drug shortages or drug unavailability. It can be concluded that there is room for improvement in the pharmaceutical legal framework within the lines drawn by the EU to mitigate drug shortages. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Thread concept for automatic task parallelization in image analysis

    NASA Astrophysics Data System (ADS)

    Lueckenhaus, Maximilian; Eckstein, Wolfgang

    1998-09-01

    Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.

  20. A simple computational algorithm of model-based choice preference.

    PubMed

    Toyama, Asako; Katahira, Kentaro; Ohira, Hideki

    2017-08-01

    A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.

  1. Self-calibrated correlation imaging with k-space variant correlation functions.

    PubMed

    Li, Yu; Edalati, Masoud; Du, Xingfu; Wang, Hui; Cao, Jie J

    2018-03-01

    Correlation imaging is a previously developed high-speed MRI framework that converts parallel imaging reconstruction into the estimate of correlation functions. The presented work aims to demonstrate this framework can provide a speed gain over parallel imaging by estimating k-space variant correlation functions. Because of Fourier encoding with gradients, outer k-space data contain higher spatial-frequency image components arising primarily from tissue boundaries. As a result of tissue-boundary sparsity in the human anatomy, neighboring k-space data correlation varies from the central to the outer k-space. By estimating k-space variant correlation functions with an iterative self-calibration method, correlation imaging can benefit from neighboring k-space data correlation associated with both coil sensitivity encoding and tissue-boundary sparsity, thereby providing a speed gain over parallel imaging that relies only on coil sensitivity encoding. This new approach is investigated in brain imaging and free-breathing neonatal cardiac imaging. Correlation imaging performs better than existing parallel imaging techniques in simulated brain imaging acceleration experiments. The higher speed enables real-time data acquisition for neonatal cardiac imaging in which physiological motion is fast and non-periodic. With k-space variant correlation functions, correlation imaging gives a higher speed than parallel imaging and offers the potential to image physiological motion in real-time. Magn Reson Med 79:1483-1494, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.

  2. Improving parallel I/O autotuning with performance modeling

    DOE PAGES

    Behzad, Babak; Byna, Surendra; Wild, Stefan M.; ...

    2014-01-01

    Various layers of the parallel I/O subsystem offer tunable parameters for improving I/O performance on large-scale computers. However, searching through a large parameter space is challenging. We are working towards an autotuning framework for determining the parallel I/O parameters that can achieve good I/O performance for different data write patterns. In this paper, we characterize parallel I/O and discuss the development of predictive models for use in effectively reducing the parameter space. Furthermore, applying our technique on tuning an I/O kernel derived from a large-scale simulation code shows that the search time can be reduced from 12 hours to 2more » hours, while achieving 54X I/O performance speedup.« less

  3. Flood predictions using the parallel version of distributed numerical physical rainfall-runoff model TOPKAPI

    NASA Astrophysics Data System (ADS)

    Boyko, Oleksiy; Zheleznyak, Mark

    2015-04-01

    The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.

  4. Studies in optical parallel processing. [All optical and electro-optic approaches

    NASA Technical Reports Server (NTRS)

    Lee, S. H.

    1978-01-01

    Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.

  5. Parallel workflow tools to facilitate human brain MRI post-processing

    PubMed Central

    Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang

    2015-01-01

    Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043

  6. Cooperative storage of shared files in a parallel computing system with dynamic block size

    DOEpatents

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2015-11-10

    Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).

  7. Parallel Processing of Big Point Clouds Using Z-Order Partitioning

    NASA Astrophysics Data System (ADS)

    Alis, C.; Boehm, J.; Liu, K.

    2016-06-01

    As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel. A central concept in many of these frameworks is data locality. By its nature, Big Data is large enough that the entire dataset would not fit on the memory and hard drives of a single node hence replicating the entire dataset to each worker node is impractical. The data must then be partitioned across worker nodes in a manner that minimises data transfer across the network. This is a challenge for point cloud data because there exist different ways to partition data and they may require data transfer. We propose a partitioning based on Z-order which is a form of locality-sensitive hashing. The Z-order or Morton code is computed by dividing each dimension to form a grid then interleaving the binary representation of each dimension. For example, the Z-order code for the grid square with coordinates (x = 1 = 012, y = 3 = 112) is 10112 = 11. The number of points in each partition is controlled by the number of bits per dimension: the more bits, the fewer the points. The number of bits per dimension also controls the level of detail with more bits yielding finer partitioning. We present this partitioning method by implementing it on Apache Spark and investigating how different parameters affect the accuracy and running time of the k nearest neighbour algorithm for a hemispherical and a triangular wave point cloud.

  8. Efficient multitasking: parallel versus serial processing of multiple tasks

    PubMed Central

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling. PMID:26441742

  9. Efficient multitasking: parallel versus serial processing of multiple tasks.

    PubMed

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.

  10. Array-based Hierarchical Mesh Generation in Parallel

    DOE PAGES

    Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...

    2015-11-03

    In this paper, we describe an array-based hierarchical mesh generation capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial mesh that can be used for a number of purposes such as multi-level methods to generating large meshes. The capability is developed under the parallel mesh framework “Mesh Oriented dAtaBase” a.k.a MOAB. We describe the underlying data structures and algorithms to generate such hierarchies and present numerical results for computational efficiency and mesh quality. Inmore » conclusion, we also present results to demonstrate the applicability of the developed capability to a multigrid finite-element solver.« less

  11. PROTO-PLASM: parallel language for adaptive and scalable modelling of biosystems.

    PubMed

    Bajaj, Chandrajit; DiCarlo, Antonio; Paoluzzi, Alberto

    2008-09-13

    This paper discusses the design goals and the first developments of PROTO-PLASM, a novel computational environment to produce libraries of executable, combinable and customizable computer models of natural and synthetic biosystems, aiming to provide a supporting framework for predictive understanding of structure and behaviour through multiscale geometric modelling and multiphysics simulations. Admittedly, the PROTO-PLASM platform is still in its infancy. Its computational framework--language, model library, integrated development environment and parallel engine--intends to provide patient-specific computational modelling and simulation of organs and biosystem, exploiting novel functionalities resulting from the symbolic combination of parametrized models of parts at various scales. PROTO-PLASM may define the model equations, but it is currently focused on the symbolic description of model geometry and on the parallel support of simulations. Conversely, CellML and SBML could be viewed as defining the behavioural functions (the model equations) to be used within a PROTO-PLASM program. Here we exemplify the basic functionalities of PROTO-PLASM, by constructing a schematic heart model. We also discuss multiscale issues with reference to the geometric and physical modelling of neuromuscular junctions.

  12. jInv: A Modular and Scalable Framework for Electromagnetic Inverse Problems

    NASA Astrophysics Data System (ADS)

    Belliveau, P. T.; Haber, E.

    2016-12-01

    Inversion is a key tool in the interpretation of geophysical electromagnetic (EM) data. Three-dimensional (3D) EM inversion is very computationally expensive and practical software for inverting large 3D EM surveys must be able to take advantage of high performance computing (HPC) resources. It has traditionally been difficult to achieve those goals in a high level dynamic programming environment that allows rapid development and testing of new algorithms, which is important in a research setting. With those goals in mind, we have developed jInv, a framework for PDE constrained parameter estimation problems. jInv provides optimization and regularization routines, a framework for user defined forward problems, and interfaces to several direct and iterative solvers for sparse linear systems. The forward modeling framework provides finite volume discretizations of differential operators on rectangular tensor product meshes and tetrahedral unstructured meshes that can be used to easily construct forward modeling and sensitivity routines for forward problems described by partial differential equations. jInv is written in the emerging programming language Julia. Julia is a dynamic language targeted at the computational science community with a focus on high performance and native support for parallel programming. We have developed frequency and time-domain EM forward modeling and sensitivity routines for jInv. We will illustrate its capabilities and performance with two synthetic time-domain EM inversion examples. First, in airborne surveys, which use many sources, we achieve distributed memory parallelism by decoupling the forward and inverse meshes and performing forward modeling for each source on small, locally refined meshes. Secondly, we invert grounded source time-domain data from a gradient array style induced polarization survey using a novel time-stepping technique that allows us to compute data from different time-steps in parallel. These examples both show that it is possible to invert large scale 3D time-domain EM datasets within a modular, extensible framework written in a high-level, easy to use programming language.

  13. CMIP: a software package capable of reconstructing genome-wide regulatory networks using gene expression data.

    PubMed

    Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang

    2016-12-23

    A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .

  14. GRASS GIS: The first Open Source Temporal GIS

    NASA Astrophysics Data System (ADS)

    Gebbert, Sören; Leppelt, Thomas

    2015-04-01

    GRASS GIS is a full featured, general purpose Open Source geographic information system (GIS) with raster, 3D raster and vector processing support[1]. Recently, time was introduced as a new dimension that transformed GRASS GIS into the first Open Source temporal GIS with comprehensive spatio-temporal analysis, processing and visualization capabilities[2]. New spatio-temporal data types were introduced in GRASS GIS version 7, to manage raster, 3D raster and vector time series. These new data types are called space time datasets. They are designed to efficiently handle hundreds of thousands of time stamped raster, 3D raster and vector map layers of any size. Time stamps can be defined as time intervals or time instances in Gregorian calendar time or relative time. Space time datasets are simplifying the processing and analysis of large time series in GRASS GIS, since these new data types are used as input and output parameter in temporal modules. The handling of space time datasets is therefore equal to the handling of raster, 3D raster and vector map layers in GRASS GIS. A new dedicated Python library, the GRASS GIS Temporal Framework, was designed to implement the spatio-temporal data types and their management. The framework provides the functionality to efficiently handle hundreds of thousands of time stamped map layers and their spatio-temporal topological relations. The framework supports reasoning based on the temporal granularity of space time datasets as well as their temporal topology. It was designed in conjunction with the PyGRASS [3] library to support parallel processing of large datasets, that has a long tradition in GRASS GIS [4,5]. We will present a subset of more than 40 temporal modules that were implemented based on the GRASS GIS Temporal Framework, PyGRASS and the GRASS GIS Python scripting library. These modules provide a comprehensive temporal GIS tool set. The functionality range from space time dataset and time stamped map layer management over temporal aggregation, temporal accumulation, spatio-temporal statistics, spatio-temporal sampling, temporal algebra, temporal topology analysis, time series animation and temporal topology visualization to time series import and export capabilities with support for NetCDF and VTK data formats. We will present several temporal modules that support parallel processing of raster and 3D raster time series. [1] GRASS GIS Open Source Approaches in Spatial Data Handling In Open Source Approaches in Spatial Data Handling, Vol. 2 (2008), pp. 171-199, doi:10.1007/978-3-540-74831-19 by M. Neteler, D. Beaudette, P. Cavallini, L. Lami, J. Cepicky edited by G. Brent Hall, Michael G. Leahy [2] Gebbert, S., Pebesma, E., 2014. A temporal GIS for field based environmental modeling. Environ. Model. Softw. 53, 1-12. [3] Zambelli, P., Gebbert, S., Ciolli, M., 2013. Pygrass: An Object Oriented Python Application Programming Interface (API) for Geographic Resources Analysis Support System (GRASS) Geographic Information System (GIS). ISPRS Intl Journal of Geo-Information 2, 201-219. [4] Löwe, P., Klump, J., Thaler, J. (2012): The FOSS GIS Workbench on the GFZ Load Sharing Facility compute cluster, (Geophysical Research Abstracts Vol. 14, EGU2012-4491, 2012), General Assembly European Geosciences Union (Vienna, Austria 2012). [5] Akhter, S., Aida, K., Chemin, Y., 2010. "GRASS GIS on High Performance Computing with MPI, OpenMP and Ninf-G Programming Framework". ISPRS Conference, Kyoto, 9-12 August 2010

  15. plasmaFoam: An OpenFOAM framework for computational plasma physics and chemistry

    NASA Astrophysics Data System (ADS)

    Venkattraman, Ayyaswamy; Verma, Abhishek Kumar

    2016-09-01

    As emphasized in the 2012 Roadmap for low temperature plasmas (LTP), scientific computing has emerged as an essential tool for the investigation and prediction of the fundamental physical and chemical processes associated with these systems. While several in-house and commercial codes exist, with each having its own advantages and disadvantages, a common framework that can be developed by researchers from all over the world will likely accelerate the impact of computational studies on advances in low-temperature plasma physics and chemistry. In this regard, we present a finite volume computational toolbox to perform high-fidelity simulations of LTP systems. This framework, primarily based on the OpenFOAM solver suite, allows us to enhance our understanding of multiscale plasma phenomenon by performing massively parallel, three-dimensional simulations on unstructured meshes using well-established high performance computing tools that are widely used in the computational fluid dynamics community. In this talk, we will present preliminary results obtained using the OpenFOAM-based solver suite with benchmark three-dimensional simulations of microplasma devices including both dielectric and plasma regions. We will also discuss the future outlook for the solver suite.

  16. Tiered Approach to Resilience Assessment.

    PubMed

    Linkov, Igor; Fox-Lent, Cate; Read, Laura; Allen, Craig R; Arnott, James C; Bellini, Emanuele; Coaffee, Jon; Florin, Marie-Valentine; Hatfield, Kirk; Hyde, Iain; Hynes, William; Jovanovic, Aleksandar; Kasperson, Roger; Katzenberger, John; Keys, Patrick W; Lambert, James H; Moss, Richard; Murdoch, Peter S; Palma-Oliveira, Jose; Pulwarty, Roger S; Sands, Dale; Thomas, Edward A; Tye, Mari R; Woods, David

    2018-04-25

    Regulatory agencies have long adopted a three-tier framework for risk assessment. We build on this structure to propose a tiered approach for resilience assessment that can be integrated into the existing regulatory processes. Comprehensive approaches to assessing resilience at appropriate and operational scales, reconciling analytical complexity as needed with stakeholder needs and resources available, and ultimately creating actionable recommendations to enhance resilience are still lacking. Our proposed framework consists of tiers by which analysts can select resilience assessment and decision support tools to inform associated management actions relative to the scope and urgency of the risk and the capacity of resource managers to improve system resilience. The resilience management framework proposed is not intended to supplant either risk management or the many existing efforts of resilience quantification method development, but instead provide a guide to selecting tools that are appropriate for the given analytic need. The goal of this tiered approach is to intentionally parallel the tiered approach used in regulatory contexts so that resilience assessment might be more easily and quickly integrated into existing structures and with existing policies. Published 2018. This article is a U.S. government work and is in the public domain in the USA.

  17. Enabling parallel simulation of large-scale HPC network systems

    DOE PAGES

    Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; ...

    2016-04-07

    Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less

  18. Enabling parallel simulation of large-scale HPC network systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.

    Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less

  19. A parallel method of atmospheric correction for multispectral high spatial resolution remote sensing images

    NASA Astrophysics Data System (ADS)

    Zhao, Shaoshuai; Ni, Chen; Cao, Jing; Li, Zhengqiang; Chen, Xingfeng; Ma, Yan; Yang, Leiku; Hou, Weizhen; Qie, Lili; Ge, Bangyu; Liu, Li; Xing, Jin

    2018-03-01

    The remote sensing image is usually polluted by atmosphere components especially like aerosol particles. For the quantitative remote sensing applications, the radiative transfer model based atmospheric correction is used to get the reflectance with decoupling the atmosphere and surface by consuming a long computational time. The parallel computing is a solution method for the temporal acceleration. The parallel strategy which uses multi-CPU to work simultaneously is designed to do atmospheric correction for a multispectral remote sensing image. The parallel framework's flow and the main parallel body of atmospheric correction are described. Then, the multispectral remote sensing image of the Chinese Gaofen-2 satellite is used to test the acceleration efficiency. When the CPU number is increasing from 1 to 8, the computational speed is also increasing. The biggest acceleration rate is 6.5. Under the 8 CPU working mode, the whole image atmospheric correction costs 4 minutes.

  20. Massively parallel sparse matrix function calculations with NTPoly

    NASA Astrophysics Data System (ADS)

    Dawson, William; Nakajima, Takahito

    2018-04-01

    We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.

  1. MaMR: High-performance MapReduce programming model for material cloud applications

    NASA Astrophysics Data System (ADS)

    Jing, Weipeng; Tong, Danyu; Wang, Yangang; Wang, Jingyuan; Liu, Yaqiu; Zhao, Peng

    2017-02-01

    With the increasing data size in materials science, existing programming models no longer satisfy the application requirements. MapReduce is a programming model that enables the easy development of scalable parallel applications to process big data on cloud computing systems. However, this model does not directly support the processing of multiple related data, and the processing performance does not reflect the advantages of cloud computing. To enhance the capability of workflow applications in material data processing, we defined a programming model for material cloud applications that supports multiple different Map and Reduce functions running concurrently based on hybrid share-memory BSP called MaMR. An optimized data sharing strategy to supply the shared data to the different Map and Reduce stages was also designed. We added a new merge phase to MapReduce that can efficiently merge data from the map and reduce modules. Experiments showed that the model and framework present effective performance improvements compared to previous work.

  2. 3-D modeling of ductile tearing using finite elements: Computational aspects and techniques

    NASA Astrophysics Data System (ADS)

    Gullerud, Arne Stewart

    This research focuses on the development and application of computational tools to perform large-scale, 3-D modeling of ductile tearing in engineering components under quasi-static to mild loading rates. Two standard models for ductile tearing---the computational cell methodology and crack growth controlled by the crack tip opening angle (CTOA)---are described and their 3-D implementations are explored. For the computational cell methodology, quantification of the effects of several numerical issues---computational load step size, procedures for force release after cell deletion, and the porosity for cell deletion---enables construction of computational algorithms to remove the dependence of predicted crack growth on these issues. This work also describes two extensions of the CTOA approach into 3-D: a general 3-D method and a constant front technique. Analyses compare the characteristics of the extensions, and a validation study explores the ability of the constant front extension to predict crack growth in thin aluminum test specimens over a range of specimen geometries, absolutes sizes, and levels of out-of-plane constraint. To provide a computational framework suitable for the solution of these problems, this work also describes the parallel implementation of a nonlinear, implicit finite element code. The implementation employs an explicit message-passing approach using the MPI standard to maintain portability, a domain decomposition of element data to provide parallel execution, and a master-worker organization of the computational processes to enhance future extensibility. A linear preconditioned conjugate gradient (LPCG) solver serves as the core of the solution process. The parallel LPCG solver utilizes an element-by-element (EBE) structure of the computations to permit a dual-level decomposition of the element data: domain decomposition of the mesh provides efficient coarse-grain parallel execution, while decomposition of the domains into blocks of similar elements (same type, constitutive model, etc.) provides fine-grain parallel computation on each processor. A major focus of the LPCG solver is a new implementation of the Hughes-Winget element-by-element (HW) preconditioner. The implementation employs a weighted dependency graph combined with a new coloring algorithm to provide load-balanced scheduling for the preconditioner and overlapped communication/computation. This approach enables efficient parallel application of the HW preconditioner for arbitrary unstructured meshes.

  3. Scout: high-performance heterogeneous computing made simple

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jablin, James; Mc Cormick, Patrick; Herlihy, Maurice

    2011-01-26

    Researchers must often write their own simulation and analysis software. During this process they simultaneously confront both computational and scientific problems. Current strategies for aiding the generation of performance-oriented programs do not abstract the software development from the science. Furthermore, the problem is becoming increasingly complex and pressing with the continued development of many-core and heterogeneous (CPU-GPU) architectures. To acbieve high performance, scientists must expertly navigate both software and hardware. Co-design between computer scientists and research scientists can alleviate but not solve this problem. The science community requires better tools for developing, optimizing, and future-proofing codes, allowing scientists to focusmore » on their research while still achieving high computational performance. Scout is a parallel programming language and extensible compiler framework targeting heterogeneous architectures. It provides the abstraction required to buffer scientists from the constantly-shifting details of hardware while still realizing higb-performance by encapsulating software and hardware optimization within a compiler framework.« less

  4. Configurable analog-digital conversion using the neural engineering framework

    PubMed Central

    Mayr, Christian G.; Partzsch, Johannes; Noack, Marko; Schüffny, Rene

    2014-01-01

    Efficient Analog-Digital Converters (ADC) are one of the mainstays of mixed-signal integrated circuit design. Besides the conventional ADCs used in mainstream ICs, there have been various attempts in the past to utilize neuromorphic networks to accomplish an efficient crossing between analog and digital domains, i.e., to build neurally inspired ADCs. Generally, these have suffered from the same problems as conventional ADCs, that is they require high-precision, handcrafted analog circuits and are thus not technology portable. In this paper, we present an ADC based on the Neural Engineering Framework (NEF). It carries out a large fraction of the overall ADC process in the digital domain, i.e., it is easily portable across technologies. The analog-digital conversion takes full advantage of the high degree of parallelism inherent in neuromorphic networks, making for a very scalable ADC. In addition, it has a number of features not commonly found in conventional ADCs, such as a runtime reconfigurability of the ADC sampling rate, resolution and transfer characteristic. PMID:25100933

  5. Development and Application of the Collaborative Optimization Architecture in a Multidisciplinary Design Environment

    NASA Technical Reports Server (NTRS)

    Braun, R. D.; Kroo, I. M.

    1995-01-01

    Collaborative optimization is a design architecture applicable in any multidisciplinary analysis environment but specifically intended for large-scale distributed analysis applications. In this approach, a complex problem is hierarchically de- composed along disciplinary boundaries into a number of subproblems which are brought into multidisciplinary agreement by a system-level coordination process. When applied to problems in a multidisciplinary design environment, this scheme has several advantages over traditional solution strategies. These advantageous features include reducing the amount of information transferred between disciplines, the removal of large iteration-loops, allowing the use of different subspace optimizers among the various analysis groups, an analysis framework which is easily parallelized and can operate on heterogenous equipment, and a structural framework that is well-suited for conventional disciplinary organizations. In this article, the collaborative architecture is developed and its mathematical foundation is presented. An example application is also presented which highlights the potential of this method for use in large-scale design applications.

  6. Computation of free energy profiles with parallel adaptive dynamics

    NASA Astrophysics Data System (ADS)

    Lelièvre, Tony; Rousset, Mathias; Stoltz, Gabriel

    2007-04-01

    We propose a formulation of an adaptive computation of free energy differences, in the adaptive biasing force or nonequilibrium metadynamics spirit, using conditional distributions of samples of configurations which evolve in time. This allows us to present a truly unifying framework for these methods, and to prove convergence results for certain classes of algorithms. From a numerical viewpoint, a parallel implementation of these methods is very natural, the replicas interacting through the reconstructed free energy. We demonstrate how to improve this parallel implementation by resorting to some selection mechanism on the replicas. This is illustrated by computations on a model system of conformational changes.

  7. A high-speed linear algebra library with automatic parallelism

    NASA Technical Reports Server (NTRS)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  8. Development of IR imaging system simulator

    NASA Astrophysics Data System (ADS)

    Xiang, Xinglang; He, Guojing; Dong, Weike; Dong, Lu

    2017-02-01

    To overcome the disadvantages of the tradition semi-physical simulation and injection simulation equipment in the performance evaluation of the infrared imaging system (IRIS), a low-cost and reconfigurable IRIS simulator, which can simulate the realistic physical process of infrared imaging, is proposed to test and evaluate the performance of the IRIS. According to the theoretical simulation framework and the theoretical models of the IRIS, the architecture of the IRIS simulator is constructed. The 3D scenes are generated and the infrared atmospheric transmission effects are simulated using OGRE technology in real-time on the computer. The physical effects of the IRIS are classified as the signal response characteristic, modulation transfer characteristic and noise characteristic, and they are simulated on the single-board signal processing platform based on the core processor FPGA in real-time using high-speed parallel computation method.

  9. National Combustion Code: A Multidisciplinary Combustor Design System

    NASA Technical Reports Server (NTRS)

    Stubbs, Robert M.; Liu, Nan-Suey

    1997-01-01

    The Internal Fluid Mechanics Division conducts both basic research and technology, and system technology research for aerospace propulsion systems components. The research within the division, which is both computational and experimental, is aimed at improving fundamental understanding of flow physics in inlets, ducts, nozzles, turbomachinery, and combustors. This article and the following three articles highlight some of the work accomplished in 1996. A multidisciplinary combustor design system is critical for optimizing the combustor design process. Such a system should include sophisticated computer-aided design (CAD) tools for geometry creation, advanced mesh generators for creating solid model representations, a common framework for fluid flow and structural analyses, modern postprocessing tools, and parallel processing. The goal of the present effort is to develop some of the enabling technologies and to demonstrate their overall performance in an integrated system called the National Combustion Code.

  10. Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

    PubMed

    Tam, Wing-Kin; Yang, Zhi

    2018-05-01

    Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Anatomically constrained neural network models for the categorization of facial expression

    NASA Astrophysics Data System (ADS)

    McMenamin, Brenton W.; Assadi, Amir H.

    2004-12-01

    The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.

  12. Anatomically constrained neural network models for the categorization of facial expression

    NASA Astrophysics Data System (ADS)

    McMenamin, Brenton W.; Assadi, Amir H.

    2005-01-01

    The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.

  13. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, D.B.

    1996-12-31

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.

  14. Parallel processing data network of master and slave transputers controlled by a serial control network

    DOEpatents

    Crosetto, Dario B.

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  15. Super and parallel computers and their impact on civil engineering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamat, M.P.

    1986-01-01

    This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.

  16. Parallel processing architecture for computing inverse differential kinematic equations of the PUMA arm

    NASA Technical Reports Server (NTRS)

    Hsia, T. C.; Lu, G. Z.; Han, W. H.

    1987-01-01

    In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.

  17. Performance evaluation of canny edge detection on a tiled multicore architecture

    NASA Astrophysics Data System (ADS)

    Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald

    2011-01-01

    In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.

  18. From experiment to design -- Fault characterization and detection in parallel computer systems using computational accelerators

    NASA Astrophysics Data System (ADS)

    Yim, Keun Soo

    This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of program states that included dynamically allocated memory (to be spatially comprehensive). In GPUs, we used fault injection studies to demonstrate the importance of detecting silent data corruption (SDC) errors that are mainly due to the lack of fine-grained protections and the massive use of fault-insensitive data. This dissertation also presents transparent fault tolerance frameworks and techniques that are directly applicable to hybrid computers built using only commercial off-the-shelf hardware components. This dissertation shows that by developing understanding of the failure characteristics and error propagation paths of target programs, we were able to create fault tolerance frameworks and techniques that can quickly detect and recover from hardware faults with low performance and hardware overheads.

  19. Use of parallel computing in mass processing of laser data

    NASA Astrophysics Data System (ADS)

    Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

    2015-12-01

    The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.

  20. A review of cognitive therapy in acute medical settings. Part I: therapy model and assessment.

    PubMed

    Levin, Tomer T; White, Craig A; Kissane, David W

    2013-04-01

    Although cognitive therapy (CT) has established outpatient utility, there is no integrative framework for using CT in acute medical settings where most psychosomatic medicine (P-M) clinicians practice. Biopsychosocial complexity challenges P-M clinicians who want to use CT as the a priori psychotherapeutic modality. For example, how should clinicians modify the data gathering and formulation process to support CT in acute settings? Narrative review methodology is used to describe the framework for a CT informed interview, formulation, and assessment in acute medical settings. Because this review is aimed largely at P-M trainees and educators, exemplary dialogues model the approach (specific CT strategies for common P-M scenarios appear in the companion article.) Structured data gathering needs to be tailored by focusing on cognitive processes informed by the cognitive hypothesis. Agenda setting, Socratic questioning, and adaptations to the mental state examination are necessary. Specific attention is paid to the CT formulation, Folkman's Cognitive Coping Model, self-report measures, data-driven evaluations, and collaboration (e.g., sharing the formulation with the patient.) Integrative CT-psychopharmacological approaches and the importance of empathy are emphasized. The value of implementing psychotherapy in parallel with data gathering because of time urgency is advocated, but this is a significant departure from usual outpatient approaches in which psychotherapy follows evaluation. This conceptual approach offers a novel integrative framework for using CT in acute medical settings, but future challenges include demonstrating clinical outcomes and training P-M clinicians so as to demonstrate fidelity.

  1. Interaction Between Ecohydrologic Dynamics and Microtopographic Variability Under Climate Change

    NASA Astrophysics Data System (ADS)

    Le, Phong V. V.; Kumar, Praveen

    2017-10-01

    Vegetation acclimation resulting from elevated atmospheric CO2 concentration, along with response to increased temperature and altered rainfall pattern, is expected to result in emergent behavior in ecologic and hydrologic functions. We hypothesize that microtopographic variability, which are landscape features typically of the length scale of the order of meters, such as topographic depressions, will play an important role in determining this dynamics by altering the persistence and variability of moisture. To investigate these emergent ecohydrologic dynamics, we develop a modeling framework, Dhara, which explicitly incorporates the control of microtopographic variability on vegetation, moisture, and energy dynamics. The intensive computational demand from such a modeling framework that allows coupling of multilayer modeling of the soil-vegetation continuum with 3-D surface-subsurface flow processes is addressed using hybrid CPU-GPU parallel computing framework. The study is performed for different climate change scenarios for an intensively managed agricultural landscape in central Illinois, USA, which is dominated by row-crop agriculture, primarily soybean (Glycine max) and maize (Zea mays). We show that rising CO2 concentration will decrease evapotranspiration, thus increasing soil moisture and surface water ponding in topographic depressions. However, increased atmospheric demand from higher air temperature overcomes this conservative behavior resulting in a net increase of evapotranspiration, leading to reduction in both soil moisture storage and persistence of ponding. These results shed light on the linkage between vegetation acclimation under climate change and microtopography variability controls on ecohydrologic processes.

  2. Analytical modeling and feasibility study of a multi-GPU cloud-based server (MGCS) framework for non-voxel-based dose calculations.

    PubMed

    Neylon, J; Min, Y; Kupelian, P; Low, D A; Santhanam, A

    2017-04-01

    In this paper, a multi-GPU cloud-based server (MGCS) framework is presented for dose calculations, exploring the feasibility of remote computing power for parallelization and acceleration of computationally and time intensive radiotherapy tasks in moving toward online adaptive therapies. An analytical model was developed to estimate theoretical MGCS performance acceleration and intelligently determine workload distribution. Numerical studies were performed with a computing setup of 14 GPUs distributed over 4 servers interconnected by a 1 Gigabits per second (Gbps) network. Inter-process communication methods were optimized to facilitate resource distribution and minimize data transfers over the server interconnect. The analytically predicted computation time predicted matched experimentally observations within 1-5 %. MGCS performance approached a theoretical limit of acceleration proportional to the number of GPUs utilized when computational tasks far outweighed memory operations. The MGCS implementation reproduced ground-truth dose computations with negligible differences, by distributing the work among several processes and implemented optimization strategies. The results showed that a cloud-based computation engine was a feasible solution for enabling clinics to make use of fast dose calculations for advanced treatment planning and adaptive radiotherapy. The cloud-based system was able to exceed the performance of a local machine even for optimized calculations, and provided significant acceleration for computationally intensive tasks. Such a framework can provide access to advanced technology and computational methods to many clinics, providing an avenue for standardization across institutions without the requirements of purchasing, maintaining, and continually updating hardware.

  3. A GPU-Parallelized Eigen-Based Clutter Filter Framework for Ultrasound Color Flow Imaging.

    PubMed

    Chee, Adrian J Y; Yiu, Billy Y S; Yu, Alfred C H

    2017-01-01

    Eigen-filters with attenuation response adapted to clutter statistics in color flow imaging (CFI) have shown improved flow detection sensitivity in the presence of tissue motion. Nevertheless, its practical adoption in clinical use is not straightforward due to the high computational cost for solving eigendecompositions. Here, we provide a pedagogical description of how a real-time computing framework for eigen-based clutter filtering can be developed through a single-instruction, multiple data (SIMD) computing approach that can be implemented on a graphical processing unit (GPU). Emphasis is placed on the single-ensemble-based eigen-filtering approach (Hankel singular value decomposition), since it is algorithmically compatible with GPU-based SIMD computing. The key algebraic principles and the corresponding SIMD algorithm are explained, and annotations on how such algorithm can be rationally implemented on the GPU are presented. Real-time efficacy of our framework was experimentally investigated on a single GPU device (GTX Titan X), and the computing throughput for varying scan depths and slow-time ensemble lengths was studied. Using our eigen-processing framework, real-time video-range throughput (24 frames/s) can be attained for CFI frames with full view in azimuth direction (128 scanlines), up to a scan depth of 5 cm ( λ pixel axial spacing) for slow-time ensemble length of 16 samples. The corresponding CFI image frames, with respect to the ones derived from non-adaptive polynomial regression clutter filtering, yielded enhanced flow detection sensitivity in vivo, as demonstrated in a carotid imaging case example. These findings indicate that the GPU-enabled eigen-based clutter filtering can improve CFI flow detection performance in real time.

  4. Accelerating semantic graph databases on commodity clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morari, Alessandro; Castellana, Vito G.; Haglin, David J.

    We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.

  5. Benchmarking Ada tasking on tightly coupled multiprocessor architectures

    NASA Technical Reports Server (NTRS)

    Collard, Philippe; Goforth, Andre; Marquardt, Matthew

    1989-01-01

    The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.

  6. LORAKS makes better SENSE: Phase-constrained partial fourier SENSE reconstruction without phase calibration.

    PubMed

    Kim, Tae Hyung; Setsompop, Kawin; Haldar, Justin P

    2017-03-01

    Parallel imaging and partial Fourier acquisition are two classical approaches for accelerated MRI. Methods that combine these approaches often rely on prior knowledge of the image phase, but the need to obtain this prior information can place practical restrictions on the data acquisition strategy. In this work, we propose and evaluate SENSE-LORAKS, which enables combined parallel imaging and partial Fourier reconstruction without requiring prior phase information. The proposed formulation is based on combining the classical SENSE model for parallel imaging data with the more recent LORAKS framework for MR image reconstruction using low-rank matrix modeling. Previous LORAKS-based methods have successfully enabled calibrationless partial Fourier parallel MRI reconstruction, but have been most successful with nonuniform sampling strategies that may be hard to implement for certain applications. By combining LORAKS with SENSE, we enable highly accelerated partial Fourier MRI reconstruction for a broader range of sampling trajectories, including widely used calibrationless uniformly undersampled trajectories. Our empirical results with retrospectively undersampled datasets indicate that when SENSE-LORAKS reconstruction is combined with an appropriate k-space sampling trajectory, it can provide substantially better image quality at high-acceleration rates relative to existing state-of-the-art reconstruction approaches. The SENSE-LORAKS framework provides promising new opportunities for highly accelerated MRI. Magn Reson Med 77:1021-1035, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

  7. Array-based, parallel hierarchical mesh refinement algorithms for unstructured meshes

    DOE PAGES

    Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...

    2016-08-18

    In this paper, we describe an array-based hierarchical mesh refinement capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial coarse mesh that can be used for a variety of purposes such as in multigrid solvers/preconditioners, to do solution convergence and verification studies and to improve overall parallel efficiency by decreasing I/O bandwidth requirements (by loading smaller meshes and in memory refinement). We also describe a high-order boundary reconstruction capability that can be used tomore » project the new points after refinement using high-order approximations instead of linear projection in order to minimize and provide more control on geometrical errors introduced by curved boundaries.The capability is developed under the parallel unstructured mesh framework "Mesh Oriented dAtaBase" (MOAB Tautges et al. (2004)). We describe the underlying data structures and algorithms to generate such hierarchies in parallel and present numerical results for computational efficiency and effect on mesh quality. Furthermore, we also present results to demonstrate the applicability of the developed capability to study convergence properties of different point projection schemes for various mesh hierarchies and to a multigrid finite-element solver for elliptic problems.« less

  8. Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

    USDA-ARS?s Scientific Manuscript database

    This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...

  9. Parallel scalability and efficiency of vortex particle method for aeroelasticity analysis of bluff bodies

    NASA Astrophysics Data System (ADS)

    Tolba, Khaled Ibrahim; Morgenthal, Guido

    2018-01-01

    This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.

  10. NASA Exhibits

    NASA Technical Reports Server (NTRS)

    Deardorff, Glenn; Djomehri, M. Jahed; Freeman, Ken; Gambrel, Dave; Green, Bryan; Henze, Chris; Hinke, Thomas; Hood, Robert; Kiris, Cetin; Moran, Patrick; hide

    2001-01-01

    A series of NASA presentations for the Supercomputing 2001 conference are summarized. The topics include: (1) Mars Surveyor Landing Sites "Collaboratory"; (2) Parallel and Distributed CFD for Unsteady Flows with Moving Overset Grids; (3) IP Multicast for Seamless Support of Remote Science; (4) Consolidated Supercomputing Management Office; (5) Growler: A Component-Based Framework for Distributed/Collaborative Scientific Visualization and Computational Steering; (6) Data Mining on the Information Power Grid (IPG); (7) Debugging on the IPG; (8) Debakey Heart Assist Device: (9) Unsteady Turbopump for Reusable Launch Vehicle; (10) Exploratory Computing Environments Component Framework; (11) OVERSET Computational Fluid Dynamics Tools; (12) Control and Observation in Distributed Environments; (13) Multi-Level Parallelism Scaling on NASA's Origin 1024 CPU System; (14) Computing, Information, & Communications Technology; (15) NAS Grid Benchmarks; (16) IPG: A Large-Scale Distributed Computing and Data Management System; and (17) ILab: Parameter Study Creation and Submission on the IPG.

  11. The UPSF code: a metaprogramming-based high-performance automatically parallelized plasma simulation framework

    NASA Astrophysics Data System (ADS)

    Gao, Xiatian; Wang, Xiaogang; Jiang, Binhao

    2017-10-01

    UPSF (Universal Plasma Simulation Framework) is a new plasma simulation code designed for maximum flexibility by using edge-cutting techniques supported by C++17 standard. Through use of metaprogramming technique, UPSF provides arbitrary dimensional data structures and methods to support various kinds of plasma simulation models, like, Vlasov, particle in cell (PIC), fluid, Fokker-Planck, and their variants and hybrid methods. Through C++ metaprogramming technique, a single code can be used to arbitrary dimensional systems with no loss of performance. UPSF can also automatically parallelize the distributed data structure and accelerate matrix and tensor operations by BLAS. A three-dimensional particle in cell code is developed based on UPSF. Two test cases, Landau damping and Weibel instability for electrostatic and electromagnetic situation respectively, are presented to show the validation and performance of the UPSF code.

  12. A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.

    PubMed

    Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming

    2017-06-16

    Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.

  13. Development of mpi_EPIC model for global agroecosystem modeling

    DOE PAGES

    Kang, Shujiang; Wang, Dali; Jeff A. Nichols; ...

    2014-12-31

    Models that address policy-maker concerns about multi-scale effects of food and bioenergy production systems are computationally demanding. We integrated the message passing interface algorithm into the process-based EPIC model to accelerate computation of ecosystem effects. Simulation performance was further enhanced by applying the Vampir framework. When this enhanced mpi_EPIC model was tested, total execution time for a global 30-year simulation of a switchgrass cropping system was shortened to less than 0.5 hours on a supercomputer. The results illustrate that mpi_EPIC using parallel design can balance simulation workloads and facilitate large-scale, high-resolution analysis of agricultural production systems, management alternatives and environmentalmore » effects.« less

  14. Motor and somatosensory conversion disorder: a functional unawareness syndrome?

    PubMed

    Perez, David L; Barsky, Arthur J; Daffner, Kirk; Silbersweig, David A

    2012-01-01

    Although conversion disorder is closely connected to the origins of neurology and psychiatry, it remains poorly understood. In this article, the authors discuss neural and clinical parallels between lesional unawareness disorders and unilateral motor and somatosensory conversion disorder, emphasizing functional neuroimaging/disease correlates. Authors suggest that a functional-unawareness neurobiological framework, mediated by right hemisphere-lateralized, large-scale brain network dysfunction, may play a significant role in the neurobiology of conversion disorder. The perigenual anterior cingulate and the posterior parietal cortices are detailed as important in disease pathophysiology. Further investigations will refine the functional-unawareness concept, clarify the role of affective circuits, and delineate the process through which functional neurologic symptoms emerge.

  15. Automatic Camera Orientation and Structure Recovery with Samantha

    NASA Astrophysics Data System (ADS)

    Gherardi, R.; Toldo, R.; Garro, V.; Fusiello, A.

    2011-09-01

    SAMANTHA is a software capable of computing camera orientation and structure recovery from a sparse block of casual images without human intervention. It can process both calibrated images or uncalibrated, in which case an autocalibration routine is run. Pictures are organized into a hierarchical tree which has single images as leaves and partial reconstructions as internal nodes. The method proceeds bottom up until it reaches the root node, corresponding to the final result. This framework is one order of magnitude faster than sequential approaches, inherently parallel, less sensitive to the error accumulation causing drift. We have verified the quality of our reconstructions both qualitatively producing compelling point clouds and quantitatively, comparing them with laser scans serving as ground truth.

  16. Benchmarking and performance analysis of the CM-2. [SIMD computer

    NASA Technical Reports Server (NTRS)

    Myers, David W.; Adams, George B., II

    1988-01-01

    A suite of benchmarking routines testing communication, basic arithmetic operations, and selected kernel algorithms written in LISP and PARIS was developed for the CM-2. Experiment runs are automated via a software framework that sequences individual tests, allowing for unattended overnight operation. Multiple measurements are made and treated statistically to generate well-characterized results from the noisy values given by cm:time. The results obtained provide a comparison with similar, but less extensive, testing done on a CM-1. Tests were chosen to aid the algorithmist in constructing fast, efficient, and correct code on the CM-2, as well as gain insight into what performance criteria are needed when evaluating parallel processing machines.

  17. A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics

    NASA Astrophysics Data System (ADS)

    Poya, Roman; Gil, Antonio J.; Ortigosa, Rogelio

    2017-07-01

    The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first or breadth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electro-mechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates combined with SIMD instructions are shown to provide a significant speed-up over the classical low-level style programming techniques.

  18. Crystal structure of dimanganese(II) zinc bis­[ortho­phosphate(V)] monohydrate

    PubMed Central

    Alhakmi, Ghaleb; Assani, Abderrazzak; Saadi, Mohamed; El Ammari, Lahcen

    2015-01-01

    The title compound, Mn2Zn(PO4)2·H2O, was obtained under hydro­thermal conditions. The structure is isotypic with other transition metal phosphates of the type M 3− xM′x(PO4)2·H2O, but shows no statistical disorder of the three metallic sites. The principal building units are distorted [MnO6] and [MnO5(H2O)] octa­hedra, a distorted [ZnO5] square pyramid and two regular PO4 tetra­hedra. The connection of the polyhedra leads to a framework structure. Two types of layers parallel to (-101) can be distinguished in this framework. One layer contains [Zn2O8] dimers linked to PO4 tetra­hedra via common edges. The other layer is more corrugated and contains [Mn2O8(H2O)2] dimers and [MnO6] octa­hedra linked together by common edges. The PO4 tetra­hedra link the two types of layers into a framework structure with channels parallel to [101]. The H atoms of the water mol­ecules point into the channels and form O—H⋯O hydrogen bonds (one of which is bifurcated) with framework O atoms across the channels. PMID:25878806

  19. Massively parallel information processing systems for space applications

    NASA Technical Reports Server (NTRS)

    Schaefer, D. H.

    1979-01-01

    NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.

  20. Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data

    DOEpatents

    Grider, Gary A.; Poole, Stephen W.

    2015-09-01

    Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.

  1. A High Order, Locally-Adaptive Method for the Navier-Stokes Equations

    NASA Astrophysics Data System (ADS)

    Chan, Daniel

    1998-11-01

    I have extended the FOSLS method of Cai, Manteuffel and McCormick (1997) and implemented it within the framework of a spectral element formulation using the Legendre polynomial basis function. The FOSLS method solves the Navier-Stokes equations as a system of coupled first-order equations and provides the ellipticity that is needed for fast iterative matrix solvers like multigrid to operate efficiently. Each element is treated as an object and its properties are self-contained. Only C^0 continuity is imposed across element interfaces; this design allows local grid refinement and coarsening without the burden of having an elaborate data structure, since only information along element boundaries is needed. With the FORTRAN 90 programming environment, I can maintain a high computational efficiency by employing a hybrid parallel processing model. The OpenMP directives provides parallelism in the loop level which is executed in a shared-memory SMP and the MPI protocol allows the distribution of elements to a cluster of SMP's connected via a commodity network. This talk will provide timing results and a comparison with a second order finite difference method.

  2. Using Python to generate AHPS-based precipitation simulations over CONUS using Amazon distributed computing

    NASA Astrophysics Data System (ADS)

    Machalek, P.; Kim, S. M.; Berry, R. D.; Liang, A.; Small, T.; Brevdo, E.; Kuznetsova, A.

    2012-12-01

    We describe how the Climate Corporation uses Python and Clojure, a language impleneted on top of Java, to generate climatological forecasts for precipitation based on the Advanced Hydrologic Prediction Service (AHPS) radar based daily precipitation measurements. A 2-year-long forecasts is generated on each of the ~650,000 CONUS land based 4-km AHPS grids by constructing 10,000 ensembles sampled from a 30-year reconstructed AHPS history for each grid. The spatial and temporal correlations between neighboring AHPS grids and the sampling of the analogues are handled by Python. The parallelization for all the 650,000 CONUS stations is further achieved by utilizing the MAP-REDUCE framework (http://code.google.com/edu/parallel/mapreduce-tutorial.html). Each full scale computational run requires hundreds of nodes with up to 8 processors each on the Amazon Elastic MapReduce (http://aws.amazon.com/elasticmapreduce/) distributed computing service resulting in 3 terabyte datasets. We further describe how we have productionalized a monthly run of the simulations process at full scale of the 4km AHPS grids and how the resultant terabyte sized datasets are handled.

  3. A Concept for Run-Time Support of the Chapel Language

    NASA Technical Reports Server (NTRS)

    James, Mark

    2006-01-01

    A document presents a concept for run-time implementation of other concepts embodied in the Chapel programming language. (Now undergoing development, Chapel is intended to become a standard language for parallel computing that would surpass older such languages in both computational performance in the efficiency with which pre-existing code can be reused and new code written.) The aforementioned other concepts are those of distributions, domains, allocations, and access, as defined in a separate document called "A Semantic Framework for Domains and Distributions in Chapel" and linked to a language specification defined in another separate document called "Chapel Specification 0.3." The concept presented in the instant report is recognition that a data domain that was invented for Chapel offers a novel approach to distributing and processing data in a massively parallel environment. The concept is offered as a starting point for development of working descriptions of functions and data structures that would be necessary to implement interfaces to a compiler for transforming the aforementioned other concepts from their representations in Chapel source code to their run-time implementations.

  4. An Efficient Computational Framework for the Analysis of Whole Slide Images: Application to Follicular Lymphoma Immunohistochemistry

    PubMed Central

    Samsi, Siddharth; Krishnamurthy, Ashok K.; Gurcan, Metin N.

    2012-01-01

    Follicular Lymphoma (FL) is one of the most common non-Hodgkin Lymphoma in the United States. Diagnosis and grading of FL is based on the review of histopathological tissue sections under a microscope and is influenced by human factors such as fatigue and reader bias. Computer-aided image analysis tools can help improve the accuracy of diagnosis and grading and act as another tool at the pathologist’s disposal. Our group has been developing algorithms for identifying follicles in immunohistochemical images. These algorithms have been tested and validated on small images extracted from whole slide images. However, the use of these algorithms for analyzing the entire whole slide image requires significant changes to the processing methodology since the images are relatively large (on the order of 100k × 100k pixels). In this paper we discuss the challenges involved in analyzing whole slide images and propose potential computational methodologies for addressing these challenges. We discuss the use of parallel computing tools on commodity clusters and compare performance of the serial and parallel implementations of our approach. PMID:22962572

  5. A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gittens, Alex; Kottalam, Jey; Yang, Jiyan

    We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with themore » fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.« less

  6. Parallel halftoning technique using dot diffusion optimization

    NASA Astrophysics Data System (ADS)

    Molina-Garcia, Javier; Ponomaryov, Volodymyr I.; Reyes-Reyes, Rogelio; Cruz-Ramos, Clara

    2017-05-01

    In this paper, a novel approach for halftone images is proposed and implemented for images that are obtained by the Dot Diffusion (DD) method. Designed technique is based on an optimization of the so-called class matrix used in DD algorithm and it consists of generation new versions of class matrix, which has no baron and near-baron in order to minimize inconsistencies during the distribution of the error. Proposed class matrix has different properties and each is designed for two different applications: applications where the inverse-halftoning is necessary, and applications where this method is not required. The proposed method has been implemented in GPU (NVIDIA GeForce GTX 750 Ti), multicore processors (AMD FX(tm)-6300 Six-Core Processor and in Intel core i5-4200U), using CUDA and OpenCV over a PC with linux. Experimental results have shown that novel framework generates a good quality of the halftone images and the inverse halftone images obtained. The simulation results using parallel architectures have demonstrated the efficiency of the novel technique when it is implemented in real-time processing.

  7. schwimmbad: A uniform interface to parallel processing pools in Python

    NASA Astrophysics Data System (ADS)

    Price-Whelan, Adrian M.; Foreman-Mackey, Daniel

    2017-09-01

    Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).

  8. Business model configuration and dynamics for technology commercialization in mature markets.

    PubMed

    Flammini, Serena; Arcese, Gabriella; Lucchetti, Maria Claudia; Mortara, Letizia

    2017-01-01

    The food industry is a well-established and complex industry. New entrants attempting to penetrate it via the commercialization of a new technological innovation could face high uncertainty and constraints. The capability to innovate through collaboration and to identify suitable strategies and innovative business models (BMs) can be particularly important for bringing a technological innovation to this market. However, although the potential for these capabilities has been advocated, we still lack a complete understanding of how new ventures could support the technology commercialization process via the development of BMs. The paper aims to discuss these issues. To address this gap, this paper builds a conceptual framework that knits together the different bodies of extant literature (i.e. entrepreneurship, strategy and innovation) to analyze the BM innovation processes associated with the exploitation of emerging technologies; determines the suitability of the framework using data from the exploratory case study of IT IS 3D - a firm which has started to exploit 3D printing in the food industry; and improves the initial conceptual framework with the findings that emerged in the case study. From this analysis it emerged that: companies could use more than one BM at a time; hence, BM innovation processes could co-exist and be run in parallel; the facing of high uncertainty might lead firms to choose a closed and/or a familiar BM, while explorative strategies could be pursued with open BMs; significant changes in strategies during the technology commercialization process are not necessarily reflected in a radical change in the BM; and firms could deliberately adopt interim strategies and BMs as means to identify the more suitable ones to reach the market. This case study illustrates how firms could innovate the processes of their BM development to face the uncertainties linked with the entry into a mature and highly conservative industry (food).

  9. A big data approach for climate change indicators processing in the CLIP-C project

    NASA Astrophysics Data System (ADS)

    D'Anca, Alessandro; Conte, Laura; Palazzo, Cosimo; Fiore, Sandro; Aloisio, Giovanni

    2016-04-01

    Defining and implementing processing chains with multiple (e.g. tens or hundreds of) data analytics operators can be a real challenge in many practical scientific use cases such as climate change indicators. This is usually done via scripts (e.g. bash) on the client side and requires climate scientists to take care of, implement and replicate workflow-like control logic aspects (which may be error-prone too) in their scripts, along with the expected application-level part. Moreover, the big amount of data and the strong I/O demand pose additional challenges related to the performance. In this regard, production-level tools for climate data analysis are mostly sequential and there is a lack of big data analytics solutions implementing fine-grain data parallelism or adopting stronger parallel I/O strategies, data locality, workflow optimization, etc. High-level solutions leveraging on workflow-enabled big data analytics frameworks for eScience could help scientists in defining and implementing the workflows related to their experiments by exploiting a more declarative, efficient and powerful approach. This talk will start introducing the main needs and challenges regarding big data analytics workflow management for eScience and will then provide some insights about the implementation of some real use cases related to some climate change indicators on large datasets produced in the context of the CLIP-C project - a EU FP7 project aiming at providing access to climate information of direct relevance to a wide variety of users, from scientists to policy makers and private sector decision makers. All the proposed use cases have been implemented exploiting the Ophidia big data analytics framework. The software stack includes an internal workflow management system, which coordinates, orchestrates, and optimises the execution of multiple scientific data analytics and visualization tasks. Real-time workflow monitoring execution is also supported through a graphical user interface. In order to address the challenges of the use cases, the implemented data analytics workflows include parallel data analysis, metadata management, virtual file system tasks, maps generation, rolling of datasets, and import/export of datasets in NetCDF format. The use cases have been implemented on a HPC cluster of 8-nodes (16-cores/node) of the Athena Cluster available at the CMCC Supercomputing Centre. Benchmark results will be also presented during the talk.

  10. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  11. Parallel processing in finite element structural analysis

    NASA Technical Reports Server (NTRS)

    Noor, Ahmed K.

    1987-01-01

    A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).

  12. Connectionism, parallel constraint satisfaction processes, and gestalt principles: (re) introducing cognitive dynamics to social psychology.

    PubMed

    Read, S J; Vanman, E J; Miller, L C

    1997-01-01

    We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.

  13. Parallelization and checkpointing of GPU applications through program transformation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solano-Quinde, Lizandro Damian

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less

  14. Using Parallel Processing for Problem Solving.

    DTIC Science & Technology

    1979-12-01

    are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language

  15. Pursuing realistic hydrologic model under SUPERFLEX framework in a semi-humid catchment in China

    NASA Astrophysics Data System (ADS)

    Wei, Lingna; Savenije, Hubert H. G.; Gao, Hongkai; Chen, Xi

    2016-04-01

    Model realism is pursued perpetually by hydrologists for flood and drought prediction, integrated water resources management and decision support of water security. "Physical-based" distributed hydrologic models are speedily developed but they also encounter unneglectable challenges, for instance, computational time with low efficiency and parameters uncertainty. This study step-wisely tested four conceptual hydrologic models under the framework of SUPERFLEX in a small semi-humid catchment in southern Huai River basin of China. The original lumped FLEXL has hypothesized model structure of four reservoirs to represent canopy interception, unsaturated zone, subsurface flow of fast and slow components and base flow storage. Considering the uneven rainfall in space, the second model (FLEXD) is developed with same parameter set for different rain gauge controlling units. To reveal the effect of topography, terrain descriptor of height above the nearest drainage (HAND) combined with slope is applied to classify the experimental catchment into two landscapes. Then the third one (FLEXTOPO) builds different model blocks in consideration of the dominant hydrologic process corresponding to the topographical condition. The fourth one named FLEXTOPOD integrating the parallel framework of FLEXTOPO in four controlled units is designed to interpret spatial variability of rainfall patterns and topographic features. Through pairwise comparison, our results suggest that: (1) semi-distributed models (FLEXD and FLEXTOPOD) taking precipitation spatial heterogeneity into account has improved model performance with parsimonious parameter set, and (2) hydrologic model architecture with flexibility to reflect perceived dominant hydrologic processes can include the local terrain circumstances for each landscape. Hence, the modeling actions are coincided with the catchment behaviour and close to the "reality". The presented methodology is regarding hydrologic model as a tool to test our hypothesis and deepen our understanding of hydrologic processes, which will be helpful to improve modeling realism.

  16. Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

    PubMed Central

    Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy

    2016-01-01

    This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637

  17. Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Akil, Mohamed

    2017-05-01

    The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.

  18. Evolution of CMS workload management towards multicore job support

    NASA Astrophysics Data System (ADS)

    Pérez-Calero Yzquierdo, A.; Hernández, J. M.; Khan, F. A.; Letts, J.; Majewski, K.; Rodrigues, A. M.; McCrea, A.; Vaandering, E.

    2015-12-01

    The successful exploitation of multicore processor architectures is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework is introducing the parallel execution of the reconstruction and simulation algorithms to overcome these limitations. CMS plans to execute multicore jobs while still supporting singlecore processing for other tasks difficult to parallelize, such as user analysis. The CMS strategy for job management thus aims at integrating single and multicore job scheduling across the Grid. This is accomplished by employing multicore pilots with internal dynamic partitioning of the allocated resources, capable of running payloads of various core counts simultaneously. An extensive test programme has been conducted to enable multicore scheduling with the various local batch systems available at CMS sites, with the focus on the Tier-0 and Tier-1s, responsible during 2015 of the prompt data reconstruction. Scale tests have been run to analyse the performance of this scheduling strategy and ensure an efficient use of the distributed resources. This paper presents the evolution of the CMS job management and resource provisioning systems in order to support this hybrid scheduling model, as well as its deployment and performance tests, which will enable CMS to transition to a multicore production model for the second LHC run.

  19. Evolution of CMS Workload Management Towards Multicore Job Support

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Perez-Calero Yzquierdo, A.; Hernández, J. M.; Khan, F. A.

    The successful exploitation of multicore processor architectures is a key element of the LHC distributed computing system in the coming era of the LHC Run 2. High-pileup complex-collision events represent a challenge for the traditional sequential programming in terms of memory and processing time budget. The CMS data production and processing framework is introducing the parallel execution of the reconstruction and simulation algorithms to overcome these limitations. CMS plans to execute multicore jobs while still supporting singlecore processing for other tasks difficult to parallelize, such as user analysis. The CMS strategy for job management thus aims at integrating single andmore » multicore job scheduling across the Grid. This is accomplished by employing multicore pilots with internal dynamic partitioning of the allocated resources, capable of running payloads of various core counts simultaneously. An extensive test programme has been conducted to enable multicore scheduling with the various local batch systems available at CMS sites, with the focus on the Tier-0 and Tier-1s, responsible during 2015 of the prompt data reconstruction. Scale tests have been run to analyse the performance of this scheduling strategy and ensure an efficient use of the distributed resources. This paper presents the evolution of the CMS job management and resource provisioning systems in order to support this hybrid scheduling model, as well as its deployment and performance tests, which will enable CMS to transition to a multicore production model for the second LHC run.« less

  20. Online optimal experimental re-design in robotic parallel fed-batch cultivation facilities.

    PubMed

    Cruz Bournazou, M N; Barz, T; Nickel, D B; Lopez Cárdenas, D C; Glauche, F; Knepper, A; Neubauer, P

    2017-03-01

    We present an integrated framework for the online optimal experimental re-design applied to parallel nonlinear dynamic processes that aims to precisely estimate the parameter set of macro kinetic growth models with minimal experimental effort. This provides a systematic solution for rapid validation of a specific model to new strains, mutants, or products. In biosciences, this is especially important as model identification is a long and laborious process which is continuing to limit the use of mathematical modeling in this field. The strength of this approach is demonstrated by fitting a macro-kinetic differential equation model for Escherichia coli fed-batch processes after 6 h of cultivation. The system includes two fully-automated liquid handling robots; one containing eight mini-bioreactors and another used for automated at-line analyses, which allows for the immediate use of the available data in the modeling environment. As a result, the experiment can be continually re-designed while the cultivations are running using the information generated by periodical parameter estimations. The advantages of an online re-computation of the optimal experiment are proven by a 50-fold lower average coefficient of variation on the parameter estimates compared to the sequential method (4.83% instead of 235.86%). The success obtained in such a complex system is a further step towards a more efficient computer aided bioprocess development. Biotechnol. Bioeng. 2017;114: 610-619. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  1. Image Processing Using a Parallel Architecture.

    DTIC Science & Technology

    1987-12-01

    ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than

  2. Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

    PubMed

    Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik

    2015-06-09

    Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.

  3. Parallel algorithm for multiscale atomistic/continuum simulations using LAMMPS

    NASA Astrophysics Data System (ADS)

    Pavia, F.; Curtin, W. A.

    2015-07-01

    Deformation and fracture processes in engineering materials often require simultaneous descriptions over a range of length and time scales, with each scale using a different computational technique. Here we present a high-performance parallel 3D computing framework for executing large multiscale studies that couple an atomic domain, modeled using molecular dynamics and a continuum domain, modeled using explicit finite elements. We use the robust Coupled Atomistic/Discrete-Dislocation (CADD) displacement-coupling method, but without the transfer of dislocations between atoms and continuum. The main purpose of the work is to provide a multiscale implementation within an existing large-scale parallel molecular dynamics code (LAMMPS) that enables use of all the tools associated with this popular open-source code, while extending CADD-type coupling to 3D. Validation of the implementation includes the demonstration of (i) stability in finite-temperature dynamics using Langevin dynamics, (ii) elimination of wave reflections due to large dynamic events occurring in the MD region and (iii) the absence of spurious forces acting on dislocations due to the MD/FE coupling, for dislocations further than 10 Å from the coupling boundary. A first non-trivial example application of dislocation glide and bowing around obstacles is shown, for dislocation lengths of ∼50 nm using fewer than 1 000 000 atoms but reproducing results of extremely large atomistic simulations at much lower computational cost.

  4. Parallelizing quantum circuit synthesis

    NASA Astrophysics Data System (ADS)

    Di Matteo, Olivia; Mosca, Michele

    2016-03-01

    Quantum circuit synthesis is the process in which an arbitrary unitary operation is decomposed into a sequence of gates from a universal set, typically one which a quantum computer can implement both efficiently and fault-tolerantly. As physical implementations of quantum computers improve, the need is growing for tools that can effectively synthesize components of the circuits and algorithms they will run. Existing algorithms for exact, multi-qubit circuit synthesis scale exponentially in the number of qubits and circuit depth, leaving synthesis intractable for circuits on more than a handful of qubits. Even modest improvements in circuit synthesis procedures may lead to significant advances, pushing forward the boundaries of not only the size of solvable circuit synthesis problems, but also in what can be realized physically as a result of having more efficient circuits. We present a method for quantum circuit synthesis using deterministic walks. Also termed pseudorandom walks, these are walks in which once a starting point is chosen, its path is completely determined. We apply our method to construct a parallel framework for circuit synthesis, and implement one such version performing optimal T-count synthesis over the Clifford+T gate set. We use our software to present examples where parallelization offers a significant speedup on the runtime, as well as directly confirm that the 4-qubit 1-bit full adder has optimal T-count 7 and T-depth 3.

  5. Parallelization of the TRIGRS model for rainfall-induced landslides using the message passing interface

    USGS Publications Warehouse

    Alvioli, M.; Baum, R.L.

    2016-01-01

    We describe a parallel implementation of TRIGRS, the Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability Model for the timing and distribution of rainfall-induced shallow landslides. We have parallelized the four time-demanding execution modes of TRIGRS, namely both the saturated and unsaturated model with finite and infinite soil depth options, within the Message Passing Interface framework. In addition to new features of the code, we outline details of the parallel implementation and show the performance gain with respect to the serial code. Results are obtained both on commercial hardware and on a high-performance multi-node machine, showing the different limits of applicability of the new code. We also discuss the implications for the application of the model on large-scale areas and as a tool for real-time landslide hazard monitoring.

  6. DasPy – Open Source Multivariate Land Data Assimilation Framework with High Performance Computing

    NASA Astrophysics Data System (ADS)

    Han, Xujun; Li, Xin; Montzka, Carsten; Kollet, Stefan; Vereecken, Harry; Hendricks Franssen, Harrie-Jan

    2015-04-01

    Data assimilation has become a popular method to integrate observations from multiple sources with land surface models to improve predictions of the water and energy cycles of the soil-vegetation-atmosphere continuum. In recent years, several land data assimilation systems have been developed in different research agencies. Because of the software availability or adaptability, these systems are not easy to apply for the purpose of multivariate land data assimilation research. Multivariate data assimilation refers to the simultaneous assimilation of observation data for multiple model state variables into a simulation model. Our main motivation was to develop an open source multivariate land data assimilation framework (DasPy) which is implemented using the Python script language mixed with C++ and Fortran language. This system has been evaluated in several soil moisture, L-band brightness temperature and land surface temperature assimilation studies. The implementation allows also parameter estimation (soil properties and/or leaf area index) on the basis of the joint state and parameter estimation approach. LETKF (Local Ensemble Transform Kalman Filter) is implemented as the main data assimilation algorithm, and uncertainties in the data assimilation can be represented by perturbed atmospheric forcings, perturbed soil and vegetation properties and model initial conditions. The CLM4.5 (Community Land Model) was integrated as the model operator. The CMEM (Community Microwave Emission Modelling Platform), COSMIC (COsmic-ray Soil Moisture Interaction Code) and the two source formulation were integrated as observation operators for assimilation of L-band passive microwave, cosmic-ray soil moisture probe and land surface temperature measurements, respectively. DasPy is parallelized using the hybrid MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) techniques. All the input and output data flow is organized efficiently using the commonly used NetCDF file format. Online 1D and 2D visualization of data assimilation results is also implemented to facilitate the post simulation analysis. In summary, DasPy is a ready to use open source parallel multivariate land data assimilation framework.

  7. PARLO: PArallel Run-Time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Pattern

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gong, Zhenhuan; Boyuka, David; Zou, X

    Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less

  8. Construction and comparison of parallel implicit kinetic solvers in three spatial dimensions

    NASA Astrophysics Data System (ADS)

    Titarev, Vladimir; Dumbser, Michael; Utyuzhnikov, Sergey

    2014-01-01

    The paper is devoted to the further development and systematic performance evaluation of a recent deterministic framework Nesvetay-3D for modelling three-dimensional rarefied gas flows. Firstly, a review of the existing discretization and parallelization strategies for solving numerically the Boltzmann kinetic equation with various model collision integrals is carried out. Secondly, a new parallelization strategy for the implicit time evolution method is implemented which improves scaling on large CPU clusters. Accuracy and scalability of the methods are demonstrated on a pressure-driven rarefied gas flow through a finite-length circular pipe as well as an external supersonic flow over a three-dimensional re-entry geometry of complicated aerodynamic shape.

  9. On some methods for improving time of reachability sets computation for the dynamic system control problem

    NASA Astrophysics Data System (ADS)

    Zimovets, Artem; Matviychuk, Alexander; Ushakov, Vladimir

    2016-12-01

    The paper presents two different approaches to reduce the time of computer calculation of reachability sets. First of these two approaches use different data structures for storing the reachability sets in the computer memory for calculation in single-threaded mode. Second approach is based on using parallel algorithms with reference to the data structures from the first approach. Within the framework of this paper parallel algorithm of approximate reachability set calculation on computer with SMP-architecture is proposed. The results of numerical modelling are presented in the form of tables which demonstrate high efficiency of parallel computing technology and also show how computing time depends on the used data structure.

  10. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    DOE PAGES

    Abraham, Mark James; Murtola, Teemu; Schulz, Roland; ...

    2015-07-15

    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.

  11. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraham, Mark James; Murtola, Teemu; Schulz, Roland

    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boman, Erik G.

    This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performancemore » computing to obtain better data locality and thus reduce run times.« less

  13. Design of a dataway processor for a parallel image signal processing system

    NASA Astrophysics Data System (ADS)

    Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

    1995-04-01

    Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.

  14. Search asymmetries: parallel processing of uncertain sensory information.

    PubMed

    Vincent, Benjamin T

    2011-08-01

    What is the mechanism underlying search phenomena such as search asymmetry? Two-stage models such as Feature Integration Theory and Guided Search propose parallel pre-attentive processing followed by serial post-attentive processing. They claim search asymmetry effects are indicative of finding pairs of features, one processed in parallel, the other in serial. An alternative proposal is that a 1-stage parallel process is responsible, and search asymmetries occur when one stimulus has greater internal uncertainty associated with it than another. While the latter account is simpler, only a few studies have set out to empirically test its quantitative predictions, and many researchers still subscribe to the 2-stage account. This paper examines three separate parallel models (Bayesian optimal observer, max rule, and a heuristic decision rule). All three parallel models can account for search asymmetry effects and I conclude that either people can optimally utilise the uncertain sensory data available to them, or are able to select heuristic decision rules which approximate optimal performance. Copyright © 2011 Elsevier Ltd. All rights reserved.

  15. Layout Study and Application of Mobile App Recommendation Approach Based On Spark Streaming Framework

    NASA Astrophysics Data System (ADS)

    Wang, H. T.; Chen, T. T.; Yan, C.; Pan, H.

    2018-05-01

    For App recommended areas of mobile phone software, made while using conduct App application recommended combined weighted Slope One algorithm collaborative filtering algorithm items based on further improvement of the traditional collaborative filtering algorithm in cold start, data matrix sparseness and other issues, will recommend Spark stasis parallel algorithm platform, the introduction of real-time streaming streaming real-time computing framework to improve real-time software applications recommended.

  16. LSD: Large Survey Database framework

    NASA Astrophysics Data System (ADS)

    Juric, Mario

    2012-09-01

    The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures.

  17. Forces and stress in second order Møller-Plesset perturbation theory for condensed phase systems within the resolution-of-identity Gaussian and plane waves approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Del Ben, Mauro, E-mail: mauro.delben@chem.uzh.ch; Hutter, Jürg, E-mail: hutter@chem.uzh.ch; VandeVondele, Joost, E-mail: Joost.VandeVondele@mat.ethz.ch

    The forces acting on the atoms as well as the stress tensor are crucial ingredients for calculating the structural and dynamical properties of systems in the condensed phase. Here, these derivatives of the total energy are evaluated for the second-order Møller-Plesset perturbation energy (MP2) in the framework of the resolution of identity Gaussian and plane waves method, in a way that is fully consistent with how the total energy is computed. This consistency is non-trivial, given the different ways employed to compute Coulomb, exchange, and canonical four center integrals, and allows, for example, for energy conserving dynamics in various ensembles.more » Based on this formalism, a massively parallel algorithm has been developed for finite and extended system. The designed parallel algorithm displays, with respect to the system size, cubic, quartic, and quintic requirements, respectively, for the memory, communication, and computation. All these requirements are reduced with an increasing number of processes, and the measured performance shows excellent parallel scalability and efficiency up to thousands of nodes. Additionally, the computationally more demanding quintic scaling steps can be accelerated by employing graphics processing units (GPU’s) showing, for large systems, a gain of almost a factor two compared to the standard central processing unit-only case. In this way, the evaluation of the derivatives of the RI-MP2 energy can be performed within a few minutes for systems containing hundreds of atoms and thousands of basis functions. With good time to solution, the implementation thus opens the possibility to perform molecular dynamics (MD) simulations in various ensembles (microcanonical ensemble and isobaric-isothermal ensemble) at the MP2 level of theory. Geometry optimization, full cell relaxation, and energy conserving MD simulations have been performed for a variety of molecular crystals including NH{sub 3}, CO{sub 2}, formic acid, and benzene.« less

  18. 77 FR 47573 - Approval and Promulgation of Implementation Plans; Mississippi; 110(a)(2)(E)(ii) Infrastructure...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-08-09

    ... Mississippi Department of Environmental Quality (MDEQ), on July 13, 2012, for parallel processing. This... of Contents I. What is parallel processing? II. Background III. What elements are required under... Executive Order Reviews I. What is parallel processing? Consistent with EPA regulations found at 40 CFR Part...

  19. Double Take: Parallel Processing by the Cerebral Hemispheres Reduces Attentional Blink

    ERIC Educational Resources Information Center

    Scalf, Paige E.; Banich, Marie T.; Kramer, Arthur F.; Narechania, Kunjan; Simon, Clarissa D.

    2007-01-01

    Recent data have shown that parallel processing by the cerebral hemispheres can expand the capacity of visual working memory for spatial locations (J. F. Delvenne, 2005) and attentional tracking (G. A. Alvarez & P. Cavanagh, 2005). Evidence that parallel processing by the cerebral hemispheres can improve item identification has remained elusive.…

  20. SciSpark's SRDD : A Scientific Resilient Distributed Dataset for Multidimensional Data

    NASA Astrophysics Data System (ADS)

    Palamuttam, R. S.; Wilson, B. D.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; McGibbney, L. J.; Ramirez, P.

    2015-12-01

    Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We have developed SciSpark, a robust Big Data framework, that extends ApacheTM Spark for scaling scientific computations. Apache Spark improves the map-reduce implementation in ApacheTM Hadoop for parallel computing on a cluster, by emphasizing in-memory computation, "spilling" to disk only as needed, and relying on lazy evaluation. Central to Spark is the Resilient Distributed Dataset (RDD), an in-memory distributed data structure that extends the functional paradigm provided by the Scala programming language. However, RDDs are ideal for tabular or unstructured data, and not for highly dimensional data. The SciSpark project introduces the Scientific Resilient Distributed Dataset (sRDD), a distributed-computing array structure which supports iterative scientific algorithms for multidimensional data. SciSpark processes data stored in NetCDF and HDF files by partitioning them across time or space and distributing the partitions among a cluster of compute nodes. We show usability and extensibility of SciSpark by implementing distributed algorithms for geospatial operations on large collections of multi-dimensional grids. In particular we address the problem of scaling an automated method for finding Mesoscale Convective Complexes. SciSpark provides a tensor interface to support the pluggability of different matrix libraries. We evaluate performance of the various matrix libraries in distributed pipelines, such as Nd4jTM and BreezeTM. We detail the architecture and design of SciSpark, our efforts to integrate climate science algorithms, parallel ingest and partitioning (sharding) of A-Train satellite observations from model grids. These solutions are encompassed in SciSpark, an open-source software framework for distributed computing on scientific data.

  1. Mapping SOA Artefacts onto an Enterprise Reference Architecture Framework

    NASA Astrophysics Data System (ADS)

    Noran, Ovidiu

    Currently, there is still no common agreement on the service-Oriented architecture (SOA) definition, or the types and meaning of the artefacts involved in the creation and maintenance of an SOA. Furthermore, the SOA image shift from an infrastructure solution to a business-wide change project may have promoted a perception that SOA is a parallel initiative, a competitor and perhaps a successor of enterprise architecture (EA). This chapter attempts to map several typical SOA artefacts onto an enterprise reference framework commonly used in EA. This is done in order to show that the EA framework can express and structure most of the SOA artefacts and therefore, a framework for SOA could in fact be derived from an EA framework with the ensuing SOA-EA integration benefits.

  2. On the costs of parallel processing in dual-task performance: The case of lexical processing in word production.

    PubMed

    Paucke, Madlen; Oppermann, Frank; Koch, Iring; Jescheniak, Jörg D

    2015-12-01

    Previous dual-task picture-naming studies suggest that lexical processes require capacity-limited processes and prevent other tasks to be carried out in parallel. However, studies involving the processing of multiple pictures suggest that parallel lexical processing is possible. The present study investigated the specific costs that may arise when such parallel processing occurs. We used a novel dual-task paradigm by presenting 2 visual objects associated with different tasks and manipulating between-task similarity. With high similarity, a picture-naming task (T1) was combined with a phoneme-decision task (T2), so that lexical processes were shared across tasks. With low similarity, picture-naming was combined with a size-decision T2 (nonshared lexical processes). In Experiment 1, we found that a manipulation of lexical processes (lexical frequency of T1 object name) showed an additive propagation with low between-task similarity and an overadditive propagation with high between-task similarity. Experiment 2 replicated this differential forward propagation of the lexical effect and showed that it disappeared with longer stimulus onset asynchronies. Moreover, both experiments showed backward crosstalk, indexed as worse T1 performance with high between-task similarity compared with low similarity. Together, these findings suggest that conditions of high between-task similarity can lead to parallel lexical processing in both tasks, which, however, does not result in benefits but rather in extra performance costs. These costs can be attributed to crosstalk based on the dual-task binding problem arising from parallel processing. Hence, the present study reveals that capacity-limited lexical processing can run in parallel across dual tasks but only at the expense of extraordinary high costs. (c) 2015 APA, all rights reserved).

  3. Graphical Representation of Parallel Algorithmic Processes

    DTIC Science & Technology

    1990-12-01

    interface with the AAARF main process . The source code for the AAARF class-common library is in the common subdi- rectory and consists of the following files... for public release; distribution unlimited AFIT/GCE/ENG/90D-07 Graphical Representation of Parallel Algorithmic Processes THESIS Presented to the...goal of this study is to develop an algorithm animation facility for parallel processes executing on different architectures, from multiprocessor

  4. Parallel tools GUI framework-DOE SBIR phase I final technical report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Galarowicz, James

    2013-12-05

    Many parallel performance, profiling, and debugging tools require a graphical way of displaying the very large datasets typically gathered from high performance computing (HPC) applications. Most tool projects create their graphical user interfaces (GUI) from scratch, many times spending their project resources on simply redeveloping commonly used infrastructure. Our goal was to create a multiplatform GUI framework, based on Nokia/Digia’s popular Qt libraries, which will specifically address the needs of these parallel tools. The Parallel Tools GUI Framework (PTGF) uses a plugin architecture facilitating rapid GUI development and reduced development costs for new and existing tool projects by allowing themore » reuse of many common GUI elements, called “widgets.” Widgets created include, 2D data visualizations, a source code viewer with syntax highlighting, and integrated help and welcome screens. Application programming interface (API) design was focused on minimizing the time to getting a functional tool working. Having a standard, unified, and userfriendly interface which operates on multiple platforms will benefit HPC application developers by reducing training time and allowing users to move between tools rapidly during a single session. However, Argo Navis Technologies LLC will not be submitting a DOE SBIR Phase II proposal and commercialization plan for the PTGF project. Our preliminary estimates for gross income over the next several years was based upon initial customer interest and income generated by similar projects. Unfortunately, as we further assessed the market during Phase I, we grew to realize that there was not enough demand to warrant such a large investment. While we do find that the project is worth our continued investment of time and money, we do not think it worthy of the DOE's investment at this time. We are grateful that the DOE has afforded us the opportunity to make this assessment, and come to this conclusion.« less

  5. AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams.

    PubMed

    Chen, Qiuwen; Luley, Ryan; Wu, Qing; Bishop, Morgan; Linderman, Richard W; Qiu, Qinru

    2018-05-01

    The evolution of high performance computing technologies has enabled the large-scale implementation of neuromorphic models and pushed the research in computational intelligence into a new era. Among the machine learning applications, unsupervised detection of anomalous streams is especially challenging due to the requirements of detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research topic. In this paper, we propose anomaly recognition and detection (AnRAD), a bioinspired detection framework that performs probabilistic inferences. We analyze the feature dependency and develop a self-structuring method that learns an efficient confabulation network using unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base using streaming data. Compared with several existing anomaly detection approaches, our method provides competitive detection quality. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementations of the detection algorithm on the graphic processing unit and the Xeon Phi coprocessor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor. The framework provides real-time service to concurrent data streams within diversified knowledge contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle behavior detection, the framework is able to monitor up to 16000 vehicles (data streams) and their interactions in real time with a single commodity coprocessor, and uses less than 0.2 ms for one testing subject. Finally, the detection network is ported to our spiking neural network simulator to show the potential of adapting to the emerging neuromorphic architectures.

  6. A Distributed Computing Framework for Real-Time Detection of Stress and of Its Propagation in a Team.

    PubMed

    Pandey, Parul; Lee, Eun Kyung; Pompili, Dario

    2016-11-01

    Stress is one of the key factor that impacts the quality of our daily life: From the productivity and efficiency in the production processes to the ability of (civilian and military) individuals in making rational decisions. Also, stress can propagate from one individual to other working in a close proximity or toward a common goal, e.g., in a military operation or workforce. Real-time assessment of the stress of individuals alone is, however, not sufficient, as understanding its source and direction in which it propagates in a group of people is equally-if not more-important. A continuous near real-time in situ personal stress monitoring system to quantify level of stress of individuals and its direction of propagation in a team is envisioned. However, stress monitoring of an individual via his/her mobile device may not always be possible for extended periods of time due to limited battery capacity of these devices. To overcome this challenge a novel distributed mobile computing framework is proposed to organize the resources in the vicinity and form a mobile device cloud that enables offloading of computation tasks in stress detection algorithm from resource constrained devices (low residual battery, limited CPU cycles) to resource rich devices. Our framework also supports computing parallelization and workflows, defining how the data and tasks divided/assigned among the entities of the framework are designed. The direction of propagation and magnitude of influence of stress in a group of individuals are studied by applying real-time, in situ analysis of Granger Causality. Tangible benefits (in terms of energy expenditure and execution time) of the proposed framework in comparison to a centralized framework are presented via thorough simulations and real experiments.

  7. unmarked: An R package for fitting hierarchical models of wildlife occurrence and abundance

    USGS Publications Warehouse

    Fiske, Ian J.; Chandler, Richard B.

    2011-01-01

    Ecological research uses data collection techniques that are prone to substantial and unique types of measurement error to address scientific questions about species abundance and distribution. These data collection schemes include a number of survey methods in which unmarked individuals are counted, or determined to be present, at spatially- referenced sites. Examples include site occupancy sampling, repeated counts, distance sampling, removal sampling, and double observer sampling. To appropriately analyze these data, hierarchical models have been developed to separately model explanatory variables of both a latent abundance or occurrence process and a conditional detection process. Because these models have a straightforward interpretation paralleling mechanisms under which the data arose, they have recently gained immense popularity. The common hierarchical structure of these models is well-suited for a unified modeling interface. The R package unmarked provides such a unified modeling framework, including tools for data exploration, model fitting, model criticism, post-hoc analysis, and model comparison.

  8. Aerodynamic Design of Complex Configurations Using Cartesian Methods and CAD Geometry

    NASA Technical Reports Server (NTRS)

    Nemec, Marian; Aftosmis, Michael J.; Pulliam, Thomas H.

    2003-01-01

    The objective for this paper is to present the development of an optimization capability for the Cartesian inviscid-flow analysis package of Aftosmis et al. We evaluate and characterize the following modules within the new optimization framework: (1) A component-based geometry parameterization approach using a CAD solid representation and the CAPRI interface. (2) The use of Cartesian methods in the development Optimization techniques using a genetic algorithm. The discussion and investigations focus on several real world problems of the optimization process. We examine the architectural issues associated with the deployment of a CAD-based design approach in a heterogeneous parallel computing environment that contains both CAD workstations and dedicated compute nodes. In addition, we study the influence of noise on the performance of optimization techniques, and the overall efficiency of the optimization process for aerodynamic design of complex three-dimensional configurations. of automated optimization tools. rithm and a gradient-based algorithm.

  9. Analysis of the tobacco industry's interference in the enforcement of health warnings on tobacco products in Brazil.

    PubMed

    Perez, Cristina de Abreu; Silva, Vera Luiza da Costa E; Bialous, Stella Aguinaga

    2017-10-19

    This article aims to analyze the relationship between the Brazilian government's adoption of a regulatory measure with a strong impact on the population and the opposition by invested interest groups. The methodology involves the analysis of official documents on the enforcement of health warnings on tobacco products sold in Brazil. In parallel, a search was conducted for publicly available tobacco industry documents resulting from lawsuits, with the aim of identifying the industry's reactions to this process. The findings suggest that various government acts were affected by direct interference from the tobacco industry. In some cases the interventions were explicit and in others they were indirect or difficult to identify. In light of the study's theoretical framework, the article provides original information on the Brazilian process that can be useful for government policymakers in the strategic identification of tobacco control policies.

  10. Stanislavsky's system as an enactive guide to embodied cognition?

    NASA Astrophysics Data System (ADS)

    Clare, Ysabel

    2017-01-01

    This paper presents a model of the structure of subjective experience derived from the work of Konstantin Stanislavsky, and demonstrates its usefulness as a functional framework of enacted cognitive embodiment by using it to articulate his approach to the process of acting. Research into Stanislavsky's training exercises reveals that they evoke a spatial adpositional conceptualisation of experience. When reflected back onto the practice from which it emerges, this situates the choices made by actors as contributing towards the construction of a stable attention field with which they enter into relationship during performance. It is suggested that the resulting template might clarify conceptual distinctions between practices at the unconscious level, and a brief illustrative comparison between Stanislavsky's and Meisner's practices is essayed. A parallel is drawn throughout with the basic principles of embodied cognition, and correlations found with aspects of Dynamic Field Theory and Wilson's notions of "on-" and "off-line" processing.

  11. Motivating women and men to take protective action against rape: examining direct and indirect persuasive fear appeals.

    PubMed

    Morrison, Kelly

    2005-01-01

    This article examines the effectiveness of persuasive fear appeals in motivating women to enroll in self-defense classes to take protective action against rape. Witte's extended parallel process model is used as a framework to examine the relations between perceived invulnerability, perceived fear, and fear control processes. Because women may perceive invulnerability to rape, persuasive fear appeals targeted toward them may be ineffective in achieving attitude, intention, and behavioral change toward protecting themselves. One possible solution is to persuade men to talk with women about whom they care. Results indicated that women did not perceive invulnerability to rape, and although there was no differential impact between high- and low-threat messages, women did report positive intention and behaviors in response to direct fear appeals. Moreover, men reported positive intention and behaviors in response to indirect fear appeals.

  12. Developmental trajectories during adolescence in males and females: a cross-species understanding of underlying brain changes

    PubMed Central

    Brenhouse, Heather C.; Andersen, Susan L.

    2011-01-01

    Adolescence is a transitional period between childhood and adulthood that encompasses vast changes within brain systems that parallel some, but not all, behavioral changes. Elevations in emotional reactivity and reward processing follow an inverted U shape in terms of onset and remission, with the peak occurring during adolescence. However, cognitive processing follows a more linear course of development. This review will focus on changes within key structures and will highlight the relationships between brain changes and behavior, with evidence spanning from functional magnetic resonance imaging (fMRI) in humans to molecular studies of receptor and signaling factors in animals. Adolescent changes in neuronal substrates will be used to understand how typical and atypical behaviors arise during adolescence. We draw upon clinical and preclinical studies to provide a neural framework for defining adolescence and its role in the transition to adulthood. PMID:21600919

  13. A framework for collaboration in public transit systems

    DOT National Transportation Integrated Search

    1997-05-01

    The 494 transportation corridor stretches eight miles and connects residential suburbs with major commercial areas, including the Mall of America and the Minneapolis-St. Paul International Airport. The corridor includes l-494 as well as parallel loca...

  14. A framework for parallelized efficient global optimization with application to vehicle crashworthiness optimization

    NASA Astrophysics Data System (ADS)

    Hamza, Karim; Shalaby, Mohamed

    2014-09-01

    This article presents a framework for simulation-based design optimization of computationally expensive problems, where economizing the generation of sample designs is highly desirable. One popular approach for such problems is efficient global optimization (EGO), where an initial set of design samples is used to construct a kriging model, which is then used to generate new 'infill' sample designs at regions of the search space where there is high expectancy of improvement. This article attempts to address one of the limitations of EGO, where generation of infill samples can become a difficult optimization problem in its own right, as well as allow the generation of multiple samples at a time in order to take advantage of parallel computing in the evaluation of the new samples. The proposed approach is tested on analytical functions, and then applied to the vehicle crashworthiness design of a full Geo Metro model undergoing frontal crash conditions.

  15. A purely Lagrangian method for simulating the shallow water equations on a sphere using smooth particle hydrodynamics

    NASA Astrophysics Data System (ADS)

    Capecelatro, Jesse

    2018-03-01

    It has long been suggested that a purely Lagrangian solution to global-scale atmospheric/oceanic flows can potentially outperform tradition Eulerian schemes. Meanwhile, a demonstration of a scalable and practical framework remains elusive. Motivated by recent progress in particle-based methods when applied to convection dominated flows, this work presents a fully Lagrangian method for solving the inviscid shallow water equations on a rotating sphere in a smooth particle hydrodynamics framework. To avoid singularities at the poles, the governing equations are solved in Cartesian coordinates, augmented with a Lagrange multiplier to ensure that fluid particles are constrained to the surface of the sphere. An underlying grid in spherical coordinates is used to facilitate efficient neighbor detection and parallelization. The method is applied to a suite of canonical test cases, and conservation, accuracy, and parallel performance are assessed.

  16. Research on retailer data clustering algorithm based on Spark

    NASA Astrophysics Data System (ADS)

    Huang, Qiuman; Zhou, Feng

    2017-03-01

    Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.

  17. Reusable Component Model Development Approach for Parallel and Distributed Simulation

    PubMed Central

    Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

    2014-01-01

    Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

  18. Amplitude analysis of four-body decays using a massively-parallel fitting framework

    NASA Astrophysics Data System (ADS)

    Hasse, C.; Albrecht, J.; Alves, A. A., Jr.; d'Argent, P.; Evans, T. D.; Rademacker, J.; Sokoloff, M. D.

    2017-10-01

    The GooFit Framework is designed to perform maximum-likelihood fits for arbitrary functions on various parallel back ends, for example a GPU. We present an extension to GooFit which adds the functionality to perform time-dependent amplitude analyses of pseudoscalar mesons decaying into four pseudoscalar final states. Benchmarks of this functionality show a significant performance increase when utilizing a GPU compared to a CPU. Furthermore, this extension is employed to study the sensitivity on the {{{D}}}0-{\\bar{{{D}}}}0 mixing parameters x and y in a time-dependent amplitude analysis of the decay D0 → K+π-π+π-. Studying a sample of 50 000 events and setting the central values to the world average of x = (0.49 ± 0.15)% and y = (0.61 ± 0.08)%, the statistical sensitivities of x and y are determined to be σ(x) = 0.019 % and σ(y) = 0.019 %.

  19. al3c: high-performance software for parameter inference using Approximate Bayesian Computation.

    PubMed

    Stram, Alexander H; Marjoram, Paul; Chen, Gary K

    2015-11-01

    The development of Approximate Bayesian Computation (ABC) algorithms for parameter inference which are both computationally efficient and scalable in parallel computing environments is an important area of research. Monte Carlo rejection sampling, a fundamental component of ABC algorithms, is trivial to distribute over multiple processors but is inherently inefficient. While development of algorithms such as ABC Sequential Monte Carlo (ABC-SMC) help address the inherent inefficiencies of rejection sampling, such approaches are not as easily scaled on multiple processors. As a result, current Bayesian inference software offerings that use ABC-SMC lack the ability to scale in parallel computing environments. We present al3c, a C++ framework for implementing ABC-SMC in parallel. By requiring only that users define essential functions such as the simulation model and prior distribution function, al3c abstracts the user from both the complexities of parallel programming and the details of the ABC-SMC algorithm. By using the al3c framework, the user is able to scale the ABC-SMC algorithm in parallel computing environments for his or her specific application, with minimal programming overhead. al3c is offered as a static binary for Linux and OS-X computing environments. The user completes an XML configuration file and C++ plug-in template for the specific application, which are used by al3c to obtain the desired results. Users can download the static binaries, source code, reference documentation and examples (including those in this article) by visiting https://github.com/ahstram/al3c. astram@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Fast parallel algorithm for slicing STL based on pipeline

    NASA Astrophysics Data System (ADS)

    Ma, Xulong; Lin, Feng; Yao, Bo

    2016-05-01

    In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.

  1. Improving NASA's Multiscale Modeling Framework for Tropical Cyclone Climate Study

    NASA Technical Reports Server (NTRS)

    Shen, Bo-Wen; Nelson, Bron; Cheung, Samson; Tao, Wei-Kuo

    2013-01-01

    One of the current challenges in tropical cyclone (TC) research is how to improve our understanding of TC interannual variability and the impact of climate change on TCs. Recent advances in global modeling, visualization, and supercomputing technologies at NASA show potential for such studies. In this article, the authors discuss recent scalability improvement to the multiscale modeling framework (MMF) that makes it feasible to perform long-term TC-resolving simulations. The MMF consists of the finite-volume general circulation model (fvGCM), supplemented by a copy of the Goddard cumulus ensemble model (GCE) at each of the fvGCM grid points, giving 13,104 GCE copies. The original fvGCM implementation has a 1D data decomposition; the revised MMF implementation retains the 1D decomposition for most of the code, but uses a 2D decomposition for the massive copies of GCEs. Because the vast majority of computation time in the MMF is spent computing the GCEs, this approach can achieve excellent speedup without incurring the cost of modifying the entire code. Intelligent process mapping allows differing numbers of processes to be assigned to each domain for load balancing. The revised parallel implementation shows highly promising scalability, obtaining a nearly 80-fold speedup by increasing the number of cores from 30 to 3,335.

  2. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E

    2014-02-11

    Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  3. Distributed computing feasibility in a non-dedicated homogeneous distributed system

    NASA Technical Reports Server (NTRS)

    Leutenegger, Scott T.; Sun, Xian-He

    1993-01-01

    The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.

  4. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    DOEpatents

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  5. Session on High Speed Civil Transport Design Capability Using MDO and High Performance Computing

    NASA Technical Reports Server (NTRS)

    Rehder, Joe

    2000-01-01

    Since the inception of CAS in 1992, NASA Langley has been conducting research into applying multidisciplinary optimization (MDO) and high performance computing toward reducing aircraft design cycle time. The focus of this research has been the development of a series of computational frameworks and associated applications that increased in capability, complexity, and performance over time. The culmination of this effort is an automated high-fidelity analysis capability for a high speed civil transport (HSCT) vehicle installed on a network of heterogeneous computers with a computational framework built using Common Object Request Broker Architecture (CORBA) and Java. The main focus of the research in the early years was the development of the Framework for Interdisciplinary Design Optimization (FIDO) and associated HSCT applications. While the FIDO effort was eventually halted, work continued on HSCT applications of ever increasing complexity. The current application, HSCT4.0, employs high fidelity CFD and FEM analysis codes. For each analysis cycle, the vehicle geometry and computational grids are updated using new values for design variables. Processes for aeroelastic trim, loads convergence, displacement transfer, stress and buckling, and performance have been developed. In all, a total of 70 processes are integrated in the analysis framework. Many of the key processes include automatic differentiation capabilities to provide sensitivity information that can be used in optimization. A software engineering process was developed to manage this large project. Defining the interactions among 70 processes turned out to be an enormous, but essential, task. A formal requirements document was prepared that defined data flow among processes and subprocesses. A design document was then developed that translated the requirements into actual software design. A validation program was defined and implemented to ensure that codes integrated into the framework produced the same results as their standalone counterparts. Finally, a Commercial Off the Shelf (COTS) configuration management system was used to organize the software development. A computational environment, CJOPT, based on the Common Object Request Broker Architecture, CORBA, and the Java programming language has been developed as a framework for multidisciplinary analysis and Optimization. The environment exploits the parallelisms inherent in the application and distributes the constituent disciplines on machines best suited to their needs. In CJOpt, a discipline code is "wrapped" as an object. An interface to the object identifies the functionality (services) provided by the discipline, defined in Interface Definition Language (IDL) and implemented using Java. The results of using the HSCT4.0 capability are described. A summary of lessons learned is also presented. The use of some of the processes, codes, and techniques by industry are highlighted. The application of the methodology developed in this research to other aircraft are described. Finally, we show how the experience gained is being applied to entirely new vehicles, such as the Reusable Space Transportation System. Additional information is contained in the original.

  6. NDL-v2.0: A new version of the numerical differentiation library for parallel architectures

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P. E.; Angelikopoulos, P.; Voglis, C.; Papageorgiou, D. G.; Lagaris, I. E.

    2014-07-01

    We present a new version of the numerical differentiation library (NDL) used for the numerical estimation of first and second order partial derivatives of a function by finite differencing. In this version we have restructured the serial implementation of the code so as to achieve optimal task-based parallelization. The pure shared-memory parallelization of the library has been based on the lightweight OpenMP tasking model allowing for the full extraction of the available parallelism and efficient scheduling of multiple concurrent library calls. On multicore clusters, parallelism is exploited by means of TORC, an MPI-based multi-threaded tasking library. The new MPI implementation of NDL provides optimal performance in terms of function calls and, furthermore, supports asynchronous execution of multiple library calls within legacy MPI programs. In addition, a Python interface has been implemented for all cases, exporting the functionality of our library to sequential Python codes. Catalog identifier: AEDG_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 63036 No. of bytes in distributed program, including test data, etc.: 801872 Distribution format: tar.gz Programming language: ANSI Fortran-77, ANSI C, Python. Computer: Distributed systems (clusters), shared memory systems. Operating system: Linux, Unix. Has the code been vectorized or parallelized?: Yes. RAM: The library uses O(N) internal storage, N being the dimension of the problem. It can use up to O(N2) internal storage for Hessian calculations, if a task throttling factor has not been set by the user. Classification: 4.9, 4.14, 6.5. Catalog identifier of previous version: AEDG_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1404 Does the new version supersede the previous version?: Yes Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, and sensitivity analysis. For a large number of scientific and engineering applications, the underlying functions correspond to simulation codes for which analytical estimation of derivatives is difficult or almost impossible. A parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with a carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Reasons for new version: The updated version was motivated by our endeavors to extend a parallel Bayesian uncertainty quantification framework [1], by incorporating higher order derivative information as in most state-of-the-art stochastic simulation methods such as Stochastic Newton MCMC [2] and Riemannian Manifold Hamiltonian MC [3]. The function evaluations are simulations with significant time-to-solution, which also varies with the input parameters such as in [1, 4]. The runtime of the N-body-type of problem changes considerably with the introduction of a longer cut-off between the bodies. In the first version of the library, the OpenMP-parallel subroutines spawn a new team of threads and distribute the function evaluations with a PARALLEL DO directive. This limits the functionality of the library as multiple concurrent calls require nested parallelism support from the OpenMP environment. Therefore, either their function evaluations will be serialized or processor oversubscription is likely to occur due to the increased number of OpenMP threads. In addition, the Hessian calculations include two explicit parallel regions that compute first the diagonal and then the off-diagonal elements of the array. Due to the barrier between the two regions, the parallelism of the calculations is not fully exploited. These issues have been addressed in the new version by first restructuring the serial code and then running the function evaluations in parallel using OpenMP tasks. Although the MPI-parallel implementation of the first version is capable of fully exploiting the task parallelism of the PNDL routines, it does not utilize the caching mechanism of the serial code and, therefore, performs some redundant function evaluations in the Hessian and Jacobian calculations. This can lead to: (a) higher execution times if the number of available processors is lower than the total number of tasks, and (b) significant energy consumption due to wasted processor cycles. Overcoming these drawbacks, which become critical as the time of a single function evaluation increases, was the primary goal of this new version. Due to the code restructure, the MPI-parallel implementation (and the OpenMP-parallel in accordance) avoids redundant calls, providing optimal performance in terms of the number of function evaluations. Another limitation of the library was that the library subroutines were collective and synchronous calls. In the new version, each MPI process can issue any number of subroutines for asynchronous execution. We introduce two library calls that provide global and local task synchronizations, similarly to the BARRIER and TASKWAIT directives of OpenMP. The new MPI-implementation is based on TORC, a new tasking library for multicore clusters [5-7]. TORC improves the portability of the software, as it relies exclusively on the POSIX-Threads and MPI programming interfaces. It allows MPI processes to utilize multiple worker threads, offering a hybrid programming and execution environment similar to MPI+OpenMP, in a completely transparent way. Finally, to further improve the usability of our software, a Python interface has been implemented on top of both the OpenMP and MPI versions of the library. This allows sequential Python codes to exploit shared and distributed memory systems. Summary of revisions: The revised code improves the performance of both parallel (OpenMP and MPI) implementations. The functionality and the user-interface of the MPI-parallel version have been extended to support the asynchronous execution of multiple PNDL calls, issued by one or multiple MPI processes. A new underlying tasking library increases portability and allows MPI processes to have multiple worker threads. For both implementations, an interface to the Python programming language has been added. Restrictions: The library uses only double precision arithmetic. The MPI implementation assumes the homogeneity of the execution environment provided by the operating system. Specifically, the processes of a single MPI application must have identical address space and a user function resides at the same virtual address. In addition, address space layout randomization should not be used for the application. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 23 ms for the serial distribution, 25 ms for the OpenMP with 2 threads, 53 ms and 1.01 s for the MPI parallel distribution using 2 threads and 2 processes respectively and yield-time for idle workers equal to 10 ms. References: [1] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework, J. Chem. Phys 137 (14). [2] H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM J. Sci. Comput. 33 (1) (2011) 407-432. [3] M. Girolami, B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73 (2) (2011) 123-214. [4] P. Angelikopoulos, C. Paradimitriou, P. Koumoutsakos, Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B 117 (47) (2013) 14808-14816. [5] P.E. Hadjidoukas, E. Lappas, V.V. Dimakopoulos, A runtime library for platform-independent task parallelism, in: PDP, IEEE, 2012, pp. 229-236. [6] C. Voglis, P.E. Hadjidoukas, D.G. Papageorgiou, I. Lagaris, A parallel hybrid optimization algorithm for fitting interatomic potentials, Appl. Soft Comput. 13 (12) (2013) 4481-4492. [7] P.E. Hadjidoukas, C. Voglis, V.V. Dimakopoulos, I. Lagaris, D.G. Papageorgiou, Supporting adaptive and irregular parallelism for non-linear numerical optimization, Appl. Math. Comput. 231 (2014) 544-559.

  7. A heterogeneous and parallel computing framework for high-resolution hydrodynamic simulations

    NASA Astrophysics Data System (ADS)

    Smith, Luke; Liang, Qiuhua

    2015-04-01

    Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of parallel processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic simulations across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. Simulations for the three-day event could be performed on a 2m grid within a few hours. In the context of a rapid pluvial flood event in Newcastle upon Tyne during 2012, the technique allows simulation of inundation for a 31km2 of the city centre in less than an hour on a 2m grid; however, further grid refinement is required to fully capture important smaller flow pathways. Good agreement between the model and observed inundation is achieved for a variety of dam failure, slow fluvial inundation, rapid pluvial inundation, and defence breach scenarios in the UK.

  8. A framework for investigating geographical variation in diseases, based on a study of Legionnaires' disease.

    PubMed

    Bhopal, R S

    1991-11-01

    Demonstration of geographical variations in disease can yield powerful insight into the disease pathway, particularly for environmentally acquired conditions, but only if the many problems of data interpretation can be solved. This paper presents the framework, methods and principles guiding a study of the geographical epidemiology of Legionnaires' Disease in Scotland. A case-list was constructed and disease incidence rates were calculated by geographical area; these showed variation. Five categories of explanation for the variation were identified: short-term fluctuations of incidence in time masquerading as differences by place; artefact; and differences in host-susceptibility, agent virulence, or environment. The methods used to study these explanations, excepting agent virulence, are described, with an emphasis on the use of previously existing data to test hypotheses. Examples include the use of mortality, census and hospital morbidity data to assess the artefact and host-susceptibility explanations; and the use of ratios of serology tests to disease to examine the differential testing hypothesis. The reasoning and process by which the environmental focus of the study was narrowed and the technique for relating the geographical pattern of disease to the putative source are outlined. This framework allows the researcher to plan for the parallel collection of the data necessary both to demonstrate geographical variation and to point to the likely explanation.

  9. Examining the concept of convenient collection: an application to extended producer responsibility and product stewardship frameworks.

    PubMed

    Wagner, Travis P

    2013-03-01

    Increasingly, Extended Producer Responsibility (EPR) and Product Stewardship (PS) frameworks are being adopted as a preferred policy approach to promote cost-effective diversion and recovery of post-consumer solid waste. Because the application of EPR/PS generally requires the creation of a separate and often parallel collection and/or management system, key to increasing the amount of waste recovered is to maximize the convenience of the collection system to maximize consumer participation. Convenient collection is often mandated in EPR/PS laws, however it is not defined. Convenience is a subjective construct rendering it extremely difficult to define. However, based on a dissection of post-consumer collection efforts under a generic EPR/PS system, this paper identifies and examines five categories of convenience - knowledge requirements, proximity to a collection site, opportunity to drop-off materials, the draw of the collection site, and the ease of the process-and the various factors of convenience within each of these categories. By using a simplified multiple criteria decision analysis, this paper proposes a performance matrix of criteria of convenience. Stakeholders can use this matrix to assist in the design, assessment, and/or implementation of a convenient post-consumer collection system under an EPR/PS framework. Copyright © 2012 Elsevier Ltd. All rights reserved.

  10. The fusion of large scale classified side-scan sonar image mosaics.

    PubMed

    Reed, Scott; Tena, Ruiz Ioseba; Capus, Chris; Petillot, Yvan

    2006-07-01

    This paper presents a unified framework for the creation of classified maps of the seafloor from sonar imagery. Significant challenges in photometric correction, classification, navigation and registration, and image fusion are addressed. The techniques described are directly applicable to a range of remote sensing problems. Recent advances in side-scan data correction are incorporated to compensate for the sonar beam pattern and motion of the acquisition platform. The corrected images are segmented using pixel-based textural features and standard classifiers. In parallel, the navigation of the sonar device is processed using Kalman filtering techniques. A simultaneous localization and mapping framework is adopted to improve the navigation accuracy and produce georeferenced mosaics of the segmented side-scan data. These are fused within a Markovian framework and two fusion models are presented. The first uses a voting scheme regularized by an isotropic Markov random field and is applicable when the reliability of each information source is unknown. The Markov model is also used to inpaint regions where no final classification decision can be reached using pixel level fusion. The second model formally introduces the reliability of each information source into a probabilistic model. Evaluation of the two models using both synthetic images and real data from a large scale survey shows significant quantitative and qualitative improvement using the fusion approach.

  11. A Watershed Scale Life Cycle Assessment Framework for Hydrologic Design

    NASA Astrophysics Data System (ADS)

    Tavakol-Davani, H.; Tavakol-Davani, PhD, H.; Burian, S. J.

    2017-12-01

    Sustainable hydrologic design has received attention from researchers with different backgrounds, including hydrologists and sustainability experts, recently. On one hand, hydrologists have been analyzing ways to achieve hydrologic goals through implementation of recent environmentally-friendly approaches, e.g. Green Infrastructure (GI) - without quantifying the life cycle environmental impacts of the infrastructure through the ISO Life Cycle Assessment (LCA) method. On the other hand, sustainability experts have been applying the LCA to study the life cycle impacts of water infrastructure - without considering the important hydrologic aspects through hydrologic and hydraulic (H&H) analysis. In fact, defining proper system elements for a watershed scale urban water sustainability study requires both H&H and LCA specialties, which reveals the necessity of performing an integrated, interdisciplinary study. Therefore, the present study developed a watershed scale coupled H&H-LCA framework to bring the hydrology and sustainability expertise together to contribute moving the current wage definition of sustainable hydrologic design towards onto a globally standard concept. The proposed framework was employed to study GIs for an urban watershed in Toledo, OH. Lastly, uncertainties associated with the proposed method and parameters were analyzed through a robust Monte Carlo simulation using parallel processing. Results indicated the necessity of both hydrologic and LCA components in the design procedure in order to achieve sustainability.

  12. The influence of interdisciplinary collaboration on decision making: a framework to analyse stakeholder coalitions, evolution and learning in strategic delta planning

    NASA Astrophysics Data System (ADS)

    Vermoolen, Myrthe; Hermans, Leon

    2015-04-01

    The sustained development of urbanizing deltas requires that conflicting interests are reconciled, in an environment characterized by technical complexity and knowledge limitations. However, integrating ideas and establishing cooperation between actors with different backgrounds and roles still proves a challenge. Agreeing on strategic choices is difficult and implementation of agreed plans may lead to unanticipated and unintended outcomes. How can individual disciplinary perspectives come together and establish a broadly-supported and well-informed plan, the implementation of which contributes to sustainable delta development? The growing recognition of this need to bring together different stakeholders and different disciplinary perspectives runs parallel to a paradigm shift from 'hard' hydrological engineering to multi-functional and more 'soft' hydrological engineering in water management. As a result, there is now more attention for interdisciplinary collaboration that not only takes the physical characteristics of water systems into account, but also the interaction between physical and societal components of these systems. Thus, it is important to study interdisciplinary collaboration and how this influences decision-making. Our research looks into this connection, using a case in delta planning in the Netherlands, where there have been several (attempts for) integration of spatial planning and flood risk/ water management, e.g. in the case of the Dutch Delta Programme. This means that spatial designers and their designs play an important role in the strategic delta planning process as well, next to civil engineers, etc. This study explores the roles of stakeholders, experts and policy makers in interdisciplinary decision-making in dynamic delta planning processes, using theories and methods that focus on coalitions, learning and changes over time in policy and planning processes. This requires an expansion of the existing frameworks to study interdisciplinary collaboration. The question here is how to combine policy science frameworks (e.g. the Advocacy Coalition Framework) and social network methods (e.g. Social Network Analysis) with frameworks that allow a connection with the physical delta systems. This will result in a new framework for analysing interdisciplinary stakeholder coalitions, evolution and learning in strategic delta planning. The use of this framework will be illustrated with an example from strategic delta planning in the Dutch Southwest Delta. With this, we want to see how spatial planning and water management disciplines have combined into new policies for delta management in the Netherlands over the past 25 years.

  13. A review of aircraft turnaround operations and simulations

    NASA Astrophysics Data System (ADS)

    Schmidt, Michael

    2017-07-01

    The ground operational processes are the connecting element between aircraft en-route operations and airport infrastructure. An efficient aircraft turnaround is an essential component of airline success, especially for regional and short-haul operations. It is imperative that advancements in ground operations, specifically process reliability and passenger comfort, are developed while dealing with increasing passenger traffic in the next years. This paper provides an introduction to aircraft ground operations focusing on the aircraft turnaround and passenger processes. Furthermore, key challenges for current aircraft operators, such as airport capacity constraints, schedule disruptions and the increasing cost pressure, are highlighted. A review of the conducted studies and conceptual work in this field shows pathways for potential process improvements. Promising approaches attempt to reduce apron traffic and parallelize passenger processes and taxiing. The application of boarding strategies and novel cabin layouts focusing on aisle, door and seat, are options to shorten the boarding process inside the cabin. A summary of existing modeling and simulation frameworks give an insight into state-of-the-art assessment capabilities as it concerns advanced concepts. They are the prerequisite to allow a holistic assessment during the early stages of the preliminary aircraft design process and to identify benefits and drawbacks for all involved stakeholders.

  14. Parallel software for lattice N = 4 supersymmetric Yang-Mills theory

    NASA Astrophysics Data System (ADS)

    Schaich, David; DeGrand, Thomas

    2015-05-01

    We present new parallel software, SUSY LATTICE, for lattice studies of four-dimensional N = 4 supersymmetric Yang-Mills theory with gauge group SU(N). The lattice action is constructed to exactly preserve a single supersymmetry charge at non-zero lattice spacing, up to additional potential terms included to stabilize numerical simulations. The software evolved from the MILC code for lattice QCD, and retains a similar large-scale framework despite the different target theory. Many routines are adapted from an existing serial code (Catterall and Joseph, 2012), which SUSY LATTICE supersedes. This paper provides an overview of the new parallel software, summarizing the lattice system, describing the applications that are currently provided and explaining their basic workflow for non-experts in lattice gauge theory. We discuss the parallel performance of the code, and highlight some notable aspects of the documentation for those interested in contributing to its future development.

  15. 3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

    PubMed Central

    Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco

    2014-01-01

    Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397

  16. Parallel processing via a dual olfactory pathway in the honeybee.

    PubMed

    Brill, Martin F; Rosenbaum, Tobias; Reus, Isabelle; Kleineidam, Christoph J; Nawrot, Martin P; Rössler, Wolfgang

    2013-02-06

    In their natural environment, animals face complex and highly dynamic olfactory input. Thus vertebrates as well as invertebrates require fast and reliable processing of olfactory information. Parallel processing has been shown to improve processing speed and power in other sensory systems and is characterized by extraction of different stimulus parameters along parallel sensory information streams. Honeybees possess an elaborate olfactory system with unique neuronal architecture: a dual olfactory pathway comprising a medial projection-neuron (PN) antennal lobe (AL) protocerebral output tract (m-APT) and a lateral PN AL output tract (l-APT) connecting the olfactory lobes with higher-order brain centers. We asked whether this neuronal architecture serves parallel processing and employed a novel technique for simultaneous multiunit recordings from both tracts. The results revealed response profiles from a high number of PNs of both tracts to floral, pheromonal, and biologically relevant odor mixtures tested over multiple trials. PNs from both tracts responded to all tested odors, but with different characteristics indicating parallel processing of similar odors. Both PN tracts were activated by widely overlapping response profiles, which is a requirement for parallel processing. The l-APT PNs had broad response profiles suggesting generalized coding properties, whereas the responses of m-APT PNs were comparatively weaker and less frequent, indicating higher odor specificity. Comparison of response latencies within and across tracts revealed odor-dependent latencies. We suggest that parallel processing via the honeybee dual olfactory pathway provides enhanced odor processing capabilities serving sophisticated odor perception and olfactory demands associated with a complex olfactory world of this social insect.

  17. A multimodal parallel architecture: A cognitive framework for multimodal interactions.

    PubMed

    Cohn, Neil

    2016-01-01

    Human communication is naturally multimodal, and substantial focus has examined the semantic correspondences in speech-gesture and text-image relationships. However, visual narratives, like those in comics, provide an interesting challenge to multimodal communication because the words and/or images can guide the overall meaning, and both modalities can appear in complicated "grammatical" sequences: sentences use a syntactic structure and sequential images use a narrative structure. These dual structures create complexity beyond those typically addressed by theories of multimodality where only a single form uses combinatorial structure, and also poses challenges for models of the linguistic system that focus on single modalities. This paper outlines a broad theoretical framework for multimodal interactions by expanding on Jackendoff's (2002) parallel architecture for language. Multimodal interactions are characterized in terms of their component cognitive structures: whether a particular modality (verbal, bodily, visual) is present, whether it uses a grammatical structure (syntax, narrative), and whether it "dominates" the semantics of the overall expression. Altogether, this approach integrates multimodal interactions into an existing framework of language and cognition, and characterizes interactions between varying complexity in the verbal, bodily, and graphic domains. The resulting theoretical model presents an expanded consideration of the boundaries of the "linguistic" system and its involvement in multimodal interactions, with a framework that can benefit research on corpus analyses, experimentation, and the educational benefits of multimodality. Copyright © 2015.

  18. Visual analysis of inter-process communication for large-scale parallel computing.

    PubMed

    Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

    2009-01-01

    In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.

  19. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  20. When the lowest energy does not induce native structures: parallel minimization of multi-energy values by hybridizing searching intelligences.

    PubMed

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise.

Top